{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreib5elolblzarankxxuvj35h7u7264tjctc42qx3h7o3rekviqbdbe",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mp7j4o6isx32"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreid2pc334ekwwsuak6osyzo6xwixzeplxk76lrbkcm4wimp437o4ue"
},
"mimeType": "image/webp",
"size": 50652
},
"path": "/apazik/genai-isnt-just-for-product-teams-dg7",
"publishedAt": "2026-06-26T17:27:20.000Z",
"site": "https://dev.to",
"tags": [
"ai",
"devops",
"aws",
"GenAI for Ops Demo Library",
"AWS DevOps Agent",
"DevOps Agent Space",
"Libreswan",
"GoBGP",
"Amazon Linux 2023",
"Submit a pull request",
"Take the quick survey"
],
"textContent": "Most GenAI use cases today focus on product teams. Build a customer chatbot. Generate marketing copy. Develop a new product feature.\n\nBut DevOps, Site Reliability Engineering (SRE), and Cloud Center of Excellence (CCoE) teams have use cases too. Investigate an incident. Create a runbook. Generate cost optimization recommendations.\n\nThese are repetitive tasks that take time away from reliability improvements.\n\nIt's not that operations teams don't see the potential of GenAI. They're waiting for something useful — something that fits into their actual workflows, with code they can deploy and evaluate.\n\nThe gap is relevance, not readiness. What's missing is:\n\n * Practical use cases matched to real operational tasks\n * Deployable code samples that are production-ready\n * Flexible patterns that can be customized\n\n\n\nThe GenAI for Ops Demo Library was created to address this.\n\n## Introducing the GenAI for Ops Demo Library\n\nThe GenAI for Ops Demo Library is a collection of deployable code samples that demonstrate how generative AI can solve real operational challenges across security, cost optimization, resilience, and automation use cases. You can deploy each demo as-is or customize them to your environment.\n\nThere are currently 12 available demos:\n\nUse Case | Demos\n---|---\nSecurity | AI-Powered Security Posture with Prowler + DevOps Agent, AI Incident Response Playbook Builder\nCost Optimization | AI-Powered Graviton Migration Assessment, AWS GenAI Cost Optimization Kiro Power\nOperations Automation | AI-Powered Technical Documentation Generation, AI-Powered Legacy System Automation, AI Password Reset Chatbot, AWS Services Lifecycle Tracker, AI Lambda Runtime Migration Assistant\nObservability | Intelligent EKS Incident Investigation with Amazon DevOps Agent, Intelligent AWS Site-to-Site VPN Tunnel Investigation with Amazon DevOps Agent\nResilience | Natural Language Chaos Engineering with AWS FIS\n\n## Technical Stack\n\nEach demo is built on AWS services and AI integration patterns familiar to operations teams:\n\n * **Amazon CloudWatch** for metrics, logs, and alarms\n * **AWS Lambda** for serverless compute\n * **Amazon Simple Notification Service (SNS)** for event routing\n * **AWS Cloud Development Kit (CDK)** for infrastructure as code\n * **Amazon Bedrock** and **Amazon Nova** for foundation model access\n * **Amazon Bedrock AgentCore** for multi-step AI orchestration\n * **Model Context Protocol (MCP) servers** for standardized tool integration\n\n\n\n## Demo Structure\n\nAdditionally, each demo includes a deployment guide, technical design document, deployment script(s), and cost estimates with optimization tips.\n\nTo show how these demos work in practice, here's a walkthrough of one.\n\n## Example: Site-to-Site VPN Tunnel Investigation with AWS DevOps Agent\n\nAWS Site-to-Site VPN tunnels fail for a lot of reasons: pre-shared key mismatches, IKE proposal incompatibilities, dead-peer-detection timeouts, Border Gateway Protocol (BGP) session drops, route withdrawals, throughput degradation. When a tunnel goes down at 2:00 AM, your on-call SRE has to read through CloudWatch metrics, VPN tunnel logs, and IPsec config to figure out what happened. That takes time and negatively impacts your Mean Time to Resolution (MTTR). This demo shows how AWS DevOps Agent autonomously triages these and other incidents, providing root cause analysis and actions for resolution.\n\n### Overview\n\nThe demo deploys a self-contained VPN environment and creates a DevOps Agent Space to investigate failures automatically.\n\nWhen a tunnel fails or performance drops, DevOps Agent:\n\n 1. Reads VPN tunnel logs from CloudWatch and correlates metrics across both tunnels\n 2. Queries a self-contained MCP server for business context (service dependencies, cost impact, compliance status)\n 3. Produces a root cause analysis (RCA) and detailed mitigation plan\n\n\n\n### Architecture\n\nThe demo has three layers:\n\n**Network layer**\n\n * An Amazon Virtual Private Cloud (VPC) (10.0.0.0/16) and a simulated on-premises VPC (172.16.0.0/16) linked by a Site-to-Site VPN with two IPsec tunnels\n * An Amazon EC2 instance customer gateway running Libreswan for IPsec and GoBGP for BGP on Amazon Linux 2023\n\n\n\n**Monitoring layer**\n\n * CloudWatch alarms to monitor the tunnel state, performance, and other failures\n * An SNS topic to trigger a Lambda function that sends a webhook to DevOps Agent\n\n\n\n**Intelligence layer**\n\n * A DevOps Agent Space for DevOps Agent to access resources and investigate VPN operational issues\n\n\n\n### How it Works\n\n\n Tunnel Fails / Performance Degrades\n ↓\n CloudWatch Alarm Changes State\n ↓\n SNS Notification Received\n ↓\n Lambda Function Invoked\n ↓\n DevOps Agent Investigation Starts\n ↓\n Investigation Completes\n → Root Cause Identified\n → Remediation Plan Generated\n\n\n### Common Failure Scenarios\n\nThe demo includes 10 failure scenarios to inject and watch DevOps Agent investigate:\n\n**IKE**\n\n * PSK mismatch (key rotation gone wrong)\n * DPD timeout (firewall blocking IKE traffic)\n * Proposal mismatch (incompatible DH group)\n * Traffic selector mismatch (subnet change breaking BGP)\n * Tunnel shutdown (customer gateway-initiated teardown)\n\n\n\n**BGP**\n\n * BGP daemon down\n * ASN mismatch after maintenance\n * Hold timer expired (blocked keepalives)\n\n\n\n**Other**\n\n * BGP route withdrawal (prefix no longer advertised)\n * Throughput degradation (performance drops while tunnels stay up)\n\n\n\n### The Results\n\n**Faster incident resolution.** Autonomous investigation of VPN failures and performance degradation reduces MTTR from hours to minutes\n\n**Fewer repeat incidents.** Targeted recommendations address incident root causes and strengthen VPN tunnel resilience\n\n**Greater operational efficiency.** Less time spent on repetitive investigations and more time spent on high-value work\n\n### Cost Estimate\n\nEach demo is built with AWS Well-Architected Framework Cost Optimization pillar in mind, so running costs stay minimal.\n\nResource | Hourly Cost\n---|---\nVPN connection (1.25 Gbps) | $0.05\n2× t3.micro EC2 instances | $0.03\n4× Public IPv4 addresses | $0.02\n4× CloudWatch alarms | < $0.01\nLambda, SNS, CloudWatch | < $0.01\n**Total** | **~$0.12/hour**\n\n_This specific demo is designed to be deployed, tested, and torn down. If left running continuously, the monthly cost is estimated to be ~$88/month ($0.12 × 730 hours)._\n\n## Get Started\n\n 1. **Explore** : Browse the demo library and choose a demo that aligns with your use case\n 2. **Try** : Deploy the demo in your AWS account\n 3. **Contribute** : Submit a pull request with your demo\n 4. **Feedback** : Take the quick survey and share your feedback\n\n",
"title": "GenAI Isn't Just for Product Teams"
}