HolmesGPT¶
Open-source SRE agent for investigating production incidents across any infrastructure — Kubernetes, VMs, cloud services, databases, and more.
New: Operator Mode — Find Problems 24/7 in the Background¶
Most AI agents are great at troubleshooting problems, but still need a human to notice something is wrong and trigger an investigation. Operator mode fixes that — HolmesGPT runs in the background 24/7, spots problems before your customers notice, and messages you in Slack with the fix. Connect the GitHub integration and it can even open PRs to fix what it finds.
While the operator itself runs in Kubernetes, health checks can query any data source Holmes is connected to — VMs, cloud services, databases, SaaS platforms, and more.
- Deployment Verification - Deploy a health check alongside your app to verify the new version is healthy
- Scheduled Health Checks - Continuously monitor services and catch regressions automatically
Quick Start¶
-
Run HolmesGPT from your terminal
-
Use through a web interface or K9s plugin
-
Compare LLM performance across 150+ test scenarios
Already Installed?¶
Connect your data sources to unlock deeper investigations with metrics, logs, and cloud provider access.
Need Help?¶
- Join our Slack - Get help from the community
- Request features on GitHub - Suggest improvements or report bugs
We are a Cloud Native Computing Foundation sandbox project.
