Recommended Setup¶
After installing HolmesGPT and running your first investigation, connect your data sources so Holmes can perform deeper investigations.
How Holmes Works¶
HolmesGPT is an AI troubleshooting agent that investigates issues by pulling data from your existing observability stack. The more data sources you connect, the more thoroughly Holmes can investigate — correlating metrics with logs, tracing infrastructure changes to application failures, and building a complete picture of what went wrong.
Holmes works across cloud, on-premise, and hybrid environments. If you use Kubernetes, the Kubernetes toolsets are enabled automatically. But Kubernetes is not required — Holmes works equally well with Prometheus, Datadog, Elasticsearch, AWS, GCP, databases, and many other data sources. Configure the toolsets that match your stack.
1. Connect a Metrics Provider¶
Metrics give Holmes visibility into trends over time. Without metrics, Holmes can still investigate using logs and infrastructure state, but it won't be able to spot gradual degradation or correlate historical information as well. Metrics are also critical to answering numerical questions, like 'what is the error rate for service xyz?'
Connect whichever metrics platform you already use:
| Platform | Setup Guide | Notes |
|---|---|---|
| Prometheus | Setup | Most common. Works with self-hosted, Grafana Cloud (Mimir), AWS AMP, Azure Managed Prometheus, Google Managed Prometheus, and Coralogix PromQL |
| Datadog | Setup | Enable datadog/metrics (and optionally datadog/logs, datadog/traces, datadog/general) |
| New Relic | Setup | Uses NRQL for metrics, traces, and logs in one toolset |
| Coralogix | Setup | For Coralogix-native log and metrics queries |
Quick example (Prometheus):
Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:
toolsets:
prometheus/metrics:
enabled: true
config:
prometheus_url: http://prometheus-server.monitoring:9090
When using the standalone Holmes Helm Chart, update your values.yaml:
toolsets:
prometheus/metrics:
enabled: true
config:
prometheus_url: http://prometheus-server.monitoring:9090
Apply the configuration:
helm upgrade holmes holmes/holmes --values=values.yaml
When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:
holmes:
toolsets:
prometheus/metrics:
enabled: true
config:
prometheus_url: http://prometheus-server.monitoring:9090
Apply the configuration:
helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>
2. Connect Centralized Logging¶
Centralized logging gives Holmes access to historical logs, cross-service log correlation, and full-text search across your environment. This is especially important for investigating issues where logs from the affected service are no longer available — crashed processes, terminated containers, rotated log files, or services running on VMs and bare metal.
| Platform | Setup Guide | Notes |
|---|---|---|
| Loki | Setup | Can connect through Grafana or directly |
| Elasticsearch / OpenSearch | Setup | elasticsearch/data for log search, elasticsearch/cluster for cluster health |
| Datadog Logs | Setup | Enable datadog/logs alongside metrics |
| Splunk | Setup | Via MCP server |
Quick example (Loki via Grafana):
Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:
toolsets:
grafana/loki:
enabled: true
config:
api_key: <your-grafana-token>
api_url: https://your-grafana.net
grafana_datasource_uid: <loki-datasource-uid>
When using the standalone Holmes Helm Chart, update your values.yaml:
toolsets:
grafana/loki:
enabled: true
config:
api_key: <your-grafana-token>
api_url: https://your-grafana.net
grafana_datasource_uid: <loki-datasource-uid>
Apply the configuration:
helm upgrade holmes holmes/holmes --values=values.yaml
When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:
holmes:
toolsets:
grafana/loki:
enabled: true
config:
api_key: <your-grafana-token>
api_url: https://your-grafana.net
grafana_datasource_uid: <loki-datasource-uid>
Apply the configuration:
helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>
3. Connect Your Cloud Provider¶
Cloud provider access lets Holmes investigate infrastructure-level causes — misconfigured security groups, IAM permission changes, database failovers, load balancer issues, DNS misconfigurations, or resource quota limits. Many production incidents involve changes at the infrastructure layer that aren't visible from application metrics or logs alone.
| Platform | Setup Guide | Notes |
|---|---|---|
| AWS | Setup | Read-only access to EC2, RDS, ELB, CloudWatch, CloudTrail, and more via MCP server |
| GCP | Setup | Logging, monitoring, traces, gcloud CLI, and storage via MCP server |
| Azure | Setup | Azure resource management via MCP server |
4. Connect Grafana Dashboards (Bonus)¶
If you use Grafana, connecting the dashboards toolset lets Holmes see what you're already monitoring — it can find relevant dashboards, extract PromQL queries from panels, and use them during investigations.
| Platform | Setup Guide |
|---|---|
| Grafana Dashboards | Setup |
Verify Your Setup¶
After configuring your data sources, verify everything is connected:
# List all enabled toolsets
holmes toolset list
# Test with a real investigation
holmes ask "what is the health of my environment?"
Next Steps¶
- Interactive Mode - Use Holmes interactively for follow-up questions
- Investigating Prometheus Alerts - Automate alert investigation
- All Built-in Toolsets - Browse the full list of integrations
- Custom Toolsets - Create integrations for proprietary tools