Productivity

infrastructure-monitoring

aj-geddes/useful-ai-prompts · updated Apr 8, 2026

$npx skills add https://github.com/aj-geddes/useful-ai-prompts --skill infrastructure-monitoring
summary

Implement comprehensive infrastructure monitoring to track system health, performance metrics, and resource utilization with alerting and visualization across your entire stack.

skill.md

Infrastructure Monitoring

Table of Contents

Overview

Implement comprehensive infrastructure monitoring to track system health, performance metrics, and resource utilization with alerting and visualization across your entire stack.

When to Use

  • Real-time performance monitoring
  • Capacity planning and trends
  • Incident detection and alerting
  • Service health tracking
  • Resource utilization analysis
  • Performance troubleshooting
  • Compliance and audit trails
  • Historical data analysis

Quick Start

Minimal working example:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: "infrastructure-monitor"
    environment: "production"

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

# Rule files
rule_files:
  - "alerts.yml"
  - "rules.yml"

scrape_configs:
  # Prometheus itself
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
// ... (see reference guides for full implementation)

Reference Guides

Detailed implementations in the references/ directory:

Guide Contents
Prometheus Configuration Prometheus Configuration
Alert Rules Alert Rules
Alertmanager Configuration Alertmanager Configuration
Grafana Dashboard Grafana Dashboard
Monitoring Deployment Monitoring Deployment

Best Practices

✅ DO

  • Follow established patterns and conventions
  • Write clean, maintainable code
  • Add appropriate documentation
  • Test thoroughly before deploying

❌ DON'T

  • Skip testing or validation
  • Ignore error handling
  • Hard-code configuration values