Monitoring¶

Stack: Grafana + Prometheus + Node Exporter + cAdvisor + Uptime Kuma.

Stack Components¶

Component	Role	Port
Grafana	Dashboard UI	3000
Prometheus	Metrics collection & storage	9090
Node Exporter	Host-level metrics	9100
cAdvisor	Container-level metrics	8080
Uptime Kuma	Uptime / availability monitoring	3001

Prometheus Configuration¶

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Grafana Dashboards¶

Dashboard	Grafana.com ID	Purpose
Node Exporter Full	1860	Host metrics
cAdvisor	14282	Container resource usage
Docker & System	893	Docker host overview

Key Alert Rules¶

Alert	Condition	Severity
High CPU	> 85% for 5 min	Warning
High RAM	> 90% for 5 min	Warning
Disk usage	> 80% on any mount	Warning
Container down	0 running on expected service	Critical
Host unreachable	Node Exporter scrape fails	Critical

Retention¶

Prometheus: 30 days (set via --storage.tsdb.retention.time=30d).

docker exec prometheus du -sh /prometheus