jpskill.com
💬 コミュニケーション コミュニティ

prometheus-alertmanager

Configure Prometheus Alertmanager for alert routing, grouping, silencing, and notification delivery. Use when a user needs to set up alert receivers (Slack, PagerDuty, email), define routing trees, manage silences and inhibition rules, or troubleshoot alert delivery pipelines.

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o prometheus-alertmanager.zip https://jpskill.com/download/15294.zip && unzip -o prometheus-alertmanager.zip && rm prometheus-alertmanager.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/15294.zip -OutFile "$d\prometheus-alertmanager.zip"; Expand-Archive "$d\prometheus-alertmanager.zip" -DestinationPath $d -Force; ri "$d\prometheus-alertmanager.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して prometheus-alertmanager.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → prometheus-alertmanager フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
1
📖 Claude が読む原文 SKILL.md(中身を展開)

この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。

Prometheus Alertmanager

Overview

Configure Alertmanager to handle alerts from Prometheus, route them to the correct receivers, group related alerts, suppress duplicates, and manage silences. Covers routing trees, receiver configuration, inhibition rules, and high-availability setup.

Instructions

Task A: Basic Alertmanager Configuration

# alertmanager.yml — Main Alertmanager configuration
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: '<SMTP_PASSWORD>'
  slack_api_url: 'https://hooks.slack.com/services/T00/B00/XXXX'
  pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'

route:
  receiver: 'default-slack'
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: 'pagerduty-critical'
      group_wait: 10s
      repeat_interval: 1h
    - match:
        severity: warning
      receiver: 'slack-warnings'
      repeat_interval: 4h
    - match_re:
        service: ^(payment|billing)$
      receiver: 'payments-team'
      routes:
        - match:
            severity: critical
          receiver: 'pagerduty-payments'

receivers:
  - name: 'default-slack'
    slack_configs:
      - channel: '#ops-alerts'
        send_resolved: true
        title: '{{ .Status | toUpper }}: {{ .CommonLabels.alertname }}'
        text: >-
          {{ range .Alerts }}
          *{{ .Labels.alertname }}* on {{ .Labels.instance }}
          {{ .Annotations.description }}
          {{ end }}

  - name: 'pagerduty-critical'
    pagerduty_configs:
      - service_key: '<PD_SERVICE_KEY>'
        severity: '{{ if eq .CommonLabels.severity "critical" }}critical{{ else }}warning{{ end }}'
        description: '{{ .CommonLabels.alertname }}: {{ .CommonAnnotations.summary }}'

  - name: 'slack-warnings'
    slack_configs:
      - channel: '#ops-warnings'
        send_resolved: true

  - name: 'payments-team'
    slack_configs:
      - channel: '#payments-alerts'
        send_resolved: true

  - name: 'pagerduty-payments'
    pagerduty_configs:
      - service_key: '<PD_PAYMENTS_KEY>'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'cluster', 'service']
  - source_match:
      alertname: 'ClusterDown'
    target_match_re:
      alertname: '.+'
    equal: ['cluster']

Task B: Define Prometheus Alert Rules

# prometheus/rules/alerts.yml — Alert rules for Prometheus
groups:
  - name: service-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
          / sum(rate(http_requests_total[5m])) by (service) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.service }}"
          description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.service }}."

      - alert: HighLatency
        expr: |
          histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)) > 2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency above 2s on {{ $labels.service }}"
          description: "P99 latency is {{ $value }}s for {{ $labels.service }}."

      - alert: PodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 3
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Pod {{ $labels.pod }} is crash looping"
          description: "Pod {{ $labels.pod }} in {{ $labels.namespace }} restarted {{ $value }} times in 15m."

      - alert: DiskSpaceRunningLow
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Disk space below 10% on {{ $labels.instance }}"
          description: "{{ $labels.mountpoint }} has {{ $value | humanizePercentage }} free."

Task C: Manage Silences

# Create a silence via Alertmanager API — Maintenance window
curl -X POST http://localhost:9093/api/v2/silences \
  -H "Content-Type: application/json" \
  -d '{
    "matchers": [
      { "name": "instance", "value": "web-01:9090", "isRegex": false },
      { "name": "severity", "value": "warning|critical", "isRegex": true }
    ],
    "startsAt": "2026-02-20T02:00:00Z",
    "endsAt": "2026-02-20T06:00:00Z",
    "createdBy": "marta",
    "comment": "Scheduled maintenance on web-01"
  }'
# List active silences
curl -s http://localhost:9093/api/v2/silences | jq '.[] | select(.status.state=="active") | {id: .id, comment: .comment, endsAt: .endsAt}'
# Delete (expire) a silence
curl -X DELETE http://localhost:9093/api/v2/silence/<SILENCE_ID>

Task D: High Availability Setup

# docker-compose.yml — Alertmanager HA cluster with 3 instances
services:
  alertmanager-1:
    image: prom/alertmanager:v0.27.0
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--cluster.peer=alertmanager-2:9094'
      - '--cluster.peer=alertmanager-3:9094'
      - '--storage.path=/alertmanager'
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml

  alertmanager-2:
    image: prom/alertmanager:v0.27.0
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--cluster.peer=alertmanager-1:9094'
      - '--cluster.peer=alertmanager-3:9094'
      - '--storage.path=/alertmanager'

  alertmanager-3:
    image: prom/alertmanager:v0.27.0
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--cluster.peer=alertmanager-1:9094'
      - '--cluster.peer=alertmanager-2:9094'
      - '--storage.path=/alertmanager'

Task E: Test Alert Configuration

# Send a test alert to Alertmanager
curl -X POST http://localhost:9093/api/v2/alerts \
  -H "Content-Type: application/json" \
  -d '[{
    "labels": {
      "alertname": "TestAlert",
      "severity": "critical",
      "service": "payment",
      "instance": "web-01:9090"
    },
    "annotations": {
      "summary": "Test alert — please ignore",
      "description": "This is a test alert to verify routing."
    },
    "startsAt": "2026-02-19T23:00:00Z"
  }]'
# Check which route an alert matches using amtool
amtool config routes test --config.file=alertmanager.yml \
  severity=critical service=payment

Best Practices

  • Use group_by to batch related alerts into single notifications and reduce noise
  • Always set send_resolved: true on Slack receivers so teams know when issues clear
  • Use inhibition rules to suppress warnings when a critical alert already fires for the same target
  • Test routing with amtool config routes test before deploying changes
  • Keep group_wait short (10-30s) for critical alerts and longer for warnings
  • Use time-based muting for known maintenance windows instead of disabling alerts