jpskill.com
💼 ビジネス コミュニティ

gcp-gke

Google Kubernetes Engine (GKE)のAutopilotを使い、本番環境で利用できるKubernetesクラスタを構築・運用し、コスト最適化や監視も実現するSkill。

📜 元の英語説明(参考)

Provision and operate production-ready Google Kubernetes Engine (GKE) clusters using Autopilot as the golden path. Covers Autopilot vs Standard, private clusters, Workload Identity, autoscaling, GPU/TPU node pools for AI inference, cost optimization with Spot VMs, and observability via Managed Prometheus.

🇯🇵 日本人クリエイター向け解説

一言でいうと

Google Kubernetes Engine (GKE)のAutopilotを使い、本番環境で利用できるKubernetesクラスタを構築・運用し、コスト最適化や監視も実現するSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o gcp-gke.zip https://jpskill.com/download/14932.zip && unzip -o gcp-gke.zip && rm gcp-gke.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/14932.zip -OutFile "$d\gcp-gke.zip"; Expand-Archive "$d\gcp-gke.zip" -DestinationPath $d -Force; ri "$d\gcp-gke.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して gcp-gke.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → gcp-gke フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

GCP Google Kubernetes Engine (GKE)

概要

GKE は Google Cloud のマネージド Kubernetes プラットフォームです。コントロールプレーンをユーザーに代わって実行し、アップグレードを自動化し、Autopilot(Google がノードを管理、Pod ごとに課金) と Standard(ユーザーがノードプールを管理、ノードごとに課金) の 2 つのオペレーティングモードを提供します。Autopilot をデフォルトで使用してください。ノードレベルの苦労がなくなり、本番環境に推奨されるゴールデンパスです。

手順

Autopilot vs Standard

Autopilot Standard
ノード管理 Google ユーザー
課金モデル Pod リソースごと ノード VM ごと
ノードプール構成 なし ユーザーが構成
最適な用途 ほとんどのワークロード DaemonSet、カスタムドライバー付き GPU、特権 Pod
Workload Identity 必須 推奨

本当にノードレベルのアクセス (カスタムカーネル、特定の GPU 構成、特権 DaemonSet) が必要な場合にのみ Standard を使用してください。それ以外の場合は、Autopilot を使用します。

クイックスタート (Autopilot)

gcloud services enable container.googleapis.com

gcloud container clusters create-auto prod-cluster \
  --region=us-central1 \
  --release-channel=regular \
  --enable-private-nodes \
  --network=default --subnetwork=default

gcloud container clusters get-credentials prod-cluster --region=us-central1

kubectl create deployment hello \
  --image=us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
kubectl expose deployment hello --port=80 --target-port=8080 --type=LoadBalancer

本番クラスタのデフォルト

gcloud container clusters create-auto prod-cluster \
  --region=us-central1 \
  --release-channel=regular \
  --enable-private-nodes \
  --enable-master-authorized-networks \
  --master-authorized-networks=10.0.0.0/8,YOUR_OFFICE_CIDR \
  --network=prod-vpc --subnetwork=prod-subnet \
  --cluster-secondary-range-name=pods \
  --services-secondary-range-name=services \
  --workload-pool=my-project.svc.id.goog \
  --enable-shielded-nodes

Workload Identity (キーなしで Pod → GCP API)

# ワークロード用の Google サービスアカウントを作成
gcloud iam service-accounts create orders-api

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:orders-api@my-project.iam.gserviceaccount.com" \
  --role="roles/cloudsql.client"

# Kubernetes ServiceAccount を GSA にバインド
gcloud iam service-accounts add-iam-policy-binding \
  orders-api@my-project.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:my-project.svc.id.goog[default/orders-api]"
apiVersion: v1
kind: ServiceAccount
metadata:
  name: orders-api
  namespace: default
  annotations:
    iam.gke.io/gcp-service-account: orders-api@my-project.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
spec:
  replicas: 3
  selector:
    matchLabels: { app: orders-api }
  template:
    metadata:
      labels: { app: orders-api }
    spec:
      serviceAccountName: orders-api  # → Workload Identity 経由で GSA にマッピング
      containers:
        - name: api
          image: us-central1-docker.pkg.dev/my-project/repo/orders-api:v1.4.2
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1"
              memory: "1Gi"

オートスケーリング

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: orders-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: orders-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }
    - type: Resource
      resource:
        name: memory
        target: { type: Utilization, averageUtilization: 80 }
# PodDisruptionBudget — 自発的な中断時にサービスを可用性を維持
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: orders-api
spec:
  minAvailable: 2
  selector:
    matchLabels: { app: orders-api }

Gateway API (最新の Ingress)

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: external-gateway
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - name: api-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: orders-route
spec:
  parentRefs: [{ name: external-gateway }]
  hostnames: ["api.example.com"]
  rules:
    - matches: [{ path: { type: PathPrefix, value: "/orders" } }]
      backendRefs:
        - name: orders-api
          port: 80

GPU 推論ワークロード (Standard モード)

# 安価な推論のために L4 GPU と Spot VM を持つノードプールを作成
gcloud container node-pools create inference-l4 \
  --cluster=ml-cluster --region=us-central1 \
  --machine-type=g2-standard-8 \
  --accelerator=type=nvidia-l4,count=1,gpu-driver-version=LATEST \
  --num-nodes=0 --enable-autoscaling --min-nodes=0 --max-nodes=10 \
  --spot --node-taints=workload=inference:NoSchedule
apiVersion: apps/v1
kind: Deployment
metadata: { name: vllm-llama }
spec:
  replicas: 1
  selector: { matchLabels: { app: vllm } }
  template:
    metadata: { labels: { app: vllm } }
    spec:
      tolerations:
        - key: workload
          operator: Equal
          value: inference
          effect: NoSchedule
      nodeSelector:
        cloud.google.com/gke-accelerator: nvidia-l4
      containers:
        - name: vllm
          image: vllm/vllm-openai:latest
          args: ["--model", "meta-llama/Llama-3-8B-Instruct", "--port", "8000"]
          resources:
            limits:
              nvidia.com/gpu: 1
              memory: "24Gi"
              cpu: "4"

コスト最適化


# フォールトトレラントなバッチ処理のための Spot VM
gcloud container node-pools create batch-spot \
  --cluster=prod-cluster --region=us-central1 \
  --machine-type=e2-standard-4 \
  --spot --num

(原文がここで切り詰められています)
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

GCP Google Kubernetes Engine (GKE)

Overview

GKE is Google Cloud's managed Kubernetes platform. It runs the control plane for you, automates upgrades, and ships two operating modes: Autopilot (Google manages nodes; pay per pod) and Standard (you manage node pools; pay per node). Default to Autopilot — it eliminates node-level toil and is the recommended golden path for production.

Instructions

Autopilot vs Standard

Autopilot Standard
Node management Google You
Billing model Per-pod resources Per-node VM
Node pool config None You configure
Best for Most workloads DaemonSets, GPUs with custom drivers, privileged pods
Workload Identity Required Recommended

Use Standard only when you genuinely need node-level access (custom kernel, certain GPU configs, privileged DaemonSets). Otherwise, Autopilot.

Quick Start (Autopilot)

gcloud services enable container.googleapis.com

gcloud container clusters create-auto prod-cluster \
  --region=us-central1 \
  --release-channel=regular \
  --enable-private-nodes \
  --network=default --subnetwork=default

gcloud container clusters get-credentials prod-cluster --region=us-central1

kubectl create deployment hello \
  --image=us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
kubectl expose deployment hello --port=80 --target-port=8080 --type=LoadBalancer

Production Cluster Defaults

gcloud container clusters create-auto prod-cluster \
  --region=us-central1 \
  --release-channel=regular \
  --enable-private-nodes \
  --enable-master-authorized-networks \
  --master-authorized-networks=10.0.0.0/8,YOUR_OFFICE_CIDR \
  --network=prod-vpc --subnetwork=prod-subnet \
  --cluster-secondary-range-name=pods \
  --services-secondary-range-name=services \
  --workload-pool=my-project.svc.id.goog \
  --enable-shielded-nodes

Workload Identity (Pods → GCP APIs without keys)

# Create a Google Service Account for the workload
gcloud iam service-accounts create orders-api

gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:orders-api@my-project.iam.gserviceaccount.com" \
  --role="roles/cloudsql.client"

# Bind the Kubernetes ServiceAccount to the GSA
gcloud iam service-accounts add-iam-policy-binding \
  orders-api@my-project.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:my-project.svc.id.goog[default/orders-api]"
apiVersion: v1
kind: ServiceAccount
metadata:
  name: orders-api
  namespace: default
  annotations:
    iam.gke.io/gcp-service-account: orders-api@my-project.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
spec:
  replicas: 3
  selector:
    matchLabels: { app: orders-api }
  template:
    metadata:
      labels: { app: orders-api }
    spec:
      serviceAccountName: orders-api  # → maps to GSA via Workload Identity
      containers:
        - name: api
          image: us-central1-docker.pkg.dev/my-project/repo/orders-api:v1.4.2
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1"
              memory: "1Gi"

Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: orders-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: orders-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }
    - type: Resource
      resource:
        name: memory
        target: { type: Utilization, averageUtilization: 80 }
# PodDisruptionBudget — keep service available during voluntary disruptions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: orders-api
spec:
  minAvailable: 2
  selector:
    matchLabels: { app: orders-api }

Gateway API (Modern Ingress)

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: external-gateway
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - name: api-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: orders-route
spec:
  parentRefs: [{ name: external-gateway }]
  hostnames: ["api.example.com"]
  rules:
    - matches: [{ path: { type: PathPrefix, value: "/orders" } }]
      backendRefs:
        - name: orders-api
          port: 80

GPU Inference Workload (Standard Mode)

# Create a node pool with L4 GPUs and Spot VMs for cheap inference
gcloud container node-pools create inference-l4 \
  --cluster=ml-cluster --region=us-central1 \
  --machine-type=g2-standard-8 \
  --accelerator=type=nvidia-l4,count=1,gpu-driver-version=LATEST \
  --num-nodes=0 --enable-autoscaling --min-nodes=0 --max-nodes=10 \
  --spot --node-taints=workload=inference:NoSchedule
apiVersion: apps/v1
kind: Deployment
metadata: { name: vllm-llama }
spec:
  replicas: 1
  selector: { matchLabels: { app: vllm } }
  template:
    metadata: { labels: { app: vllm } }
    spec:
      tolerations:
        - key: workload
          operator: Equal
          value: inference
          effect: NoSchedule
      nodeSelector:
        cloud.google.com/gke-accelerator: nvidia-l4
      containers:
        - name: vllm
          image: vllm/vllm-openai:latest
          args: ["--model", "meta-llama/Llama-3-8B-Instruct", "--port", "8000"]
          resources:
            limits:
              nvidia.com/gpu: 1
              memory: "24Gi"
              cpu: "4"

Cost Optimization

# Spot VMs for fault-tolerant batch
gcloud container node-pools create batch-spot \
  --cluster=prod-cluster --region=us-central1 \
  --machine-type=e2-standard-4 \
  --spot --num-nodes=0 --enable-autoscaling --max-nodes=20

# Compute Class definition (newer alternative to node pools)
apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata: { name: spot-burst }
spec:
  priorities:
    - machineFamily: n4
      spot: true
    - machineFamily: n2
      spot: true
    - machineFamily: n2  # on-demand fallback
  nodePoolAutoCreation: { enabled: true }

Observability — Managed Prometheus

# Scrape app metrics with Google Cloud Managed Service for Prometheus
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata: { name: orders-api }
spec:
  selector:
    matchLabels: { app: orders-api }
  endpoints:
    - port: metrics
      interval: 30s

Examples

Example 1 — Stand up a production Autopilot cluster

User wants a hardened GKE cluster for a new service. Create an Autopilot cluster on the regular release channel with private nodes, master authorized networks, Workload Identity enabled, and Shielded Nodes. Wire the app's Kubernetes ServiceAccount to a GSA with the minimum IAM roles, deploy via a Deployment + HPA + PDB, and front it with the Gateway API for managed TLS.

Example 2 — Run a vLLM Llama 3 inference service on Spot L4s

User needs cheap LLM inference. Create a Standard cluster (Autopilot doesn't support custom GPU drivers consistently), add an L4 node pool with --spot and autoscaling 0→10, deploy vLLM with a node selector + toleration so only inference pods schedule there, and add Managed Prometheus scraping for token-throughput metrics.

Guidelines

  • Default to Autopilot — most workloads should never see a node pool config
  • Use the regular release channel in production; rapid for staging; stable only for highly conservative orgs
  • Always enable private nodes + master authorized networks; never expose the API server publicly
  • Workload Identity is mandatory — never put service account JSON keys in Secrets
  • Set resource requests AND limits; Autopilot rejects pods without them
  • Add a PodDisruptionBudget to every Deployment that serves traffic
  • Use Spot VMs / Compute Classes for batch and inference workloads to cut compute cost 60–90%
  • Use Gateway API for new ingress; the legacy Ingress resource is feature-frozen
  • Enable Managed Prometheus instead of self-hosting Prometheus
  • For multi-tenant clusters, isolate teams by namespace + RBAC + ResourceQuota