gcp-gke
Google Kubernetes Engine (GKE)のAutopilotを使い、本番環境で利用できるKubernetesクラスタを構築・運用し、コスト最適化や監視も実現するSkill。
📜 元の英語説明(参考)
Provision and operate production-ready Google Kubernetes Engine (GKE) clusters using Autopilot as the golden path. Covers Autopilot vs Standard, private clusters, Workload Identity, autoscaling, GPU/TPU node pools for AI inference, cost optimization with Spot VMs, and observability via Managed Prometheus.
🇯🇵 日本人クリエイター向け解説
Google Kubernetes Engine (GKE)のAutopilotを使い、本番環境で利用できるKubernetesクラスタを構築・運用し、コスト最適化や監視も実現するSkill。
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o gcp-gke.zip https://jpskill.com/download/14932.zip && unzip -o gcp-gke.zip && rm gcp-gke.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/14932.zip -OutFile "$d\gcp-gke.zip"; Expand-Archive "$d\gcp-gke.zip" -DestinationPath $d -Force; ri "$d\gcp-gke.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
gcp-gke.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
gcp-gkeフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 1
📖 Skill本文(日本語訳)
※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。
GCP Google Kubernetes Engine (GKE)
概要
GKE は Google Cloud のマネージド Kubernetes プラットフォームです。コントロールプレーンをユーザーに代わって実行し、アップグレードを自動化し、Autopilot(Google がノードを管理、Pod ごとに課金) と Standard(ユーザーがノードプールを管理、ノードごとに課金) の 2 つのオペレーティングモードを提供します。Autopilot をデフォルトで使用してください。ノードレベルの苦労がなくなり、本番環境に推奨されるゴールデンパスです。
手順
Autopilot vs Standard
| Autopilot | Standard | |
|---|---|---|
| ノード管理 | ユーザー | |
| 課金モデル | Pod リソースごと | ノード VM ごと |
| ノードプール構成 | なし | ユーザーが構成 |
| 最適な用途 | ほとんどのワークロード | DaemonSet、カスタムドライバー付き GPU、特権 Pod |
| Workload Identity | 必須 | 推奨 |
本当にノードレベルのアクセス (カスタムカーネル、特定の GPU 構成、特権 DaemonSet) が必要な場合にのみ Standard を使用してください。それ以外の場合は、Autopilot を使用します。
クイックスタート (Autopilot)
gcloud services enable container.googleapis.com
gcloud container clusters create-auto prod-cluster \
--region=us-central1 \
--release-channel=regular \
--enable-private-nodes \
--network=default --subnetwork=default
gcloud container clusters get-credentials prod-cluster --region=us-central1
kubectl create deployment hello \
--image=us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
kubectl expose deployment hello --port=80 --target-port=8080 --type=LoadBalancer
本番クラスタのデフォルト
gcloud container clusters create-auto prod-cluster \
--region=us-central1 \
--release-channel=regular \
--enable-private-nodes \
--enable-master-authorized-networks \
--master-authorized-networks=10.0.0.0/8,YOUR_OFFICE_CIDR \
--network=prod-vpc --subnetwork=prod-subnet \
--cluster-secondary-range-name=pods \
--services-secondary-range-name=services \
--workload-pool=my-project.svc.id.goog \
--enable-shielded-nodes
Workload Identity (キーなしで Pod → GCP API)
# ワークロード用の Google サービスアカウントを作成
gcloud iam service-accounts create orders-api
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:orders-api@my-project.iam.gserviceaccount.com" \
--role="roles/cloudsql.client"
# Kubernetes ServiceAccount を GSA にバインド
gcloud iam service-accounts add-iam-policy-binding \
orders-api@my-project.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:my-project.svc.id.goog[default/orders-api]"
apiVersion: v1
kind: ServiceAccount
metadata:
name: orders-api
namespace: default
annotations:
iam.gke.io/gcp-service-account: orders-api@my-project.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: orders-api
spec:
replicas: 3
selector:
matchLabels: { app: orders-api }
template:
metadata:
labels: { app: orders-api }
spec:
serviceAccountName: orders-api # → Workload Identity 経由で GSA にマッピング
containers:
- name: api
image: us-central1-docker.pkg.dev/my-project/repo/orders-api:v1.4.2
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
オートスケーリング
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: orders-api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: orders-api
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 70 }
- type: Resource
resource:
name: memory
target: { type: Utilization, averageUtilization: 80 }
# PodDisruptionBudget — 自発的な中断時にサービスを可用性を維持
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: orders-api
spec:
minAvailable: 2
selector:
matchLabels: { app: orders-api }
Gateway API (最新の Ingress)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: external-gateway
spec:
gatewayClassName: gke-l7-global-external-managed
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: api-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: orders-route
spec:
parentRefs: [{ name: external-gateway }]
hostnames: ["api.example.com"]
rules:
- matches: [{ path: { type: PathPrefix, value: "/orders" } }]
backendRefs:
- name: orders-api
port: 80
GPU 推論ワークロード (Standard モード)
# 安価な推論のために L4 GPU と Spot VM を持つノードプールを作成
gcloud container node-pools create inference-l4 \
--cluster=ml-cluster --region=us-central1 \
--machine-type=g2-standard-8 \
--accelerator=type=nvidia-l4,count=1,gpu-driver-version=LATEST \
--num-nodes=0 --enable-autoscaling --min-nodes=0 --max-nodes=10 \
--spot --node-taints=workload=inference:NoSchedule
apiVersion: apps/v1
kind: Deployment
metadata: { name: vllm-llama }
spec:
replicas: 1
selector: { matchLabels: { app: vllm } }
template:
metadata: { labels: { app: vllm } }
spec:
tolerations:
- key: workload
operator: Equal
value: inference
effect: NoSchedule
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-l4
containers:
- name: vllm
image: vllm/vllm-openai:latest
args: ["--model", "meta-llama/Llama-3-8B-Instruct", "--port", "8000"]
resources:
limits:
nvidia.com/gpu: 1
memory: "24Gi"
cpu: "4"
コスト最適化
# フォールトトレラントなバッチ処理のための Spot VM
gcloud container node-pools create batch-spot \
--cluster=prod-cluster --region=us-central1 \
--machine-type=e2-standard-4 \
--spot --num
(原文がここで切り詰められています) 📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開
GCP Google Kubernetes Engine (GKE)
Overview
GKE is Google Cloud's managed Kubernetes platform. It runs the control plane for you, automates upgrades, and ships two operating modes: Autopilot (Google manages nodes; pay per pod) and Standard (you manage node pools; pay per node). Default to Autopilot — it eliminates node-level toil and is the recommended golden path for production.
Instructions
Autopilot vs Standard
| Autopilot | Standard | |
|---|---|---|
| Node management | You | |
| Billing model | Per-pod resources | Per-node VM |
| Node pool config | None | You configure |
| Best for | Most workloads | DaemonSets, GPUs with custom drivers, privileged pods |
| Workload Identity | Required | Recommended |
Use Standard only when you genuinely need node-level access (custom kernel, certain GPU configs, privileged DaemonSets). Otherwise, Autopilot.
Quick Start (Autopilot)
gcloud services enable container.googleapis.com
gcloud container clusters create-auto prod-cluster \
--region=us-central1 \
--release-channel=regular \
--enable-private-nodes \
--network=default --subnetwork=default
gcloud container clusters get-credentials prod-cluster --region=us-central1
kubectl create deployment hello \
--image=us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
kubectl expose deployment hello --port=80 --target-port=8080 --type=LoadBalancer
Production Cluster Defaults
gcloud container clusters create-auto prod-cluster \
--region=us-central1 \
--release-channel=regular \
--enable-private-nodes \
--enable-master-authorized-networks \
--master-authorized-networks=10.0.0.0/8,YOUR_OFFICE_CIDR \
--network=prod-vpc --subnetwork=prod-subnet \
--cluster-secondary-range-name=pods \
--services-secondary-range-name=services \
--workload-pool=my-project.svc.id.goog \
--enable-shielded-nodes
Workload Identity (Pods → GCP APIs without keys)
# Create a Google Service Account for the workload
gcloud iam service-accounts create orders-api
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:orders-api@my-project.iam.gserviceaccount.com" \
--role="roles/cloudsql.client"
# Bind the Kubernetes ServiceAccount to the GSA
gcloud iam service-accounts add-iam-policy-binding \
orders-api@my-project.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:my-project.svc.id.goog[default/orders-api]"
apiVersion: v1
kind: ServiceAccount
metadata:
name: orders-api
namespace: default
annotations:
iam.gke.io/gcp-service-account: orders-api@my-project.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: orders-api
spec:
replicas: 3
selector:
matchLabels: { app: orders-api }
template:
metadata:
labels: { app: orders-api }
spec:
serviceAccountName: orders-api # → maps to GSA via Workload Identity
containers:
- name: api
image: us-central1-docker.pkg.dev/my-project/repo/orders-api:v1.4.2
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: orders-api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: orders-api
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 70 }
- type: Resource
resource:
name: memory
target: { type: Utilization, averageUtilization: 80 }
# PodDisruptionBudget — keep service available during voluntary disruptions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: orders-api
spec:
minAvailable: 2
selector:
matchLabels: { app: orders-api }
Gateway API (Modern Ingress)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: external-gateway
spec:
gatewayClassName: gke-l7-global-external-managed
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: api-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: orders-route
spec:
parentRefs: [{ name: external-gateway }]
hostnames: ["api.example.com"]
rules:
- matches: [{ path: { type: PathPrefix, value: "/orders" } }]
backendRefs:
- name: orders-api
port: 80
GPU Inference Workload (Standard Mode)
# Create a node pool with L4 GPUs and Spot VMs for cheap inference
gcloud container node-pools create inference-l4 \
--cluster=ml-cluster --region=us-central1 \
--machine-type=g2-standard-8 \
--accelerator=type=nvidia-l4,count=1,gpu-driver-version=LATEST \
--num-nodes=0 --enable-autoscaling --min-nodes=0 --max-nodes=10 \
--spot --node-taints=workload=inference:NoSchedule
apiVersion: apps/v1
kind: Deployment
metadata: { name: vllm-llama }
spec:
replicas: 1
selector: { matchLabels: { app: vllm } }
template:
metadata: { labels: { app: vllm } }
spec:
tolerations:
- key: workload
operator: Equal
value: inference
effect: NoSchedule
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-l4
containers:
- name: vllm
image: vllm/vllm-openai:latest
args: ["--model", "meta-llama/Llama-3-8B-Instruct", "--port", "8000"]
resources:
limits:
nvidia.com/gpu: 1
memory: "24Gi"
cpu: "4"
Cost Optimization
# Spot VMs for fault-tolerant batch
gcloud container node-pools create batch-spot \
--cluster=prod-cluster --region=us-central1 \
--machine-type=e2-standard-4 \
--spot --num-nodes=0 --enable-autoscaling --max-nodes=20
# Compute Class definition (newer alternative to node pools)
apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata: { name: spot-burst }
spec:
priorities:
- machineFamily: n4
spot: true
- machineFamily: n2
spot: true
- machineFamily: n2 # on-demand fallback
nodePoolAutoCreation: { enabled: true }
Observability — Managed Prometheus
# Scrape app metrics with Google Cloud Managed Service for Prometheus
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata: { name: orders-api }
spec:
selector:
matchLabels: { app: orders-api }
endpoints:
- port: metrics
interval: 30s
Examples
Example 1 — Stand up a production Autopilot cluster
User wants a hardened GKE cluster for a new service. Create an Autopilot cluster on the regular release channel with private nodes, master authorized networks, Workload Identity enabled, and Shielded Nodes. Wire the app's Kubernetes ServiceAccount to a GSA with the minimum IAM roles, deploy via a Deployment + HPA + PDB, and front it with the Gateway API for managed TLS.
Example 2 — Run a vLLM Llama 3 inference service on Spot L4s
User needs cheap LLM inference. Create a Standard cluster (Autopilot doesn't support custom GPU drivers consistently), add an L4 node pool with --spot and autoscaling 0→10, deploy vLLM with a node selector + toleration so only inference pods schedule there, and add Managed Prometheus scraping for token-throughput metrics.
Guidelines
- Default to Autopilot — most workloads should never see a node pool config
- Use the regular release channel in production; rapid for staging; stable only for highly conservative orgs
- Always enable private nodes + master authorized networks; never expose the API server publicly
- Workload Identity is mandatory — never put service account JSON keys in Secrets
- Set resource
requestsANDlimits; Autopilot rejects pods without them - Add a PodDisruptionBudget to every Deployment that serves traffic
- Use Spot VMs / Compute Classes for batch and inference workloads to cut compute cost 60–90%
- Use Gateway API for new ingress; the legacy
Ingressresource is feature-frozen - Enable Managed Prometheus instead of self-hosting Prometheus
- For multi-tenant clusters, isolate teams by namespace + RBAC + ResourceQuota