🛠️ 開発・MCP コミュニティ

gemini-computer-use

Gemini Computer Useモデルを活用し、Playwrightでウェブブラウザ操作を自動化、エージェントループでタスクを実行し、危険な操作には安全確認を組み込むことで、ウェブブラウザ上の作業を効率化するSkill。

📜 元の英語説明(参考)

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o gemini-computer-use.zip https://jpskill.com/download/17144.zip && unzip -o gemini-computer-use.zip && rm gemini-computer-use.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/17144.zip -OutFile "$d\gemini-computer-use.zip"; Expand-Archive "$d\gemini-computer-use.zip" -DestinationPath $d -Force; ri "$d\gemini-computer-use.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して gemini-computer-use.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → gemini-computer-use フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 3

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Gemini Computer Use

クイックスタート

env ファイルを source し、APIキーを設定します。
```
cp env.example env.sh
$EDITOR env.sh
source env.sh
```

仮想環境を作成し、依存関係をインストールします。

python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium

プロンプトを指定してエージェントスクリプトを実行します。

python scripts/computer_use_agent.py \
  --prompt "Find the latest blog post title on example.com" \
  --start-url "https://example.com" \
  --turn-limit 6

ブラウザの選択

デフォルト: Playwright にバンドルされている Chromium (環境変数は不要)。
COMPUTER_USE_BROWSER_CHANNEL でチャネル (Chrome/Edge) を選択します。
COMPUTER_USE_BROWSER_EXECUTABLE でカスタムの Chromium ベースの実行可能ファイル (例: Brave) を使用します。

両方が設定されている場合、COMPUTER_USE_BROWSER_EXECUTABLE が優先されます。

コアワークフロー (エージェントループ)

スクリーンショットをキャプチャし、ユーザーの目標 + スクリーンショットをモデルに送信します。
レスポンス内の function_call アクションを解析します。
Playwright で各アクションを実行します。
safety_decision が require_confirmation の場合、実行前にユーザーに確認を求めます。
最新の URL + スクリーンショットを含む function_response オブジェクトを送信します。
モデルがテキストのみを返す (アクションなし) か、ターン制限に達するまで繰り返します。

運用上のガイダンス

サンドボックス化されたブラウザプロファイルまたはコンテナで実行します。
モデルに実行させたくない危険なアクションをブロックするには、--exclude を使用します。
変更する理由がない限り、ビューポートを 1440x900 に保ちます。

リソース

スクリプト: scripts/computer_use_agent.py
参照ノート: references/google-computer-use.md
Env テンプレート: env.example

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Gemini Computer Use

Quick start

Source the env file and set your API key:

cp env.example env.sh
$EDITOR env.sh
source env.sh

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium

Run the agent script with a prompt:

python scripts/computer_use_agent.py \
  --prompt "Find the latest blog post title on example.com" \
  --start-url "https://example.com" \
  --turn-limit 6

Browser selection

Default: Playwright's bundled Chromium (no env vars required).
Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL.
Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE.

If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.

Core workflow (agent loop)

Capture a screenshot and send the user goal + screenshot to the model.
Parse function_call actions in the response.
Execute each action in Playwright.
If a safety_decision is require_confirmation, prompt the user before executing.
Send function_response objects containing the latest URL + screenshot.
Repeat until the model returns only text (no actions) or you hit the turn limit.

Operational guidance

Run in a sandboxed browser profile or container.
Use --exclude to block risky actions you do not want the model to take.
Keep the viewport at 1440x900 unless you have a reason to change it.

Resources

Script: scripts/computer_use_agent.py
Reference notes: references/google-computer-use.md
Env template: env.example

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。

📄 SKILL.md (2,082 bytes)
📎 references/google-computer-use.md (636 bytes)
📎 scripts/computer_use_agent.py (11,557 bytes)