🛠️ 開発・MCP コミュニティ

claude-computer-use-advanced

Claudeのコンピューター操作ツールを使い、UI自動化やアプリ制御、複雑なワークフローを構築し、デスクトップ作業の自動化、アプリテスト、画面分析、ソフトウェア制御、コンピュータビジョン構築などを高度に行うSkill。

📜 元の英語説明(参考)

Advanced computer use patterns for UI automation, application control, and multi-step workflows using Claude's computer use tool. Use when automating desktop tasks, testing applications, analyzing screen content, controlling software programmatically, or building computer vision workflows. Supports zoom tool for enhanced vision on Opus 4.5, multi-step automation, and sophisticated application control.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o claude-computer-use-advanced.zip https://jpskill.com/download/9394.zip && unzip -o claude-computer-use-advanced.zip && rm claude-computer-use-advanced.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9394.zip -OutFile "$d\claude-computer-use-advanced.zip"; Expand-Archive "$d\claude-computer-use-advanced.zip" -DestinationPath $d -Force; ri "$d\claude-computer-use-advanced.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して claude-computer-use-advanced.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → claude-computer-use-advanced フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Claude Computer Use Advanced

概要

Claudeのコンピューター使用ツールは、AIエージェントがデスクトップ環境とプログラム的に対話することを可能にします。このスキルは、高度なUI自動化ワークフローの構築、アプリケーションのテスト、画面からのデータ抽出、およびオペレーティングシステム全体でのアプリケーションの制御のための高度なパターンを提供します。

コンピューターの使用は、AI能力の大きな進歩を表しています。APIに依存するのではなく、Claudeは人間と同じように、画面を見て、要素をクリックし、テキストを入力し、キーボードショートカットを使用することで、あらゆるソフトウェアと対話できます。これにより、グラフィカルインターフェイスを持つ事実上すべてのデスクトップタスクを自動化することが可能になります。

主な機能:

スクリーンショットを撮り、視覚的なコンテンツを分析する
正確な座標を使用して画面要素をクリックする
テキストを入力し、フォームを送信する
キーボードショートカットと特殊キーを押す
スクロール、ドラッグ、および複雑なマウス操作を実行する
特定の画面領域を検査するためにズームツールを使用する (Opus 4.5のみ)
複数ステップの自動化ワークフローを実行する
アプリケーションとソフトウェアをプログラム的に制御する

主要なモデル:

Claude Opus 4.5 (computer_20251124): 視覚的な精度を高めるためのズームツールを備えた最新バージョン
Claude 4 / Sonnet 3.7 (computer_20250124): 完全なアクションサポートを備えた安定バージョン

どのような時に使用するか

コンピューターの使用は、以下の一般的なシナリオで使用します。

デスクトップ自動化

反復タスクのバッチ処理
フォームへの入力とデータ入力ワークフロー
ファイルとフォルダーの管理
構成とセットアップの自動化
複数アプリケーションのワークフロー

アプリケーションテスト

自動化されたUIテストフレームワーク
クロスプラットフォームテストワークフロー
ビジュアルリグレッションテスト
ユーザーインタラクションシミュレーション
品質保証の自動化

画面分析とデータ抽出

アプリケーションからのテキストとデータの抽出
視覚的なレイアウトとデザインの分析
非APIシステムからのコンテンツの読み取り
スクリーンショットの分析と理解
OCRのような画面からのテキスト抽出

アプリケーション制御と統合

APIなしでレガシーアプリケーションを制御する
クローズドシステム用の自動化エージェントの構築
RPA (Robotic Process Automation) ワークフローの構築
複数アプリケーションプロセスのオーケストレーション
ソフトウェアのテストと検証

コンピューター使用タスク

タスク 1: スクリーンショットの撮影

スクリーンショットは、コンピューター使用の基礎を形成します。Claudeは、次に何をすべきかを知るために画面を見る必要があります。

どのような時に使用するか:

あらゆるワークフローの最初のアクションとして
要素をクリックする前に (座標を取得するため)
現在のアプリケーションの状態を分析するため
画面に何が表示されているかを理解するため
アクション後に結果を確認するため

どのように機能するか: スクリーンショットアクションは、ディスプレイ全体をキャプチャし、それをbase64エンコードされた画像として返します。Claudeは、スクリーンショットを分析して、インターフェイスを理解し、要素を識別し、次のアクションを決定できます。

基本的な例:

action = {
    "type": "screenshot"
}

座標系:

原点 (0, 0) は画面の左上にあります
Xは右に増加します
Yは下に増加します
座標はディスプレイ上のピクセル位置を参照します

ベストプラクティス:

常にスクリーンショットから開始して、初期状態を確立します
重要なアクションの後にスクリーンショットを撮って、成功を確認します
正確な要素の場所にはズームツール (Opus 4.5) を使用します
座標を計画する際には、ディスプレイの解像度を考慮してください
標準解像度: 1280x800 (推奨: 1024x768)

タスク 2: 要素のクリック

クリックは、UI要素 (ボタン、リンク、チェックボックス、メニュー項目、およびクリック可能なインターフェイス要素) と対話する主な方法です。

どのような時に使用するか:

ボタンをアクティブにしてフォームを送信する
メニューを開いてオプションを選択する
リンクをクリックしてナビゲートする
チェックボックスとラジオボタンを切り替える
リストまたはドロップダウンで項目を選択する

どのように機能するか: クリックアクションは、x,y座標を受け取り、その位置にマウスクリックを送信します。Claudeは、スクリーンショットを使用して要素の場所を識別し、それらをクリックします。

基本的な例:

action = {
    "type": "left_click",
    "coordinate": [640, 400]  # スクリーンショットからのx, y
}

座標の精度:

クリック可能な要素の中心を狙います
正確な座標が必要な場合は、ズームツール (Opus 4.5) を使用します
クリックが失敗した場合は、別のスクリーンショットを撮って調整します
小さな要素では、精度を高めるためにズームされた領域が必要になる場合があります

高度なクリックタイプ:

left_click: 標準のシングルクリック
right_click: コンテキストメニューを開きます
double_click: テキストを選択するか、ダブルクリックアクションをアクティブにします
middle_click: 一部のアプリケーションはミドルクリックを使用します
click_holding: ドラッグ操作のためにクリックして保持します

ベストプラクティス:

クリックする前にスクリーンショットを撮って、正しい座標を見つけます
ボタン/リンクの境界を視覚的に識別します
最高の精度を得るには、中心座標を使用します
フォローアップのスクリーンショットでクリックが成功したことを確認します
調整された座標で再試行して、クリックの失敗を処理します

タスク 3: テキストの入力とフォームへの入力

入力により、テキスト入力 (フォーム、検索ボックス、コマンドプロンプト、およびテキストフィールドへのデータの入力) が可能になります。

どのような時に使用するか:

フォームフィールドにテキストを入力する
検索クエリを入力する
コマンドまたはコードを入力する
資格情報を入力する (安全な環境でのみ)
テキストエリアとフィールドの入力

どのように機能するか: タイプアクションは、キーボード入力を文字ごとに送信します。テキストは、通常、テキストフィールドをクリックした後の現在のカーソル位置に入力されます。

基本的な例:

# 最初にテキストフィールドをクリックします
action = {"type": "left_click", "coordinate": [500, 300]}

# 次にテキストを入力します
action = {
    "type": "type",
    "text": "Hello, World!"
}

テキスト入力ワークフロー:

スクリーンショットを撮ってフォームを確認します
ターゲットテキストフィールドをクリックします
テキストを入力します
オプションで、EnterキーまたはTabキーを押して、送信するか、次のフィールドに移動します
スクリーンショットを撮って入力を確認します

特殊文字:


# 特殊文字にはキーアクションを使用します
action = {"type": "key", "key": "Return"}       # Enterキー
action = {"type": "key"
(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Claude Computer Use Advanced

Overview

Claude's computer use tool enables AI agents to interact with desktop environments programmatically. This skill provides advanced patterns for building sophisticated UI automation workflows, testing applications, extracting data from screens, and controlling applications across operating systems.

Computer use represents a major advancement in AI capability - rather than relying on APIs, Claude can interact with any software the same way a human would: by viewing the screen, clicking elements, typing text, and using keyboard shortcuts. This makes it possible to automate virtually any desktop task that has a graphical interface.

Core Capabilities:

Take screenshots and analyze visual content
Click on screen elements using precise coordinates
Type text and submit forms
Press keyboard shortcuts and special keys
Scroll, drag, and perform complex mouse operations
Use zoom tool to inspect specific screen regions (Opus 4.5 only)
Execute multi-step automation workflows
Control applications and software programmatically

Key Models:

Claude Opus 4.5 (computer_20251124): Newest version with zoom tool for enhanced vision accuracy
Claude 4 / Sonnet 3.7 (computer_20250124): Stable version with full action support

When to Use

Use computer use for these common scenarios:

Desktop Automation

Batch processing repetitive tasks
Form filling and data entry workflows
File and folder management
Configuration and setup automation
Multi-application workflows

Application Testing

Automated UI testing frameworks
Cross-platform testing workflows
Visual regression testing
User interaction simulation
Quality assurance automation

Screen Analysis & Data Extraction

Extracting text and data from applications
Analyzing visual layouts and designs
Reading content from non-API systems
Screenshot analysis and understanding
OCR-like text extraction from screens

Application Control & Integration

Controlling legacy applications without APIs
Creating automation agents for closed systems
Building RPA (Robotic Process Automation) workflows
Orchestrating multi-application processes
Software testing and validation

Computer Use Tasks

Task 1: Taking Screenshots

Screenshots form the foundation of computer use - Claude needs to see the screen to know what to do next.

When to Use:

As the first action in any workflow
Before clicking on elements (to get coordinates)
To analyze the current application state
To understand what's visible on screen
After actions to verify results

How It Works: The screenshot action captures the entire display and returns it as a base64-encoded image. Claude can then analyze the screenshot to understand the interface, identify elements, and determine the next action.

Basic Example:

action = {
    "type": "screenshot"
}

Coordinate System:

Origin (0, 0) is at the top-left of the screen
X increases to the right
Y increases downward
Coordinates refer to pixel positions on the display

Best Practices:

Always start with a screenshot to establish the initial state
Take screenshots after significant actions to verify success
Use zoom tool (Opus 4.5) for precise element location
Consider display resolution when planning coordinates
Standard resolution: 1280x800 (recommended: 1024x768)

Task 2: Clicking Elements

Clicking is the primary way to interact with UI elements - buttons, links, checkboxes, menu items, and any clickable interface element.

When to Use:

Activating buttons and submitting forms
Opening menus and selecting options
Clicking links to navigate
Toggling checkboxes and radio buttons
Selecting items in lists or dropdowns

How It Works: The click action takes x,y coordinates and sends a mouse click at that position. Claude uses the screenshot to identify element locations and then clicks on them.

Basic Example:

action = {
    "type": "left_click",
    "coordinate": [640, 400]  # x, y from screenshot
}

Coordinate Precision:

Aim for the center of clickable elements
Use zoom tool (Opus 4.5) when precise coordinates are needed
If click misses, take another screenshot and adjust
Small elements may require zoomed region for accuracy

Advanced Click Types:

left_click: Standard single click
right_click: Opens context menus
double_click: Selects text or activates double-click actions
middle_click: Some applications use middle-click
click_holding: Click and hold for drag operations

Best Practices:

Take a screenshot before clicking to find correct coordinates
Identify button/link boundaries visually
Use center coordinates for best accuracy
Verify clicks succeeded with follow-up screenshots
Handle missed clicks by retrying with adjusted coordinates

Task 3: Typing Text & Form Input

Typing enables text input - entering data into forms, search boxes, command prompts, and text fields.

When to Use:

Filling out form fields with text
Entering search queries
Typing commands or code
Inputting credentials (in secure environments only)
Text area and field population

How It Works: The type action sends keyboard input character by character. The text is typed at the current cursor position, typically after clicking a text field.

Basic Example:

# Click text field first
action = {"type": "left_click", "coordinate": [500, 300]}

# Then type text
action = {
    "type": "type",
    "text": "Hello, World!"
}

Text Input Workflow:

Take screenshot to see the form
Click on the target text field
Type the text
Optionally press Enter or Tab to submit/move to next field
Screenshot to verify input

Special Characters:

# Use key action for special characters
action = {"type": "key", "key": "Return"}       # Enter key
action = {"type": "key", "key": "Tab"}          # Tab key
action = {"type": "key", "key": "BackSpace"}    # Delete character
action = {"type": "key", "key": "ctrl+a"}       # Select all
action = {"type": "key", "key": "ctrl+c"}       # Copy
action = {"type": "key", "key": "ctrl+v"}       # Paste

Best Practices:

Always click the text field first to focus it
Clear existing text with Ctrl+A and Delete if needed
Use Tab to move between form fields
Press Enter to submit forms
Take screenshots between actions to verify input

Task 4: Keyboard Control & Special Keys

Keyboard actions provide precise control over keys, shortcuts, and special inputs beyond text typing.

When to Use:

Pressing keyboard shortcuts (Ctrl+C, Ctrl+V, etc.)
Using special keys (Enter, Tab, Escape, arrows)
Navigating menus and dialogs with keyboard
Using application-specific hotkeys
Controlling focus and navigation

How It Works: The key action presses keyboard keys or combinations. It can send single keys or key combinations (like Ctrl+A).

Common Keys:

# Navigation
{"type": "key", "key": "Return"}      # Enter key
{"type": "key", "key": "Tab"}         # Tab to next field
{"type": "key", "key": "BackSpace"}   # Delete character
{"type": "key", "key": "Delete"}      # Delete forward
{"type": "key", "key": "Escape"}      # Escape/Cancel

# Arrows
{"type": "key", "key": "Up"}
{"type": "key", "key": "Down"}
{"type": "key", "key": "Left"}
{"type": "key", "key": "Right"}

# Shortcuts
{"type": "key", "key": "ctrl+a"}      # Select all
{"type": "key", "key": "ctrl+c"}      # Copy
{"type": "key", "key": "ctrl+v"}      # Paste
{"type": "key", "key": "ctrl+z"}      # Undo
{"type": "key", "key": "ctrl+s"}      # Save
{"type": "key", "key": "alt+Tab"}     # Switch windows

Key Holding (for drag operations):

{"type": "key", "key": "shift", "held": True}  # Hold shift while clicking

Best Practices:

Use keyboard shortcuts when available
Tab through form fields instead of clicking each one
Use Escape to close dialogs or cancel operations
Combine arrow keys for navigation
Use Ctrl+A before typing to replace selected text

Task 5: Using the Zoom Tool (Opus 4.5 Exclusive)

The zoom tool is a powerful feature exclusive to Claude Opus 4.5 that lets you inspect specific regions of the screen at full resolution, enabling precise element location and visual analysis.

What It Does: The zoom tool captures a rectangular region of the screen and returns it at full resolution without downscaling. This allows Claude to see fine details, read small text, identify exact element boundaries, and determine precise click coordinates.

When to Use:

Locating small UI elements accurately
Reading fine-print text
Analyzing icon details
Identifying exact button positions
Handling crowded interfaces
Improving coordinate precision for clicks

How It Works: You provide a rectangular region defined by coordinates [x1, y1, x2, y2] where:

(x1, y1) = top-left corner of region
(x2, y2) = bottom-right corner of region

Basic Example:

# Zoom into a specific region to see details
action = {
    "type": "zoom",
    "coordinate": [400, 200, 800, 400]  # [x1, y1, x2, y2]
}

Zoom Workflow Example:

# 1. Take full screenshot to understand layout
{"type": "screenshot"}

# 2. Identify region with uncertain element location
# Need to find exact position of "Submit" button

# 3. Zoom into that region for precise view
{"type": "zoom", "coordinate": [300, 350, 700, 450]}

# 4. With precise view, identify exact coordinates
# See "Submit" button at pixel position [550, 385]

# 5. Click with confidence
{"type": "left_click", "coordinate": [550, 385]}

Region Selection:

Small regions (50x50) for individual elements
Medium regions (200x200) for control groups
Larger regions up to full screen
Leave sufficient margins around target element

Vision Accuracy Benefits:

Opus 4.5's improved vision can read text more accurately
Zoom provides full resolution for detail inspection
Better at identifying element boundaries
Helps with crowded or complex UIs
Reduces click coordinate errors

Best Practices:

Use zoom when initial screenshot doesn't clearly show element
Zoom into area 20-30 pixels beyond element on all sides
Use full region coordinates from screenshot
Zoom only when precision is critical
Combine with screenshots for optimal efficiency

Task 6: Multi-Step Automation Workflows

Complex automation requires coordinating multiple actions across steps - this task covers orchestrating sophisticated workflows.

When to Use:

Multi-application workflows
Complex data entry processes
Testing procedures with multiple steps
Sequential automation tasks
Conditional workflows (if this, then that)

How It Works: Agent loops execute sequences of actions, using screenshots to understand results and determine next steps. The loop continues until the workflow is complete.

Basic Agent Loop Pattern:

# Pseudo-code for agent loop
actions = []

# Step 1: Take screenshot to see initial state
actions.append({"type": "screenshot"})

# Step 2: Analyze screenshot and click button
actions.append({"type": "left_click", "coordinate": [100, 50]})

# Step 3: Take screenshot to see result
actions.append({"type": "screenshot"})

# Step 4: Fill form based on new state
actions.append({"type": "left_click", "coordinate": [200, 200]})
actions.append({"type": "type", "text": "Form data"})

# Step 5: Submit
actions.append({"type": "left_click", "coordinate": [200, 300]})

# Step 6: Verify with screenshot
actions.append({"type": "screenshot"})

Error Recovery:

# If a click misses or action fails:
# 1. Take screenshot
# 2. Re-evaluate coordinates
# 3. Retry with adjusted position
# 4. Use zoom for precision if needed
# 5. Continue workflow

Workflow State Tracking:

Track which steps are complete
Remember important data extracted
Maintain context about current application state
Use screenshots as state checkpoints
Save intermediate results for verification

Best Practices:

Take screenshots at workflow boundaries
Verify each major step with feedback
Handle unexpected states gracefully
Use Try/catch-like patterns for errors
Log important transitions for debugging

Task 7: Application Control & System Interaction

Beyond UI clicks, computer use enables controlling applications, navigating system interfaces, and performing system-level tasks.

When to Use:

Window/application navigation
File system interaction (opening files, folders)
System settings configuration
Application launching and management
Multi-window workflows

How It Works: Standard mouse/keyboard operations work with any application:

Clicking desktop, taskbar, menu items
Opening file dialogs and navigating folders
Using File/Edit/View menus
Performing system-level operations
Managing multiple windows

Application Navigation Example:

# Open application menu
{"type": "left_click", "coordinate": [500, 10]}  # Menu bar

# Take screenshot to see menu
{"type": "screenshot"}

# Click menu item
{"type": "left_click", "coordinate": [520, 100]}

# Wait for dialog to open
{"type": "screenshot"}

# Interact with dialog
{"type": "left_click", "coordinate": [300, 300]}

File System Navigation:

# Open File > Open dialog with keyboard shortcut
{"type": "key", "key": "ctrl+o"}

# Take screenshot to see file dialog
{"type": "screenshot"}

# Navigate to folder (multiple methods)
# Method 1: Type path directly
{"type": "type", "text": "/path/to/folder"}

# Method 2: Double-click folders in explorer
{"type": "double_click", "coordinate": [400, 200]}

# Select and open file
{"type": "left_click", "coordinate": [400, 250]}
{"type": "key", "key": "Return"}

Multi-Window Workflows:

# Switch between windows
{"type": "key", "key": "alt+Tab"}

# Take screenshot to verify window
{"type": "screenshot"}

# Interact with new window
{"type": "left_click", "coordinate": [500, 400]}

Best Practices:

Use keyboard shortcuts when available
Navigate menus through visual screenshots
Handle different menu layouts gracefully
Use file dialog navigation carefully
Take screenshots between window switches

Quick Start Example

Here's a complete example of a simple automation workflow - filling out a web form:

import anthropic
import base64

client = anthropic.Anthropic(api_key="your-api-key")

# Define the computer use tool
tools = [
    {
        "name": "computer",
        "type": "computer_20251124",  # Opus 4.5 with zoom
        "display_width_px": 1024,
        "display_height_px": 768,
        "display_number": ":1"
    }
]

# Start with a screenshot
messages = [
    {
        "role": "user",
        "content": "Fill out the contact form with name 'John Doe' and email 'john@example.com', then submit it."
    }
]

# Add the screenshot action
messages.append({
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "First, take a screenshot to see the current state of the screen."
        }
    ]
})

# Create the request with beta header
response = client.messages.create(
    model="claude-opus-4-5-20250929",
    max_tokens=1024,
    tools=tools,
    messages=messages,
    headers={"anthropic-beta": "computer-use-2025-11-24"}
)

# Process the response - it will contain tool use actions
for content_block in response.content:
    if content_block.type == "tool_use":
        action = content_block.input

        # Execute the action and get result
        if action["type"] == "screenshot":
            # Return screenshot (base64 encoded)
            result = capture_screenshot()  # Your implementation
        elif action["type"] == "left_click":
            # Execute click
            result = click_at_coordinates(action["coordinate"])
        elif action["type"] == "type":
            # Type text
            result = type_text(action["text"])

        # Continue with the agent loop
        # ... (add result back to messages and continue)

API Reference

For detailed API specifications, parameters, response formats, and advanced usage patterns, see:

references/computer-use-api.md - Complete API documentation with all tool versions and action types

Advanced Patterns

For sophisticated automation patterns, multi-step workflows, zoom tool techniques, and best practices:

references/advanced-patterns.md - Advanced automation, error handling, and optimization

Security & Deployment

For security considerations, safe deployment practices, and operational guidelines:

references/security-deployment.md - Security, containerization, monitoring, and responsible use

Security & Limitations

Security Considerations:

Isolation: Deploy in isolated virtual machines or containers with minimal privileges
Network Control: Restrict internet access via domain allowlists
Credentials: Avoid providing sensitive credentials unless absolutely necessary
Confirmation: Request human approval for significant decisions
Input Validation: Validate all user inputs to prevent prompt injection

Known Limitations:

Latency: Not suitable for time-sensitive interactive tasks
Vision Accuracy: Computer vision may misidentify elements or coordinates
Application Support: Spreadsheets and specialized applications can be unreliable
Account Management: Cannot reliably create accounts or share content on social platforms
Prompt Injection: Vulnerable to prompt injection in web-based environments
Resolution: Recommended maximum 1280x800 resolution
Token Cost: Screenshots consume tokens due to vision processing

Related Skills

anthropic-expert - Overview of Claude computer use and tool use capabilities
claude-opus-4-5-guide - Opus 4.5 features including zoom tool enhancements
multi-ai-research - Research patterns for investigating third-party applications

Learn More: Start with the Quick Start example above, then explore the reference guides for advanced patterns and complete API documentation.