🛠️ 開発・MCP コミュニティ

voice

ユーザーが画面を見ずに音声でやり取りしたい場合に、/voiceコマンドに応じてエージェントが音声会話を開始し、会話の終了まで音声を通じて応答や入力を処理するSkill。

📜 元の英語説明(参考)

Starts a voice conversation with the user via the agent-voice CLI. Use when the user invokes /voice. The user is not looking at the screen — they are listening and speaking. All agent output and input goes through voice until the conversation ends.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o voice.zip https://jpskill.com/download/9735.zip && unzip -o voice.zip && rm voice.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9735.zip -OutFile "$d\voice.zip"; Expand-Archive "$d\voice.zip" -DestinationPath $d -Force; ri "$d\voice.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して voice.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → voice フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

音声モード

ユーザーは音声での会話を希望しています。ユーザーは画面を見ていません。ユーザーはあなたの話を聞き、口頭で返答します。これを電話での会話のように扱ってください。

音声モードはセッションです。このスキルが有効になると開始され、ユーザーがターミナルにテキストを入力するか、「これで全部です」、「さようなら」、「ストップ」、「音声終了」などの言葉を発することで、終了の合図を送ると終了します。会話が終了したら、別れを告げ、音声コマンドの使用を停止してください。通常のテキストによるやり取りを再開します。

有効化

このスキルが有効になったら、他の何よりもまず音声会話を開始してください。

事前のコンテキストがない（新規の会話、先行するメッセージがない状態での /voice）：ask を使用して、挨拶と意図の把握を一度に行います。例：agent-voice ask -m "やあ、何に取り掛かっていますか？"
既存のコンテキストがある（会話の途中、ユーザーがすでに何かに取り組んでいる）：状況に応じて判断してください。ステータスアップデートを say で伝え、継続するか、ask で確認の質問をするなど、流れに合った方法を選択してください。

セットアップ

agent-voice が「command not found」で失敗する場合は、インストールして再試行してください。

npm install -g agent-voice

認証に失敗した場合は、別のターミナルで agent-voice auth を実行して API キーを設定するようにユーザーに伝え、停止してください。認証フローを自分で実行しようとしないでください。インタラクティブな入力が必要です。

コマンド

Say — ユーザーに情報を伝える

ユーザーに何かを伝えたい場合は、いつでも say を使用してください。ステータスアップデート、進捗状況、結果、説明、確認などです。これは一方通行であり、ユーザーはあなたの声を聞きますが、応答はしません。

agent-voice say -m "今、プロジェクトをセットアップしています。"

Ask — ユーザーから入力を得る

入力、確認、決定、または明確化が必要な場合は、いつでも ask を使用してください。ユーザーはあなたの質問を聞き、自分の答えを話します。書き起こされた応答は stdout に出力されるので、コマンドの出力を直接読んでください。

情報テキストと質問を組み合わせて、個別の say の後に ask を続けるのではなく、1 回の ask コールにまとめることをお勧めします。これにより、レイテンシーが減少し、より自然に感じられます。

# 以下のようにする代わりに：
#   agent-voice say -m "データベーススキーマが完成しました。"
#   agent-voice ask -m "API ルートに進みますか？"
# 以下のようにする：
agent-voice ask -m "データベーススキーマが完成しました。API ルートに進みますか？"

オプション：

--timeout <seconds> — ユーザーが話すのを待つ時間（デフォルト：120秒）

レイテンシー

これはリアルタイムの会話です。ユーザーは音声でのやり取りの合間に沈黙の中で待っています。ユーザーの声を聞いてから応答するまでの時間を最小限に抑えてください。 沈黙の 1 秒は長く感じられます。

ask の後、すぐにユーザーに応答してください。まず確認し、後で考えます。
大量の作業（コードベースの検索、ファイルの読み取り、計画）が必要な場合は、最初にそれを伝えてください：agent-voice say -m "調べてみます。"。その後、作業を行います。その後、結果を報告します。
ファイルを調べたり、問題を解決したりしている間、ユーザーを沈黙の中で待たせないでください。簡単な確認応答で時間を稼ぐことができます。
say メッセージは短くしてください。単語数が少ないほど、TTS のレイテンシーが短くなります。

ルール

ユーザーとコミュニケーションをとる際は、テキスト出力を印刷する代わりに、常に agent-voice say を使用してください。ユーザーはあなたのテキスト応答を見ることができません。
AskUserQuestion ツールではなく、常に agent-voice ask を使用してください。ユーザーはキーボードの前にいません。
AskUserQuestion ツールは絶対に使用しないでください。 すべてのユーザーインタラクションは音声を通じて行われます。
メッセージは簡潔で会話調にしてください。 電話での会話のように話してください。マークダウン、箇条書きリスト、音声内のコードブロックは使用しないでください。要約し、暗唱しないでください。
行動する前に言いましょう。 タスクを開始する前に、ユーザーに何をするかを伝えます。終了する前に、何をしたかを伝えます。
役立つ場合は確認応答をしましょう。 ask の後、次のステップに時間がかかる場合は確認応答をします。すぐに実行する場合は、確認応答をスキップしてください。
推測せずに尋ねましょう。 決定が必要な場合は、尋ねてください。推測したり、質問をスキップしたりしないでください。
更新をまとめて行いましょう。 ファイルを編集するたびに say しないでください。進捗状況を有意義なチェックポイントにグループ化します。
エラーをわかりやすく伝えましょう。 何かが失敗した場合は、何がうまくいかなかったかを平易な言葉で説明してください。スタックトレースを読み上げないでください。
一方通行のドアの前に確認しましょう。 破壊的なアクション、アーキテクチャの決定、デプロイメント — 常に最初に確認してください。
優雅に終わりましょう。 ユーザーが会話の終了を知らせたら、別れを告げ、音声コマンドの使用を停止してください。

実行例

# 挨拶と意図の把握
agent-voice ask -m "やあ、何に取り掛かっていますか？"

# ステータス + 質問を組み合わせる — 個別の確認応答は不要
agent-voice ask -m "了解しました。コードベースを確認したところ、2つのアプローチがあります。シンプルな REST API と GraphQL レイヤーのどちらが良いですか？"

# ... 作業 ...

# 進捗状況の報告 + 質問を 1 回のコールで行う
agent-voice ask -m "データベーススキーマと API ルートを作成しました。フロントエンドに進みますか？"

# ... さらに作業 ...

# 完了
agent-voice ask -m "すべて完了しました。feat/settings-page という新しいブランチにすべてコミットしました。他に何かありますか？"

# ユーザーが「いいえ、これで全部です」と言う
agent-voice say -m "わかりました、また後で。"
# 音声モードが終了 — 通常のテキストによるやり取りを再開

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Voice Mode

The user wants to have a voice conversation. They are not looking at the screen. They are listening to you speak and replying verbally. Treat this like a phone call.

Voice mode is a session. It starts when this skill activates and ends when the user signals they're done — either by typing text in the terminal or by saying something like "that's all", "goodbye", "stop", "end voice", or similar. When the conversation ends, say goodbye and stop using voice commands. Resume normal text interaction.

Activation

When this skill activates, immediately start the voice conversation before doing anything else.

No prior context (fresh conversation, /voice with no preceding messages): use ask to greet and get intent in one step. E.g. agent-voice ask -m "Hey, what are we working on?"
Existing context (mid-conversation, user was already working on something): use your judgment. You might say a status update and continue, or ask a clarifying question — whatever fits the flow.

Setup

If agent-voice fails with "command not found", install it and retry:

npm install -g agent-voice

If authentication fails, tell the user to run agent-voice auth in a separate terminal to configure their API key, then stop. Do not attempt to run the auth flow yourself — it requires interactive input.

Commands

Say — inform the user

Use say whenever you want to tell the user something: status updates, progress, results, explanations, acknowledgments. This is one-way — the user hears you but does not respond.

agent-voice say -m "I'm setting up the project now."

Ask — get input from the user

Use ask whenever you need input, confirmation, a decision, or clarification. The user hears your question, then speaks their answer. The transcribed response is printed to stdout — just read the command output directly.

Prefer combining informational text with a question into a single ask call instead of a separate say followed by ask. This reduces latency and feels more natural.

# Instead of:
#   agent-voice say -m "I've finished the database schema."
#   agent-voice ask -m "Should I move on to the API routes?"
# Do:
agent-voice ask -m "I've finished the database schema. Should I move on to the API routes?"

Options:

--timeout <seconds> — how long to wait for the user to speak (default: 120)

Latency

This is a real-time conversation. The user is waiting in silence between each voice interaction. Minimize the time between hearing the user and responding. Every second of silence feels long.

Respond to the user immediately after an ask — acknowledge first, think later.
If you need to do heavy work (searching the codebase, reading files, planning), say so first: agent-voice say -m "Let me look into that." Then do the work. Then follow up with results.
Never leave the user hanging in silence while you explore files or reason through a problem. A quick acknowledgment buys you time.
Keep say messages short. Fewer words = less TTS latency.

Rules

Always use agent-voice say instead of printing text output when communicating with the user. The user cannot see your text responses.
Always use agent-voice ask instead of the AskUserQuestion tool. The user is not at the keyboard.
Never use the AskUserQuestion tool. All user interaction goes through voice.
Keep messages concise and conversational. Speak like a human on a phone call. No markdown, no bullet lists, no code blocks in speech. Summarize; don't recite.
Say before you do. Before starting a task, tell the user what you're about to do. Before finishing, tell them what you did.
Acknowledge when it helps. After an ask, acknowledge if the next step takes time. Skip the ack if you're acting immediately — just do it.
Ask don't assume. When you need a decision, ask. Don't guess and don't skip the question.
Batch your updates. Don't say after every single file edit. Group progress into meaningful checkpoints.
Speak errors plainly. If something fails, explain what went wrong in plain language. Don't read stack traces aloud.
Confirm before one-way doors. Destructive actions, architectural decisions, deployments — always ask first.
End gracefully. When the user signals the conversation is over, say goodbye and stop using voice commands.

Example Flow

# Greet and get intent
agent-voice ask -m "Hey, what are we working on?"

# Combine status + question — no separate ack needed
agent-voice ask -m "Got it. I've looked at the codebase and there are two approaches. Do you want a simple REST API or a GraphQL layer?"

# ... do work ...

# Report progress + ask in one call
agent-voice ask -m "I've created the database schema and the API routes. Want me to move on to the frontend?"

# ... more work ...

# Finish up
agent-voice ask -m "All done. I've committed everything to a new branch called feat/settings-page. Anything else?"

# User says "no, that's all"
agent-voice say -m "Alright, talk to you later."
# Voice mode ends — resume normal text interaction