🛠️ 開発・MCP コミュニティ

qwen-voice

Qwenを活用し、ユーザーの音声メッセージをテキストに変換したり、テキストを自然な音声に変換してTelegramで共有したりするなど、音声に関する様々なタスクをスムーズに実行するSkill。

📜 元の英語説明(参考)

Use Qwen (DashScope/百炼) for speech tasks: (1) ASR speech-to-text transcription of user audio/voice messages (Telegram .ogg opus, wav, mp3) using qwen3-asr-flash, optionally with coarse timestamps via chunking; (2) TTS text-to-speech voice reply using qwen3-tts-flash with selectable voice (default Cherry) and output as .ogg voice note for Telegram.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o qwen-voice.zip https://jpskill.com/download/9292.zip && unzip -o qwen-voice.zip && rm qwen-voice.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9292.zip -OutFile "$d\qwen-voice.zip"; Expand-Archive "$d\qwen-voice.zip" -DestinationPath $d -Force; ri "$d\qwen-voice.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して qwen-voice.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → qwen-voice フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Qwen Voice (ASR + TTS)

バンドルされたスクリプトを使用してください。環境変数 DASHSCOPE_API_KEY を推奨します。存在しない場合、スクリプトは ~/.bashrc から読み込もうとします。

ASR (音声 → テキスト)

タイムスタンプなし (デフォルト)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg

タイムスタンプあり (チャンクベース)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg --timestamps --chunk-sec 3

注:

タイムスタンプは、固定長のチャンク分割によって生成されます (単語レベルのアライメントではありません)。
入力音声は、送信前にモノラル 16kHz WAV に変換されます。

TTS (テキスト → 音声)

プリセット音声 (デフォルト: Cherry)

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 Pi。' --voice Cherry --out /tmp/out.ogg

音声クローン (一度作成し、再利用)

サンプル音声から音声プロファイルを作成します。

python3 skills/qwen-voice/scripts/qwen_voice_clone.py --in ./voice_sample.ogg --name george --out work/qwen-voice/george.voice.json

クローンされた音声を使用して合成します。

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 George。' --voice-profile work/qwen-voice/george.voice.json --out /tmp/out.ogg

注:

.ogg 出力は Opus であり、Telegram のボイスメッセージに適しています。
音声クローンは、DashScope のカスタマイズエンドポイント + Qwen リアルタイム TTS モデルを使用します。
スクリプトは、work/venv-dashscope にローカル venv を使用します (初回実行時に自動作成されます)。

典型的なチャットワークフロー

ユーザーがボイスメッセージ/音声ファイルを送信した場合: ASR を実行し、文字起こしされたテキストで返信します。
ユーザーが明示的に音声による返信を求めた場合: TTS を実行し、生成された .ogg をボイスノートとして送信します。

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Qwen Voice (ASR + TTS)

Use the bundled scripts. Prefer environment variable DASHSCOPE_API_KEY. If missing, scripts attempt to read it from ~/.bashrc.

ASR (speech → text)

Non-timestamp (default)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg

With timestamps (chunk-based)

python3 skills/qwen-voice/scripts/qwen_asr.py --in /path/to/audio.ogg --timestamps --chunk-sec 3

Notes:

Timestamps are generated by fixed-length chunking (not word-level alignment).
Input audio is converted to mono 16kHz WAV before sending.

TTS (text → speech)

Preset voice (default: Cherry)

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 Pi。' --voice Cherry --out /tmp/out.ogg

Clone voice (create once, reuse)

Create a voice profile from a sample audio:

python3 skills/qwen-voice/scripts/qwen_voice_clone.py --in ./voice_sample.ogg --name george --out work/qwen-voice/george.voice.json

Use the cloned voice to synthesize:

python3 skills/qwen-voice/scripts/qwen_tts.py --text '你好，我是 George。' --voice-profile work/qwen-voice/george.voice.json --out /tmp/out.ogg

Notes:

.ogg output is Opus, suitable for Telegram voice messages.
Voice cloning uses DashScope customization endpoint + Qwen realtime TTS model.
Scripts use a local venv at work/venv-dashscope (auto-created on first run).

Typical chat workflow

When user sends voice message/audio: run ASR and reply with the transcribed text.
When user explicitly asks for voice reply: run TTS and send the generated .ogg as a voice note.