💼 ビジネスコミュニティ

text-to-speech

自然な会話からポッドキャストまで、多様なAI音声モデルを活用し、テキストを高品質な音声に変換するSkill。

📜 元の英語説明(参考)

Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Higgs Audio, VibeVoice (podcasts). Capabilities: text-to-speech, voice cloning, multi-speaker dialogue, podcast generation, expressive speech. Use for: voiceovers, audiobooks, podcasts, accessibility, video narration, IVR, voice assistants. Triggers: text to speech, tts, voice generation, ai voice, speech synthesis, voice over, generate speech, ai narrator, voice cloning, text to audio, elevenlabs alternative, voice ai, ai voiceover, speech generator, natural voice

🇯🇵 日本人クリエイター向け解説

一言でいうと

自然な会話からポッドキャストまで、多様なAI音声モデルを活用し、テキストを高品質な音声に変換するSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o text-to-speech.zip https://jpskill.com/download/6213.zip && unzip -o text-to-speech.zip && rm text-to-speech.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/6213.zip -OutFile "$d\text-to-speech.zip"; Expand-Archive "$d\text-to-speech.zip" -DestinationPath $d -Force; ri "$d\text-to-speech.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して text-to-speech.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → text-to-speech フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-17
取得日時: 2026-05-17
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

テキスト読み上げ

Text-to-Speech

inference.sh CLI を介して、テキストを自然な音声に変換します。

クイックスタート

# CLI をインストール
curl -fsSL https://cli.inference.sh | sh && infsh login

# 音声を生成
infsh app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'

利用可能なモデル

モデル	アプリ ID	最適な用途
DIA TTS	`infsh/dia-tts`	会話的、表現豊か
Kokoro TTS	`infsh/kokoro-tts`	高速、自然
Chatterbox	`infsh/chatterbox`	汎用
Higgs Audio	`infsh/higgs-audio`	感情制御
VibeVoice	`infsh/vibevoice`	ポッドキャスト、長文

すべてのオーディオアプリを閲覧

infsh app list --category audio

例

基本的なテキスト読み上げ

infsh app run infsh/kokoro-tts --input '{"text": "Welcome to our tutorial."}'

DIA を使用した会話型 TTS

infsh app sample infsh/dia-tts --save input.json

# input.json を編集:
# {
#   "text": "Hey! How are you doing today? I'm really excited to share this with you.",
#   "voice": "conversational"
# }

infsh app run infsh/dia-tts --input input.json

長文オーディオ（ポッドキャスト）

infsh app sample infsh/vibevoice --save input.json

# ポッドキャストスクリプトで input.json を編集
infsh app run infsh/vibevoice --input input.json

Higgs を使用した表現豊かな音声

infsh app sample infsh/higgs-audio --save input.json

# {
#   "text": "This is absolutely incredible!",
#   "emotion": "excited"
# }

infsh app run infsh/higgs-audio --input input.json

ユースケース

ナレーション: 製品デモ、説明ビデオ
オーディオブック: テキストを音声に変換
ポッドキャスト: ポッドキャストエピソードを生成
アクセシビリティ: コンテンツをアクセス可能にする
IVR: 電話システムの音声プロンプト
ビデオナレーション: ビデオにナレーションを追加

ビデオとの組み合わせ

音声を生成し、次にトーキングヘッドビデオを作成します。

# 1. 音声を生成
infsh app run infsh/kokoro-tts --input '{"text": "Your script here"}' > speech.json

# 2. OmniHuman でアバタービデオにオーディオ URL を使用
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

ドキュメント

アプリの実行 - CLI を介してアプリを実行する方法
オーディオ文字起こし例 - オーディオ処理ワークフロー
アプリの概要 - アプリのエコシステムを理解する

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Text-to-Speech

Convert text to natural speech via inference.sh CLI.

Quick Start

# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Hello, welcome to our product demo."}'

Available Models

Model	App ID	Best For
DIA TTS	`infsh/dia-tts`	Conversational, expressive
Kokoro TTS	`infsh/kokoro-tts`	Fast, natural
Chatterbox	`infsh/chatterbox`	General purpose
Higgs Audio	`infsh/higgs-audio`	Emotional control
VibeVoice	`infsh/vibevoice`	Podcasts, long-form

Browse All Audio Apps

infsh app list --category audio

Examples

Basic Text-to-Speech

infsh app run infsh/kokoro-tts --input '{"text": "Welcome to our tutorial."}'

Conversational TTS with DIA

infsh app sample infsh/dia-tts --save input.json

# Edit input.json:
# {
#   "text": "Hey! How are you doing today? I'm really excited to share this with you.",
#   "voice": "conversational"
# }

infsh app run infsh/dia-tts --input input.json

Long-form Audio (Podcasts)

infsh app sample infsh/vibevoice --save input.json

# Edit input.json with your podcast script
infsh app run infsh/vibevoice --input input.json

Expressive Speech with Higgs

infsh app sample infsh/higgs-audio --save input.json

# {
#   "text": "This is absolutely incredible!",
#   "emotion": "excited"
# }

infsh app run infsh/higgs-audio --input input.json

Use Cases

Voiceovers: Product demos, explainer videos
Audiobooks: Convert text to spoken word
Podcasts: Generate podcast episodes
Accessibility: Make content accessible
IVR: Phone system voice prompts
Video Narration: Add narration to videos

Combine with Video

Generate speech, then create a talking head video:

# 1. Generate speech
infsh app run infsh/kokoro-tts --input '{"text": "Your script here"}' > speech.json

# 2. Use the audio URL with OmniHuman for avatar video
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

Related Skills

# Full platform skill (all 150+ apps)
npx skills add inferencesh/skills@inference-sh

# AI avatars (combine TTS with talking heads)
npx skills add inferencesh/skills@ai-avatar-video

# AI music generation
npx skills add inferencesh/skills@ai-music-generation

# Speech-to-text (transcription)
npx skills add inferencesh/skills@speech-to-text

# Video generation
npx skills add inferencesh/skills@ai-video-generation

Browse all apps: infsh app list

Documentation

Running Apps - How to run apps via CLI
Audio Transcription Example - Audio processing workflows
Apps Overview - Understanding the app ecosystem

text-to-speech

🇯🇵 日本人クリエイター向け解説

🎯 このSkillでできること

📦 インストール方法 (3ステップ)

📖 Skill本文(日本語訳)

テキスト読み上げ

クイックスタート

利用可能なモデル

すべてのオーディオアプリを閲覧

例

基本的なテキスト読み上げ

DIA を使用した会話型 TTS

長文オーディオ（ポッドキャスト）

Higgs を使用した表現豊かな音声

ユースケース

ビデオとの組み合わせ

関連スキル

ドキュメント

Text-to-Speech

Quick Start

Available Models

Browse All Audio Apps

Examples

Basic Text-to-Speech

Conversational TTS with DIA

Long-form Audio (Podcasts)

Expressive Speech with Higgs

Use Cases

Combine with Video

Related Skills

Documentation