📦 その他コミュニティ

acestep-lyrics-transcription

音声ファイルからOpenAI WhisperやElevenLabs Scribe APIを使って歌詞を抽出し、LRC、SRT、JSON形式で、単語レベルの時間情報付き歌詞データを作成し、歌の文字起こしやカラオケ用歌詞ファイルの作成を支援するSkill。

📜 元の英語説明(参考)

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o acestep-lyrics-transcription.zip https://jpskill.com/download/9148.zip && unzip -o acestep-lyrics-transcription.zip && rm acestep-lyrics-transcription.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9148.zip -OutFile "$d\acestep-lyrics-transcription.zip"; Expand-Archive "$d\acestep-lyrics-transcription.zip" -DestinationPath $d -Force; ri "$d\acestep-lyrics-transcription.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して acestep-lyrics-transcription.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → acestep-lyrics-transcription フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

歌詞の文字起こし Skill

オーディオファイルを、OpenAI Whisper または ElevenLabs Scribe API を介して、タイムスタンプ付きの歌詞 (LRC/SRT/JSON) に文字起こしします。

API キーの設定ガイド

文字起こしを行う前に、ユーザーの API キーが設定されているかどうかを必ず確認してください。 確認するには、次のコマンドを実行します。

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --check-key

このコマンドは、アクティブなプロバイダーの API キーが設定されているか空であるかのみを報告します。実際のキーの値は出力しません。ユーザーの API キーの内容を読み取ったり表示したりしないでください。 キーフィールドに対して config --get を使用したり、config.json を直接読み取ったりしないでください。config --list コマンドは安全です。出力時に API キーを *** として自動的にマスクします。

コマンドがキーが空であると報告した場合、先に進む前にユーザーに設定方法を説明し、設定を促す必要があります。有効なキーがない状態で文字起こしを試みないでください。失敗します。

AskUserQuestion を使用して、次のオプションとガイダンスとともに、ユーザーに API キーの提供を依頼します。

現在アクティブなプロバイダー (openai または elevenlabs) と、その API キーが設定されていないことをユーザーに伝えます。それがないと文字起こしを進めることができないことを説明します。
キーの入手場所に関する明確な指示を提供します。
- OpenAI: https://platform.openai.com/api-keys で API キーを取得します。課金が有効になっている OpenAI アカウントが必要です。Whisper API の費用は約 0.006 ドル/分です。
- ElevenLabs: https://elevenlabs.io/app/settings/api-keys で API キーを取得します。ElevenLabs アカウントが必要です。無料枠には利用制限があります。
また、すでにキーを持っている場合は、別のプロバイダーに切り替えるオプションも提供します。

ユーザーがキーを提供したら、次を使用して設定します。

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set <provider>.api_key <KEY>

ユーザーがプロバイダーを切り替えたい場合は、次も実行します。

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider <provider_name>

設定後、config --check-key を再度実行して、先に進む前にキーが設定されていることを確認します。

API キーがすでに設定されている場合は、質問せずに直接文字起こしに進みます。

クイックスタート

# 1. この skill のディレクトリに cd します
cd {project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/

# 2. API キーを設定します (いずれか 1 つを選択)
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
# または
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# 3. 文字起こし
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh

# 4. 出力は以下に保存されます: {project_root}/acestep_output/<filename>.lrc

前提条件

curl, jq, python3 (または python)
OpenAI または ElevenLabs の API キー

スクリプトの使用法

./scripts/acestep-lyrics-transcription.sh transcribe --audio <file> [options]

Options:
  -a, --audio      オーディオファイルのパス (必須)
  -l, --language   言語コード (zh, en, ja など)
  -f, --format     出力形式: lrc, srt, json (デフォルト: lrc)
  -p, --provider   API プロバイダー: openai, elevenlabs (config をオーバーライド)
  -o, --output     出力ファイルのパス (デフォルト: acestep_output/<filename>.lrc)

文字起こし後の歌詞の修正 (必須)

重要: 文字起こし後、MV レンダリングに使用する前に、LRC ファイルを手動で修正する必要があります。文字起こしモデルは、歌われた歌詞で頻繁にエラーを生成します。

固有名詞: "ACE-Step" → "AC step", "Spotify" → "spot a fly"
類似した音の単語: "arrives" → "eyes", "open source" → "open sores"
結合/分割された単語: "lighting up" → "lightin' nup"

修正ワークフロー

Read ツールを使用して文字起こしされた LRC ファイルを読み取ります
ACE-Step 出力 JSON ファイルから元の歌詞を読み取ります
元の歌詞を全体的な参照として使用します: 行ごとの配置を試みないでください。文字起こしは、元の歌詞とは異なる方法で、行を分割、結合、または並べ替えることがよくあります。代わりに、元の歌詞をすべて読んで正しい言い回しを理解し、各 LRC 行をスキャンして、元の歌詞の内容に基づいて誤って認識された単語を修正します。
文字起こしエラーを修正します: 誤って認識された単語を正しい元の単語に置き換え、タイムスタンプはそのままにします
Write ツールを使用して修正された LRC を書き戻します

修正する内容

誤って認識された単語を、正しい元のバージョンに置き換えます
すべての [MM:SS.cc] タイムスタンプを正確にそのまま保持します (文字起こしからのタイムスタンプは正確です)
[Verse] や [Chorus] などの構造タグを追加しないでください。LRC にはタイムスタンプ付きのテキスト行のみが含まれている必要があります

例

文字起こし (誤り):

[00:46.96]AC step alive,
[00:50.80]one point five eyes.

元の歌詞の参照:

ACE-Step alive
One point five arrives

修正済み (正しい):

[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.

設定

設定ファイル: scripts/config.json

# プロバイダーの切り替え
./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# API キーの設定
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...

# 設定の表示
./scripts/acestep-lyrics-transcription.sh config --list

オプション	デフォルト	説明
`provider`	`openai`	アクティブなプロバイダー: `openai` または `elevenlabs`
`output_format`	`lrc`	デフォルトの出力: `lrc`, `srt`, または `json`
`openai.api_key`	`""`	OpenAI API キー
`openai.api_url`	`https://api.openai.com/v1`	OpenAI API ベース URL
`openai.model`	`whisper-1`	OpenAI モデル (whisper-1 for

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Lyrics Transcription Skill

Transcribe audio files to timestamped lyrics (LRC/SRT/JSON) via OpenAI Whisper or ElevenLabs Scribe API.

API Key Setup Guide

Before transcribing, you MUST check whether the user's API key is configured. Run the following command to check:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --check-key

This command only reports whether the active provider's API key is set or empty — it does NOT print the actual key value. NEVER read or display the user's API key content. Do not use config --get on key fields or read config.json directly. The config --list command is safe — it automatically masks API keys as *** in output.

If the command reports the key is empty, you MUST stop and guide the user to configure it before proceeding. Do NOT attempt transcription without a valid key — it will fail.

Use AskUserQuestion to ask the user to provide their API key, with the following options and guidance:

Tell the user which provider is currently active (openai or elevenlabs) and that its API key is not configured. Explain that transcription cannot proceed without it.
Provide clear instructions on where to obtain a key:
- OpenAI: Get an API key at https://platform.openai.com/api-keys — requires an OpenAI account with billing enabled. The Whisper API costs ~$0.006/min.
- ElevenLabs: Get an API key at https://elevenlabs.io/app/settings/api-keys — requires an ElevenLabs account. Free tier includes limited credits.
Also offer the option to switch to the other provider if they already have a key for it.

Once the user provides the key, configure it using:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set <provider>.api_key <KEY>

If the user wants to switch providers, also run:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider <provider_name>

After configuring, re-run config --check-key to verify the key is set before proceeding.

If the API key is already configured, proceed directly to transcription without asking.

Quick Start

# 1. cd to this skill's directory
cd {project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/

# 2. Configure API key (choose one)
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
# or
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# 3. Transcribe
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh

# 4. Output saved to: {project_root}/acestep_output/<filename>.lrc

Prerequisites

curl, jq, python3 (or python)
An API key for OpenAI or ElevenLabs

Script Usage

./scripts/acestep-lyrics-transcription.sh transcribe --audio <file> [options]

Options:
  -a, --audio      Audio file path (required)
  -l, --language   Language code (zh, en, ja, etc.)
  -f, --format     Output format: lrc, srt, json (default: lrc)
  -p, --provider   API provider: openai, elevenlabs (overrides config)
  -o, --output     Output file path (default: acestep_output/<filename>.lrc)

Post-Transcription Lyrics Correction (MANDATORY)

CRITICAL: After transcription, you MUST manually correct the LRC file before using it for MV rendering. Transcription models frequently produce errors on sung lyrics:

Proper nouns: "ACE-Step" → "AC step", "Spotify" → "spot a fly"
Similar-sounding words: "arrives" → "eyes", "open source" → "open sores"
Merged/split words: "lighting up" → "lightin' nup"

Correction Workflow

Read the transcribed LRC file using the Read tool
Read the original lyrics from the ACE-Step output JSON file
Use original lyrics as a whole reference: Do NOT attempt line-by-line alignment — transcription often splits, merges, or reorders lines differently from the original. Instead, read the original lyrics in full to understand the correct wording, then scan each LRC line and fix any misrecognized words based on your knowledge of what the original lyrics say.
Fix transcription errors: Replace misrecognized words with the correct original words, keeping the timestamps intact
Write the corrected LRC back using the Write tool

What to Correct

Replace misrecognized words with their correct original versions
Keep all [MM:SS.cc] timestamps exactly as-is (timestamps from transcription are accurate)
Do NOT add structure tags like [Verse] or [Chorus] — the LRC should only have timestamped text lines

Example

Transcribed (wrong):

[00:46.96]AC step alive,
[00:50.80]one point five eyes.

Original lyrics reference:

ACE-Step alive
One point five arrives

Corrected (right):

[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.

Configuration

Config file: scripts/config.json

# Switch provider
./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# Set API keys
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...

# View config
./scripts/acestep-lyrics-transcription.sh config --list

Option	Default	Description
`provider`	`openai`	Active provider: `openai` or `elevenlabs`
`output_format`	`lrc`	Default output: `lrc`, `srt`, or `json`
`openai.api_key`	`""`	OpenAI API key
`openai.api_url`	`https://api.openai.com/v1`	OpenAI API base URL
`openai.model`	`whisper-1`	OpenAI model (whisper-1 for word timestamps)
`elevenlabs.api_key`	`""`	ElevenLabs API key
`elevenlabs.api_url`	`https://api.elevenlabs.io/v1`	ElevenLabs API base URL
`elevenlabs.model`	`scribe_v2`	ElevenLabs model

Provider Notes

Provider	Model	Word Timestamps	Pricing
OpenAI	whisper-1	Yes (segment + word)	$0.006/min
ElevenLabs	scribe_v2	Yes (word-level)	Varies by plan

OpenAI whisper-1 is the only OpenAI model supporting word-level timestamps
ElevenLabs scribe_v2 returns word-level timestamps with type filtering
Both support multilingual transcription

Examples

# Basic transcription (uses config defaults)
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3

# Chinese song to LRC
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --language zh

# Use ElevenLabs, output SRT
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --provider elevenlabs --format srt

# Custom output path
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --output ./my_lyrics.lrc