🛠️ 開発・MCP コミュニティ

prompt-repetition

A prompt repetition technique for improving LLM accuracy. Achieves significant performance gains in 67% (47/70) of 70 benchmarks. Automatically applied on lightweight models (haiku, flash, mini).

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o prompt-repetition.zip https://jpskill.com/download/20918.zip && unzip -o prompt-repetition.zip && rm prompt-repetition.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/20918.zip -OutFile "$d\prompt-repetition.zip"; Expand-Archive "$d\prompt-repetition.zip" -DestinationPath $d -Force; ri "$d\prompt-repetition.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して prompt-repetition.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → prompt-repetition フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

プロンプト繰り返し

解決される問題

LLMは因果言語モデルとして訓練されており、各トークンは前のトークンのみに注意を向けます。これは以下の問題につながります。

コンテキスト-質問問題: コンテキストを処理する際に質問が不明である
選択肢先行型MCQ問題: 回答の選択肢を見る際に質問のコンテキストを完全に理解できない
位置/インデックス問題: 長いリスト内の特定の位置情報に対する注意の重みが弱まる

プロンプト繰り返しにより、2回目のパスで1回目のパス全体を参照できるようになり、双方向アテンションのいくつかの利点を効果的に模倣します。

このスキルを使用するタイミング

軽量モデルを使用する場合: claude-haiku、gemini-flash、gpt-4o-miniなど
選択肢先行型MCQ: 質問の前に回答の選択肢が表示される多肢選択問題
コンテキスト + 質問: 長いコンテキストから特定の情報を検索する場合
インデックス/位置タスク: 在庫やリストにおける位置ベースのクエリ
NPCダイアログ: ゲームAIキャラクターの一貫性を維持する場合
非推論タスク: Chain-of-Thoughtを使用しないタスク

仕組み

因果アテンションの制限

[Context] → [Question]
    ↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear

プロンプト繰り返しがこれを解決する方法

[First Pass]                [Second Pass]
Context → Question    →    Context' → Question'
                              ↑         ↑
                          Can reference entire first pass

2回目の繰り返しでは、モデルは最初のプロンプト全体にわたる情報を再処理し、主要な概念に対するアテンションの重みを強化することで、パフォーマンスが向上します。

注: これはモデルのアーキテクチャを双方向に変更するものではありません。因果モデルの制限を軽減するためのプロンプトエンジニアリング手法です。

研究結果 (Google Research 2025)

メトリック	結果
有意な改善 (p < 0.1)	70ベンチマーク中47
パフォーマンス低下	0
中立	23
改善率	67%

最も劇的な改善: Gemini 2.0 Flash-LiteのNameIndexで 21.33% → 97.33% (+76%p)

テスト済みモデル

Gemini 2.0 Flash / Flash Lite
GPT-4o / GPT-4o-mini
Claude 3.7 Sonnet / Claude 3 Haiku
Deepseek V3

テスト済みベンチマーク

ARC (Challenge) - 科学的推論
OpenBookQA - オープンドメインQA
GSM8K - 数学問題
MMLU-Pro - マルチタスク言語理解
MATH - 数学問題解決
NameIndex / MiddleMatch - カスタム位置タスク

適用手順

ステップ1: 自動適用対象モデルの確認

プロバイダー	自動適用モデル	除外モデル
Claude	haiku series	opus, sonnet
Gemini	flash, flash-lite	pro, ultra
OpenAI	gpt-4o-mini, gpt-low	gpt-4o, gpt-4

ステップ2: タスクタイプごとの繰り返し回数の決定

タスクタイプ	キーワードパターン	繰り返し回数	期待される改善
選択肢先行型MCQ	`A. B. C. D.` 選択肢が最初	2回	+15-40%p
インデックス/位置	`slot`, `position`, `index`, `N-th`	3回	+50-76%p
コンテキスト + 質問	一般的な質問	2回	+5-15%p
CoTあり	`step by step`, `think through`	0回 (適用しない)	~0%

ステップ3: トークン制限の確認

# Check context before auto-apply
max_context = model_context_window * 0.8  # 80% safety margin
if len(prompt_tokens) * repetitions > max_context:
    repetitions = max(1, int(max_context / len(prompt_tokens)))

ステップ4: プロンプト変換

def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
    """Repeat the prompt a specified number of times

    Args:
        prompt: Original prompt
        times: Number of repetitions (default 2)

    Returns:
        Repeated prompt
    """
    if times <= 1:
        return prompt
    return "\n\n".join([prompt] * times)

実践例

例1: 選択肢先行型MCQ (最大効果)

前:

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

後 (繰り返し×2適用):

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

期待される出力:

精度: 元の78% → 繰り返し後93% (+15%p)

例2: インデックス/位置タスク (最大効果)

前:

Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map

What item is in slot 25?

後 (繰り返し×3適用): プロンプトが3回繰り返されます。

期待される出力:

Dragon Scale

精度: 元の21% → 繰り返し後97% (+76%p)

例3: ツール呼び出しプロンプトの処理

注: ツール呼び出しの指示を含むプロンプトも全体が繰り返されます。実装の簡潔さと一貫性のために、完全繰り返しのアプローチが採用されました。

前:

Use the calculator tool to compute 234 * 567.
What is the result?

後 (繰り返し×2):

Use the calculator tool to compute 234 * 567.
What is the result?

Use the calculator tool to compute 234 * 567.
What is the result?

研究結果によると、ツール呼び出しセクションを含む完全な繰り返しも効果的であることが示されています。

実運用可能な実装

自動適用トランスフォーマー

"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re

# Context window per model (in tokens)
MODEL_CONTEXT_WINDOWS = {
    "claude-3-haiku": 200_000,
    "claude-haiku": 200_000,
    "gemini-flash": 1_000_000,
    "gemini-flash-lite": 1_000_000,
    "gemini-2.0-flash": 1_

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Prompt Repetition

Problem Being Solved

LLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to:

Context-Question Problem: The question is unknown when processing context
Options-First MCQ Problem: Cannot fully understand the question context when viewing answer choices
Position/Index Problem: Attention weights weaken for specific position information in long lists

Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention.

When to use this skill

When using lightweight models: claude-haiku, gemini-flash, gpt-4o-mini, etc.
Options-First MCQ: Multiple choice where answer choices appear before the question
Context + Question: Searching for specific information in long contexts
Index/Position Tasks: Position-based queries in inventories or lists
NPC Dialogue: Maintaining consistency for game AI characters
Non-Reasoning Tasks: Tasks that do not use Chain-of-Thought

How It Works

Limitations of Causal Attention

[Context] → [Question]
    ↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear

How Prompt Repetition Solves This

[First Pass]                [Second Pass]
Context → Question    →    Context' → Question'
                              ↑         ↑
                          Can reference entire first pass

In the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts, resulting in improved performance.

Note: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.

Research Results (Google Research 2025)

Metric	Result
Significant improvement (p < 0.1)	47 / 70 benchmarks
Performance degradation	0
Neutral	23
Improvement rate	67%

Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% → 97.33% (+76%p)

Tested Models

Gemini 2.0 Flash / Flash Lite
GPT-4o / GPT-4o-mini
Claude 3.7 Sonnet / Claude 3 Haiku
Deepseek V3

Tested Benchmarks

ARC (Challenge) - Scientific reasoning
OpenBookQA - Open-domain QA
GSM8K - Math problems
MMLU-Pro - Multitask language understanding
MATH - Mathematical problem solving
NameIndex / MiddleMatch - Custom position tasks

Application Procedure

Step 1: Verify Auto-Apply Target Models

Provider	Auto-apply models	Excluded models
Claude	haiku series	opus, sonnet
Gemini	flash, flash-lite	pro, ultra
OpenAI	gpt-4o-mini, gpt-low	gpt-4o, gpt-4

Step 2: Determine Repetition Count by Task Type

Task Type	Keyword Pattern	Repetitions	Expected Improvement
Options-First MCQ	`A. B. C. D.` choices first	2×	+15-40%p
Index/Position	`slot`, `position`, `index`, `N-th`	3×	+50-76%p
Context + Question	General question	2×	+5-15%p
With CoT	`step by step`, `think through`	0× (not applied)	~0%

Step 3: Check Token Limits

# Check context before auto-apply
max_context = model_context_window * 0.8  # 80% safety margin
if len(prompt_tokens) * repetitions > max_context:
    repetitions = max(1, int(max_context / len(prompt_tokens)))

Step 4: Prompt Transformation

def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
    """Repeat the prompt a specified number of times

    Args:
        prompt: Original prompt
        times: Number of repetitions (default 2)

    Returns:
        Repeated prompt
    """
    if times <= 1:
        return prompt
    return "\n\n".join([prompt] * times)

Practical Examples

Example 1: Options-First MCQ (Greatest Effect)

Before:

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

After (repetition ×2 applied):

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

A. Paris
B. London
C. Berlin
D. Madrid

Which city is the capital of France?
Reply with one letter.

Expected output:

Accuracy: original 78% → after repetition 93% (+15%p)

Example 2: Index/Position Tasks (Maximum Effect)

Before:

Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map

What item is in slot 25?

After (repetition ×3 applied): Prompt repeated 3 times

Expected output:

Dragon Scale

Accuracy: original 21% → after repetition 97% (+76%p)

Example 3: Tool Call Prompt Handling

Note: Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency.

Before:

Use the calculator tool to compute 234 * 567.
What is the result?

After (repetition ×2):

Use the calculator tool to compute 234 * 567.
What is the result?

Use the calculator tool to compute 234 * 567.
What is the result?

Research results show that full repetition including tool call sections is also effective.

Production-Ready Implementation

Auto-Apply Transformer

"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re

# Context window per model (in tokens)
MODEL_CONTEXT_WINDOWS = {
    "claude-3-haiku": 200_000,
    "claude-haiku": 200_000,
    "gemini-flash": 1_000_000,
    "gemini-flash-lite": 1_000_000,
    "gemini-2.0-flash": 1_000_000,
    "gpt-4o-mini": 128_000,
    "gpt-low": 128_000,
}

# Models targeted for auto-apply
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())

# CoT patterns (excluded from apply)
COT_PATTERNS = [
    r"step by step",
    r"think through",
    r"let's think",
    r"reasoning:",
    r"chain of thought",
]

# Position/Index patterns (3× repetition)
POSITION_PATTERNS = [
    r"slot \d+",
    r"position \d+",
    r"index \d+",
    r"\d+(st|nd|rd|th)",
    r"item \d+",
    r"row \d+",
    r"column \d+",
]

@dataclass
class PromptRepetitionConfig:
    """Prompt repetition configuration"""
    default_repetitions: int = 2
    position_repetitions: int = 3
    separator: str = "\n\n"
    max_context_ratio: float = 0.8
    applied_marker: str = "<!-- prompt-repetition-applied -->"

class PromptRepetitionTransformer:
    """Auto-apply prompt repetition transformer for lightweight models"""

    def __init__(self, config: Optional[PromptRepetitionConfig] = None):
        self.config = config or PromptRepetitionConfig()

    def should_apply(self, model: str, prompt: str) -> bool:
        """Determine whether to auto-apply"""
        # Skip if already applied
        if self.config.applied_marker in prompt:
            return False

        # Check target model
        model_lower = model.lower()
        if not any(m in model_lower for m in AUTO_APPLY_MODELS):
            return False

        # Skip when CoT pattern detected
        prompt_lower = prompt.lower()
        for pattern in COT_PATTERNS:
            if re.search(pattern, prompt_lower):
                return False

        return True

    def determine_repetitions(self, prompt: str, model: str) -> int:
        """Determine repetition count based on task type"""
        prompt_lower = prompt.lower()

        # Position/Index pattern detected → 3×
        for pattern in POSITION_PATTERNS:
            if re.search(pattern, prompt_lower):
                return self.config.position_repetitions

        return self.config.default_repetitions

    def estimate_tokens(self, text: str) -> int:
        """Simple token count estimation (speed over precision)"""
        # Estimate approximately 4 characters = 1 token
        return len(text) // 4

    def transform(self, prompt: str, model: str) -> str:
        """Apply repetition to prompt"""
        if not self.should_apply(model, prompt):
            return prompt

        repetitions = self.determine_repetitions(prompt, model)

        # Check context limit
        model_lower = model.lower()
        max_tokens = 128_000  # Default value
        for m, tokens in MODEL_CONTEXT_WINDOWS.items():
            if m in model_lower:
                max_tokens = tokens
                break

        max_allowed = int(max_tokens * self.config.max_context_ratio)
        prompt_tokens = self.estimate_tokens(prompt)

        # Reduce repetitions if token limit exceeded
        while prompt_tokens * repetitions > max_allowed and repetitions > 1:
            repetitions -= 1

        if repetitions <= 1:
            return prompt

        # Apply repetition + add marker
        repeated = self.config.separator.join([prompt] * repetitions)
        return f"{self.config.applied_marker}\n{repeated}"

    def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
        """Wrap LLM call function"""
        def wrapped(prompt: str, **kwargs):
            transformed = self.transform(prompt, model)
            return llm_fn(transformed, **kwargs)
        return wrapped

How to Measure Effectiveness (Verification)

A/B Testing Method

def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
    """A/B test for prompt repetition effectiveness"""
    transformer = PromptRepetitionTransformer()

    results = {"baseline": [], "repeated": []}

    for prompt, expected in zip(prompts, ground_truth):
        # Baseline
        response_a = llm_fn(prompt)
        results["baseline"].append(response_a == expected)

        # With Repetition
        repeated_prompt = transformer.transform(prompt, model)
        response_b = llm_fn(repeated_prompt)
        results["repeated"].append(response_b == expected)

    baseline_acc = sum(results["baseline"]) / len(prompts)
    repeated_acc = sum(results["repeated"]) / len(prompts)

    print(f"Baseline accuracy: {baseline_acc:.2%}")
    print(f"Repeated accuracy: {repeated_acc:.2%}")
    print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")

Key Metrics

Metric	Measurement Method
Accuracy	Compare correct answer rates
Consistency	Variance across 10 runs of same prompt
Token cost	Input token increase rate
Latency	Compare p50, p99 latency

When NOT to Use

Case	Reason
Using CoT	Reasoning process already provides context
Reasoning models (opus, sonnet)	Already optimized; minimal effect
Very long prompts	Risk of exceeding context limit
Already repeated	Duplicate application wastes tokens

Cost-Accuracy Analysis

Metric	Baseline	With Repetition	Change
Input tokens	500/req	1000/req	+100%
Output tokens	100/req	100/req	0%
Latency (p50)	450ms	460ms	+2%
Latency (p99)	1200ms	1250ms	+4%
Accuracy	78%	89%	+14%p
Cost per correct answer	$0.019	$0.020	+5%

Key insight: The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency.

Multi-Agent Integration

Auto-Apply Strategy Per Agent

Agent	Model	Repetition Applied	Applied At
Claude Orchestrator	opus/sonnet	Optional	-
Claude Executor	haiku	Auto	skill_loader.py
Gemini Analyst	flash	Auto	On MCP call
OpenAI	gpt-4o-mini	Auto	skill_loader.py

Preventing Duplicate Application

To prevent duplicate application in multi-agent pipelines:

Use markers: Detect already-applied prompts with  marker
Pass metadata: Pass x-prompt-repetition-applied: true header between agents
Orchestrator management: Claude Orchestrator tracks whether repetition is applied when calling sub-agents

Application Pattern

[Claude Sonnet] Planning (no repetition needed)
    ↓
[Gemini Flash] Analysis (repetition ×2 auto-applied, marker added)
    ↓
[Claude Haiku] Execution (marker detected → skip duplicate apply)

skill_loader.py Integration Guide

Recommended Implementation

# Code to add to skill_loader.py
from prompt_repetition_transformer import PromptRepetitionTransformer

class SkillLoader:
    def __init__(self, ...):
        # ... existing code ...
        self.prompt_transformer = PromptRepetitionTransformer()

    def apply_auto_skills(self, prompt: str, model: str) -> str:
        """Handle auto-apply skills"""
        # Auto-apply prompt-repetition
        for skill in self.skills.values():
            auto_apply = skill.get('data', {}).get('auto-apply', {})
            if auto_apply.get('trigger') == 'auto':
                target_models = auto_apply.get('models', [])
                if any(m in model.lower() for m in target_models):
                    prompt = self.prompt_transformer.transform(prompt, model)

        return prompt

Constraints

Required Rules

Lightweight models first: Most effective for haiku, flash, mini series
Limit repetitions: 2× for general tasks, max 3× for position tasks
Context monitoring: Be cautious of context overflow due to repetition
Check markers: Mandatory marker check to prevent duplicate application

Prohibited Rules

No padding substitution: Increasing length with . etc. has no effect (per research)
Do not combine with CoT: Effects cancel out
Do not force-apply to reasoning models: Already optimized
No duplicate application: Consecutive application without markers wastes tokens

Quick Reference

=== Auto-Apply Target Models ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low

=== Repetition Count ===
General tasks: 2×
Position/Index (slot/position/index keywords): 3×
With CoT: 0× (not applied)

=== Effect (Google Research 2025) ===
Improvement rate: 67% (47/70 benchmarks)
Performance degradation: 0 cases
Maximum improvement: +76%p (NameIndex)

=== Cost ===
Input tokens: +100%
Latency: +2% (Prefill parallelization)
Cost per correct answer: +5%

=== Duplicate Application Prevention ===
Marker: <!-- prompt-repetition-applied -->

prompt-repetition

🎯 このSkillでできること

📦 インストール方法 (3ステップ)

📖 Skill本文(日本語訳)

プロンプト繰り返し

解決される問題

このスキルを使用するタイミング

仕組み

因果アテンションの制限

プロンプト繰り返しがこれを解決する方法

研究結果 (Google Research 2025)

テスト済みモデル

テスト済みベンチマーク

適用手順

ステップ1: 自動適用対象モデルの確認

ステップ2: タスクタイプごとの繰り返し回数の決定

ステップ3: トークン制限の確認

ステップ4: プロンプト変換

実践例

例1: 選択肢先行型MCQ (最大効果)

例2: インデックス/位置タスク (最大効果)

例3: ツール呼び出しプロンプトの処理

実運用可能な実装

自動適用トランスフォーマー

Prompt Repetition

Problem Being Solved

When to use this skill

How It Works

Limitations of Causal Attention

How Prompt Repetition Solves This

Research Results (Google Research 2025)

Tested Models

Tested Benchmarks

Application Procedure

Step 1: Verify Auto-Apply Target Models

Step 2: Determine Repetition Count by Task Type

Step 3: Check Token Limits

Step 4: Prompt Transformation

Practical Examples

Example 1: Options-First MCQ (Greatest Effect)

Example 2: Index/Position Tasks (Maximum Effect)

Example 3: Tool Call Prompt Handling

Production-Ready Implementation

Auto-Apply Transformer

How to Measure Effectiveness (Verification)

A/B Testing Method

Key Metrics

When NOT to Use

Cost-Accuracy Analysis

Multi-Agent Integration

Auto-Apply Strategy Per Agent

Preventing Duplicate Application

Application Pattern

skill_loader.py Integration Guide

Recommended Implementation

Constraints

Required Rules

Prohibited Rules

Quick Reference

References