🛠️ 開発・MCP コミュニティ

nowait-reasoning-optimizer

Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o nowait-reasoning-optimizer.zip https://jpskill.com/download/18469.zip && unzip -o nowait-reasoning-optimizer.zip && rm nowait-reasoning-optimizer.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/18469.zip -OutFile "$d\nowait-reasoning-optimizer.zip"; Expand-Archive "$d\nowait-reasoning-optimizer.zip" -DestinationPath $d -Force; ri "$d\nowait-reasoning-optimizer.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して nowait-reasoning-optimizer.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → nowait-reasoning-optimizer フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 2

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

NOWAIT Reasoning Optimizer

論文 "Wait, We Don't Need to 'Wait'! Removing Thinking Tokens Improves Reasoning Efficiency" (Wang et al., 2025) の NOWAIT 手法を実装します。

概要

NOWAIT は、推論時に自己反省トークン（例: "Wait"、"Hmm"、"Alternatively"）を抑制する、トレーニング不要な推論時介入手法です。モデルの有用性を損なうことなく、chain-of-thought (CoT) の軌跡長を 27-51% 削減します。

どのような時に使うか

計算リソースが限られている R1 スタイルの推論モデルをデプロイする場合
本番システムの推論レイテンシを削減する場合
推論タスクのトークンコストを最適化する場合
簡素化が必要な冗長な CoT 出力を扱う場合

サポートされているモデル

モデルシリーズ	タイプ	トークン削減率
QwQ-32B	RLベース	16-31%
Phi4-Reasoning-Plus	RLベース	23-28%
Qwen3-32B	RLベース	13-16%
Kimi-VL-A3B	マルチモーダル	40-60%
QvQ-72B-Preview	マルチモーダル	20-30%

重要: NOWAIT は RL ベースのモデルで最も効果を発揮します。蒸留モデル (Qwen3-4B/8B/14B) は、反省トークンを抑制するとパフォーマンスが低下します。

クイックスタート

1. 基本的な実装

from scripts.nowait_processor import NOWAITLogitProcessor

# モデルの tokenizer 用にプロセッサを初期化します
processor = NOWAITLogitProcessor(tokenizer)

# 生成時に使用します
outputs = model.generate(
    inputs,
    logits_processor=[processor],
    max_new_tokens=32768
)

2. 抑制されるキーワード

完全なリストは references/keywords.md を参照してください。主なキーワード:

wait, alternatively, hmm, but, however, check, 
double-check, maybe, verify, again, oh, ah

仕組み

キーワードの初期化: 経験的な分析から反省キーワードを特定します
トークンバリアントへの展開: キーワードを語彙内のすべてのトークンバリアントにマッピングします (例: "wait" → " wait", "Wait", " Wait", ".wait", "WAIT")
推論時の抑制: デコード中に反省トークンのロジットを大きな負の値に設定します

Logits (Before)         Logits (After)
Wait     0.8     →     Wait     -inf
First    0.6     →     First    0.6
Hmm      0.5     →     Hmm      -inf
Let      0.4     →     Let      0.4

主な発見

なぜ効果があるのか

NOWAIT は自己反省を完全に排除するのではなく、モデルが不要な「待機」推論をスキップするように誘導します
モデルは依然として、重要な意思決定ポイントで不可欠な検証を実行します
より線形で直接的な推論パスになります

RL vs 蒸留モデル

モデルタイプ	NOWAIT の効果	推奨事項
RLベース (QwQ, Phi4, Qwen3-32B)	安定した精度、大幅なトークン削減	✅ 推奨
蒸留 (Qwen3-4B/8B/14B)	難しいタスクでの精度低下	⚠️ 注意して使用してください

蒸留モデルは、トレーニングデータからの CoT 構造に大きく依存しています。反省トークンを削除すると、推論パターンが中断されます。

統合例

HuggingFace Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from scripts.nowait_processor import NOWAITLogitProcessor

model = AutoModelForCausalLM.from_pretrained("Qwen/QwQ-32B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/QwQ-32B")

processor = NOWAITLogitProcessor(tokenizer)

response = model.generate(
    tokenizer(prompt, return_tensors="pt").input_ids,
    logits_processor=[processor],
    max_new_tokens=32768,
    do_sample=True,
    temperature=0.7
)

vLLM

from vllm import LLM, SamplingParams
from scripts.nowait_processor import get_nowait_bad_words_ids

llm = LLM(model="Qwen/QwQ-32B")
bad_words_ids = get_nowait_bad_words_ids(llm.get_tokenizer())

sampling_params = SamplingParams(
    max_tokens=32768,
    bad_words_ids=bad_words_ids
)

期待される結果

タスクタイプ	元のトークン数	NOWAIT トークン数	削減率
数学 (AIME)	15,000	10,500	30%
Visual QA (MMMU)	2,900	1,450	50%
Video QA (MMVU)	1,700	1,250	27%

制限事項

CoT のオーバーヘッドがすでに最小限である非常に単純な問題では効果が低い
蒸留モデルは、困難なタスクで精度が低下する可能性がある
一部のドメインでは、モデル固有のキーワードチューニングが必要になる場合がある

参考文献

論文: arXiv:2506.08343v2
完全なキーワードリスト: references/keywords.md
実装: scripts/nowait_processor.py

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

NOWAIT Reasoning Optimizer

Implements the NOWAIT technique from the paper "Wait, We Don't Need to 'Wait'! Removing Thinking Tokens Improves Reasoning Efficiency" (Wang et al., 2025).

Overview

NOWAIT is a training-free inference-time intervention that suppresses self-reflection tokens (e.g., "Wait", "Hmm", "Alternatively") during generation, reducing chain-of-thought (CoT) trajectory length by 27-51% without compromising model utility.

When to Use

Deploying R1-style reasoning models with limited compute
Reducing inference latency for production systems
Optimizing token costs for reasoning tasks
Working with verbose CoT outputs that need streamlining

Supported Models

Model Series	Type	Token Reduction
QwQ-32B	RL-based	16-31%
Phi4-Reasoning-Plus	RL-based	23-28%
Qwen3-32B	RL-based	13-16%
Kimi-VL-A3B	Multimodal	40-60%
QvQ-72B-Preview	Multimodal	20-30%

Important: NOWAIT works best with RL-based models. Distilled models (Qwen3-4B/8B/14B) show degraded performance when reflection tokens are suppressed.

Quick Start

1. Basic Implementation

from scripts.nowait_processor import NOWAITLogitProcessor

# Initialize processor for your model's tokenizer
processor = NOWAITLogitProcessor(tokenizer)

# Use during generation
outputs = model.generate(
    inputs,
    logits_processor=[processor],
    max_new_tokens=32768
)

2. Keywords Suppressed

See references/keywords.md for the complete list. Core keywords:

wait, alternatively, hmm, but, however, check, 
double-check, maybe, verify, again, oh, ah

How It Works

Initialize Keywords: Identify reflection keywords from empirical analysis
Expand to Token Variants: Map keywords to all token variants in vocabulary (e.g., "wait" → " wait", "Wait", " Wait", ".wait", "WAIT")
Suppress During Inference: Set logits of reflection tokens to large negative values during decoding

Logits (Before)         Logits (After)
Wait     0.8     →     Wait     -inf
First    0.6     →     First    0.6
Hmm      0.5     →     Hmm      -inf
Let      0.4     →     Let      0.4

Key Findings

Why It Works

NOWAIT doesn't eliminate self-reflection entirely—it guides models to skip unnecessary "waiting" reasoning
Models still perform essential verification at key decision points
Results in more linear, straightforward reasoning paths

RL vs Distilled Models

Model Type	NOWAIT Effect	Recommendation
RL-based (QwQ, Phi4, Qwen3-32B)	Stable accuracy, significant token reduction	✅ Recommended
Distilled (Qwen3-4B/8B/14B)	Accuracy degradation on hard tasks	⚠️ Use with caution

Distilled models rely heavily on CoT structure from training data—removing reflection tokens disrupts their reasoning patterns.

Integration Examples

HuggingFace Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from scripts.nowait_processor import NOWAITLogitProcessor

model = AutoModelForCausalLM.from_pretrained("Qwen/QwQ-32B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/QwQ-32B")

processor = NOWAITLogitProcessor(tokenizer)

response = model.generate(
    tokenizer(prompt, return_tensors="pt").input_ids,
    logits_processor=[processor],
    max_new_tokens=32768,
    do_sample=True,
    temperature=0.7
)

vLLM

from vllm import LLM, SamplingParams
from scripts.nowait_processor import get_nowait_bad_words_ids

llm = LLM(model="Qwen/QwQ-32B")
bad_words_ids = get_nowait_bad_words_ids(llm.get_tokenizer())

sampling_params = SamplingParams(
    max_tokens=32768,
    bad_words_ids=bad_words_ids
)

Expected Results

Task Type	Original Tokens	NOWAIT Tokens	Reduction
Math (AIME)	15,000	10,500	30%
Visual QA (MMMU)	2,900	1,450	50%
Video QA (MMVU)	1,700	1,250	27%

Limitations

Less effective on very simple problems where CoT overhead is already minimal
Distilled models may suffer accuracy loss on challenging tasks
Some domains may require model-specific keyword tuning

References

Paper: arXiv:2506.08343v2
Complete keyword list: references/keywords.md
Implementation: scripts/nowait_processor.py

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。

📄 SKILL.md (4,888 bytes)
📎 scripts/nowait_processor.py (10,186 bytes)