🛠️ 開発・MCP コミュニティ

rl-execution

取引執行を最適化するために、注文分割やタイミング調整、市場への影響を最小限に抑えるなど、強化学習を活用した高度な取引戦略を実行するSkill。

📜 元の英語説明(参考)

Reinforcement learning for trade execution optimization including order splitting, adaptive timing, and impact minimization

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o rl-execution.zip https://jpskill.com/download/10432.zip && unzip -o rl-execution.zip && rm rl-execution.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/10432.zip -OutFile "$d\rl-execution.zip"; Expand-Archive "$d\rl-execution.zip" -DestinationPath $d -Force; ri "$d\rl-execution.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して rl-execution.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → rl-execution フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

RL実行最適化

トレード実行のための強化学習（RL）は、エージェントに大規模な注文を分割し、タイミングを調整して、市場への影響を最小限に抑える方法を教えます。固定されたスケジュール（TWAP、VWAP）に従う代わりに、RLエージェントはリアルタイムの市場状況を観察し、その場で取引レートを調整します。

実行最適化が重要な理由

すべてのトレードには、提示されたスプレッドを超えるコストがかかります。

コスト要素	原因	典型的な大きさ
スプレッドコスト	Bid-Askを跨ぐこと	DEXで5-50 bps
一時的な影響	流動性を消費すること	取引レートに比例
永続的な影響	情報漏洩	合計サイズに比例
タイミングリスク	待機中の価格変動	ボラティリティと時間に比例

薄いプールでの100 SOLの成行買いは、価格を2〜5％動かす可能性があります。それを数分かけて10 SOLのスライス10個に分割すると、そのコストを30〜60％削減できます。問題は、どのように最適に分割するかということです。そして、それが実行アルゴリズムとRLの出番です。

実行のためのRLフレームワーク

状態空間

エージェントは、各決定ステップで以下を観察します。

state = [
    remaining_qty,    # 取引する残りの量 (0-1に正規化)
    time_remaining,   # 許容された期間の残り時間
    current_price,    # 現在の中間価格 (到着価格に正規化)
    spread,           # 現在のBid-Askスプレッド
    volatility,       # 最近実現したボラティリティ
    volume,           # 最近の取引量 (正規化)
]

行動空間

このステップで取引する量を制御する離散的な行動：

actions = [0%, 10%, 25%, 50%, 100%]  # 残りの量の割合

小さな行動空間は、問題を扱いやすくします。各行動は、現在のタイムステップで実行する残りの注文の割合を表します。

報酬関数

報酬は、ベンチマークに対する実行コストにペナルティを課します。

reward = -(execution_price - arrival_price) * quantity_traded

すべてのステップで合計すると、合計報酬は負のインプリメンテーションショートフォールに等しくなります。エージェントは、総コストを最小限に抑えることを学習します。

エピソード構造

1つのエピソード = 注文の配置から完了までの1つの注文：

エージェントは注文を受け取ります：Tタイムステップ以内にQユニットを売買
各ステップで、エージェントは行動（取引量）を選択します
マーケットシミュレーターは価格インパクトを適用し、状態を更新します
数量が完全に実行されるか、時間が経過すると、エピソードは終了します
満了時に残っている数量は、市場で実行されます（ペナルティ）

標準的な実行アルゴリズム

TWAP（時間加重平均価格）

最も単純なベースライン — すべてのタイムステップで注文を均等に分割します。

trade_per_step = total_quantity / num_steps

長所: シンプル、決定的、実装が簡単。短所: 市場の状況を完全に無視します。

VWAP（出来高加重平均価格）

各期間の予想される出来高に比例して分割します。

trade_at_step_t = total_quantity * (expected_volume[t] / total_expected_volume)

長所: 流動性があるときに多く取引します。短所: 正確な出来高予測が必要です。依然として非適応型です。

Almgren-Chriss最適実行

基礎となる分析モデル。実行コストとタイミングリスクの組み合わせを最小限に抑えます。

minimize: E[cost] + λ * Var[cost]

線形インパクトの仮定では、これは閉形式の最適な軌道をもたらします。完全な導出については、references/execution_algorithms.mdを参照してください。

RLベースの適応実行

シミュレートされた経験から実行ポリシーを学習するRLエージェント（DQN、PPOなど）：

# 擬似コードトレーニングループ
for episode in range(num_episodes):
    state = env.reset(order_qty=Q, horizon=T)
    done = False
    while not done:
        action = agent.select_action(state)
        next_state, reward, done, info = env.step(action)
        agent.store_transition(state, action, reward, next_state, done)
        agent.update()
        state = next_state

長所: 現在の市場状況に適応し、非線形のパターンを学習できます。短所: リアルなシミュレーター、sim-to-realギャップ、トレーニングの不安定性が必要です。

価格インパクトモデル

シミュレーターは、標準的な2成分インパクトモデルを使用します。

temporary_impact = η * (trade_rate / avg_volume)
permanent_impact = γ * (trade_rate / avg_volume)

一時的な影響は、取引後に減衰します（流動性が補充されます）
永続的な影響は、均衡価格をシフトさせます（情報効果）

時刻tにおけるサイズqの取引の実行価格：

exec_price = mid_price + permanent_impact + temporary_impact
mid_price_next = mid_price + permanent_impact + noise

このスキルを使用するタイミング

このスキルは、次の場合に最も価値があります。

注文サイズが利用可能な流動性に比べて大きい（1日の出来高の> 1％）
市場への影響が大きい（薄いDEXプール、低キャップトークン）
実行ウィンドウが柔軟である（数分から数時間、ミリ秒単位ではない）
コスト削減が複雑さを正当化する（機関規模の注文）

小規模なリテール注文（流動性の高いペアで<$ 1,000）の場合、単純な成行注文または基本的なスリッページ制限で十分です。代わりにslippage-modelingスキルを参照してください。

実用上の制限

Sim-to-realギャップ: シミュレートされた市場は、すべての実際のダイナミクス（キューの位置、敵対的なフロー、MEV）を捉えていません。
非定常性: 市場のマイクロストラクチャは時間とともに変化します。ある体制でトレーニングされたモデルは、別の体制では失敗する可能性があります。
DEX固有の事項: オンチェーン実行には、連続時間ではなく、ブロックレベルの粒度（Solanaで〜400ms）があります。ガス/優先手数料がコストを追加します。
データ要件: トレーニングには、リアルなシミュレーションのための過去のオーダーブックまたは取引データが必要です。

他のスキルとの統合

スキル	統合
`slippage-modeling`	シミュレーターを調整するためのインパクト推定を提供します
`position-sizing`	実行する合計注文サイズを決定します
`liquidity-analysis`	リアルなシミュレーションのために利用可能な流動性を評価します
`volatility-modeling`	状態ベクトルのボラティリティ推定を提供します
`jupiter-swap`	計算された取引スケジュールの実際のオンチェーン実行

(原文はここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

RL Execution Optimization

Reinforcement learning (RL) for trade execution teaches an agent to split and time large orders so that total market impact is minimized. Instead of following a fixed schedule (TWAP, VWAP), an RL agent observes real-time market state and adapts its trading rate on the fly.

Why Execution Optimization Matters

Every trade has a cost beyond the quoted spread:

Cost Component	Cause	Typical Magnitude
Spread cost	Crossing the bid-ask	5-50 bps on DEXs
Temporary impact	Consuming liquidity	Scales with trade rate
Permanent impact	Information leakage	Scales with total size
Timing risk	Price drifts while waiting	Scales with volatility and time

A 100 SOL market buy on a thin pool can move the price 2-5%. Splitting it into ten 10 SOL slices over a few minutes can cut that cost by 30-60%. The question is how to split optimally — and that is where execution algorithms and RL come in.

The RL Framework for Execution

State Space

The agent observes at each decision step:

state = [
    remaining_qty,    # How much is left to trade (0-1 normalized)
    time_remaining,   # Fraction of allowed horizon remaining
    current_price,    # Current mid-price (normalized to arrival price)
    spread,           # Current bid-ask spread
    volatility,       # Recent realized volatility
    volume,           # Recent trading volume (normalized)
]

Action Space

Discrete actions controlling how much to trade this step:

actions = [0%, 10%, 25%, 50%, 100%]  # of remaining quantity

A small action space keeps the problem tractable. Each action represents the fraction of the remaining order to execute in the current time step.

Reward Function

The reward penalizes execution cost relative to a benchmark:

reward = -(execution_price - arrival_price) * quantity_traded

Summed over all steps, the total reward equals the negative implementation shortfall. The agent learns to minimize total cost.

Episode Structure

One episode = one order from placement to completion:

Agent receives order: buy/sell Q units within T time steps
At each step, agent picks an action (trade amount)
Market simulator applies price impact and updates state
Episode ends when quantity is fully executed or time expires
Any remaining quantity at expiry is executed at market (penalty)

Standard Execution Algorithms

TWAP (Time-Weighted Average Price)

The simplest baseline — split the order equally across all time steps:

trade_per_step = total_quantity / num_steps

Pros: Simple, deterministic, easy to implement. Cons: Ignores market conditions entirely.

VWAP (Volume-Weighted Average Price)

Split proportional to expected volume in each period:

trade_at_step_t = total_quantity * (expected_volume[t] / total_expected_volume)

Pros: Trades more when liquidity is available. Cons: Requires accurate volume forecasts; still non-adaptive.

Almgren-Chriss Optimal Execution

The foundational analytical model. Minimizes a combination of execution cost and timing risk:

minimize: E[cost] + λ * Var[cost]

With linear impact assumptions, this yields a closed-form optimal trajectory. See references/execution_algorithms.md for the full derivation.

RL-Based Adaptive Execution

An RL agent (DQN, PPO, or similar) that learns the execution policy from simulated experience:

# Pseudocode training loop
for episode in range(num_episodes):
    state = env.reset(order_qty=Q, horizon=T)
    done = False
    while not done:
        action = agent.select_action(state)
        next_state, reward, done, info = env.step(action)
        agent.store_transition(state, action, reward, next_state, done)
        agent.update()
        state = next_state

Pros: Adapts to current market conditions, can learn non-linear patterns. Cons: Requires realistic simulator, sim-to-real gap, training instability.

Price Impact Model

The simulator uses a standard two-component impact model:

temporary_impact = η * (trade_rate / avg_volume)
permanent_impact = γ * (trade_rate / avg_volume)

Temporary impact decays after the trade (liquidity replenishes)
Permanent impact shifts the equilibrium price (information effect)

The execution price for a trade of size q at time t:

exec_price = mid_price + permanent_impact + temporary_impact
mid_price_next = mid_price + permanent_impact + noise

When to Use This Skill

This skill is most valuable when:

Order size is large relative to available liquidity (>1% of daily volume)
Market impact is significant (thin DEX pools, low-cap tokens)
Execution window is flexible (minutes to hours, not milliseconds)
Cost savings justify complexity (institutional-scale orders)

For small retail orders (<$1,000 on liquid pairs), simple market orders or basic slippage limits are sufficient. See the slippage-modeling skill instead.

Practical Limitations

Sim-to-real gap: Simulated markets do not capture all real dynamics (queue position, adversarial flow, MEV).
Non-stationarity: Market microstructure changes over time; models trained on one regime may fail in another.
DEX specifics: On-chain execution has block-level granularity (~400ms on Solana), not continuous-time. Gas/priority fees add cost.
Data requirements: Training requires historical orderbook or trade data for realistic simulation.

Integration with Other Skills

Skill	Integration
`slippage-modeling`	Provides impact estimates to calibrate the simulator
`position-sizing`	Determines the total order size to execute
`liquidity-analysis`	Assesses available liquidity for realistic simulation
`volatility-modeling`	Supplies volatility estimates for the state vector
`jupiter-swap`	Actual on-chain execution of the computed trade schedule

Quick Start

Compare Execution Strategies (No API Needed)

python scripts/execution_simulator.py

Runs TWAP, VWAP, and adaptive strategies in a simulated market and compares execution costs across many trials.

Almgren-Chriss Optimal Trajectory

python scripts/almgren_chriss.py

Computes the analytically optimal execution trajectory and compares it to TWAP for a given set of market parameters.

Files

References

references/execution_algorithms.md — TWAP, VWAP, Almgren-Chriss, IS, and RL execution algorithms with formulas and comparison
references/rl_framework.md — MDP formulation, environment design, training methodology, and practical considerations for RL execution

Scripts

scripts/execution_simulator.py — Simulated order execution comparing TWAP, VWAP, and adaptive strategies with price impact
scripts/almgren_chriss.py — Almgren-Chriss optimal execution model with trajectory computation and cost analysis

Dependencies

uv pip install numpy

No API keys required — all scripts run in simulation/demo mode.

Disclaimer

This skill provides educational analysis tools for studying execution algorithms. It does not constitute financial advice. Simulated results do not guarantee real-world performance. Always test execution strategies with small sizes before scaling up.