🛠️ 開発・MCP コミュニティ

railway-troubleshooting

Railwayで発生するデプロイ失敗、ビルドエラー、サービス停止、パフォーマンス低下、ネットワーク障害といった問題の原因を特定し、解決策を見つけ出す支援をするSkill。

📜 元の英語説明(参考)

Railway debugging and issue resolution. Use when deployments fail, builds error, services crash, performance degrades, or networking issues occur.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o railway-troubleshooting.zip https://jpskill.com/download/9477.zip && unzip -o railway-troubleshooting.zip && rm railway-troubleshooting.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9477.zip -OutFile "$d\railway-troubleshooting.zip"; Expand-Archive "$d\railway-troubleshooting.zip" -DestinationPath $d -Force; ri "$d\railway-troubleshooting.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して railway-troubleshooting.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → railway-troubleshooting フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Railway のトラブルシューティング

Railway.com のデプロイメントにおける体系的なデバッグと問題解決。

概要

このスキルは、Railway プラットフォームの問題に対する意思決定木、診断ワークフロー、および復旧手順を提供します。ビルドの失敗、ランタイムクラッシュ、ネットワークの問題、データベースの問題、およびパフォーマンスの低下を対象としています。

クイックスタート

この意思決定木を使用して、Railway の問題を診断および解決します。

Railway の問題ですか？
│
├── デプロイメントに失敗しましたか？
│   ├── ビルドエラー → 操作 1: ビルドの失敗を診断する
│   ├── デプロイエラー → 操作 1: デプロイメントの失敗を診断する
│   ├── ヘルスチェックに失敗しました → サービスのヘルスエンドポイントを確認する
│   └── タイムアウト → 設定でビルド/デプロイのタイムアウトを確認する
│
├── サービスがクラッシュしていますか？
│   ├── 即時クラッシュ → 操作 2: ランタイムクラッシュをデバッグする
│   ├── 時間経過後のクラッシュ → メモリ制限、メモリリークを確認する
│   ├── 再起動ループ → 起動コマンド、依存関係を確認する
│   └── 終了コードエラー → アプリケーションログで詳細を確認する
│
├── ネットワークの問題ですか？
│   ├── サービスに到達できません → 操作 3: ネットワークのトラブルシューティング
│   ├── 断続的な接続 → DNS、サービスディスカバリを確認する
│   ├── SSL エラー → ドメイン構成、証明書を確認する
│   └── タイムアウトエラー → ポート構成、ファイアウォールを確認する
│
├── ビルドの問題ですか？
│   ├── Nixpacks の検出が間違っています → 操作 4: ビルドエラーを修正する
│   ├── 依存関係が失敗しています → package.json、requirements.txt を確認する
│   ├── ビルドコマンドが失敗しています → ビルドスクリプトを確認する
│   └── キャッシュの問題 → ビルドキャッシュをクリアし、強制的に再構築する
│
└── データベースの問題ですか？
    ├── 接続が拒否されました → 操作 5: データベースの問題を解決する
    ├── タイムアウトエラー → 接続プール、クエリのパフォーマンスを確認する
    ├── パフォーマンスが遅い → インデックス、クエリの最適化を確認する
    └── データ破損 → バックアップ、復旧手順を確認する

操作

操作 1: デプロイメントの失敗を診断する

体系的なログ分析を通じて、デプロイメントの失敗を特定して解決します。

使用するタイミング: デプロイメントステータスが失敗と表示される、ビルドは成功するがデプロイが失敗する、ヘルスチェックが失敗する。

ワークフロー:

デプロイメントステータスを確認する

# CLI アプローチ
railway status
railway logs --deployment

# API アプローチ (GraphQL については references/debug-workflow.md を参照)
# デプロイメントステータスと最近のデプロイをクエリする

デプロイログを分析する
- ポートバインディングの問題を確認する (Railway は PORT 環境変数を想定)
- ヘルスチェックエンドポイントが応答していることを確認する
- 起動コマンドの実行を確認する
- タイムアウトの問題を特定する
一般的なデプロイの失敗
- ポートがバインドされていない: アプリは process.env.PORT でリッスンする必要がある
- ヘルスチェックのタイムアウト: タイムアウトを増やすか、エンドポイントを修正する
- 環境変数が不足している: サービス変数を確認する
- 起動コマンドが間違っている: 設定で起動コマンドを確認する
修正して再デプロイする
- コード/構成に修正を適用する
- 新しいデプロイメントをトリガーする
- デプロイメントログを監視する
- サービスが正常であることを確認する

参照: 特定のエラーメッセージと解決策については、references/common-errors.md を参照してください。

操作 2: ランタイムクラッシュをデバッグする

サービスクラッシュと再起動ループを調査して解決します。

使用するタイミング: サービスが再起動中と表示される、ログに終了コードがある、OOM エラー、クラッシュレポート。

ワークフロー:

クラッシュ情報を収集する

# ランタイムログを取得する
railway logs --tail 500

# サービスメトリクスを確認する
railway metrics

# 診断スクリプトを使用する
./scripts/diagnose.sh [service-id] --verbose

クラッシュパターンを特定する
- 即時クラッシュ: 起動の問題 (依存関係の欠落、構成エラー)
- 時間経過後のクラッシュ: メモリリーク、リソース枯渇
- 断続的なクラッシュ: 競合状態、外部依存関係
- 終了コード 137: メモリ不足 (OOM) で強制終了
リソース制限を確認する
- メモリ使用量が上昇傾向 → メモリリーク
- CPU が 100% → 無限ループ、CPU 負荷の高い操作
- ディスクがいっぱい → ログローテーションの問題、一時ファイル
- 接続制限 → データベースプールが枯渇
一般的なクラッシュの原因
- OOM: メモリ制限を増やすか、メモリリークを修正する
- 依存関係の欠落: パッケージのインストールを確認する
- 未処理の例外: エラー処理を追加する
- 外部サービスがダウンしている: 再試行ロジック、サーキットブレーカーを追加する

参照: 体系的なデバッグ手順については、references/debug-workflow.md を参照してください。

操作 3: ネットワークのトラブルシューティング

サービスディスカバリ、DNS、および接続を含むネットワークの問題を解決します。

使用するタイミング: サービスが互いに到達できない、DNS 解決が失敗する、外部アクセスに問題がある、SSL エラー。

ワークフロー:

サービスディスカバリを確認する

# プライベートネットワークが有効になっているか確認する
# サービスは以下を使用します: [service-name].[project-name].railway.internal

# DNS 解決をテストする
railway run nslookup [service-name].[project-name].railway.internal

ネットワーク構成を確認する
- プロジェクト設定でプライベートネットワークが有効になっている
- サービス名が正しい (Railway が提供する名前を使用)
- ポート構成がアプリケーションと一致する
- サービス URL の環境変数が設定されている
外部アクセスをデバッグする
- ドメインがサービス設定で正しく構成されている
- DNS レコードが Railway を指している
- SSL 証明書がプロビジョニングされている (ドメイン設定を確認)
- パブリックアクセス用にドメインの生成オプションが有効になっている
一般的なネットワークの問題
- サービスディスカバリ: 完全な内部ドメイン名を使用する
- ポートの不一致: アプリは PORT 環境変数でリッスンする必要がある
- SSL が機能しない: 証明書のプロビジョニングに時間をかける (5〜10 分)
- タイムアウト: ファイアウォールルール、レート制限を確認する

参照: references/common-errors.md のネットワークエラーセクションを参照してください。

操作 4: ビルドエラーを修正する

ビルドの失敗、nixpacks の構成の問題、および依存関係の問題を解決します。

使用するタイミング: ビルドが失敗する、間違ったビルダーが検出される、依存関係がインストールされない、ビルドコマンドが失敗する。

ワークフロー:

ビルドログを確認する

railway logs --build

# ビルドフェーズの失敗を特定する:
# - 検出フェーズ: Nixpacks プロバイダーの検出
# - インストールフェーズ: 依存関係のインストール
# - ビルドフェーズ: ビルドコマンドの実行

確認

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Railway Troubleshooting

Systematic debugging and issue resolution for Railway.com deployments.

Overview

This skill provides decision trees, diagnostic workflows, and recovery procedures for Railway platform issues. It covers build failures, runtime crashes, networking problems, database issues, and performance degradation.

Quick Start

Use this decision tree to diagnose and resolve Railway issues:

Railway Issue?
│
├── Deployment Failed?
│   ├── Build Error → Operation 1: Diagnose Build Failures
│   ├── Deploy Error → Operation 1: Diagnose Deployment Failures
│   ├── Health Check Failed → Check service health endpoint
│   └── Timeout → Check build/deploy timeouts in settings
│
├── Service Crashing?
│   ├── Immediate crash → Operation 2: Debug Runtime Crashes
│   ├── Crash after time → Check memory limits, memory leaks
│   ├── Restart loop → Check startup command, dependencies
│   └── Exit code errors → Check application logs for specifics
│
├── Networking Issues?
│   ├── Service unreachable → Operation 3: Troubleshoot Networking
│   ├── Intermittent connectivity → Check DNS, service discovery
│   ├── SSL errors → Check domain configuration, certificates
│   └── Timeout errors → Check port configuration, firewalls
│
├── Build Issues?
│   ├── Nixpacks detection wrong → Operation 4: Fix Build Errors
│   ├── Dependencies failing → Check package.json, requirements.txt
│   ├── Build commands failing → Verify build scripts
│   └── Cache issues → Clear build cache, force rebuild
│
└── Database Problems?
    ├── Connection refused → Operation 5: Resolve Database Issues
    ├── Timeout errors → Check connection pools, query performance
    ├── Performance slow → Check indices, query optimization
    └── Data corruption → Check backups, recovery procedures

Operations

Operation 1: Diagnose Deployment Failures

Identify and resolve deployment failures through systematic log analysis.

When to use: Deployment status shows failed, builds succeed but deploys fail, health checks failing.

Workflow:

Check Deployment Status

# CLI approach
railway status
railway logs --deployment

# API approach (see references/debug-workflow.md for GraphQL)
# Query deployment status and recent deploys

Analyze Deploy Logs
- Check for port binding issues (Railway expects PORT env var)
- Verify health check endpoint responding
- Check startup command execution
- Identify timeout issues
Common Deploy Failures
- Port not bound: App must listen on process.env.PORT
- Health check timeout: Increase timeout or fix endpoint
- Missing environment variables: Check service variables
- Startup command wrong: Verify start command in settings
Fix and Redeploy
- Apply fix to code/configuration
- Trigger new deployment
- Monitor deployment logs
- Verify service healthy

See: references/common-errors.md for specific error messages and solutions.

Operation 2: Debug Runtime Crashes

Investigate and resolve service crashes and restart loops.

When to use: Service shows restarting, exit codes in logs, OOM errors, crash reports.

Workflow:

Gather Crash Information

# Get runtime logs
railway logs --tail 500

# Check service metrics
railway metrics

# Use diagnostic script
./scripts/diagnose.sh [service-id] --verbose

Identify Crash Pattern
- Immediate crash: Startup issue (missing deps, config error)
- Crash after time: Memory leak, resource exhaustion
- Intermittent crash: Race condition, external dependency
- Exit code 137: Out of Memory (OOM) killed
Check Resource Limits
- Memory usage trending up → Memory leak
- CPU at 100% → Infinite loop, CPU-intensive operation
- Disk full → Log rotation issue, temp files
- Connection limits → Database pool exhausted
Common Crash Causes
- OOM: Increase memory limit or fix memory leak
- Missing dependencies: Check package installation
- Uncaught exceptions: Add error handling
- External service down: Add retry logic, circuit breakers

See: references/debug-workflow.md for systematic debugging steps.

Operation 3: Troubleshoot Networking

Resolve networking issues including service discovery, DNS, and connectivity.

When to use: Services can't reach each other, DNS resolution fails, external access issues, SSL errors.

Workflow:

Verify Service Discovery

# Check private networking enabled
# Services use: [service-name].[project-name].railway.internal

# Test DNS resolution
railway run nslookup [service-name].[project-name].railway.internal

Check Network Configuration
- Private networking enabled in project settings
- Service names correct (use Railway-provided names)
- Port configuration matches application
- Environment variables for service URLs set
Debug External Access
- Domain configured correctly in service settings
- DNS records pointing to Railway
- SSL certificate provisioned (check domain settings)
- Generate domain option enabled for public access
Common Network Issues
- Service discovery: Use full internal domain name
- Port mismatch: App must listen on PORT env var
- SSL not working: Allow time for cert provisioning (5-10 min)
- Timeout: Check for firewall rules, rate limiting

See: references/common-errors.md Network Errors section.

Operation 4: Fix Build Errors

Resolve build failures, nixpacks configuration issues, and dependency problems.

When to use: Build fails, wrong builder detected, dependencies not installing, build commands fail.

Workflow:

Check Build Logs

railway logs --build

# Identify build phase failure:
# - Detection phase: Nixpacks provider detection
# - Install phase: Dependencies installation
# - Build phase: Build commands execution

Verify Builder Configuration
- Check nixpacks.toml or railway.toml for custom config
- Verify build command in service settings
- Check for language version specification
- Ensure correct provider detected (Node, Python, Go, etc.)
Fix Dependency Issues
- Lock file present (package-lock.json, yarn.lock, requirements.txt)
- Dependencies compatible with build environment
- Private packages have auth configured
- Build dependencies vs runtime dependencies separated

Force Rebuild if Needed

# Clear cache and rebuild
./scripts/force-rebuild.sh [service-id] --no-cache

# Or via CLI
railway up --detach

Common Build Errors:

Wrong nixpacks provider: Add nixpacks.toml with correct provider
Dependency resolution: Update lock files, fix version conflicts
Build timeout: Optimize build, increase timeout in settings
Cache issues: Clear build cache with force rebuild

See: references/common-errors.md Build Errors section.

Operation 5: Resolve Database Issues

Debug database connection problems, timeouts, and performance issues.

When to use: Connection refused, database timeouts, slow queries, connection pool exhausted.

Workflow:

Verify Database Connection

# Check database service status
railway status

# Test connection with database URL
railway run psql $DATABASE_URL -c "SELECT 1"

Check Connection Configuration
- DATABASE_URL environment variable set correctly
- Connection pool size appropriate for service plan
- Connection timeout settings reasonable
- SSL mode configured if required
Debug Connection Issues
- Connection refused: Database not started, wrong host/port
- Timeout: Network issue, slow queries, pool exhausted
- Auth failed: Wrong credentials, user permissions
- Too many connections: Pool size exceeded, connection leak
Performance Troubleshooting
- Slow queries: Check query plans, add indices
- High CPU: Identify expensive queries, optimize
- Connection pool exhausted: Increase pool size or fix leaks
- Disk space: Clean up old data, increase storage

Emergency Recovery:

Restart database service: railway restart [service-id]
Check backups: Railway auto-backups available
Scale vertically: Upgrade database plan if needed
Connection leak: Restart application services

See: references/recovery-procedures.md for emergency procedures.

Related Skills

railway-auth: Authentication setup for Railway CLI/API
railway-logs: Advanced log querying and analysis
railway-deployment: Deployment workflows and strategies
railway-api: GraphQL API queries and operations

When to Use This Skill

Use railway-troubleshooting when you encounter:

❌ Deployment failures or build errors
🔄 Service restart loops or crashes
🌐 Networking or connectivity issues
🐛 Runtime errors or performance problems
💾 Database connection or query issues
⚡ Performance degradation
🔧 Configuration or environment issues

Quick Diagnostic

Run the diagnostic script for automated issue detection:

cd /mnt/c/data/github/skrillz/.claude/skills/railway-troubleshooting/scripts
./diagnose.sh [service-id] --verbose

The script will:

Check service health status
Analyze recent deployment logs
Scan for common error patterns
Check resource utilization
Provide specific recommendations

Additional Resources

Common Errors Guide: references/common-errors.md - 20+ documented errors with solutions
Debug Workflow: references/debug-workflow.md - Systematic debugging methodology
Recovery Procedures: references/recovery-procedures.md - Emergency recovery steps
Diagnostic Script: scripts/diagnose.sh - Automated diagnostics
Force Rebuild: scripts/force-rebuild.sh - Clear cache and rebuild

Best Practices

Always check logs first: Build logs, deploy logs, runtime logs
Verify environment variables: Missing vars cause most deployment failures
Check resource limits: Memory/CPU limits appropriate for workload
Test locally first: Reproduce issues locally when possible
Monitor metrics: Use Railway dashboard for trends
Document solutions: Update common-errors.md with new patterns
Use private networking: For inter-service communication
Enable health checks: Catch deployment issues early

Support

For issues not covered by this skill:

Railway Documentation: https://docs.railway.com
Railway Discord: Active community support
Railway Status: https://status.railway.com
GitHub Issues: https://github.com/railwayapp/railway/issues