jpskill.com
🛠️ 開発・MCP コミュニティ 🔴 エンジニア向け 👤 エンジニア・AI開発者

🛠️ Openguardrails

openguardrails

AIエージェントの実行時セキュリティを確保し、データ漏洩や資格情報盗難、コマンドインジェクションなどから保護するオープンソースのプラグインです。

⏱ テスト計画作成 2時間 → 20分
📜 元の英語説明(参考)

Runtime security plugin for AI agents. Provides local-first protection against data exfiltration, credential theft, command injection, and sensitive data leakage. Includes a free AI Security Gateway that sanitizes PII before it reaches LLM providers. Fully open source (Apache 2.0) — all detection and sanitization logic is auditable on GitHub. Source: github.com/openguardrails/openguardrails

🇯🇵 日本人クリエイター向け解説

一言でいうと

AIエージェントの実行時セキュリティを確保し、データ漏洩や資格情報盗難、コマンドインジェクションなどから保護するオープンソースのプラグインです。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o openguardrails.zip https://jpskill.com/download/5126.zip && unzip -o openguardrails.zip && rm openguardrails.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/5126.zip -OutFile "$d\openguardrails.zip"; Expand-Archive "$d\openguardrails.zip" -DestinationPath $d -Force; ri "$d\openguardrails.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して openguardrails.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → openguardrails フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-17
取得日時
2026-05-17
同梱ファイル
1

💬 こう話しかけるだけ — サンプルプロンプト

  • Openguardrails を使って、最小構成のサンプルコードを示して
  • Openguardrails の主な使い方と注意点を教えて
  • Openguardrails を既存プロジェクトに組み込む方法を教えて

これをClaude Code に貼るだけで、このSkillが自動発動します。

📖 Claude が読む原文 SKILL.md(中身を展開)

この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。

OpenGuardrails

What this skill is: This is the installation and usage guide for the OpenGuardrails security plugin. The plugin code itself lives in the open-source repository at github.com/openguardrails/openguardrails (subdirectory openclaw-security/). This skill does not execute code on its own — it documents how to install, configure, and verify the plugin.

ClawHub ↔ GitHub identity: This skill is published on ClawHub as ThomasLWang/openguardrails. The upstream source is at github.com/openguardrails/openguardrails, maintained by the same author (Thomas Wang). The npm package is @openguardrails/openclaw-security. All three point to the same codebase.

Runtime security guard for OpenClaw agents. Protects against the most critical AI agent threats:

  • Data exfiltration defense — detects and blocks when an agent reads sensitive files then attempts to send them to external servers
  • Sensitive data leakage prevention — sanitizes PII, credentials, and secrets before they reach LLM providers
  • Prompt injection protection — identifies crafted inputs designed to hijack agent behavior
  • Command injection blocking — catches shell escapes, backtick substitution, and command chaining in tool parameters
  • Content safety — filters NSFW content and enforces minor protection policies

Security & Trust

Open source and auditable. All code is Apache 2.0 licensed at github.com/openguardrails/openguardrails. You can audit every line before installing — especially the tool-event hooks, sanitization logic, and network calls. Key files to review:

  • agent/sanitizer.ts — what gets sanitized before any cloud transmission
  • agent/content-injection-scanner.ts — local-only regex patterns for injection detection
  • gateway/src/sanitizer.ts — AI Security Gateway sanitization (fully local)
  • index.ts — plugin entry point showing all event hooks

What is transmitted to the cloud API (and what is not):

  • Sent: sanitized tool metadata only — tool names, parameter keys, session signals (tool ordering, timing). All sensitive values (PII, credentials, file contents, secrets) are replaced with category placeholders (<EMAIL>, <SECRET>, <CREDIT_CARD>, etc.) locally before transmission.
  • Never sent: raw file contents, user messages, conversation history, actual credential values, or any unsanitized parameter values.
  • Data retention: Detection request payloads (sanitized tool metadata) are not retained after the response is returned. Account data is stored persistently for billing: agent ID and API key (created at registration in Step 2), plus email (provided by you in Step 3 via the claim web form), plan tier, and per-agent usage counts.

Local-only mode. The plugin works without any cloud connection. Local fast-path detection (shell escape blocking, read-then-exfil patterns, content injection redaction) operates entirely on your machine with no network calls. Cloud assessment is only used for borderline behavioral patterns and is opt-in via registration. If you skip registration, you still get all local protections.

No install-time network calls. The plugin makes zero network requests at install time. It loads a local BehaviorDetector and waits until you explicitly run /og_activate.

Registration sends minimal data. The /og_activate command sends exactly { name, description } to register — no email, no machine identifiers, no file paths, no user data. Your email is only collected later in Step 3, when you voluntarily visit the claim URL in your browser and enter it yourself. See the "Step 2: Register" section for the full request/response specification.

Fail-open design. If the cloud API is unreachable or times out, tool calls are allowed — the plugin never blocks your workflow due to network issues.

This skill is documentation-only. This SKILL.md file contains no executable code. It is an installation and usage guide for the open-source plugin at github.com/openguardrails/openguardrails. You are encouraged to audit the source before installing.


How It Works

The plugin hooks into before_tool_call, after_tool_call, and tool_result_persist events. It classifies every tool call in real time — tracking sensitive file reads, external network calls, credential access, and shell commands across the session. Critical patterns are blocked instantly on the local fast path; borderline signals are assessed via the cloud API.

Additionally, when the agent reads files or fetches web pages, the plugin scans the content for injection patterns across 7 categories and automatically redacts them in-place before the agent processes the content.

Agent calls tool
      ↓
[openguardrails] classifies tool + updates session state
  • sensitive file read → then network call?      → BLOCK (local)
  • shell escape in params ($(), backtick)?        → BLOCK (local)
  • prompt injection in file/web content?          → REDACT in-place (local)
  • credential access + low intent overlap?        → assess via cloud
  • external domains + intent mismatch?            → assess via cloud
      ↓
Allow, redact, alert, or block — with explanation returned to agent

No content is sent externally until the plugin is registered and activated.


Step 1: Install

Install from npm or directly from source:

# From npm
openclaw plugins install @openguardrails/openclaw-security

# From source (audit first, then install)
git clone https://github.com/openguardrails/openguardrails.git
openclaw plugins install -l ./openguardrails/openclaw-security

Installation completes immediately — no network calls are made at install time. The plugin loads, creates a BehaviorDetector, and waits.

Verify it's loaded:

/og_status

Expected output when not yet registered:

OpenGuardrails Status

- Status:    not registered — run `/og_activate` to register
- Platform:  https://www.openguardrails.com/core

- blockOnRisk: true

Step 2: Register (optional — local-only mode works without this)

Registration is triggered by running /og_activate. It enables cloud-based behavioral assessment on top of the local protections you already have.

What the registration request sends

The plugin calls POST /api/v1/agents/register with exactly two fields:

{ "name": "OpenClaw Agent", "description": "" }

That's it — an agent display name and an optional description. No machine identifiers, no file paths, no user data. See agent/config.ts:65-68 in the source.

What gets stored locally

The response is saved to ~/.openclaw/credentials/openguardrails/credentials.json:

{
  "apiKey": "sk-og-<32 hex chars>",
  "agentId": "<uuid>",
  "claimUrl": "https://www.openguardrails.com/core/claim/<token>",
  "verificationCode": "word-XXXX"
}

These credentials are generated server-side and stored as plaintext JSON on your machine (consistent with how CLI tools like gh, aws, and gcloud store credentials). The apiKey authenticates subsequent detection requests. You can revoke it anytime from the account portal or by deleting the credentials file.

Run registration

/og_activate

If the platform is reachable, you'll see:

OpenGuardrails: Claim Your Agent

Agent ID: <uuid>

Complete these steps to activate behavioral detection:

  1. Visit:  https://www.openguardrails.com/core/claim/<token>
  2. Code:   <word-XXXX>  (e.g. reef-X4B2)
  3. Email:  your email becomes your login for the account portal

After claiming you get 30,000 free detections.
Platform: https://www.openguardrails.com/core

Using an existing API key

If you already have a key (e.g. from a previous registration or from the account portal), set it directly — no /og_activate needed:

{
  "plugins": {
    "entries": {
      "openguardrails": {
        "config": {
          "apiKey": "sk-og-<your-key>"
        }
      }
    }
  }
}

Step 3: Activate (this is where your email is collected)

After registration, complete these steps in your browser to activate cloud assessment. This is the only step that collects your email — the registration API call in Step 2 does not send or collect any email.

  1. Visit the claim URL shown by /og_activate
  2. Enter the verification code (the word-XXXX code displayed in the terminal)
  3. Enter your email — you type this into the web form yourself; it becomes your account identity and is stored for billing
  4. Click the verification link sent to your email

Once your email is verified, the agent status changes to active and behavioral detection begins. The plugin polls for activation status automatically — no restart needed.

Check status anytime:

/og_status

Active output:

OpenGuardrails Status

- Agent ID:  <uuid>
- API Key:   sk-og-xxxxxxxxxxxx...
- Email:     you@example.com
- Platform:  https://www.openguardrails.com/core
- Status:    active

- blockOnRisk: true

Step 4: Test Detection

After email verification, the platform automatically sends you a test email containing a hidden prompt injection. This lets you immediately verify that OpenGuardrails is working.

How to test

  1. Check your inbox for the test email from OpenGuardrails (subject: "Design Review Request")
  2. Save it as a .txt file (e.g. ~/test-email.txt)
  3. Ask the agent to read the file: "Read ~/test-email.txt and summarize it"
  4. OpenGuardrails should detect and redact the hidden injection before the agent processes it

The test email looks like a normal design review request but contains an embedded injection payload in an HTML comment. With OpenGuardrails active, the injected content is replaced with:

__REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__

Alternative: use a sample file

You can also test with a sample file from the repository:

https://raw.githubusercontent.com/openguardrails/openguardrails/main/openclaw-security/samples/popup-injection-email.txt

Download it and ask the agent to read it. OpenGuardrails will detect and redact the injection.

What the detection looks like

When the agent reads a file containing an injection payload, OpenGuardrails:

  1. Scans the content for known injection pattern categories (see detection table above)
  2. Redacts the matched content in-place — the agent never sees the raw payload
  3. Logs a warning with the detected pattern category

Account & Portal

After activation, sign in to the account portal with your email + API key:

https://www.openguardrails.com/core/login

The portal shows:

  • Account overview — plan, quota usage, all agents under your email
  • Agent management — view API keys, regenerate keys
  • Usage logs — per-agent request history with latency and endpoint breakdown
  • Plan upgrades — upgrade from Free to Starter, Pro, or Business

Plans

Plan Price Detections/mo
Free $0 30,000
Starter $19/mo 100,000
Pro $49/mo 300,000
Business $199/mo 2,000,000

If you register multiple agents with the same email, they all share one account and quota.


Commands

Command Description
/og_status Show registration status, email, platform URL, blockOnRisk setting
/og_activate Register (if needed) and show claim URL and activation instructions

Configuration Reference

All options go in ~/.openclaw/openclaw.json under plugins.entries.openguardrails.config:

Option Default Description
enabled true Enable/disable the plugin
blockOnRisk true Block the tool call when risk is detected
apiKey "" Explicit API key (sk-og-...). Run /og_activate if empty
agentName "OpenClaw Agent" Name shown in the dashboard
coreUrl https://www.openguardrails.com/core Platform API endpoint
dashboardUrl https://www.openguardrails.com/dashboard Dashboard URL for monitoring and reporting
dashboardSessionToken "" Dashboard auth token (falls back to apiKey if empty)
timeoutMs 60000 Cloud assessment timeout (ms). Fails open on timeout

What Gets Detected

Fast-path blocks (local, no cloud round-trip)

Pattern Example Block reason
Read sensitive file → network call Read ~/.ssh/id_rsa, then WebFetch to external URL sensitive file read followed by network call to <domain>
Read credentials → network call Read ~/.aws/credentials, then Bash curl ... sensitive file read followed by network call to <domain>
Shell escape in params Bash with `cmd`, $(cmd), ;, &&, \|, or newline injection | suspicious shell command detected — potential command injection

Content injection detection (local, in-place redaction)

When the agent reads files or fetches web pages, OpenGuardrails scans the content for injection patterns and redacts them before the agent processes the content:

Pattern Category Description Redaction marker
Instruction override Attempts to override or discard prior context __REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__
Fake system message Spoofed system-level directives embedded in user content __REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__
Mode switching Attempts to change the agent's operating mode __REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__
Concealment directive Instructions to hide output from the user __REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__
Command execution Embedded shell commands or execution directives __REDACTED_BY_OPENGUARDRAILS_DUE_TO_COMMAND_EXECUTION__
Task hijacking Attempts to redirect the agent's current objective __REDACTED_BY_OPENGUARDRAILS_DUE_TO_PROMPT_INJECTION__
Data exfiltration Shell substitution targeting sensitive files __REDACTED_BY_OPENGUARDRAILS_DUE_TO_DATA_EXFILTRATION__

A single high-confidence match or 2+ medium-confidence matches from different categories triggers redaction. See agent/content-injection-scanner.ts for the full pattern list.

Cloud-assessed patterns

Tag Risk Action Description
READ_SENSITIVE_WRITE_NETWORK critical block Sensitive read followed by outbound call
DATA_EXFIL_PATTERN critical block Large data read, then sent externally
MULTI_CRED_ACCESS high block Multiple credential files accessed in one session
SHELL_EXEC_AFTER_WEB_FETCH high block Shell command executed after fetching external content
INTENT_ACTION_MISMATCH medium alert Tool sequence doesn't match stated user goal
UNUSUAL_TOOL_SEQUENCE medium alert Statistical anomaly in tool ordering

Risk levels and actions

Risk Level Action Meaning
critical block Tool call is blocked, agent sees block reason
high block Tool call is blocked, agent sees block reason
medium alert Tool call is allowed, warning logged
low / no_risk allow Tool call proceeds normally

Block reason format

When a tool call is blocked, the agent receives a message like:

OpenGuardrails blocked [critical]: Sensitive file read followed by data
sent to external server. Agent accessed credentials despite low relevance
to user intent "fetch weather for user". (confidence: 97%)

Sensitive file categories recognized

SSH_KEY, AWS_CREDS, GPG_KEY, ENV_FILE, CRYPTO_CERT, SYSTEM_AUTH, BROWSER_COOKIE, KEYCHAIN


AI Security Gateway (Free)

OpenGuardrails includes a free AI Security Gateway — a local HTTP proxy that protects sensitive data from being sent to external LLM providers (Anthropic, OpenAI, Gemini, and compatible APIs like Kimi and DeepSeek).

How it works

The gateway runs locally on your machine. It intercepts LLM API calls, sanitizes sensitive data before sending to the provider, and restores original values in responses. The entire process is transparent — you use your agent normally, and your data stays protected.

Your prompt: "My card is 6222021234567890, book a hotel"
      ↓ Gateway sanitizes locally
LLM sees: "My card is __bank_card_1__, book a hotel"
      ↓ LLM responds
LLM: "Booking with card __bank_card_1__"
      ↓ Gateway restores locally
You see: "Booking with card 6222021234567890"

The LLM provider never sees the real card number. You see the correct response. No impact on functionality.

Data types sanitized by the gateway

Data Type Placeholder Examples
Email addresses __email_N__ user@example.com
Credit cards __credit_card_N__ 1234-5678-9012-3456
Bank cards __bank_card_N__ 16-19 digit card numbers
Phone numbers __phone_N__ +1-555-123-4567, +86-138-1234-5678
API keys & secrets __secret_N__ sk-..., ghp_..., Bearer tokens, high-entropy tokens
IP addresses __ip_N__ 192.168.1.1
SSN __ssn_N__ 123-45-6789
IBAN __iban_N__ GB82WEST12345698765432
URLs __url_N__ https://example.com/path

More data types will be added based on user needs. Contact us if you need a specific type.

Setup

  1. Set your LLM API keys as environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY)
  2. Start the gateway: npx @openguardrails/gateway (runs on port 8900 by default)
  3. Point your agent's API base URL to http://127.0.0.1:8900

The gateway supports Anthropic (/v1/messages), OpenAI (/v1/chat/completions), and Gemini (/v1/models/{model}:generateContent) endpoints, including streaming.

Key properties

  • 100% local — the gateway runs on localhost, sensitive data never leaves your machine unsanitized
  • Zero dependencies — no npm dependencies beyond Node.js
  • Stateless — placeholder-to-original mappings exist only during the request cycle and are discarded after the response is restored
  • Free — no registration, no API key, no usage limits

Privacy & Data Protection

OpenGuardrails does not collect or sell your content. The detection engine is rule-driven and operates on structured signals — it has no LLM to train and no use for your content. Detection request payloads are not retained. Account data (agent ID, email, plan, usage counts) is stored for billing.

Local-first sanitization

All sensitive data is replaced with category placeholders on your machine before anything is sent to the cloud API:

Data Type Placeholder
Email addresses <EMAIL>
Credit card numbers <CREDIT_CARD>
SSNs <SSN>
IBANs <IBAN>
IP addresses <IP_ADDRESS>
Phone numbers <PHONE>
URLs <URL>
API keys & secrets <SECRET>
High-entropy tokens <SECRET>

The sanitization logic is in agent/sanitizer.ts — audit it yourself.

What stays local (no network calls)

  • Injection redaction — regex-based scanning, fully local
  • Fast-path blocks — shell escape detection, read-then-exfil patterns
  • AI Security Gateway — sanitization and restoration
  • Credentials — stored at ~/.openclaw/credentials/openguardrails/credentials.json
  • Low-risk / no-risk tool calls — never leave the machine

Verification guide

Before installing in production, we recommend:

  1. Audit the source — clone the repo and review these files:
    • index.ts — all event hooks (before_tool_call, after_tool_call, tool_result_persist); confirm no unexpected side effects
    • agent/sanitizer.ts — the sanitization logic that strips PII before any cloud call
    • platform-client/ — every outbound network call the plugin makes; confirm all go to openguardrails.com/core only
    • agent/config.ts:65-68 — the registration request; confirm it sends only { name, description }
  2. Install from source — clone from GitHub, inspect the code, then install locally:
    git clone https://github.com/openguardrails/openguardrails.git
    # Audit the code, then:
    openclaw plugins install -l ./openguardrails/openclaw-security
  3. Run in local-only mode first — skip /og_activate to use all local protections (injection redaction, shell escape blocking, read-then-exfil detection) with zero cloud connectivity
  4. Monitor network traffic — after registration, the plugin only contacts openguardrails.com/core for behavioral assessment; verify with your network monitor of choice
  5. Use a disposable email for initial testing if you prefer not to use your primary email during evaluation
  6. Revoke anytime — each agent gets its own API key; revoke from the account portal or delete ~/.openclaw/credentials/openguardrails/credentials.json

Contact

Have questions, feature requests, or need enterprise deployment support?

We welcome feedback on detection accuracy, requests for new sanitized data types, and enterprise inquiries for private deployment, custom rules, and dedicated support.


Uninstall

rm -rf ~/.openclaw/extensions/openguardrails
# Then manually delete openguardrails config in ~/.openclaw/openclaw.json
# Optionally remove credentials
rm -rf ~/.openclaw/credentials/openguardrails