🛠️ Verification & Quality Assurance
??ージェントの出力やプログラムの信頼性を高
📺 まず動画で見る(YouTube)
▶ 【衝撃】最強のAIエージェント「Claude Code」の最新機能・使い方・プログラミングをAIで効率化する超実践術を解説! ↗
※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。
📜 元の英語説明(参考)
Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.
🇯🇵 日本人クリエイター向け解説
??ージェントの出力やプログラムの信頼性を高
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-17
- 取得日時
- 2026-05-17
- 同梱ファイル
- 1
💬 こう話しかけるだけ — サンプルプロンプト
- › Verification & Quality Assuran を使って、最小構成のサンプルコードを示して
- › Verification & Quality Assuran の主な使い方と注意点を教えて
- › Verification & Quality Assuran を既存プロジェクトに組み込む方法を教えて
これをClaude Code に貼るだけで、このSkillが自動発動します。
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
Verification & Quality Assurance Skill
What This Skill Does
This skill provides a comprehensive verification and quality assurance system that ensures code quality and correctness through:
- Truth Scoring: Real-time reliability metrics (0.0-1.0 scale) for code, agents, and tasks
- Verification Checks: Automated code correctness, security, and best practices validation
- Automatic Rollback: Instant reversion of changes that fail verification (default threshold: 0.95)
- Quality Metrics: Statistical analysis with trends, confidence intervals, and improvement tracking
- CI/CD Integration: Export capabilities for continuous integration pipelines
- Real-time Monitoring: Live dashboards and watch modes for ongoing verification
Shipped vs. aspirational. The concrete, in-CI verification stack — the 6 regression-guard jobs + the witness manifest + the tool-discoverability audit — is real and runs on every push. The truth-scoring / auto-rollback / WebSocket-dashboard surface described later in this doc is partly shipped (
ruflo verifyruns the witness checks) and partly design — treat the "CI Guards" section below as the authoritative current state.
CI Guards — what's actually shipped (current state)
Ruflo's regression protection is three layers, all gated before publish. Authoritative reference: verification/README.md.
| Layer | What | CI job(s) in .github/workflows/v3-ci.yml |
ADR |
|---|---|---|---|
| 1 — install/behavioral smoke | Exercise user-visible failure modes against a real build | smoke-install-no-bsqlite (npm install on platforms w/o prebuilds), plugin-hooks-smoke (#1859/#1862 — hook flag parsing), mcp-protocol-smoke (#1874 — HTTP MCP wire format), memory-import-smoke (#1883/#1884 — WSL path + key sanitization), mcp-roundtrip-smoke (#1889 paired-tool round-trip + #1863 cli-no-crash + ADR-095 G2 consensus-transport) |
ADR-102 |
| 1 — discoverability gate | Every MCP tool description must answer "use this over native when?" | tool-descriptions-audit — scripts/audit-tool-descriptions.mjs, baseline at verification/mcp-tool-baseline.json (monotone-decreasing: noGuidance / tooShort / duplicates) |
ADR-112 |
| 2 — cryptographic witness | Every documented fix's load-bearing marker must still be present in dist; Ed25519-signed, per-OS bundles | witness-verify (ubuntu/macos/windows) — plugins/ruflo-core/scripts/witness/verify.mjs against verification/<os>/manifest.md.json |
ADR-103 |
| 3 — temporal history | When was a regression introduced | verification/<os>/history.jsonl + history.mjs (summary / regressions / timeline) |
ADR-103 |
Run the guards locally
# Tool-description discoverability audit (ADR-112)
node scripts/audit-tool-descriptions.mjs # fails if any baseline count rises
node scripts/audit-tool-descriptions.mjs --update-baseline # lock the new floor after a fix lands
# Behavioral smokes (each builds what it needs; safe to run individually)
node plugins/ruflo-core/scripts/test-hooks.mjs "node $PWD/v3/@claude-flow/cli/bin/cli.js"
node plugins/ruflo-core/scripts/test-mcp-protocol.mjs
node plugins/ruflo-core/scripts/test-memory-import.mjs
node plugins/ruflo-core/scripts/test-mcp-roundtrips.mjs # #1889 paired-tool round-trip
node plugins/ruflo-core/scripts/test-cli-no-crash.mjs # #1863 unhandled-exception class
node plugins/ruflo-core/scripts/test-consensus-transport.mjs # ADR-095 G2 consensus transport
# Witness manifest — regenerate + verify
node scripts/regen-witness.mjs
node plugins/ruflo-core/scripts/witness/verify.mjs --manifest verification/macos/manifest.md.json
# Temporal history
node plugins/ruflo-core/scripts/witness/history.mjs --history verification/macos/history.jsonl summary
node plugins/ruflo-core/scripts/witness/history.mjs --history verification/macos/history.jsonl regressions
Adding a new guard
- Behavioral smoke → write
plugins/ruflo-core/scripts/test-<name>.mjs. Pattern: static dist-scan first (fast, always completes), behavioral probe second with an internal timeout + a process-level watchdog so CI never hangs. Add a step to the relevant job inv3-ci.yml. - Static gate with a baseline → write
scripts/audit-<name>.mjsthat scans, counts violations, and fails if the count exceeds a monotone-decreasing baseline inverification/<name>-baseline.json. Support--update-baseline. Add a CI job; wire it intowitness-verifyneeds[]if it should gatepublish. - Documented-fix marker → append
{ id, desc, file, marker }toverification/witness-fixes.json, runnode scripts/regen-witness.mjs. The marker must be a substring the fix specifically creates (not a generic pattern like'function').
Prerequisites
- Ruflo installed (
npx ruflo@alpha) - Git repository (for rollback features)
- Node.js 18+ (for dashboard features)
@noble/ed25519(for the witness verifier — a single runtime dep,npm i @noble/ed25519)
Quick Start
# View current truth scores
npx ruflo@alpha truth
# Run verification check
npx ruflo@alpha verify check
# Verify specific file with custom threshold
npx ruflo@alpha verify check --file src$app.js --threshold 0.98
# Rollback last failed verification
npx ruflo@alpha verify rollback --last-good
Complete Guide
Truth Scoring System
View Truth Metrics
Display comprehensive quality and reliability metrics for your codebase and agent tasks.
Basic Usage:
# View current truth scores (default: table format)
npx ruflo@alpha truth
# View scores for specific time period
npx ruflo@alpha truth --period 7d
# View scores for specific agent
npx ruflo@alpha truth --agent coder --period 24h
# Find files$tasks below threshold
npx ruflo@alpha truth --threshold 0.8
Output Formats:
# Table format (default)
npx ruflo@alpha truth --format table
# JSON for programmatic access
npx ruflo@alpha truth --format json
# CSV for spreadsheet analysis
npx ruflo@alpha truth --format csv
# HTML report with visualizations
npx ruflo@alpha truth --format html --export report.html
Real-time Monitoring:
# Watch mode with live updates
npx ruflo@alpha truth --watch
# Export metrics automatically
npx ruflo@alpha truth --export .claude-flow$metrics$truth-$(date +%Y%m%d).json
Truth Score Dashboard
Example dashboard output:
📊 Truth Metrics Dashboard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Overall Truth Score: 0.947 ✅
Trend: ↗️ +2.3% (7d)
Top Performers:
verification-agent 0.982 ⭐
code-analyzer 0.971 ⭐
test-generator 0.958 ✅
Needs Attention:
refactor-agent 0.821 ⚠️
docs-generator 0.794 ⚠️
Recent Tasks:
task-456 0.991 ✅ "Implement auth"
task-455 0.967 ✅ "Add tests"
task-454 0.743 ❌ "Refactor API"
Metrics Explained
Truth Scores (0.0-1.0):
1.0-0.95: Excellent ⭐ (production-ready)0.94-0.85: Good ✅ (acceptable quality)0.84-0.75: Warning ⚠️ (needs attention)<0.75: Critical ❌ (requires immediate action)
Trend Indicators:
- ↗️ Improving (positive trend)
- → Stable (consistent performance)
- ↘️ Declining (quality regression detected)
Statistics:
- Mean Score: Average truth score across all measurements
- Median Score: Middle value (less affected by outliers)
- Standard Deviation: Consistency of scores (lower = more consistent)
- Confidence Interval: Statistical reliability of measurements
Verification Checks
Run Verification
Execute comprehensive verification checks on code, tasks, or agent outputs.
File Verification:
# Verify single file
npx ruflo@alpha verify check --file src$app.js
# Verify directory recursively
npx ruflo@alpha verify check --directory src/
# Verify with auto-fix enabled
npx ruflo@alpha verify check --file src$utils.js --auto-fix
# Verify current working directory
npx ruflo@alpha verify check
Task Verification:
# Verify specific task output
npx ruflo@alpha verify check --task task-123
# Verify with custom threshold
npx ruflo@alpha verify check --task task-456 --threshold 0.99
# Verbose output for debugging
npx ruflo@alpha verify check --task task-789 --verbose
Batch Verification:
# Verify multiple files in parallel
npx ruflo@alpha verify batch --files "*.js" --parallel
# Verify with pattern matching
npx ruflo@alpha verify batch --pattern "src/**/*.ts"
# Integration test suite
npx ruflo@alpha verify integration --test-suite full
Verification Criteria
The verification system evaluates:
-
Code Correctness
- Syntax validation
- Type checking (TypeScript)
- Logic flow analysis
- Error handling completeness
-
Best Practices
- Code style adherence
- SOLID principles
- Design patterns usage
- Modularity and reusability
-
Security
- Vulnerability scanning
- Secret detection
- Input validation
- Authentication$authorization checks
-
Performance
- Algorithmic complexity
- Memory usage patterns
- Database query optimization
- Bundle size impact
-
Documentation
- JSDoc/TypeDoc completeness
- README accuracy
- API documentation
- Code comments quality
JSON Output for CI/CD
# Get structured JSON output
npx ruflo@alpha verify check --json > verification.json
# Example JSON structure:
{
"overallScore": 0.947,
"passed": true,
"threshold": 0.95,
"checks": [
{
"name": "code-correctness",
"score": 0.98,
"passed": true
},
{
"name": "security",
"score": 0.91,
"passed": false,
"issues": [...]
}
]
}
Automatic Rollback
Rollback Failed Changes
Automatically revert changes that fail verification checks.
Basic Rollback:
# Rollback to last known good state
npx ruflo@alpha verify rollback --last-good
# Rollback to specific commit
npx ruflo@alpha verify rollback --to-commit abc123
# Interactive rollback with preview
npx ruflo@alpha verify rollback --interactive
Smart Rollback:
# Rollback only failed files (preserve good changes)
npx ruflo@alpha verify rollback --selective
# Rollback with automatic backup
npx ruflo@alpha verify rollback --backup-first
# Dry-run mode (preview without executing)
npx ruflo@alpha verify rollback --dry-run
Rollback Performance:
- Git-based rollback: <1 second
- Selective file rollback: <500ms
- Backup creation: Automatic before rollback
Verification Reports
Generate Reports
Create detailed verification reports with metrics and visualizations.
Report Formats:
# JSON report
npx ruflo@alpha verify report --format json
# HTML report with charts
npx ruflo@alpha verify report --export metrics.html --format html
# CSV for data analysis
npx ruflo@alpha verify report --format csv --export metrics.csv
# Markdown summary
npx ruflo@alpha verify report --format markdown
Time-based Reports:
# Last 24 hours
npx ruflo@alpha verify report --period 24h
# Last 7 days
npx ruflo@alpha verify report --period 7d
# Last 30 days with trends
npx ruflo@alpha verify report --period 30d --include-trends
# Custom date range
npx ruflo@alpha verify report --from 2025-01-01 --to 2025-01-31
Report Content:
- Overall truth scores
- Per-agent performance metrics
- Task completion quality
- Verification pass$fail rates
- Rollback frequency
- Quality improvement trends
- Statistical confidence intervals
Interactive Dashboard
Launch Dashboard
Run interactive web-based verification dashboard with real-time updates.
# Launch dashboard on default port (3000)
npx ruflo@alpha verify dashboard
# Custom port
npx ruflo@alpha verify dashboard --port 8080
# Export dashboard data
npx ruflo@alpha verify dashboard --export
# Dashboard with auto-refresh
npx ruflo@alpha verify dashboard --refresh 5s
Dashboard Features:
- Real-time truth score updates (WebSocket)
- Interactive charts and graphs
- Agent performance comparison
- Task history timeline
- Rollback history viewer
- Export to PDF/HTML
- Filter by time period$agent$score
Configuration
Default Configuration
Set verification preferences in .claude-flow$config.json:
{
"verification": {
"threshold": 0.95,
"autoRollback": true,
"gitIntegration": true,
"hooks": {
"preCommit": true,
"preTask": true,
"postEdit": true
},
"checks": {
"codeCorrectness": true,
"security": true,
"performance": true,
"documentation": true,
"bestPractices": true
}
},
"truth": {
"defaultFormat": "table",
"defaultPeriod": "24h",
"warningThreshold": 0.85,
"criticalThreshold": 0.75,
"autoExport": {
"enabled": true,
"path": ".claude-flow$metrics$truth-daily.json"
}
}
}
Threshold Configuration
Adjust verification strictness:
# Strict mode (99% accuracy required)
npx ruflo@alpha verify check --threshold 0.99
# Lenient mode (90% acceptable)
npx ruflo@alpha verify check --threshold 0.90
# Set default threshold
npx ruflo@alpha config set verification.threshold 0.98
Per-environment thresholds:
{
"verification": {
"thresholds": {
"production": 0.99,
"staging": 0.95,
"development": 0.90
}
}
}
Integration Examples
CI/CD Integration
GitHub Actions:
name: Quality Verification
on: [push, pull_request]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions$checkout@v3
- name: Install Dependencies
run: npm install
- name: Run Verification
run: |
npx ruflo@alpha verify check --json > verification.json
- name: Check Truth Score
run: |
score=$(jq '.overallScore' verification.json)
if (( $(echo "$score < 0.95" | bc -l) )); then
echo "Truth score too low: $score"
exit 1
fi
- name: Upload Report
uses: actions$upload-artifact@v3
with:
name: verification-report
path: verification.json
GitLab CI:
verify:
stage: test
script:
- npx ruflo@alpha verify check --threshold 0.95 --json > verification.json
- |
score=$(jq '.overallScore' verification.json)
if [ $(echo "$score < 0.95" | bc) -eq 1 ]; then
echo "Verification failed with score: $score"
exit 1
fi
artifacts:
paths:
- verification.json
reports:
junit: verification.json
Swarm Integration
Run verification automatically during swarm operations:
# Swarm with verification enabled
npx ruflo@alpha swarm --verify --threshold 0.98
# Hive Mind with auto-rollback
npx ruflo@alpha hive-mind --verify --rollback-on-fail
# Training pipeline with verification
npx ruflo@alpha train --verify --threshold 0.99
Pair Programming Integration
Enable real-time verification during collaborative development:
# Pair with verification
npx ruflo@alpha pair --verify --real-time
# Pair with custom threshold
npx ruflo@alpha pair --verify --threshold 0.97 --auto-fix
Advanced Workflows
Continuous Verification
Monitor codebase continuously during development:
# Watch directory for changes
npx ruflo@alpha verify watch --directory src/
# Watch with auto-fix
npx ruflo@alpha verify watch --directory src/ --auto-fix
# Watch with notifications
npx ruflo@alpha verify watch --notify --threshold 0.95
Monitoring Integration
Send metrics to external monitoring systems:
# Export to Prometheus
npx ruflo@alpha truth --format json | \
curl -X POST https:/$pushgateway.example.com$metrics$job$claude-flow \
-d @-
# Send to DataDog
npx ruflo@alpha verify report --format json | \
curl -X POST "https:/$api.datadoghq.com$api$v1$series?api_key=${DD_API_KEY}" \
-H "Content-Type: application$json" \
-d @-
# Custom webhook
npx ruflo@alpha truth --format json | \
curl -X POST https:/$metrics.example.com$api$truth \
-H "Content-Type: application$json" \
-d @-
Pre-commit Hooks
Automatically verify before commits:
# Install pre-commit hook
npx ruflo@alpha verify install-hook --pre-commit
# .git$hooks$pre-commit example:
#!$bin$bash
npx ruflo@alpha verify check --threshold 0.95 --json > $tmp$verify.json
score=$(jq '.overallScore' $tmp$verify.json)
if (( $(echo "$score < 0.95" | bc -l) )); then
echo "❌ Verification failed with score: $score"
echo "Run 'npx ruflo@alpha verify check --verbose' for details"
exit 1
fi
echo "✅ Verification passed with score: $score"
Performance Metrics
Verification Speed:
- Single file check: <100ms
- Directory scan: <500ms (per 100 files)
- Full codebase analysis: <5s (typical project)
- Truth score calculation: <50ms
Rollback Speed:
- Git-based rollback: <1s
- Selective file rollback: <500ms
- Backup creation: <2s
Dashboard Performance:
- Initial load: <1s
- Real-time updates: <100ms latency (WebSocket)
- Chart rendering: 60 FPS
Troubleshooting
Common Issues
Low Truth Scores:
# Get detailed breakdown
npx ruflo@alpha truth --verbose --threshold 0.0
# Check specific criteria
npx ruflo@alpha verify check --verbose
# View agent-specific issues
npx ruflo@alpha truth --agent <agent-name> --format json
Rollback Failures:
# Check git status
git status
# View rollback history
npx ruflo@alpha verify rollback --history
# Manual rollback
git reset --hard HEAD~1
Verification Timeouts:
# Increase timeout
npx ruflo@alpha verify check --timeout 60s
# Verify in batches
npx ruflo@alpha verify batch --batch-size 10
Exit Codes
Verification commands return standard exit codes:
0: Verification passed (score ≥ threshold)1: Verification failed (score < threshold)2: Error during verification (invalid input, system error)
Related Commands
npx ruflo@alpha pair- Collaborative development with verificationnpx ruflo@alpha train- Training with verification feedbacknpx ruflo@alpha swarm- Multi-agent coordination with quality checksnpx ruflo@alpha report- Generate comprehensive project reports
Best Practices
- Set Appropriate Thresholds: Use 0.99 for critical code, 0.95 for standard, 0.90 for experimental
- Enable Auto-rollback: Prevent bad code from persisting
- Monitor Trends: Track improvement over time, not just current scores
- Integrate with CI/CD: Make verification part of your pipeline
- Use Watch Mode: Get immediate feedback during development
- Export Metrics: Track quality metrics in your monitoring system
- Review Rollbacks: Understand why changes were rejected
- Train Agents: Use verification feedback to improve agent performance
Additional Resources
- Truth Scoring Algorithm: See
$docs$truth-scoring.md - Verification Criteria: See
$docs$verification-criteria.md - Integration Examples: See
$examples$verification/ - API Reference: See
$docs$api$verification.md