assessment-builder
学習目標に沿った多様な質問形式(選択式、コード補完、デバッグ、プロジェクト)で、よくある誤解に基づいた意味のある選択肢を持つ評価を作成し、理解度を測るテストや試験を設計する際に役立つSkill。
📜 元の英語説明(参考)
Creates assessments with varied question types (MCQ, code-completion, debugging, projects) aligned to learning objectives with meaningful distractors based on common misconceptions. Activate when educators design quizzes, exams, or tests measuring understanding; need questions at appropriate cognitive levels (Bloom's taxonomy); want balanced cognitive distribution (60%+ non-recall); or require rubrics for open-ended questions. Generates MCQs with diagnostic distractors, code-writing prompts, debugging challenges, and project-based assessments targeting deep understanding.
🇯🇵 日本人クリエイター向け解説
学習目標に沿った多様な質問形式(選択式、コード補完、デバッグ、プロジェクト)で、よくある誤解に基づいた意味のある選択肢を持つ評価を作成し、理解度を測るテストや試験を設計する際に役立つSkill。
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o assessment-builder.zip https://jpskill.com/download/16850.zip && unzip -o assessment-builder.zip && rm assessment-builder.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/16850.zip -OutFile "$d\assessment-builder.zip"; Expand-Archive "$d\assessment-builder.zip" -DestinationPath $d -Force; ri "$d\assessment-builder.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
assessment-builder.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
assessment-builderフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 2
📖 Skill本文(日本語訳)
※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。
目的
assessment-builder スキルは、教育者が概念的な理解(単なる暗記ではない)を測定する、包括的でバランスの取れた評価を作成するのに役立ちます。このスキルは、多様な質問タイプを生成し、MCQ のための意味のあるディストラクターを設計し、質問をブルームのタキソノミーレベルに合わせ、自由形式の質問のためのルーブリックを提供します。
Constitution v4.0.1 への適合: このスキルは、評価優先の評価設計(すべての柱の基礎)を実装します。つまり、評価を作成する前に成功基準を定義し、セクション IIb(AI の 3 つの役割フレームワーク)の共同学習評価を統合します。
評価優先の評価設計 (Constitution v4.0.1)
重要なワークフロー:
- 評価優先: 質問を設計する前に、章の仕様から成功基準を定義します。
- 目的を次に: 質問を(仕様からの)学習目標にマッピングします。
- 質問を第三に: 評価をテストする評価項目を設計します。
- 検証を第四に: 質問が実際に評価で定義されたものを測定していることを確認します。
評価計画のテンプレート:
### 評価計画 (評価優先)
**ソース**: `specs/part-X/chapter-Y/spec.md` の章の仕様
**仕様からの成功評価**:
1. 75% 以上の学生が有効な仕様を記述する(演習で測定)
2. 80% 以上が曖昧な要件を特定する(クイズで測定)
3. 学生が共同学習を実証する(振り返りで測定)
**学習目標** (仕様から):
- LO-001: 明確な仕様を記述する
- LO-002: 曖昧な要件を特定する
- LO-003: AI パートナーと効果的に連携する
**評価設計**:
- Q1-3: LO-001 (仕様記述) をテスト → 評価 #1
- Q4-5: LO-002 (曖昧さの特定) をテスト → 評価 #2
- Q6: LO-003 (共同学習) をテスト → 評価 #3
以下なしで評価を作成しないでください:
- ✅ 承認された評価を含む仕様への参照
- ✅ 明示的なマッピング: 質問 → 目標 → 評価
- ✅ 質問が定義された成功基準を測定していることの検証
いつアクティブ化するか
このスキルは、以下の場合に使用します。
- 教育者がクイズ、テスト、または試験を作成する必要がある場合
- 学習目標に沿った評価を設計する場合
- 多様な質問タイプ (MCQ、コード記述、デバッグ、プロジェクト) を作成する場合
- 一般的な学生の誤りに基づいた意味のあるディストラクターが必要な場合
- バランスの取れた認知分布 (単なる想起問題ではない) が必要な場合
- 自由形式のプログラミング問題のルーブリックを生成する場合
- 既存の評価の品質とバランスを評価する場合
- 60% 以上の質問が応用または高次の思考をテストしていることを確認する場合
入力
必須:
- 学習目標: 学習者が実証できるはずのこと
- 概念/トピック: 評価する Python の概念
オプション:
- 質問数: 生成する質問の数
- 質問タイプ: 希望する形式 (MCQ、コード記述、デバッグなど)
- 対象読者: beginner | intermediate | advanced
- 制限時間: 利用可能な評価時間
- ブルームの分布: 希望する認知レベルの組み合わせ
プロセス
ステップ 1: 評価目標を明確にする
以下を理解します。
- 評価する特定のスキル/知識
- 必要な理解の深さ
- 形式の制約 (時間、配信方法)
- 重要度 (形成的チェック vs. 重大な試験)
ステップ 2: 質問タイプのリファレンスをロードする
質問タイプのパターンを読みます。
Read reference/question-types.md
利用可能なタイプ:
- MCQ: 概念認識、意思決定
- Code-completion: 構文の適用、戦略的なギャップの埋め込み
- Code-tracing: メンタル実行モデルのテスト
- Debugging: エラー認識と修正
- Code-writing: 仕様からの実装
- Explanation: 概念的な理解のテスト
- Code-review: コード品質の評価
- Prediction: 動作の予測
- Comparison: アプローチの区別
- Project: 複数の概念の統合
ステップ 3: ブルームの評価アライメントをロードする
認知レベルのガイドラインを読みます。
Read reference/blooms-assessment-alignment.md
質問タイプをブルームのレベルにマッピングします。
- Remember (10-15%): MCQ の想起、用語
- Understand (15-20%): トレーシング、予測、説明
- Apply (30-40%): コード補完、簡単なコード記述
- Analyze (20-25%): デバッグ、比較、コードレビュー
- Evaluate (10-15%): 最適なアプローチの選択、トレードオフ
- Create (5-10%): プロジェクト、設計問題
ターゲット: 60% 以上の非想起 (Apply 以上)
CEFR 熟練度統合 (Constitution v3.1.2)
評価の難易度を CEFR 熟練度レベル (skills-proficiency-mapper に準拠) にマッピングします。
A1 (初心者 - 認識):
- MCQ: 構文の認識、基本的な概念の識別
- Code-tracing: 単純な線形コードの出力を予測
- まだデバッグや設計の質問はない
A2 (初級 - ガイド付き適用):
- MCQ: スキャフォールディングによる正しいアプローチの選択
- Code-completion: ヒントによる戦略的な空白の埋め込み
- 簡単なデバッグ: ガイダンスによる構文エラーの識別
B1 (中級 - 独立した適用):
- Code-writing: 明確な仕様からの実装
- Debugging: ロジックエラーの独立した発見と修正
- Code-review: 簡単なコード品質の評価
B2 (準上級 - 分析):
- 設計問題: 適切なデータ構造/アルゴリズムの選択
- 最適化: パフォーマンス改善の識別
- トレードオフ評価: 複数の有効なアプローチの比較
C1 (上級 - 統合):
- アーキテクチャ設計: システム構造の計画
- 複雑なデバッグ: 多層的な問題の診断
- ベストプラクティスの正当化: 設計上の決定の擁護
評価設計ルール: 質問は、レッスンが対象とする CEFR レベル (仕様から) と一致する必要があります。
3 つの役割の AI パートナーシップ評価 (セクション IIb, Constitution v4.0.1)
重要: AI ネイティブな開発では、学生が AI と双方向の共同学習パートナーシップで連携する能力(セクション IIb の強制関数による)を評価する必要があります。単に独立して行うだけではありません。
AI の 3 つの役割 - 評価タイプ:
1. AI を教師として (学生は AI から学ぶか?)
- 振り返り質問: "何
(原文はここで切り詰められています)
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開
Purpose
The assessment-builder skill helps educators create comprehensive, balanced assessments that measure conceptual understanding (not just memorization). This skill generates varied question types, designs meaningful distractors for MCQs, aligns questions with Bloom's taxonomy levels, and provides rubrics for open-ended questions.
Constitution v4.0.1 Alignment: This skill implements evals-first assessment design (foundational to all pillars)—defining success criteria BEFORE creating assessments, and integrating Section IIb (AI Three Roles Framework) co-learning evaluation.
Evals-First Assessment Design (Constitution v4.0.1)
CRITICAL WORKFLOW:
- Evals First: Define success criteria from chapter spec BEFORE designing questions
- Objectives Second: Map questions to learning objectives (from spec)
- Questions Third: Design assessment items that test evals
- Validation Fourth: Verify questions actually measure what evals define
Template for Assessment Planning:
### Assessment Planning (Evals-First)
**Source**: Chapter spec at `specs/part-X/chapter-Y/spec.md`
**Success Evals from Spec**:
1. 75%+ students write valid specification (measured by exercise)
2. 80%+ identify vague requirements (measured by quiz)
3. Students demonstrate co-learning (measured by reflection)
**Learning Objectives** (from spec):
- LO-001: Write clear specifications
- LO-002: Identify ambiguous requirements
- LO-003: Work effectively with AI partner
**Assessment Design**:
- Q1-3: Test LO-001 (spec writing) → Eval #1
- Q4-5: Test LO-002 (identify vagueness) → Eval #2
- Q6: Test LO-003 (co-learning) → Eval #3
Do NOT create assessments without:
- ✅ Reference to approved spec with evals
- ✅ Explicit mapping: Question → Objective → Eval
- ✅ Validation that questions measure defined success criteria
When to Activate
Use this skill when:
- Educators need to create quizzes, tests, or exams
- Designing assessments aligned with learning objectives
- Creating varied question types (MCQ, code-writing, debugging, projects)
- Need meaningful distractors based on common student errors
- Want balanced cognitive distribution (not just recall questions)
- Generating rubrics for open-ended programming questions
- Evaluating existing assessments for quality and balance
- Ensuring 60%+ questions test application or higher-order thinking
Inputs
Required:
- Learning objectives: What learners should be able to demonstrate
- Concept/topic: Python concepts to assess
Optional:
- Number of questions: How many questions to generate
- Question types: Preferred formats (MCQ, code-writing, debugging, etc.)
- Target audience: beginner | intermediate | advanced
- Time limit: Available assessment time
- Bloom's distribution: Desired cognitive level mix
Process
Step 1: Clarify Assessment Goals
Understand:
- What specific skills/knowledge to assess
- Depth of understanding required
- Format constraints (time, delivery method)
- Stakes (formative check vs. high-stakes exam)
Step 2: Load Question Type Reference
Read question type patterns:
Read reference/question-types.md
Available types:
- MCQ: Concept recognition, decision-making
- Code-completion: Apply syntax, fill strategic gaps
- Code-tracing: Test mental execution model
- Debugging: Error recognition and fixing
- Code-writing: Implementation from specification
- Explanation: Test conceptual understanding
- Code-review: Evaluate code quality
- Prediction: Anticipate behavior
- Comparison: Distinguish approaches
- Project: Integrate multiple concepts
Step 3: Load Bloom's Assessment Alignment
Read cognitive level guidelines:
Read reference/blooms-assessment-alignment.md
Map question types to Bloom's levels:
- Remember (10-15%): MCQ recall, terminology
- Understand (15-20%): Tracing, prediction, explanation
- Apply (30-40%): Code-completion, simple code-writing
- Analyze (20-25%): Debugging, comparison, code-review
- Evaluate (10-15%): Best-approach selection, trade-offs
- Create (5-10%): Projects, design problems
Target: 60%+ non-recall (Apply and higher)
CEFR Proficiency Integration (Constitution v3.1.2)
Map assessment difficulty to CEFR proficiency levels (aligned with skills-proficiency-mapper):
A1 (Beginner - Recognition):
- MCQs: Recognize syntax, identify basic concepts
- Code-tracing: Predict simple, linear code output
- No debugging or design questions yet
A2 (Elementary - Guided Application):
- MCQs: Choose correct approach with scaffolding
- Code-completion: Fill strategic blanks with hints
- Simple debugging: Identify syntax errors with guidance
B1 (Intermediate - Independent Application):
- Code-writing: Implement from clear specification
- Debugging: Find and fix logic errors independently
- Code-review: Evaluate simple code quality
B2 (Upper-Intermediate - Analysis):
- Design questions: Choose appropriate data structures/algorithms
- Optimization: Identify performance improvements
- Trade-off evaluation: Compare multiple valid approaches
C1 (Advanced - Synthesis):
- Architecture design: Plan system structure
- Complex debugging: Diagnose multi-layered issues
- Best practices justification: Defend design decisions
Assessment Design Rule: Questions must match lesson's target CEFR level (from spec).
Three-Role AI Partnership Assessment (Section IIb, Constitution v4.0.1)
CRITICAL: AI-native development requires assessing students' ability to work WITH AI in bidirectional co-learning partnership (per Section IIb forcing functions), not just independently.
AI's Three Roles - Assessment Types:
1. AI as Teacher (Does student learn from AI?)
- Reflection question: "What did AI suggest that you hadn't considered?"
- Evaluation: "Which of AI's three suggestions was most valuable? Why?"
- Transfer: "How would you apply AI's pattern to a new problem?"
2. AI as Student (Does student effectively teach AI?)
- Specification quality: "Write a spec that produces correct code on first try"
- Feedback quality: "Improve AI's output by providing clear corrections"
- Iteration: "Refine your prompt based on AI's initial response"
3. AI as Co-Worker (Does student collaborate effectively?)
- Convergence: "Show iteration where you and AI refined solution together"
- Decision-making: "Which strategic decisions did you make vs. AI's tactical suggestions?"
- Validation: "How did you verify AI's output was correct?"
Example Assessment Items:
### Q: Co-Learning Reflection (10 points)
In the previous exercise, you worked with AI to implement authentication.
1. What pattern or approach did AI suggest that you hadn't considered? (5 pts)
2. How did you validate that AI's suggestion was appropriate? (5 pts)
**Rubric**:
- Excellent (10): Specific AI suggestion identified, clear validation method
- Good (7): General AI contribution mentioned, basic validation
- Fair (4): Vague answer, no validation mentioned
- Poor (0): No evidence of learning from AI
Assessment Balance for AI-Native Content:
- 60-70%: Traditional skills (code-writing, debugging)
- 20-30%: Co-learning skills (working WITH AI)
- 10-20%: Validation/verification skills (checking AI outputs)
Step 4: Design Varied Question Set
Create 5-10 questions with variety:
Example Distribution:
Q1: MCQ (Understand) - Concept check
Q2: Code-tracing (Understand) - Predict output
Q3: Code-completion (Apply) - Fill strategic blank
Q4: Code-writing (Apply) - Implement function
Q5: Debugging (Analyze) - Find and fix errors
Q6: Code-review (Evaluate) - Assess quality
Q7: Project (Create) - Integrate concepts
Avoid: 10 identical MCQs (tests memorization, not understanding)
Step 5: Design MCQ Distractors
For multiple-choice questions, load distractor design guide:
Read reference/distractor-design.md
Create meaningful distractors based on common misconceptions:
Process:
- Identify correct answer
- List common errors students make
- Create distractor for each specific misconception
- Ensure distractors are plausible (not obviously wrong)
Example:
Q: After this code, what does x contain?
x = [1, 2, 3]
y = x
y.append(4)
A) [1, 2, 3, 4] ← Correct (understands references)
B) [1, 2, 3] ← Thinks x is independent of y
C) [4] ← Misunderstands append
D) Error ← Thinks modification through y is invalid
Distractor Analysis:
B tests: Understanding of assignment vs. copy
C tests: Understanding of append operation
D tests: Understanding of reference semantics
Step 6: Create Rubrics for Open-Ended Questions
For code-writing, projects, and explanations, load rubric guidelines:
Read reference/rubric-guidelines.md
Create analytic rubric with criteria:
- Correctness (40%): Produces correct output
- Code Quality (25%): Readable, well-structured
- Efficiency (20%): Appropriate algorithm
- Error Handling (10%): Edge cases
- Style (5%): Python best practices
Each criterion has 4 levels: excellent, good, fair, poor
Load templates:
Read templates/rubric-template.yml
Step 7: Validate Cognitive Distribution
Check assessment balance using validation script:
python .claude/skills/assessment-builder/scripts/validate-assessment.py assessment.yml
The script checks:
- Cognitive distribution: 60%+ non-recall?
- Question variety: Multiple question types?
- MCQ distractors: Analysis provided?
- Rubrics: Present for open-ended questions?
- Objective alignment: Questions map to objectives?
Review output:
- Validation score: >80 is good, <60 needs revision
- Issues: Critical problems to fix
- Warnings: Quality improvements
- Recommendations: Specific suggestions
Step 8: Refine Based on Validation
If validation identifies issues:
Issue: "Only 40% non-recall questions" Fix: Replace 2-3 MCQ recall questions with code-writing or debugging
Issue: "MCQs lack distractor analysis" Fix: Document what misconception each distractor tests
Issue: "Missing rubrics for code-writing" Fix: Add rubric with correctness, quality, efficiency criteria
Step 9: Add Answer Key and Explanations
For each question:
- Correct answer: What is right
- Explanation: Why it's correct
- Common errors: Why distractors are wrong
- Grading guidance: How to score
Example:
Q3: Write function to sum even numbers
Answer:
def sum_evens(numbers):
return sum(num for num in numbers if num % 2 == 0)
Explanation:
- Iterates through numbers
- Filters even (num % 2 == 0)
- Sums filtered values
- Handles empty list (sum returns 0)
Common Errors:
- Forgetting to check if num % 2 == 0
- Using == instead of %= for modulo
- Not handling empty list
Step 10: Package Complete Assessment
Structure assessment following template:
Read templates/assessment-template.yml
Include:
- Metadata: Title, audience, time limit, points
- Learning objectives: What's being assessed
- Questions: All with type, points, Bloom's level
- Answer key: Solutions and explanations
- Rubrics: For open-ended questions
- Cognitive analysis: Distribution breakdown
Output Format
Provide assessment as structured markdown:
# Assessment: [Title]
**Target Audience**: [beginner/intermediate/advanced]
**Time Limit**: [X minutes]
**Total Points**: [X]
**Passing Score**: [X]
## Learning Objectives Assessed
- [Objective 1]
- [Objective 2]
- [...]
## Cognitive Distribution
- Remember/Understand: X%
- Apply: X%
- Analyze/Evaluate: X%
- Create: X%
---
## Question 1 (X points) [Type: MCQ] [Bloom: Understand]
[Question text]
A) [Option]
B) [Option]
C) [Option]
D) [Option]
---
## Question 2 (X points) [Type: Code-Writing] [Bloom: Apply]
Write a function that [specification].
**Requirements**:
- [Requirement 1]
- [Requirement 2]
**Test Cases**:
1. Input: [x], Expected: [y]
2. Input: [x], Expected: [y]
---
[Continue for all questions...]
---
## Answer Key
### Question 1
**Answer**: B
**Explanation**: [Why B is correct and why A, C, D are wrong]
**Distractor Analysis**:
- A tests: [misconception]
- C tests: [misconception]
- D tests: [misconception]
### Question 2
**Solution**:
```python
[Complete correct solution]
Acceptance Checks
- [ ] SpecRef present and each question maps to at least one objective ID
- [ ] Objective Coverage Matrix included (question → objective IDs, Bloom level)
- [ ] ≥ 60% non‑recall; variety of question types present
- [ ] Validation Pack attached: answer key, explanations, distractor analyses, and rubric(s)
Objective Coverage Matrix (example)
Q1 → [LO-001] (Understand)
Q2 → [LO-002] (Apply)
Q3 → [LO-002, LO-003] (Analyze)
...
Rubric:
Correctness (40 pts):
- Excellent (40): Works for all test cases
- Good (30): Works for standard cases
- Fair (20): Partial functionality
- Poor (10): Significant errors
Code Quality (25 pts):
- Excellent (25): Clear names, good structure
- Good (19): Mostly readable
- Fair (13): Some confusion
- Poor (6): Poor quality
[Continue for all criteria...]
[Continue for all questions...]
## Examples
### Example 1: Create Balanced Assessment for Functions
**Input**: "Create 6-question assessment for Python functions (beginner level)"
**Process**:
1. Identify objectives: Define functions, use parameters, return values
2. Design variety:
- Q1: MCQ on function syntax (Understand)
- Q2: Trace execution (Understand)
- Q3: Complete function (Apply)
- Q4: Write function from spec (Apply)
- Q5: Debug function (Analyze)
- Q6: Mini-project combining functions (Create)
3. Create MCQ distractors based on common errors
4. Add rubric for Q4, Q6
5. Validate (check 60%+ non-recall)
**Output**: Complete 6-question assessment with answer key and rubrics
---
### Example 2: Add Distractors to Existing MCQ
**Input**: "Improve distractors for this MCQ" [provides question]
**Process**:
1. Load distractor design reference
2. Identify concept being tested
3. List common student misconceptions
4. Create diagnostic distractor for each misconception
5. Document what each distractor tests
**Output**: Improved MCQ with meaningful, plausible distractors and analysis
---
### Example 3: Validate Assessment Balance
**Input**: "Is this assessment balanced?" [provides 10-question exam]
**Process**:
1. Analyze cognitive distribution
2. Check for question variety
3. Validate MCQ distractors
4. Check for rubrics on open-ended
5. Calculate non-recall percentage
6. Provide specific recommendations
**Output**: Validation report with issues and recommendations
## Common Patterns
### Pattern 1: Formative Assessment (Quick Check)
3-5 questions 30% Understand (quick concept check) 50% Apply (demonstrate skill) 20% Analyze (identify errors) 15-20 minutes
### Pattern 2: Summative Assessment (Unit Exam)
10-15 questions 15% Understand 40% Apply 30% Analyze/Evaluate 15% Create (project) 60-90 minutes
### Pattern 3: Skills Demonstration
5-7 questions, mostly code-writing and projects 20% Apply (basic implementations) 40% Analyze (debugging, optimization) 40% Create (design and build)
## Validation Checklist
Before finalizing assessment:
- [ ] 60%+ questions test application or higher-order thinking
- [ ] Multiple question types used (not all MCQ)
- [ ] MCQs have meaningful, diagnostic distractors
- [ ] Open-ended questions have rubrics
- [ ] Questions align with learning objectives
- [ ] Clear instructions for each question
- [ ] Answer key and explanations provided
- [ ] Reasonable time allocation
- [ ] Validation score >80
## References
Supporting documentation (loaded as needed):
- `reference/question-types.md` - MCQ, code-writing, debugging, projects, etc.
- `reference/distractor-design.md` - Misconception-based distractor creation
- `reference/blooms-assessment-alignment.md` - Cognitive levels and question types
- `reference/rubric-guidelines.md` - Analytic rubric creation
## Error Handling
If validation fails:
1. Report specific issues (e.g., "Only 30% non-recall questions", "Missing rubrics")
2. Suggest remediation (e.g., "Replace 3 MCQs with code-writing questions")
3. Halt and require user intervention (hard failure mode)
Assessments must meet quality standards: balanced cognitive distribution, varied types, meaningful distractors, clear rubrics. 同梱ファイル
※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。
- 📄 SKILL.md (17,916 bytes)
- 📎 scripts/validate-assessment.py (7,552 bytes)