🛠️ 開発・MCP コミュニティ

agent-native-architecture

自律的なエージェントが中心となり、ループ内で機能を実現するアプリケーションを構築するSkill。

📜 元の英語説明(参考)

Build applications where agents are first-class citizens. Use this skill when designing autonomous agents, creating MCP tools, implementing self-modifying systems, or building apps where features are outcomes achieved by agents operating in a loop.

🇯🇵 日本人クリエイター向け解説

一言でいうと

自律的なエージェントが中心となり、ループ内で機能を実現するアプリケーションを構築するSkill。

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⬇ このSkillをダウンロード(.skill) 元のソースを見る ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-17
取得日時: 2026-05-17
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

なぜ今なのか

ソフトウェアエージェントは今や信頼性高く動作します。Claude Codeは、bashとファイルツールにアクセスできるLLMが、目的が達成されるまでループで動作することで、複雑な多段階タスクを自律的に実行できることを実証しました。

驚くべき発見は、本当に優れたコーディングエージェントは、実は本当に優れた汎用エージェントであるということです。Claude Codeがコードベースをリファクタリングできるのと同じアーキテクチャで、エージェントはファイルを整理したり、読書リストを管理したり、ワークフローを自動化したりできます。

Claude Code SDKはこれをアクセス可能にします。機能があなたが書くコードではなく、エージェントがツールを使って達成する、結果として記述されるアプリケーションを構築できます。エージェントは、その結果が達成されるまでループで動作します。

これは新しい分野を開拓します。Claude Codeが動作するのと同じ方法で動作するソフトウェアが、コーディングをはるかに超えるカテゴリに適用されるのです。

コア原則

1. パリティ

ユーザーがUIを通じてできることは何でも、エージェントはツールを通じて達成できるべきです。

これが基本的な原則です。これがなければ、他の何も意味がありません。

メモの作成、整理、タグ付けのための美しいインターフェースを備えたメモアプリを構築したと想像してください。ユーザーがエージェントに「会議を要約したメモを作成し、緊急としてタグ付けしてください」と尋ねます。

メモを作成するためのUIは構築したが、同じことを行うエージェント機能がない場合、エージェントは行き詰まります。謝罪したり、明確化の質問をしたりするかもしれませんが、助けることはできません。人間がインターフェースを使用すれば簡単なアクションであるにもかかわらずです。

解決策: エージェントが、UIができることは何でも達成できるツール（またはツールの組み合わせ）を持っていることを確認します。

これは、UIボタンとツールを1対1でマッピングすることではありません。エージェントが同じ結果を達成できることを保証することです。時にはそれが単一のツール（create_note）であることもあります。時には、プリミティブを組み合わせることもあります（適切なフォーマットでメモディレクトリにwrite_file）。

規律: UI機能を追加する際には常に、「エージェントはこの結果を達成できるか？」と問いかけます。できない場合は、必要なツールまたはプリミティブを追加します。

機能マップが役立ちます。

ユーザーアクション	エージェントがそれを達成する方法
メモを作成する	`write_file`をメモディレクトリに、または`create_note`ツール
メモを緊急としてタグ付けする	`update_file`メタデータ、または`tag_note`ツール
メモを検索する	`search_files`または`search_notes`ツール
メモを削除する	`delete_file`または`delete_note`ツール

テスト: UIでユーザーが実行できる任意のアクションを選択します。それをエージェントに説明します。エージェントはその結果を達成できますか？

2. 粒度

アトミックなプリミティブを優先します。機能は、エージェントがループで動作することで達成される結果です。

ツールはプリミティブな機能です。ファイルを読み込む、ファイルを書き込む、bashコマンドを実行する、レコードを保存する、通知を送信する、などです。

機能は、あなたが書く関数ではありません。それはプロンプトで記述する結果であり、ツールを持ち、結果が達成されるまでループで動作するエージェントによって達成されます。

粒度が低い（エージェントを制限する）:

Tool: classify_and_organize_files(files)
→ あなたが決定ロジックを書いた
→ エージェントがあなたのコードを実行する
→ 動作を変更するには、リファクタリングする

粒度が高い（エージェントを強化する）:

Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "ユーザーのダウンロードフォルダを整理してください。各ファイルを分析し、
        内容と新しさに基づいて適切な場所を決定し、
        そこに移動してください。"
Agent: ループで動作する—ファイルを読み込み、判断し、移動し、
       結果を確認する—フォルダが整理されるまで。
→ エージェントが決定を下す
→ 動作を変更するには、プロンプトを編集する

重要な変化: エージェントは判断を伴う結果を追求しており、振り付けされたシーケンスを実行しているわけではありません。予期しないファイルタイプに遭遇したり、アプローチを調整したり、明確化の質問をしたりするかもしれません。ループは結果が達成されるまで続きます。

ツールがアトミックであればあるほど、エージェントはそれらをより柔軟に利用できます。決定ロジックをツールにバンドルすると、判断を再びコードに戻してしまいます。

テスト: 機能の動作を変更するために、散文を編集しますか、それともコードをリファクタリングしますか？

3. 構成可能性

アトミックなツールとパリティがあれば、新しいプロンプトを書くだけで新しい機能を作成できます。

これは最初の2つの原則の成果です。ツールがアトミックで、エージェントがユーザーができることは何でもできる場合、新しい機能は単に新しいプロンプトになります。

アクティビティを要約し、優先順位を提案する「週次レビュー」機能が必要ですか？それはプロンプトです。

"今週変更されたファイルをレビューしてください。主要な変更点を要約し、
未完了の項目と迫る締め切りに基づいて、来週の3つの優先事項を提案してください。"

エージェントはlist_files、read_file、およびその判断を使用してこれを達成します。あなたは週次レビューのコードを書きませんでした。あなたは結果を記述し、エージェントはそれが達成されるまでループで動作します。

これは開発者とユーザーの両方に機能します。 プロンプトを追加することで新しい機能をリリースできます。ユーザーはプロンプトを変更したり、独自のプロンプトを作成したりすることで動作をカスタマイズできます。「『これをファイルする』と言ったら、常に私のActionフォルダに移動し、緊急としてタグ付けする」は、アプリケーションを拡張するユーザーレベルのプロンプトになります。

制約: これは、ツールが予期しない方法で構成できるほどアトミックであり、エージェントがユーザーとパリティを持っている場合にのみ機能します。ツールがあまりにも多くのロジックをエンコードしている場合、またはエージェントが主要な機能にアクセスできない場合、構成は破綻します。

テスト: 新しいコードを追加せずに、新しいプロンプトセクションを書くことで新しい機能を追加できますか？

4. 創発的機能

エージェントは、あなたが明示的に設計しなかったことを達成できます。

ツールがアトミックで、パリティが維持され、プロンプトが構成可能である場合、ユーザーはエージェントにあなたが決して予期しなかったことを尋ねるでしょう。そして多くの場合、エージェントはそれを解決できます。

「会議のメモとタスクリストを相互参照して、私がコミットしたがまだスケジュールしていないことを教えてください。」

あなたは「コミットメントトラッカー」機能を構築しませんでした。しかし、エージェントがメモを読み、タスクを読み、それらについて推論できる場合—答えが得られるまでループで動作する—これを達成できます。

これはr

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

<why_now>

Why Now

Software agents work reliably now. Claude Code demonstrated that an LLM with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.

The surprising discovery: a really good coding agent is actually a really good general-purpose agent. The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.

The Claude Code SDK makes this accessible. You can build applications where features aren't code you write—they're outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.

This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding. </why_now>

<core_principles>

Core Principles

1. Parity

Whatever the user can do through the UI, the agent should be able to achieve through tools.

This is the foundational principle. Without it, nothing else matters.

Imagine you build a notes app with a beautiful interface for creating, organizing, and tagging notes. A user asks the agent: "Create a note summarizing my meeting and tag it as urgent."

If you built UI for creating notes but no agent capability to do the same, the agent is stuck. It might apologize or ask clarifying questions, but it can't help—even though the action is trivial for a human using the interface.

The fix: Ensure the agent has tools (or combinations of tools) that can accomplish anything the UI can do.

This isn't about creating a 1:1 mapping of UI buttons to tools. It's about ensuring the agent can achieve the same outcomes. Sometimes that's a single tool (create_note). Sometimes it's composing primitives (write_file to a notes directory with proper formatting).

The discipline: When adding any UI capability, ask: can the agent achieve this outcome? If not, add the necessary tools or primitives.

A capability map helps:

User Action	How Agent Achieves It
Create a note	`write_file` to notes directory, or `create_note` tool
Tag a note as urgent	`update_file` metadata, or `tag_note` tool
Search notes	`search_files` or `search_notes` tool
Delete a note	`delete_file` or `delete_note` tool

The test: Pick any action a user can take in your UI. Describe it to the agent. Can it accomplish the outcome?

2. Granularity

Prefer atomic primitives. Features are outcomes achieved by an agent operating in a loop.

A tool is a primitive capability: read a file, write a file, run a bash command, store a record, send a notification.

A feature is not a function you write. It's an outcome you describe in a prompt, achieved by an agent that has tools and operates in a loop until the outcome is reached.

Less granular (limits the agent):

Tool: classify_and_organize_files(files)
→ You wrote the decision logic
→ Agent executes your code
→ To change behavior, you refactor

More granular (empowers the agent):

Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "Organize the user's downloads folder. Analyze each file,
        determine appropriate locations based on content and recency,
        and move them there."
Agent: Operates in a loop—reads files, makes judgments, moves things,
       checks results—until the folder is organized.
→ Agent makes the decisions
→ To change behavior, you edit the prompt

The key shift: The agent is pursuing an outcome with judgment, not executing a choreographed sequence. It might encounter unexpected file types, adjust its approach, or ask clarifying questions. The loop continues until the outcome is achieved.

The more atomic your tools, the more flexibly the agent can use them. If you bundle decision logic into tools, you've moved judgment back into code.

The test: To change how a feature behaves, do you edit prose or refactor code?

3. Composability

With atomic tools and parity, you can create new features just by writing new prompts.

This is the payoff of the first two principles. When your tools are atomic and the agent can do anything users can do, new features are just new prompts.

Want a "weekly review" feature that summarizes activity and suggests priorities? That's a prompt:

"Review files modified this week. Summarize key changes. Based on
incomplete items and approaching deadlines, suggest three priorities
for next week."

The agent uses list_files, read_file, and its judgment to accomplish this. You didn't write weekly-review code. You described an outcome, and the agent operates in a loop until it's achieved.

This works for developers and users. You can ship new features by adding prompts. Users can customize behavior by modifying prompts or creating their own. "When I say 'file this,' always move it to my Action folder and tag it urgent" becomes a user-level prompt that extends the application.

The constraint: This only works if tools are atomic enough to be composed in ways you didn't anticipate, and if the agent has parity with users. If tools encode too much logic, or the agent can't access key capabilities, composition breaks down.

The test: Can you add a new feature by writing a new prompt section, without adding new code?

4. Emergent Capability

The agent can accomplish things you didn't explicitly design for.

When tools are atomic, parity is maintained, and prompts are composable, users will ask the agent for things you never anticipated. And often, the agent can figure it out.

"Cross-reference my meeting notes with my task list and tell me what I've committed to but haven't scheduled."

You didn't build a "commitment tracker" feature. But if the agent can read notes, read tasks, and reason about them—operating in a loop until it has an answer—it can accomplish this.

This reveals latent demand. Instead of guessing what features users want, you observe what they're asking the agent to do. When patterns emerge, you can optimize them with domain-specific tools or dedicated prompts. But you didn't have to anticipate them—you discovered them.

The flywheel:

Build with atomic tools and parity
Users ask for things you didn't anticipate
Agent composes tools to accomplish them (or fails, revealing a gap)
You observe patterns in what's being requested
Add domain tools or prompts to make common patterns efficient
Repeat

This changes how you build products. You're not trying to imagine every feature upfront. You're creating a capable foundation and learning from what emerges.

The test: Give the agent an open-ended request relevant to your domain. Can it figure out a reasonable approach, operating in a loop until it succeeds? If it just says "I don't have a feature for that," your architecture is too constrained.

5. Improvement Over Time

Agent-native applications get better through accumulated context and prompt refinement.

Unlike traditional software, agent-native applications can improve without shipping code:

Accumulated context: The agent can maintain state across sessions—what exists, what the user has done, what worked, what didn't. A context.md file the agent reads and updates is layer one. More sophisticated approaches involve structured memory and learned preferences.

Prompt refinement at multiple levels:

Developer level: You ship updated prompts that change agent behavior for all users
User level: Users customize prompts for their workflow
Agent level: The agent modifies its own prompts based on feedback (advanced)

Self-modification (advanced): Agents that can edit their own prompts or even their own code. For production use cases, consider adding safety rails—approval gates, automatic checkpoints for rollback, health checks. This is where things are heading.

The improvement mechanisms are still being discovered. Context and prompt refinement are proven. Self-modification is emerging. What's clear: the architecture supports getting better in ways traditional software doesn't.

The test: Does the application work better after a month of use than on day one, even without code changes? </core_principles>

What aspect of agent-native architecture do you need help with?

Design architecture - Plan a new agent-native system from scratch
Files & workspace - Use files as the universal interface, shared workspace patterns
Tool design - Build primitive tools, dynamic capability discovery, CRUD completeness
Domain tools - Know when to add domain tools vs stay with primitives
Execution patterns - Completion signals, partial completion, context limits
System prompts - Define agent behavior in prompts, judgment criteria
Context injection - Inject runtime app state into agent prompts
Action parity - Ensure agents can do everything users can do
Self-modification - Enable agents to safely evolve themselves
Product design - Progressive disclosure, latent demand, approval patterns
Mobile patterns - iOS storage, background execution, checkpoint/resume
Testing - Test agent-native apps for capability and parity
Refactoring - Make existing code more agent-native

Wait for response before proceeding. </intake>

<routing> | Response | Action | |----------|--------| | 1, "design", "architecture", "plan" | Read architecture-patterns.md, then apply Architecture Checklist below | | 2, "files", "workspace", "filesystem" | Read files-universal-interface.md and shared-workspace-architecture.md | | 3, "tool", "mcp", "primitive", "crud" | Read mcp-tool-design.md | | 4, "domain tool", "when to add" | Read from-primitives-to-domain-tools.md | | 5, "execution", "completion", "loop" | Read agent-execution-patterns.md | | 6, "prompt", "system prompt", "behavior" | Read system-prompt-design.md | | 7, "context", "inject", "runtime", "dynamic" | Read dynamic-context-injection.md | | 8, "parity", "ui action", "capability map" | Read action-parity-discipline.md | | 9, "self-modify", "evolve", "git" | Read self-modification.md | | 10, "product", "progressive", "approval", "latent demand" | Read product-implications.md | | 11, "mobile", "ios", "android", "background", "checkpoint" | Read mobile-patterns.md | | 12, "test", "testing", "verify", "validate" | Read agent-native-testing.md | | 13, "review", "refactor", "existing" | Read refactoring-to-prompt-native.md |

After reading the reference, apply those patterns to the user's specific context. </routing>

<architecture_checklist>

Architecture Review Checklist

When designing an agent-native system, verify these before implementation:

Core Principles

[ ] Parity: Every UI action has a corresponding agent capability
[ ] Granularity: Tools are primitives; features are prompt-defined outcomes
[ ] Composability: New features can be added via prompts alone
[ ] Emergent Capability: Agent can handle open-ended requests in your domain

Tool Design

[ ] Dynamic vs Static: For external APIs where agent should have full access, use Dynamic Capability Discovery
[ ] CRUD Completeness: Every entity has create, read, update, AND delete
[ ] Primitives not Workflows: Tools enable capability, don't encode business logic
[ ] API as Validator: Use z.string() inputs when the API validates, not z.enum()

Files & Workspace

[ ] Shared Workspace: Agent and user work in same data space
[ ] context.md Pattern: Agent reads/updates context file for accumulated knowledge
[ ] File Organization: Entity-scoped directories with consistent naming

Agent Execution

[ ] Completion Signals: Agent has explicit complete_task tool (not heuristic detection)
[ ] Partial Completion: Multi-step tasks track progress for resume
[ ] Context Limits: Designed for bounded context from the start

Context Injection

[ ] Available Resources: System prompt includes what exists (files, data, types)
[ ] Available Capabilities: System prompt documents tools with user vocabulary
[ ] Dynamic Context: Context refreshes for long sessions (or provide refresh_context tool)

UI Integration

[ ] Agent → UI: Agent changes reflect in UI (shared service, file watching, or event bus)
[ ] No Silent Actions: Agent writes trigger UI updates immediately
[ ] Capability Discovery: Users can learn what agent can do

Mobile (if applicable)

[ ] Checkpoint/Resume: Handle iOS app suspension gracefully
[ ] iCloud Storage: iCloud-first with local fallback for multi-device sync
[ ] Cost Awareness: Model tier selection (Haiku/Sonnet/Opus)

When designing architecture, explicitly address each checkbox in your plan. </architecture_checklist>

<quick_start>

Quick Start: Build an Agent-Native Feature

Step 1: Define atomic tools

const tools = [
  tool("read_file", "Read any file", { path: z.string() }, ...),
  tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
  tool("list_files", "List directory", { path: z.string() }, ...),
  tool("complete_task", "Signal task completion", { summary: z.string() }, ...),
];

Step 2: Write behavior in the system prompt

## Your Responsibilities
When asked to organize content, you should:
1. Read existing files to understand the structure
2. Analyze what organization makes sense
3. Create/move files using your tools
4. Use your judgment about layout and formatting
5. Call complete_task when you're done

You decide the structure. Make it good.

Step 3: Let the agent work in a loop

const result = await agent.run({
  prompt: userMessage,
  tools: tools,
  systemPrompt: systemPrompt,
  // Agent loops until it calls complete_task
});

</quick_start>

<reference_index>

Reference Files

All references in references/:

Core Patterns:

architecture-patterns.md - Event-driven, unified orchestrator, agent-to-UI
files-universal-interface.md - Why files, organization patterns, context.md
mcp-tool-design.md - Tool design, dynamic capability discovery, CRUD
from-primitives-to-domain-tools.md - When to add domain tools, graduating to code
agent-execution-patterns.md - Completion signals, partial completion, context limits
system-prompt-design.md - Features as prompts, judgment criteria

Agent-Native Disciplines:

dynamic-context-injection.md - Runtime context, what to inject
action-parity-discipline.md - Capability mapping, parity workflow
shared-workspace-architecture.md - Shared data space, UI integration
product-implications.md - Progressive disclosure, latent demand, approval
agent-native-testing.md - Testing outcomes, parity tests

Platform-Specific:

mobile-patterns.md - iOS storage, checkpoint/resume, cost awareness
self-modification.md - Git-based evolution, guardrails
refactoring-to-prompt-native.md - Migrating existing code </reference_index>

<anti_patterns>

Anti-Patterns

Common Approaches That Aren't Fully Agent-Native

These aren't necessarily wrong—they may be appropriate for your use case. But they're worth recognizing as different from the architecture this document describes.

Agent as router — The agent figures out what the user wants, then calls the right function. The agent's intelligence is used to route, not to act. This can work, but you're using a fraction of what agents can do.

Build the app, then add agent — You build features the traditional way (as code), then expose them to an agent. The agent can only do what your features already do. You won't get emergent capability.

Request/response thinking — Agent gets input, does one thing, returns output. This misses the loop: agent gets an outcome to achieve, operates until it's done, handles unexpected situations along the way.

Defensive tool design — You over-constrain tool inputs because you're used to defensive programming. Strict enums, validation at every layer. This is safe, but it prevents the agent from doing things you didn't anticipate.

Happy path in code, agent just executes — Traditional software handles edge cases in code—you write the logic for what happens when X goes wrong. Agent-native lets the agent handle edge cases with judgment. If your code handles all the edge cases, the agent is just a caller.

Specific Anti-Patterns

THE CARDINAL SIN: Agent executes your code instead of figuring things out

// WRONG - You wrote the workflow, agent just executes it
tool("process_feedback", async ({ message }) => {
  const category = categorize(message);      // Your code decides
  const priority = calculatePriority(message); // Your code decides
  await store(message, category, priority);   // Your code orchestrates
  if (priority > 3) await notify();           // Your code decides
});

// RIGHT - Agent figures out how to process feedback
tools: store_item, send_message  // Primitives
prompt: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"

Workflow-shaped tools — analyze_and_organize bundles judgment into the tool. Break it into primitives and let the agent compose them.

Context starvation — Agent doesn't know what resources exist in the app.

User: "Write something about Catherine the Great in my feed"
Agent: "What feed? I don't understand what system you're referring to."

Fix: Inject available resources, capabilities, and vocabulary into system prompt.

Orphan UI actions — User can do something through the UI that the agent can't achieve. Fix: maintain parity.

Silent actions — Agent changes state but UI doesn't update. Fix: Use shared data stores with reactive binding, or file system observation.

Heuristic completion detection — Detecting agent completion through heuristics (consecutive iterations without tool calls, checking for expected output files). This is fragile. Fix: Require agents to explicitly signal completion through a complete_task tool.

Static tool mapping for dynamic APIs — Building 50 tools for 50 API endpoints when a discover + access pattern would give more flexibility.

// WRONG - Every API type needs a hardcoded tool
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// When glucose tracking is added... code change required

// RIGHT - Dynamic capability discovery
tool("list_available_types", ...)  // Discover what's available
tool("read_health_data", { dataType: z.string() }, ...)  // Access any type

Incomplete CRUD — Agent can create but not update or delete.

// User: "Delete that journal entry"
// Agent: "I don't have a tool for that"
tool("create_journal_entry", ...)  // Missing: update, delete

Fix: Every entity needs full CRUD.

Sandbox isolation — Agent works in separate data space from user.

Documents/
├── user_files/        ← User's space
└── agent_output/      ← Agent's space (isolated)

Fix: Use shared workspace where both operate on same files.

Gates without reason — Domain tool is the only way to do something, and you didn't intend to restrict access. The default is open. Keep primitives available unless there's a specific reason to gate.

Artificial capability limits — Restricting what the agent can do out of vague safety concerns rather than specific risks. Be thoughtful about restricting capabilities. The agent should generally be able to do what users can do. </anti_patterns>

<success_criteria>

Success Criteria

You've built an agent-native application when:

Architecture

[ ] The agent can achieve anything users can achieve through the UI (parity)
[ ] Tools are atomic primitives; domain tools are shortcuts, not gates (granularity)
[ ] New features can be added by writing new prompts (composability)
[ ] The agent can accomplish tasks you didn't explicitly design for (emergent capability)
[ ] Changing behavior means editing prompts, not refactoring code

Implementation

[ ] System prompt includes dynamic context about app state
[ ] Every UI action has a corresponding agent tool (action parity)
[ ] Agent tools are documented in system prompt with user vocabulary
[ ] Agent and user work in the same data space (shared workspace)
[ ] Agent actions are immediately reflected in the UI
[ ] Every entity has full CRUD (Create, Read, Update, Delete)
[ ] Agents explicitly signal completion (no heuristic detection)
[ ] context.md or equivalent for accumulated knowledge

Product

[ ] Simple requests work immediately with no learning curve
[ ] Power users can push the system in unexpected directions
[ ] You're learning what users want by observing what they ask the agent to do
[ ] Approval requirements match stakes and reversibility

Mobile (if applicable)

[ ] Checkpoint/resume handles app interruption
[ ] iCloud-first storage with local fallback
[ ] Background execution uses available time wisely
[ ] Model tier matched to task complexity

The Ultimate Test

Describe an outcome to the agent that's within your application's domain but that you didn't build a specific feature for.

Can it figure out how to accomplish it, operating in a loop until it succeeds?

If yes, you've built something agent-native.

If it says "I don't have a feature for that"—your architecture is still too constrained. </success_criteria>