WISE Agent Platform MVP + Frontier 4-Pillar Architecture

Shipped WISE Agent Platform MVP — registry, executor, CLI — then restructured agency/ to mirror Frontier's 4-pillar architecture. 59 tests green.

2026-02-06 By Sean Weldon

Atlas Development Log — WISE Agent Platform MVP + Frontier Architecture

Overview

Two-phase session: first built the WISE Agent Platform Phase 1 MVP in agency/ with a YAML-backed agent registry, Claude Agent SDK executor, MCP tool factory, and Typer+Rich CLI. Then immediately restructured the entire directory to mirror OpenAI Frontier's 4-pillar architecture — governance, execution, context, and evaluation. The sow-generator is registered as the first managed agent, proving the Fathom_to_SOW migration path.

1. Objectives

Implement the WISE Agent Platform Phase 1 MVP from the approved plan
Register sow-generator as the first managed agent
Verify CLI commands work end-to-end
Restructure the entire agency/ package to match Frontier's 4-pillar architecture

Success looks like: All tests passing, CLI commands functional (wise-agents list/info/status), and architecture mapping cleanly to Frontier's pillars.

2. Key Developments

Phase 1 — MVP Build:

Created Pydantic v2 identity models (AgentIdentity, AgentPermissions, AgentExecution)
Built YAML-backed AgentRegistry with auto-discovery from config/agents/
Implemented AgentExecutor wrapping Claude Agent SDK's ClaudeSDKClient
Added ToolRegistry factory pattern for MCP server creation
Built 5-command CLI: list, info, run, register, status
25 tests passing

Phase 2 — Frontier Restructure:

Rewrote all code into 4-pillar directory structure
Added PermissionGuard — pre-execution governance checks (active status, tool allowlists, cost limits)
Added AuditLog — automatic audit entries for every executor lifecycle event
Added ExecutionSandbox — per-run artifact directory isolation
Added ContextService — 3-layer context assembly (standards, product, task) with connector protocol
Added FileConnector implementing @runtime_checkable Connector protocol
Added RunLog with per-agent stats (success rate, avg cost, avg duration)
Added Scorer with 3 dimensions: completion, cost efficiency, tool efficiency
Added AgentMemory — per-agent key-value store
Expanded test suite from 25 to 59 tests across 5 files

3. Design Decisions

4-Pillar Directory Structure

Decision: Map agency/ 1:1 to Frontier's architecture — governance/, execution/, context/, evaluation/
Rationale: Mirrors a well-known reference architecture; makes the codebase self-documenting
Alternative considered: Keep the flat registry/runtime/tools layout
Trade-off: More directories but clearer separation of concerns

Tools Nested Under Execution

Decision: Move tools/ from top-level to execution/tools/
Rationale: Tools are an execution concern, not a standalone pillar
Trade-off: Slightly deeper import paths but architecturally correct

PermissionGuard as Separate Class

Decision: Extract permission logic into governance/permissions.py rather than embedding in executor
Rationale: Clean separation — governance defines policy, execution enforces it
Alternative considered: Permission checks inline in executor
Trade-off: Extra module but testable and replaceable independently

Typed AgentContext Model

Decision: Replace context: dict[str, Any] with a Pydantic AgentContext model
Rationale: Typed fields (connectors, system_prompt_path, knowledge_base) enable YAML validation
Alternative considered: Keep generic dict for flexibility
Trade-off: Slightly more rigid but catches config errors at load time

In-Memory Stores for Phase 1

Decision: RunLog, AuditLog, and AgentMemory are all in-memory
Rationale: SQLite persistence is planned for Phase 3; avoids premature complexity
Alternative considered: SQLite from day one
Trade-off: Data lost on restart, but appropriate for MVP validation

4. Challenges & Solutions

setuptools Build Backend Error

Problem: setuptools.backends._legacy:_Backend doesn't exist
Root cause: Incorrect build-backend string in pyproject.toml
Solution: Changed to standard setuptools.build_meta

Package Discovery for Flat Layout

Problem: agency/ is both the Python package AND contains pyproject.toml
Root cause: setuptools include = ["agency*"] looked for agency/agency/
Solution: Explicit package-dir mapping with agency = "." and find = {where = [".."], include = ["agency", "agency.*"]}

Executor Not Catching Missing Agents

Problem: registry.get() raised KeyError outside the try/except block
Root cause: The lookup was before the error-handling scope
Solution: Moved all executor code into a single try/except, returning failed RunResult for missing agents

Phase 2 Import Rewiring

Problem: All imports broke after restructure (e.g., from ..registry.models no longer valid)
Root cause: Moving files to new pillar directories changed all import paths
Solution: Wrote all new files from scratch with correct import paths rather than sed/refactoring

5. Code Changes

File	Change
`agency/pyproject.toml`	Package config with setuptools, package-dir mapping, CLI entry point
`agency/__init__.py`	4-pillar docstring, version
`agency/governance/identity.py`	AgentIdentity, AgentPermissions, AgentExecution, AgentContext models
`agency/governance/registry.py`	YAML-backed agent catalog with auto-discovery
`agency/governance/permissions.py`	PermissionGuard with pre-execution checks
`agency/governance/audit.py`	In-memory audit trail with filtering
`agency/execution/executor.py`	Agent lifecycle with permission + audit integration
`agency/execution/events.py`	Structured event dataclasses
`agency/execution/sandbox.py`	Per-run artifact directory isolation
`agency/execution/tools/registry.py`	MCP server factory registry
`agency/context/service.py`	3-layer context assembly service
`agency/context/connectors/base.py`	Connector protocol + FileConnector
`agency/evaluation/run_log.py`	Run history with per-agent stats
`agency/evaluation/scorer.py`	Multi-dimensional run scoring
`agency/evaluation/memory.py`	Per-agent key-value memory store
`agency/cli.py`	5-command Typer+Rich CLI
`agency/config/agents/sow-generator.yaml`	First registered agent config
`agency/tests/`	59 tests across 5 test files

6. Next Steps

Phase 2: Build Obsidian and Fathom connectors for the context service
Phase 2: Wire context service into executor for automatic prompt enrichment
Phase 3: Add SQLite persistence to RunLog, AuditLog, and AgentMemory
Phase 3: Build evaluation dashboard CLI command
Phase 4: Multi-agent orchestration with YAML workflow definitions
Phase 5: Migrate Fathom_to_SOW fully into the agency runtime

7. Session Notes

OpenAI launched Frontier on Feb 5, 2026 — an enterprise agent platform with employee-like identities, shared context, and scoped permissions. This session was a response: building an internal Anthropic-powered equivalent that maps to the same 4-pillar architecture but runs on Claude Agent SDK instead of OpenAI's closed ecosystem.

The key insight is that Frontier's architecture is sound but its value proposition (identity, governance, context, evaluation) can be replicated with open tools. The sow-generator agent config proves the pattern works — a single YAML file defines who the agent is, what tools it can use, what data it can access, and how much it's allowed to spend.

Architecture mapping:

Governance = Frontier's IAM/identity layer
Execution = Frontier's managed runtime
Context = Frontier's semantic/business context layer
Evaluation = Frontier's metrics and memory system

59 tests in ~10 seconds. All CLI commands verified working. Ready for Phase 2.