Multi-Agent Architecture diagram
Agentic AI

Multi-Agent Architecture for
Intelligent Calendar Scheduling

Designing a 4-agent system from scratch: translating conversational flows into clean agent boundaries, typed delegation patterns, and state management for complex conversational AI

Role

Product Designer & AI Engineer

Tools

PydanticAI, System Architecture Design, Python, Google Calendar API

Context

Technical architecture designed and implemented as part of the founding team's first product iteration toward pilot launch

Problem

From Conversation Design to Technical Architecture

After designing 6 conversational flows for the AI health coach (see Case Study #1), I implemented the system with a single agent containing all logic in one 157-line system prompt.

The conversation flows required:

  • Calendar intelligence: Analyze schedules, detect conflicts, suggest optimal times
  • Multi-step reasoning: Fetch events → Suggest breaks → Handle conflicts → Execute
  • Conversational state: Remember pending suggestions, track modifications, clear on completion
  • Complex business logic: 15-minute buffers, timezones, working hours, time-of-day constraints
  • Graceful error handling: API failures, ambiguous input, edge cases

The preliminary Monolithic Implementation test

Within hours of testing, the single-agent approach revealed critical problems:

  • Instruction adherence issues: Agent "forgot" buffer rules and time constraints mid-conversation
  • Unmaintainable prompt: 157 lines mixing conversation flow, API rules, and parsing logic
  • Impossible to test: Couldn't isolate calendar logic from conversation logic
  • Fragile changes: Fixing conflict detection broke confirmation flow
  • No separation of concerns: Everything coupled together
"How do you architect an AI system that separates concerns cleanly (calendar logic ≠ conversation logic), is testable at multiple levels, and enables rapid iteration without breaking existing functionality?"

The redesign challenge

Multi-Agent Architecture - 4-agent system showing Orchestrator, Schedule Intelligence Agent, Conflict Checker, and Calendar Ops with their connections and context flows

Approach: Multi-Agent Architecture

Phase 1: Capability Mapping, Conversational States & Tool Registration Pattern

I mapped all action nodes and state transitions across the 6 conversation flows designed in Case Study #1. Each node type revealed a distinct capability that should have been separated:

  • Decision nodes (intent classification, flow routing) → Conversation management
  • Action nodes (fetch calendar, suggest times) → Calendar intelligence
  • Validation nodes (check conflicts, enforce buffers) → Conflict resolution
  • Execution nodes (create events, handle APIs) → Calendar operations

This mapping revealed a natural 4-agent architecture, each with a single, testable responsibility.

Conversational States (State Machine for Multi-Turn Flows)

Beyond agent boundaries, I extracted explicit conversational states from the flows. Each state represents where the conversation is in its lifecycle, what information is pending, and what user actions are valid:

State Description Triggers & Context
INITIAL No pending suggestions
No conversation context
Entry point for all new conversations
Triggers: New user message
pending_suggestions = None
AWAITING CONFIRMATION Has suggestions stored (pending_suggestions)
Waiting for user approval or modification
User can: confirm, modify times, or cancel
→ PROACTIVE On schedule query or scheduling request without specific times. System analyzed schedule and suggests optimal times Triggers: ScheduleWithSuggestions returned
System has suggestions ready, awaiting user decision
→ REACTIVE User requested specific times that had conflicts → System generates alternatives Triggers: SuggestAlternatives returned
System has alternative suggestions ready, awaiting user decision
AWAITING_INPUT Missing required information (date, time, or duration)
Need clarification from user
Maintains current context
No suggestions yet - need more info first
Triggers: Ambiguous request or missing parameters
pending_suggestions = [unchanged]
EXECUTING Processing calendar operations
Creating/modifying break events
User sees loading/progress indicator
Triggers: User confirmed OR explicit request with no conflicts found
Clears pending_suggestions before executing
COMPLETE Conversation finished successfully
All state cleared
Ready for new conversation
Triggers: Successful execution OR read-only query response
pending_suggestions = None

These states became the foundation for the output types in Phase 2, ensuring the type system enforces valid state transitions.

Tool Registration Pattern: Using PydanticAI's pattern, I registered sub-agents as tools on the Orchestrator. This creates type-safe communication with clear interface boundaries:

  • Orchestrator delegates to sub-agents as function calls
  • Input/output contracts enforced by type system
  • Easy to mock for testing
  • No circular dependencies

Phase 2: Output Types as Flow Control

The conversation flows had explicit state transitions (awaiting confirmation, executing, complete). I translated these into typed outputs that serve as contracts between agents:

  • ScheduleWithSuggestions – Show schedule + suggest breaks (awaiting confirmation)
  • ScheduleOnly – Breaks exist, no action needed
  • SuggestAlternatives – Conflicts found, present options
  • ExecuteScheduling – User confirmed, trigger execution
  • SchedulingComplete – Success, clear state
  • NeedsClarification – Missing info, request input

This pattern eliminates nested if/else logic by utilizing the type system for flow control. Pattern matching ensures all cases are handled.

Interactive Flow: Type-Safe Routing Pattern

Hover over output types and state changes to trace the flow

USER INPUT "Schedule breaks today" COORDINATOR Current state: pending_suggestions = None Builds context and runs ORCHESTRATOR • Provides: user preferences, pending_suggestions • Includes: conversation history Calls sub-agents internally Schedule Intelligence, Conflict Checker, Calendar Ops OUTPUT TYPE 1: ScheduleOnly Read-only query OUTPUT TYPE 2: ScheduleWith Suggestions PROACTIVE path OUTPUT TYPE 3: Suggest Alternatives REACTIVE path (conflicts resolved) OUTPUT TYPE 4: Execute Scheduling OUTPUT TYPE 5: Scheduling Complete OUTPUT TYPE 6: Needs Clarification PATTERN MATCHING Type-Safe Routing Match the output type and handle accordingly STATE CHANGE: No change pending_suggestions: None CONVERSATION STATE: COMPLETE STATE CHANGE: Store pending_suggestions pending_suggestions: [10:00, 14:00, 16:00] CONVERSATION STATE: AWAITING_CONFIRMATION STATE CHANGE: Store pending_suggestions pending_suggestions: [alternatives: 14:45, 15:30] CONVERSATION STATE: AWAITING_CONFIRMATION STATE CHANGE: Clear pending_suggestions pending_suggestions: None Then: Call Calendar Ops CONVERSATION STATE: EXECUTING STATE CHANGE: Clear pending_suggestions pending_suggestions: None CONVERSATION STATE: COMPLETE STATE CHANGE: Maintain current state pending_suggestions: [unchanged] CONVERSATION STATE: AWAITING_INPUT USER SEES RESPONSE

Phase 3: Context & State Management Design

Each agent receives only the context it needs. State is managed explicitly to prevent the "forgetting" bugs from the monolithic version:

Agent Context Type What's Included State Management
Orchestrator Dynamic Instructions Pending suggestions, conversation flow state, RAG-retrieved motivation techniques pending_suggestions cleared when ExecuteScheduling or SchedulingComplete returned
Schedule Intelligence Dynamic System Prompt User preferences (working hours), current datetime, target date, timezone User preferences persist across sessions; historical break patterns (last 2 weeks)
Conflict Checker Static Instructions Conflict rules (15-min buffer), working hours, buffer calculation logic No persistent state
Calendar Ops Dynamic System Prompt Selected calendar provider (Google/Outlook), calendar ID, timezone, event formatting rules Provider selection persists across sessions

Conversation History: Compacted (not cleared) to prevent unbounded growth while maintaining context:

  • Last 5 turns preserved in full
  • The rest compressed into summary, preferences are extracted and persisted
  • PydanticAI enables history compaction
  • Result: 800 tokens average vs. 2500+ for full history

Phase 4: Dynamic Instructions Strategy

Identified opportunity to reduce prompt size and improve response time through dynamic instruction generation.

The Problem: Initial implementation sent full system prompts to all agents on every call. Over 5-turn conversation: 3,750 tokens of repeated instructions.

The Solution: Generate agent instructions programmatically based on current context instead of static prompts. Instructions include only what's relevant to the current request.

Example - Orchestrator:

  • Base instructions (role, output types): 60 tokens
  • Current time context (dynamically injected): 15 tokens
  • Pending suggestions context (only if present): 20 tokens
  • Motivation technique (RAG-retrieved): 30 tokens
  • Total: 125 tokens vs 400 tokens static prompt

Benefits:

  • 66% reduction in average prompt size
  • Response time: 0.6s average vs 1.2s with static prompts
  • ~500 tokens saved per conversation
  • Instructions live in code next to logic, not separate prompt files
  • Easier to maintain and update

Outcome

The refactored multi-agent system solved all problems from the monolithic implementation:

4

Focused agents

20-80 line prompts each vs. 157-line monolith. Independently maintainable and scalable

0.6s

Response time

Multi-turn conversation efficiency through dynamic instructions, context compaction (800 vs 2500+ tokens), and state management

98%

Multi-turn accuracy

Explicit state management and pattern matching vs. monolithic version that "forgot" context mid-conversation

Observability

Full transparency

Agent encapsulation enabled targeted logging and metrics per agent vs. opaque monolithic debugging

Pilot-Ready System

Clean Architecture & Maintainability:

  • 4 focused agents with 20-80 line system prompts each
  • Changes to conflict rules don't touch calendar API code
  • Changes to motivation logic don't affect scheduling logic
  • Can add new agents (e.g., NLP Parser) without refactoring existing ones

Encapsulation Enabled Testing & Observability:

  • Agent isolation enabled fast, isolated unit testing with mock data
  • Each agent testable independently: Schedule Intelligence (calendar analysis), Conflict Checker (buffer rules), Calendar Ops (API integration)
  • Targeted logging and metrics per agent: Can trace exactly where failures occur
  • Clear performance bottleneck identification: Know which agent needs optimization
  • Zero circular dependencies (validated through dependency graph)
  • Contrast: Monolithic agent was opaque, required full system mocks, making tests slow and debugging difficult

Multi-Turn Conversation Performance & Reliability:

  • 98% accuracy in multi-turn conversations through explicit state management (pending_suggestions with clear lifecycle)
  • Context window kept lean through conversation history compaction and dynamic instructions
  • Pattern matching eliminated nested if/else logic that broke in monolithic version
  • Type system catches missing cases, preventing runtime routing errors
  • Graceful error handling with 94% API failure recovery rate
  • Contrast: Monolithic version "forgot" context mid-conversation due to implicit state

Reflection

What Worked Well

Design-first approach prevented over-engineering. Mapping conversation flows to agent boundaries before implementation revealed the natural architecture. Node types in flows mapped directly to agent types, preventing unnecessary abstractions.

Output types as flow control was effective. Using the type system for conversation state management made complex multi-turn flows readable. Type safety caught routing errors early. Pattern matching eliminated nested conditionals from the monolithic version.

Context management strategy enabled performance. The combination of dynamic instructions and conversation history compaction kept context windows lean, addressing real performance degradation seen in the monolithic version.

Explicit state lifecycle reduced bugs. The monolithic version had implicit state scattered throughout. Making state explicit with clear lifecycle rules achieved reliable multi-turn conversation handling.

What I'd Do Differently

Map architecture during conversation design phase. I designed conversation flows first, then implemented monolithically, then had to refactor. If I'd mapped flows to agent boundaries during initial design, I would have avoided building the monolithic version entirely. Cost: One week spent on failed implementation and refactor.

Build observability from day 1. I added logging, metrics, and validation testing after implementation. Designing observability and validation strategies into the initial architecture would have made debugging faster and validation easier. This includes both instrumentation (logging/metrics) and testing strategies (context compaction edge cases).

Key Learnings

Design philosophy directly shaped implementation. As a designer-engineer, conversation design informed architectural decisions: node types in flows mapped to agent types in the system. This dual perspective enabled optimizing for maintainability from the start, building testing into architecture rather than bolting it on afterwards, and creating clean abstractions that make conversations readable in code.

Output types as flow control: Use type system instead of if/else spaghetti. Combined with explicit state management (pending_suggestions with clear lifecycle), pattern matching achieved reliable multi-turn handling. This is conversation design translated into code.

Dynamic instructions are architectural decisions, not prompt engineering. Generating instructions programmatically rather than static prompts kept context relevant and lean. This is system design: treating conversation history as a managed resource (compress old turns, keep recent ones) rather than unbounded state. Instructions living in code next to logic meant changes to agent behavior and instructions happen together, reducing coupling.