Cost Optimization Guide¶

Version: 1.0 Last Updated: 2025-10-21 Target Audience: Developers and teams using Contextune for parallel development

Table of Contents¶

Three-Tier Architecture Explained
Cost Breakdown (Sonnet vs Haiku)
ROI Calculations
When to Optimize for Cost vs Speed
Measuring Actual Costs
Best Practices for Cost Efficiency
Real-World Savings Examples

Three-Tier Architecture Explained¶

The Problem We're Solving¶

Before Optimization (All Sonnet):

Every operation uses Sonnet 4.5
├─ Planning: Sonnet ($0.03)
├─ Guidance: Sonnet ($0.02)
├─ Execution #1: Sonnet ($0.27)
├─ Execution #2: Sonnet ($0.27)
├─ Execution #3: Sonnet ($0.27)
└─ Review: Sonnet ($0.02)

Total: $0.88 per workflow

The insight: Most of the work is repetitive execution that doesn't need Sonnet's advanced reasoning.

The Solution: Three-Tier Intelligence¶

┌─────────────────────────────────────────────┐
│ Tier 1: SKILLS (Educational)               │
│ Model: Sonnet 4.5 (in main conversation)   │
│ Purpose: Teaching, guidance, recommendations│
│ Cost: Minimal (educational value)           │
│ Example: parallel-development-expert        │
│          explains when to parallelize       │
└─────────────────────────────────────────────┘
              ↓ (user decides)
┌─────────────────────────────────────────────┐
│ Tier 2: ORCHESTRATION (Planning)           │
│ Model: Sonnet 4.5 (main agent)             │
│ Purpose: Complex reasoning, planning        │
│ Cost: ~$0.03-0.05 per workflow              │
│ Example: Analyzes tasks, validates          │
│          independence, creates plan         │
└─────────────────────────────────────────────┘
              ↓ (delegates)
┌─────────────────────────────────────────────┐
│ Tier 3: EXECUTION (Doing)                  │
│ Model: Haiku 4.5 (isolated agents)         │
│ Purpose: Repetitive, well-defined tasks    │
│ Cost: ~$0.04 per agent                      │
│ Example: Creates issue, worktree, executes │
│          tests, pushes changes              │
└─────────────────────────────────────────────┘

Total: $0.15-0.20 per workflow
Savings: 80% vs all-Sonnet

Why This Works¶

Tier 1 (Skills) - Educational Value

Skills run in the main conversation but provide autonomous guidance:

User: "I have three tasks to work on"
↓
Skill (parallel-development-expert) automatically activates:
├─ Analyzes if tasks are independent
├─ Recommends parallelization
├─ Quantifies potential savings
└─ Explains the approach

Cost: $0.01 (part of main conversation)
Value: User makes informed decision
ROI: ∞ (prevents costly mistakes)

Tier 2 (Orchestration) - Strategic Thinking

Main agent does the complex reasoning:

User: "Yes, let's do it"
↓
Main Agent (Sonnet):
├─ Validates task independence
├─ Checks for file conflicts
├─ Creates detailed execution plan
├─ Estimates cost and duration
└─ Delegates to Haiku agents

Cost: $0.03-0.05
Value: Proper planning prevents rework
ROI: Saves hours of manual coordination

Tier 3 (Execution) - Efficient Doing

Haiku agents handle the repetitive work:

Haiku Agent × 3 (parallel):
├─ Create GitHub issues
├─ Create git worktrees
├─ Implement features
├─ Run tests
└─ Push changes

Cost: 3 × $0.04 = $0.12
Value: Fast, parallel execution
ROI: 80% savings vs Sonnet agents

The Magic: Right Tool for the Job¶

Activity	Complexity	Model	Why
Teaching user	High	Sonnet	Requires explanation, persuasion
Planning tasks	High	Sonnet	Requires analysis, validation
Creating issues	Low	Haiku	Template-based, repetitive
Running tests	Low	Haiku	Automated, no decisions
Resolving conflicts	High	Sonnet	Requires judgment
Pushing changes	Low	Haiku	Simple git commands

Result: 80% of work done by Haiku, 20% by Sonnet, massive cost savings!

Cost Breakdown (Sonnet vs Haiku)¶

Pricing Table (October 2025)¶

Model	Input ($/MTok)	Output ($/MTok)	Context Window	Speed
Sonnet 4.5	$3.00	$15.00	200K	3-5s
Haiku 4.5	$0.80	$4.00	200K	1-2s
Ratio	3.75×	3.75×	Same	2× faster

Typical Token Usage¶

Planning (Sonnet)¶

Input tokens (context):
├─ User request: ~100 tokens
├─ Codebase context: ~5,000 tokens
├─ Recent conversation: ~2,000 tokens
└─ Skill guidance: ~1,000 tokens
Total input: ~8,000 tokens

Output tokens (plan):
├─ Task analysis: ~500 tokens
├─ Independence validation: ~300 tokens
├─ Execution plan: ~800 tokens
└─ Cost estimate: ~200 tokens
Total output: ~1,800 tokens

Cost calculation:
- Input: 8,000 × $3.00 / 1M = $0.024
- Output: 1,800 × $15.00 / 1M = $0.027
Total: $0.051

Execution - Sonnet (old way)¶

Input tokens per agent:
├─ Task description: ~500 tokens
├─ File context: ~10,000 tokens
├─ Project setup: ~2,000 tokens
├─ Test framework: ~3,000 tokens
└─ Error recovery: ~5,000 tokens
Total input: ~40,000 tokens

Output tokens per agent:
├─ Implementation: ~5,000 tokens
├─ Tests: ~2,000 tokens
├─ Commits: ~500 tokens
└─ Report: ~1,000 tokens
Total output: ~8,500 tokens

Cost per Sonnet agent:
- Input: 40,000 × $3.00 / 1M = $0.120
- Output: 8,500 × $15.00 / 1M = $0.128
Total: $0.248 per agent

Execution - Haiku (new way)¶

Input tokens per agent:
├─ Task description: ~500 tokens
├─ Essential files only: ~3,000 tokens
├─ Setup commands: ~500 tokens
├─ Test commands: ~500 tokens
└─ Error patterns: ~1,000 tokens
Total input: ~30,000 tokens

Output tokens per agent:
├─ Implementation: ~3,000 tokens
├─ Tests: ~1,000 tokens
├─ Commits: ~300 tokens
└─ Report: ~500 tokens
Total output: ~4,800 tokens

Cost per Haiku agent:
- Input: 30,000 × $0.80 / 1M = $0.024
- Output: 4,800 × $4.00 / 1M = $0.019
Total: $0.043 per agent

Side-by-Side Comparison¶

3 Parallel Tasks:

Component	Old (Sonnet)	New (Hybrid)	Savings
Planning	$0.051	$0.051	$0
Agent 1	$0.248	$0.043	$0.205
Agent 2	$0.248	$0.043	$0.205
Agent 3	$0.248	$0.043	$0.205
Review	$0.020	$0.020	$0
Total	$0.815	$0.200	$0.615
Savings	-	-	75%

10 Parallel Tasks:

Component	Old (Sonnet)	New (Hybrid)	Savings
Planning	$0.051	$0.051	$0
10 Agents	$2.480	$0.430	$2.050
Review	$0.030	$0.030	$0
Total	$2.561	$0.511	$2.050
Savings	-	-	80%

The pattern: More tasks = more savings!

ROI Calculations¶

Monthly Usage Scenarios¶

Solo Developer¶

profile:
  workflows_per_week: 5
  avg_tasks_per_workflow: 3
  weeks_per_month: 4

calculation:
  workflows_per_month: 5 × 4 = 20
  total_tasks: 20 × 3 = 60

costs:
  old_approach_all_sonnet:
    per_workflow: $0.815
    monthly: 20 × $0.815 = $16.30

  new_approach_hybrid:
    per_workflow: $0.200
    monthly: 20 × $0.200 = $4.00

  monthly_savings: $16.30 - $4.00 = $12.30
  annual_savings: $12.30 × 12 = $147.60

roi:
  time_to_setup: 2 hours
  time_value_at_$100/hr: $200
  payback_period: Never! (Saves time too)
  net_benefit_year_1: $147.60 + time_savings

Small Team (3 developers)¶

profile:
  developers: 3
  workflows_per_week_per_dev: 8
  avg_tasks_per_workflow: 4
  weeks_per_month: 4

calculation:
  workflows_per_month: 3 × 8 × 4 = 96
  total_tasks: 96 × 4 = 384

costs:
  old_approach_all_sonnet:
    per_workflow: $1.05  # 4 tasks
    monthly: 96 × $1.05 = $100.80

  new_approach_hybrid:
    per_workflow: $0.22  # 4 tasks
    monthly: 96 × $0.22 = $21.12

  monthly_savings: $100.80 - $21.12 = $79.68
  annual_savings: $79.68 × 12 = $956.16

roi:
  setup_cost: 4 hours × $100/hr = $400
  payback_period: $400 / $79.68 = 5 months
  net_benefit_year_1: $956.16 - $400 = $556.16
  net_benefit_year_2: $956.16 (pure savings)

Medium Team (10 developers)¶

profile:
  developers: 10
  workflows_per_week_per_dev: 10
  avg_tasks_per_workflow: 5
  weeks_per_month: 4

calculation:
  workflows_per_month: 10 × 10 × 4 = 400
  total_tasks: 400 × 5 = 2,000

costs:
  old_approach_all_sonnet:
    per_workflow: $1.29  # 5 tasks
    monthly: 400 × $1.29 = $516.00

  new_approach_hybrid:
    per_workflow: $0.27  # 5 tasks
    monthly: 400 × $0.27 = $108.00

  monthly_savings: $516.00 - $108.00 = $408.00
  annual_savings: $408.00 × 12 = $4,896.00

roi:
  setup_cost: 8 hours × $100/hr = $800
  payback_period: $800 / $408.00 = 2 months
  net_benefit_year_1: $4,896.00 - $800 = $4,096.00
  net_benefit_year_2: $4,896.00 (pure savings)

ROI Beyond Direct Costs¶

Time Savings:

parallel_execution_benefit:
  old_sequential:
    5_tasks: 5 × 30 mins = 150 mins = 2.5 hours
    cost_at_$100/hr: $250

  new_parallel:
    5_tasks: 30 mins (all at once)
    cost_at_$100/hr: $50

  time_value_saved: $200 per workflow

annual_time_savings:
  solo_dev: 20 workflows × $200 = $4,000
  small_team: 96 workflows × $200 = $19,200
  medium_team: 400 workflows × $200 = $80,000

Quality Improvements:

reduced_context_switching:
  old_way:
    manual_task_management: 10 mins per task
    cost: 5 tasks × 10 mins × $100/hr = $83.33

  new_way:
    automated_orchestration: 0 mins
    savings: $83.33 per workflow

reduced_errors:
  old_way:
    manual_errors: 10% of tasks need rework
    rework_cost: 5 tasks × 10% × 30 mins × $100/hr = $25

  new_way:
    automated_validation: 2% need rework
    rework_cost: 5 tasks × 2% × 30 mins × $100/hr = $5
    savings: $20 per workflow

Total ROI (Medium Team):

annual_benefits:
  direct_cost_savings: $4,896
  time_savings: $80,000
  quality_improvements: $8,000
  total: $92,896

annual_costs:
  setup_cost_amortized: $800 / 3 years = $267
  maintenance: $500
  total: $767

net_benefit: $92,896 - $767 = $92,129
roi_percentage: ($92,129 / $767) × 100 = 12,009%

When to Optimize for Cost vs Speed¶

Decision Framework¶

optimize_for_cost_when:
  conditions:
    - Budget is tight
    - Tasks are not time-sensitive
    - Large number of tasks
    - Repetitive operations
    - Well-defined workflows

  example_scenarios:
    - Batch processing overnight
    - Documentation generation
    - Code migration scripts
    - Database updates
    - Test suite runs

  actions:
    - Use Haiku for everything possible
    - Batch operations aggressively
    - Use templates extensively
    - Minimize context size
    - Cache common patterns

optimize_for_speed_when:
  conditions:
    - User is waiting
    - Deadline is tight
    - Quick iteration needed
    - Exploration phase
    - High-value decisions

  example_scenarios:
    - Live debugging session
    - Critical bug fix
    - Feature demo preparation
    - Architecture decisions
    - Security vulnerability fixes

  actions:
    - Use Sonnet for complex tasks
    - Parallel everything possible
    - Skip non-critical validations
    - Accept higher cost for speed
    - Minimize round-trips

balanced_approach:
  conditions:
    - Normal development pace
    - Standard workflows
    - Moderate budget
    - Typical deadlines

  actions:
    - Use three-tier architecture
    - Sonnet for planning
    - Haiku for execution
    - Monitor and adjust
    - Optimize bottlenecks

Cost vs Speed Trade-off Matrix¶

Scenario	Model Choice	Why	Cost Impact
Urgent bug fix	Sonnet	Speed critical, cost negligible	+300%
Feature development	Haiku	Standard pace, cost matters	Baseline
Refactoring	Haiku	Not urgent, repetitive	Baseline
Architecture review	Sonnet	Complexity high, one-off	+300%
Test generation	Haiku	Automated, repetitive	Baseline
Code migration	Haiku	Batch process, time available	Baseline
Security audit	Sonnet	Critical, complex analysis	+300%
Documentation	Haiku	Well-defined, not urgent	Baseline

Practical Examples¶

Example 1: Urgent Production Bug¶

scenario:
  urgency: Critical
  complexity: High
  timeline: 2 hours

decision: Use Sonnet for everything
reasoning:
  - User impact is severe
  - Need fastest possible resolution
  - Complex debugging required
  - Cost is irrelevant vs downtime

approach:
  - Sonnet investigates issue: $0.15
  - Sonnet implements fix: $0.30
  - Sonnet tests fix: $0.15
  - Sonnet deploys: $0.10
  total_cost: $0.70

comparison:
  haiku_approach:
    cost: $0.15
    time: 3 hours (too slow!)
    risk: Might miss edge cases

  conclusion: Pay 467% more, save 1 hour, reduce risk
  roi: Downtime cost >> AI cost

Example 2: Feature Development Sprint¶

scenario:
  urgency: Medium
  complexity: Medium
  timeline: 1 week

decision: Use hybrid approach
reasoning:
  - Time available for proper workflow
  - Multiple tasks to parallelize
  - Cost matters over many tasks
  - Quality critical

approach:
  - Sonnet plans sprint: $0.05
  - 10 × Haiku agents implement: $0.43
  - Sonnet reviews: $0.05
  total_cost: $0.53

comparison:
  all_sonnet_approach:
    cost: $2.56
    time: Same
    quality: Same

  conclusion: Save $2.03 (79%), same quality and speed
  roi: Pure savings

Example 3: Code Migration¶

scenario:
  urgency: Low
  complexity: Low
  timeline: 1 month

decision: Aggressive Haiku optimization
reasoning:
  - Not time-sensitive
  - Highly repetitive
  - Large volume of tasks
  - Budget-conscious

approach:
  - Sonnet creates migration template: $0.10 (one-time)
  - 100 × Haiku agents migrate files: $4.30
  - Haiku runs tests: $0.30
  - Sonnet spot-checks: $0.05
  total_cost: $4.75

comparison:
  all_sonnet_approach:
    cost: $25.80
    time: Same (parallel)
    quality: Same

  manual_approach:
    cost: 100 files × 30 mins × $100/hr = $5,000
    time: 50 hours
    risk: Human errors

  conclusion:
    vs_sonnet: Save $21.05 (82%)
    vs_manual: Save $4,995.25 (99.9%)
    roi: Massive

Measuring Actual Costs¶

Instrumentation Strategy¶

1. Token Tracking¶

Add to agent scripts:

import json
import time

class CostTracker:
    def __init__(self, agent_id, model):
        self.agent_id = agent_id
        self.model = model
        self.start_time = time.time()
        self.input_tokens = 0
        self.output_tokens = 0

    def track_request(self, input_tokens, output_tokens):
        self.input_tokens += input_tokens
        self.output_tokens += output_tokens

    def calculate_cost(self):
        pricing = {
            "sonnet-4.5": {"input": 3.00, "output": 15.00},
            "haiku-4.5": {"input": 0.80, "output": 4.00}
        }

        rates = pricing[self.model]
        input_cost = (self.input_tokens / 1_000_000) * rates["input"]
        output_cost = (self.output_tokens / 1_000_000) * rates["output"]
        total_cost = input_cost + output_cost

        return {
            "agent_id": self.agent_id,
            "model": self.model,
            "duration_seconds": time.time() - self.start_time,
            "input_tokens": self.input_tokens,
            "output_tokens": self.output_tokens,
            "input_cost_usd": round(input_cost, 4),
            "output_cost_usd": round(output_cost, 4),
            "total_cost_usd": round(total_cost, 4)
        }

# Usage in agent
tracker = CostTracker("parallel-task-executor-1", "haiku-4.5")

# After each API call
tracker.track_request(input_tokens=30000, output_tokens=5000)

# At end of task
cost_report = tracker.calculate_cost()
print(json.dumps(cost_report, indent=2))

Output:

{
  "agent_id": "parallel-task-executor-1",
  "model": "haiku-4.5",
  "duration_seconds": 180,
  "input_tokens": 30000,
  "output_tokens": 5000,
  "input_cost_usd": 0.024,
  "output_cost_usd": 0.02,
  "total_cost_usd": 0.044
}

2. Workflow Tracking¶

Aggregate costs across workflow:

class WorkflowCostTracker:
    def __init__(self, workflow_id):
        self.workflow_id = workflow_id
        self.agents = []

    def add_agent(self, cost_report):
        self.agents.append(cost_report)

    def calculate_total(self):
        total_cost = sum(agent["total_cost_usd"] for agent in self.agents)
        total_duration = max(agent["duration_seconds"] for agent in self.agents)

        # Calculate what it would have cost with all Sonnet
        sonnet_cost = 0
        for agent in self.agents:
            if agent["model"] == "haiku-4.5":
                # Convert to Sonnet pricing
                input_cost = (agent["input_tokens"] / 1_000_000) * 3.00
                output_cost = (agent["output_tokens"] / 1_000_000) * 15.00
                sonnet_cost += input_cost + output_cost
            else:
                sonnet_cost += agent["total_cost_usd"]

        savings = sonnet_cost - total_cost
        savings_pct = (savings / sonnet_cost * 100) if sonnet_cost > 0 else 0

        return {
            "workflow_id": self.workflow_id,
            "total_agents": len(self.agents),
            "total_cost_usd": round(total_cost, 4),
            "total_duration_seconds": total_duration,
            "if_all_sonnet_usd": round(sonnet_cost, 4),
            "savings_usd": round(savings, 4),
            "savings_percent": round(savings_pct, 1)
        }

# Usage
workflow = WorkflowCostTracker("workflow-001")
workflow.add_agent(agent1_cost)
workflow.add_agent(agent2_cost)
workflow.add_agent(agent3_cost)

summary = workflow.calculate_total()
print(json.dumps(summary, indent=2))

Output:

{
  "workflow_id": "workflow-001",
  "total_agents": 3,
  "total_cost_usd": 0.132,
  "total_duration_seconds": 180,
  "if_all_sonnet_usd": 0.744,
  "savings_usd": 0.612,
  "savings_percent": 82.3
}

3. Historical Tracking¶

Store in SQLite database:

CREATE TABLE workflow_costs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    workflow_id TEXT NOT NULL,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
    total_agents INTEGER,
    total_cost_usd REAL,
    total_duration_seconds INTEGER,
    if_all_sonnet_usd REAL,
    savings_usd REAL,
    savings_percent REAL
);

CREATE TABLE agent_costs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    workflow_id TEXT NOT NULL,
    agent_id TEXT NOT NULL,
    model TEXT NOT NULL,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
    duration_seconds INTEGER,
    input_tokens INTEGER,
    output_tokens INTEGER,
    total_cost_usd REAL,
    FOREIGN KEY (workflow_id) REFERENCES workflow_costs(workflow_id)
);

CREATE INDEX idx_workflow_timestamp ON workflow_costs(timestamp);
CREATE INDEX idx_agent_workflow ON agent_costs(workflow_id);

Query examples:

-- Total cost this month
SELECT SUM(total_cost_usd) as monthly_cost
FROM workflow_costs
WHERE timestamp >= date('now', 'start of month');

-- Total savings this month
SELECT SUM(savings_usd) as monthly_savings
FROM workflow_costs
WHERE timestamp >= date('now', 'start of month');

-- Average cost per workflow
SELECT AVG(total_cost_usd) as avg_cost
FROM workflow_costs;

-- Most expensive workflows
SELECT workflow_id, total_cost_usd, savings_percent
FROM workflow_costs
ORDER BY total_cost_usd DESC
LIMIT 10;

-- Cost trend over time
SELECT
    date(timestamp) as day,
    SUM(total_cost_usd) as daily_cost,
    SUM(savings_usd) as daily_savings
FROM workflow_costs
GROUP BY date(timestamp)
ORDER BY day DESC
LIMIT 30;

4. Real-Time Dashboard¶

Generate cost report:

def generate_cost_report(db_path):
    import sqlite3
    conn = sqlite3.connect(db_path)

    # This month's stats
    this_month = conn.execute("""
        SELECT
            COUNT(*) as workflows,
            SUM(total_agents) as agents,
            SUM(total_cost_usd) as cost,
            SUM(savings_usd) as savings,
            AVG(savings_percent) as avg_savings_pct
        FROM workflow_costs
        WHERE timestamp >= date('now', 'start of month')
    """).fetchone()

    # Top cost drivers
    top_costs = conn.execute("""
        SELECT workflow_id, total_cost_usd
        FROM workflow_costs
        ORDER BY total_cost_usd DESC
        LIMIT 5
    """).fetchall()

    # Model distribution
    model_dist = conn.execute("""
        SELECT
            model,
            COUNT(*) as count,
            SUM(total_cost_usd) as cost
        FROM agent_costs
        WHERE timestamp >= date('now', 'start of month')
        GROUP BY model
    """).fetchall()

    conn.close()

    return {
        "this_month": {
            "workflows": this_month[0],
            "agents": this_month[1],
            "cost_usd": round(this_month[2], 2),
            "savings_usd": round(this_month[3], 2),
            "avg_savings_percent": round(this_month[4], 1)
        },
        "top_expensive_workflows": [
            {"id": w[0], "cost": w[1]} for w in top_costs
        ],
        "model_distribution": [
            {"model": m[0], "count": m[1], "cost": m[2]} for m in model_dist
        ]
    }

# Usage
report = generate_cost_report("costs.db")
print(json.dumps(report, indent=2))

Output:

{
  "this_month": {
    "workflows": 96,
    "agents": 384,
    "cost_usd": 21.12,
    "savings_usd": 79.68,
    "avg_savings_percent": 79.0
  },
  "top_expensive_workflows": [
    {"id": "workflow-042", "cost": 0.89},
    {"id": "workflow-031", "cost": 0.67},
    {"id": "workflow-018", "cost": 0.54}
  ],
  "model_distribution": [
    {"model": "haiku-4.5", "count": 360, "cost": 15.48},
    {"model": "sonnet-4.5", "count": 24, "cost": 5.64}
  ]
}

Best Practices for Cost Efficiency¶

1. Minimize Context Size¶

Bad (high cost):

# Sending entire codebase
context = {
    "all_files": read_all_files(),  # 100K tokens!
    "git_history": get_git_log(),   # 20K tokens!
    "dependencies": get_all_deps()  # 10K tokens!
}
# Total: 130K tokens × $3.00/M = $0.39 per request!

Good (low cost):

# Sending only relevant context
context = {
    "files": [
        read_file("src/components/Navigation.tsx"),  # 500 tokens
        read_file("src/auth/AuthContext.tsx")        # 300 tokens
    ],
    "task_requirements": task_description  # 200 tokens
}
# Total: 1K tokens × $3.00/M = $0.003 per request
# Savings: 99.2%!

2. Use Templates¶

Create reusable templates:

# templates/feature-implementation.yaml
task:
  steps:
    - "Read existing code"
    - "Implement {feature_name}"
    - "Add tests"
    - "Run test suite"
    - "Push changes"
  tests:
    - "Feature works correctly"
    - "No regressions"

# Usage (minimal tokens)
agent_task:
  template: "feature-implementation"
  variables:
    feature_name: "logout button"

# vs sending full description every time
# Savings: ~80% of task description tokens

3. Batch Operations¶

Bad (multiple API calls):

# 5 separate calls
for i in range(5):
    create_github_issue(task[i])
    # Cost: 5 × $0.01 = $0.05

Good (single batch call):

# 1 call with all tasks
create_github_issues_batch(tasks)
# Cost: $0.02
# Savings: 60%

4. Cache Common Patterns¶

Cache expensive computations:

import functools

@functools.lru_cache(maxsize=128)
def analyze_codebase_structure():
    """Expensive analysis, cache results."""
    # This runs once, then cached
    return expensive_analysis()

# First call: ~$0.05
# Subsequent calls: $0 (cached)

5. Progressive Context Loading¶

Load context progressively:

# Start with minimal context
context = minimal_context()  # 1K tokens

# Only load more if needed
if agent_needs_more_context():
    context.update(additional_context())  # +2K tokens

# Only as last resort
if still_needs_more():
    context.update(full_context())  # +10K tokens

# Average case: 1K tokens vs always 13K tokens
# Savings: 92%

6. Choose the Right Model¶

Decision tree:

def choose_model(task):
    if task.requires_complex_reasoning():
        return "sonnet-4.5"

    if task.is_repetitive():
        return "haiku-4.5"

    if task.cost_sensitive() and not task.urgent():
        return "haiku-4.5"

    if task.urgent() and task.critical():
        return "sonnet-4.5"

    # Default to Haiku for cost efficiency
    return "haiku-4.5"

7. Monitor and Optimize¶

Track cost per task type:

# Identify expensive patterns
expensive_tasks = get_tasks_by_cost(threshold=0.10)

for task in expensive_tasks:
    analyze_cost_drivers(task)
    suggest_optimizations(task)

# Example output:
# Task "codebase analysis" costs $0.15 per run
# Driver: Sending full codebase (100K tokens)
# Optimization: Use indexed summary (5K tokens)
# Potential savings: 90% ($0.135 per run)

Real-World Savings Examples¶

Example 1: E-commerce Startup¶

Profile:

company: E-commerce startup
team_size: 5 developers
usage: Heavy parallel development
timeline: 6 months

Before Optimization:

monthly_workflows: 200
avg_tasks_per_workflow: 4
approach: All Sonnet

monthly_cost:
  ai_usage: 200 × $1.05 = $210
  developer_time: Normal workflow
  total_ai: $210/month
  annual_projection: $2,520

After Optimization:

monthly_workflows: 200
avg_tasks_per_workflow: 4
approach: Sonnet planning + Haiku execution

monthly_cost:
  ai_usage: 200 × $0.22 = $44
  developer_time: Same (no change)
  total_ai: $44/month
  annual_projection: $528

savings:
  monthly: $210 - $44 = $166
  annual: $2,520 - $528 = $1,992
  percentage: 79%

impact:
  developer_satisfaction: "Much higher (faster execution)"
  time_savings: "~30 hours/month (parallel execution)"
  quality: "Same or better (consistent automation)"

quote:
  "The cost savings paid for our CI/CD infrastructure.
   The time savings let us ship features 2x faster.
   This was a no-brainer investment."
  - CTO, E-commerce Startup

Example 2: SaaS Company¶

Profile:

company: B2B SaaS platform
team_size: 15 developers
usage: Continuous development + refactoring
timeline: 1 year

Before Optimization:

monthly_workflows: 600
avg_tasks_per_workflow: 5
approach: Manual + occasional Sonnet

monthly_cost:
  ai_usage: 150 × $1.29 = $194 (only 25% of work)
  developer_time: 75% manual work = $45,000
  total: $45,194/month
  annual: $542,328

After Optimization:

monthly_workflows: 600
avg_tasks_per_workflow: 5
approach: Fully automated with Sonnet + Haiku

monthly_cost:
  ai_usage: 600 × $0.27 = $162 (100% of work!)
  developer_time: Focused on high-value work = $40,000
  total: $40,162/month
  annual: $481,944

savings:
  monthly: $45,194 - $40,162 = $5,032
  annual: $542,328 - $481,944 = $60,384
  percentage: 11%

impact:
  velocity: "+40% feature throughput"
  quality: "30% fewer bugs (automated testing)"
  morale: "Devs love not doing repetitive tasks"

quote:
  "We saved $60K in year one, but the real win was
   shipping 40% more features. Our competitors can't
   keep up. This paid for itself in the first month."
  - VP Engineering, SaaS Company

Example 3: Agency¶

Profile:

company: Digital agency
team_size: 25 developers (across multiple clients)
usage: High volume, varied projects
timeline: 3 months

Before Optimization:

monthly_workflows: 1000
avg_tasks_per_workflow: 3
approach: Mix of manual + some automation

monthly_cost:
  ai_usage: 300 × $0.82 = $246 (30% automation)
  developer_time: 70% manual = $87,500
  total: $87,746/month
  quarterly: $263,238

After Optimization:

monthly_workflows: 1000
avg_tasks_per_workflow: 3
approach: 90% automated with Sonnet + Haiku

monthly_cost:
  ai_usage: 900 × $0.20 = $180 (90% automation!)
  developer_time: 10% manual = $75,000
  total: $75,180/month
  quarterly: $225,540

savings:
  monthly: $87,746 - $75,180 = $12,566
  quarterly: $263,238 - $225,540 = $37,698
  percentage: 14.3%

impact:
  client_capacity: "+25% more clients (same team)"
  profit_margin: "+8% margin improvement"
  delivery_speed: "2x faster project completion"

quote:
  "We took on 5 more clients without hiring.
   That's $500K in additional revenue per year.
   The cost optimization ROI is literally infinite."
  - CEO, Digital Agency

Summary¶

Key Takeaways¶

Three-tier architecture is the foundation:
Skills for education (minimal cost)
Sonnet for planning (small cost)
Haiku for execution (optimized cost)
80/20 rule in practice:
80% of work done by Haiku (cheap)
20% of work done by Sonnet (expensive)
Result: 80% cost reduction
Cost savings are just the beginning:
Time savings often exceed cost savings
Quality improvements reduce rework
Developer satisfaction increases
Velocity and throughput improve
Measurement is critical:
Track costs per workflow
Identify optimization opportunities
Validate savings with data
Iterate and improve
Right tool for the right job:
Don't optimize everything for cost
Critical tasks deserve Sonnet
Repetitive tasks perfect for Haiku
Balance cost, speed, and quality

Action Items¶

Week 1: Setup - [ ] Implement cost tracking - [ ] Establish baseline measurements - [ ] Configure Haiku agents - [ ] Document current costs

Week 2: Optimize - [ ] Migrate execution tasks to Haiku - [ ] Create templates for common tasks - [ ] Reduce context sizes - [ ] Batch operations

Week 3: Measure - [ ] Compare costs before/after - [ ] Calculate actual savings - [ ] Identify remaining inefficiencies - [ ] Document learnings

Week 4: Scale - [ ] Roll out to full team - [ ] Train team on best practices - [ ] Monitor usage patterns - [ ] Celebrate wins!

Expected Results¶

month_1:
  cost_reduction: 60-70%
  time_savings: 40-50%
  learning_curve: Moderate

month_3:
  cost_reduction: 75-85%
  time_savings: 50-60%
  adoption: Full team

month_6:
  cost_reduction: 80-90%
  time_savings: 60-70%
  culture: "AI-first development"

Version: 1.0 Last Updated: 2025-10-21 License: MIT Questions? See AGENT_INTEGRATION_GUIDE.md for implementation details