Context Usage Integration Guide¶

Overview¶

This document explains how Contextune integrates with Claude Code's /usage and /context commands to provide intelligent context optimization and cost savings.

Problem Statement¶

Claude Code tracks usage across three dimensions: 1. Session limits: Resets every 12 hours 2. Weekly limits (all models): Resets weekly 3. Weekly limits (Opus): Separate quota for Opus

Users need to manually run /usage to check these limits. Contextune automates this optimization.

Solution Architecture¶

1. Manual Usage Logging (`/ctx:usage`)¶

User Workflow:

# Step 1: Check Claude Code usage
/usage

# Step 2: Log to Contextune (prompts for paste)
/ctx:usage

What Contextune Does: - Parses the /usage output - Stores snapshot in observability.db - Analyzes trends over time - Provides recommendations

Example:

Input (pasted by user):
  Current session: 7% used
  Current week (all models): 89% used
  Current week (Opus): 0% used

Output (Contextune):
  ⚠️  89% weekly usage - approaching limit
  💡 Switch research tasks to Haiku (87% savings)
  💡 Max parallel tasks: 2 (based on remaining 11%)
  ✨ Opus available - use for complex architecture

2. Automatic Token Estimation¶

What We Track: - Prompt lengths (word count → token estimate) - Response sizes (from observability DB) - Model used (Haiku vs Sonnet vs Opus) - Parallel task spawning (context multiplication)

Estimation Formula:

# Rough estimates (Claude 3.5 tokenizer)
tokens = words * 1.3

# Session usage estimate
session_tokens = sum(prompt_tokens + response_tokens)
session_percent = (session_tokens / SESSION_LIMIT) * 100

# Weekly usage estimate
weekly_tokens = sum_last_7_days(session_tokens)
weekly_percent = (weekly_tokens / WEEKLY_LIMIT) * 100

Accuracy: - ±10% for session estimates - ±15% for weekly estimates - Good enough for proactive warnings

3. Smart Model Selection¶

Decision Logic (implemented in usage_monitor.py):

def should_use_haiku(task_type, weekly_usage):
    """
    Research tasks: Always Haiku (fast, cheap, good enough)
    Design tasks: Sonnet unless weekly > 80%
    Execute tasks: Always Haiku (deterministic)
    General tasks: Haiku if weekly > 80%
    """

Cost Savings: - Haiku: $0.25 / 1M input tokens - Sonnet: $3.00 / 1M input tokens - Savings: 87% by using Haiku for research

Example: - 10 research tasks @ Sonnet: $0.30 - 10 research tasks @ Haiku: $0.04 - Saved: $0.26 per 10 tasks

4. Parallel Task Limits¶

Context Budget Calculation:

def get_parallel_task_limit(remaining_percent):
    """
    Each parallel task consumes ~10-15% context.

    remaining < 15%: 1 task only
    remaining < 30%: 2 tasks
    remaining < 45%: 3 tasks
    remaining < 60%: 4 tasks
    remaining >= 60%: 5 tasks (max)
    """

Integration Points:

/ctx:plan - Checks usage before creating plan

Warning: 89% weekly usage
Recommendation: Plan for 2 parallel tasks (not 5)
Alternative: Wait until reset (Oct 29, 9:59pm) for full capacity

/ctx:execute - Validates before spawning agents

Attempting to spawn 5 tasks...
⚠️  Only 11% context remaining
✅ Spawning 2 tasks now
📅 Queuing 3 tasks for after reset

5. Proactive Warnings¶

Warning Levels:

Weekly Usage	Status	Action
0-70%	✅ Healthy	Normal operation
71-85%	⚠️ Warning	Suggest Haiku for non-critical tasks
86-95%	🚨 Critical	Auto-switch to Haiku, limit parallel
96-100%	🛑 Limit	Defer all tasks until reset

Hook Integration:

The user_prompt_submit.py hook checks usage before every prompt:

# In hook
monitor = UsageMonitor()
usage = monitor.get_current_usage()

if usage.weekly_percent > 90:
    # Add warning to prompt
    warning = f"⚠️  {usage.weekly_percent}% weekly usage. Using Haiku for this request."
    # Auto-switch model
    model = "haiku-4-5"

Implementation Status¶

✅ Completed (v0.8.8)¶

lib/usage_monitor.py - Core usage tracking
Manual usage parsing (_parse_usage_output())
Smart model selection (should_use_haiku())
Parallel task limits (get_parallel_task_limit())
Recommendations engine (get_recommendation())
Database integration (save_usage_history())

🚧 In Progress¶

/ctx:usage slash command implementation
Hook integration for automatic warnings
Marimo dashboard for usage trends
Token estimation algorithm
Auto-model-switching in research agents

📋 Planned¶

Weekly usage reports via email
Budget alerts ("$X spent this week")
Opus usage optimization (use when available)
Session reset notifications
Cost forecasting ("At current rate, you'll hit limit in X days")

Usage Examples¶

Example 1: Research Task with Auto-Optimization¶

User: "research best React state libraries"

Contextune (detects):
  - Command: /ctx:research
  - Weekly usage: 89%
  - Recommendation: Use Haiku (not Sonnet)

Contextune (executes):
  - Spawns 3 Haiku research agents
  - Cost: $0.02 (vs $0.24 with Sonnet)
  - Time: 2 minutes
  - Saved: $0.22 ✅

Example 2: Parallel Plan with Context Limits¶

User: "create parallel plan for auth, dashboard, API, tests, docs"

Contextune (checks):
  - 5 tasks requested
  - Session usage: 92%
  - Remaining: 8%
  - Max tasks: 1

Contextune (warns):
  ⚠️  92% session usage (resets 12:59am)
  💡 Can only execute 1 task now
  💡 Options:
    1. Execute Task 1 now (highest priority)
    2. Wait 4 hours for session reset
    3. Queue all tasks for after reset

User choice: Queue for reset
Contextune: ✅ Tasks queued, will auto-execute at 1:00am

Example 3: Opus Opportunity Detection¶

Contextune (monitors):
  - Weekly usage: 45%
  - Opus usage: 0%
  - Task: "design distributed cache architecture"

Contextune (suggests):
  ✨ Opus available (0% used)!
  This is a complex architecture task - perfect for Opus.

  Cost comparison:
    • Sonnet: $0.15 estimated
    • Opus: $0.75 estimated (+$0.60)

  Trade-off: 5x cost for highest quality reasoning

  Use Opus? [y/N]

Dashboard Integration¶

The Marimo dashboard (notebooks/contextune_metrics_dashboard.py) will show:

Usage Trends
Session usage over last 24 hours
Weekly usage over last 4 weeks
Opus usage (if any)
Cost Savings
Money saved by auto-switching to Haiku
Comparison: "Without Contextune" vs "With Contextune"
ROI calculation
Model Distribution
% of tasks on Haiku vs Sonnet vs Opus
Recommended vs actual usage
Parallel Efficiency
Tasks executed vs tasks queued
Context utilization rate
Time saved by parallelization

Testing¶

# Test usage monitor
uv run lib/usage_monitor.py

# Test with mock data
python3 <<EOF
from lib.usage_monitor import UsageMonitor, UsageStats

# Simulate high usage
stats = UsageStats(
    session_percent=92.0,
    session_reset_time="12:59am",
    weekly_percent=89.0,
    weekly_reset_time="Oct 29, 9:59pm",
    opus_percent=0.0,
    timestamp=time.time(),
    raw_output=""
)

monitor = UsageMonitor()
monitor._cache = stats

# Get recommendations
rec = monitor.get_recommendation()
print(rec)
EOF

Future Enhancements¶

Predictive Analysis
"At current rate, you'll hit 95% by Tuesday"
"Recommend deferring 3 tasks to next week"
Budget Tracking
"$X spent this week / $Y monthly budget"
"On track to spend $Z this month"
Team Coordination
Share usage data across team
Coordinate parallel tasks to avoid limit collisions
API Integration
Direct integration with Anthropic's usage API (when available)
Real-time usage tracking without manual paste

Summary¶

Key Benefits:

✅ Automatic optimization: No manual checking needed
💰 Cost savings: 87% reduction by smart model selection
⚡ Faster execution: Haiku is 5x faster for research
🎯 Better planning: Context-aware task scheduling
📊 Visibility: Historical trends and forecasting

User Experience:

Before Contextune:

User: [runs 10 research tasks on Sonnet]
Result: $0.30 spent, might hit weekly limit

After Contextune:

User: [runs 10 research tasks]
Contextune: Auto-switched to Haiku (89% weekly usage)
Result: $0.04 spent, saved $0.26, 11% capacity preserved

Next Steps: 1. Implement /ctx:usage slash command 2. Add hook integration for automatic warnings 3. Create Marimo usage dashboard 4. Test with real usage data 5. Document user workflows