Design Decisions¶

Key design decisions and their rationale.

HTML as Primary Format¶

Decision: Use HTML files as the primary data format, not JSON or a database.

Rationale: - HTML provides structure (nodes), relationships (hyperlinks), and presentation (CSS) in one format - Browsers are universal graph viewers - no special tools needed - Human readable and machine parseable - Git-friendly text format

Trade-offs: - Slightly more verbose than JSON - Requires HTML parsing (vs JSON.parse) - Not a "standard" database format

Why it's worth it: The ability to view any node in a browser with full styling is invaluable for debugging and understanding. Version control works perfectly. No special tools required.

CSS Selectors for Queries¶

Decision: Use CSS selectors instead of a custom query language (like Cypher or GraphQL).

Rationale: - Everyone already knows CSS selectors - Powerful enough for most queries - Native browser support - Libraries available in every language

Trade-offs: - Less expressive than Cypher for complex graph patterns - No built-in graph traversal syntax

Why it's worth it: Zero learning curve. '[data-status="blocked"]' is immediately understandable. For complex queries, use the Python/JS graph algorithms.

Pydantic for Validation¶

Decision: Use Pydantic models for all data structures.

Rationale: - Type safety and validation - Automatic serialization - Excellent error messages - IDE autocomplete support

Trade-offs: - Adds dependency (but only on Python side) - Schema changes require code updates

Why it's worth it: Catch errors early with validation. Type hints make the SDK easier to use. Documentation comes from type annotations.

justhtml for Parsing¶

Decision: Use justhtml library for HTML parsing in Python.

Rationale: - Pure Python, no C dependencies - Simple API - Works in restrictive environments - Small footprint

Trade-offs: - Slower than lxml for large files - Less features than BeautifulSoup

Why it's worth it: Zero-dependency install works everywhere. Performance is fine for typical graph sizes (thousands of nodes).

Optional SQLite Index¶

Decision: Make SQLite index optional, not required.

Rationale: - Most use cases work fine with file system queries - Adds complexity - Must be kept in sync

Trade-offs: - Slower queries for large graphs without index - More features when index is enabled

Why it's worth it: Start simple. Add complexity only when needed. The 80% case doesn't need a database.

TrackBuilder Fluent API¶

Decision: Provide a fluent builder pattern for track creation.

Rationale: - Reads like English - Self-documenting - Guides users through required fields - Chainable for conciseness

Example:

track = sdk.tracks.builder() \
    .title("Project") \
    .with_spec(overview="...") \
    .with_plan_phases([...]) \
    .create()

Trade-offs: - More code than dictionary-based API - Another pattern to learn

Why it's worth it: Discoverability through method chaining. IDE autocomplete shows what's available. Errors caught at build time, not creation time.

Auto-generated IDs¶

Decision: Auto-generate IDs (timestamp-based) instead of user-provided.

Rationale: - No collision risk - Sortable by creation time - Deterministic (timestamp + type) - No user decision required

Format: {type}-YYYYMMDD-HHMMSS

Trade-offs: - Less human-friendly than "user-auth" - Not portable across time zones (uses UTC)

Why it's worth it: One less thing for users to think about. Collisions impossible. Easy to debug (timestamp shows when created).

Session Management via Hooks¶

Decision: Use hooks for automatic session management, not manual SDK calls.

Rationale: - Zero user effort - Can't forget to start/end session - Consistent across all agents - Framework-agnostic

Trade-offs: - Requires hook configuration - Less explicit than manual calls - Might capture unwanted activity

Why it's worth it: Agent developers don't think about sessions. Attribution happens automatically. Context preserved across conversations.

HTML + CSS + JS Dashboard¶

Decision: Build dashboard with vanilla HTML/CSS/JS, not React/Vue.

Rationale: - No build step required - Works offline immediately - Easier to understand and modify - No framework lock-in

Trade-offs: - More verbose code - Less sophisticated state management - Limited component reuse

Why it's worth it: Open index.html in a browser and it works. No npm install, no build process. View source to understand how it works.

Git as Version Control¶

Decision: Design for Git from the start, not as an afterthought.

Rationale: - Developers already use Git - Perfect for text files - Branching and merging work naturally - History and diffs are meaningful

Trade-offs: - Large graphs might have slow diffs - Merge conflicts possible (though readable)

Why it's worth it: Real version control, not change logs. Branches for experimentation. History shows evolution of work.

Python SDK First¶

Decision: Build Python SDK first, JavaScript second.

Rationale: - AI agents primarily use Python - Rich ecosystem (Pydantic, etc.) - Type hints for documentation - Easier to validate designs

Trade-offs: - JavaScript users wait longer - Some duplication of logic

Why it's worth it: Focus on primary users (agents) first. Get it right in one language, then port. JavaScript can still use HTML directly.

MIT License¶

Decision: Use MIT license, not GPL or proprietary.

Rationale: - Maximum freedom for users - Commercial use allowed - Compatible with everything - Simple and clear

Trade-offs: - Can't prevent proprietary forks - No patent protection

Why it's worth it: Maximize adoption. No licensing worries. Good for community.

Immutable by Default¶

Decision: Pydantic models are immutable by default.

Rationale: - Prevents accidental modifications - Thread-safe reads - Explicit about changes (must call .save())

Trade-offs: - Must create new instances for changes - More verbose update code

Why it's worth it: Explicit is better than implicit. Prevents bugs from unexpected mutations. Clear when data is being persisted.

Decisions We Didn't Make¶

Why not MongoDB?¶

Binary format (not human-readable)
Requires server
Complex deployment
Not a graph database

Why not GraphQL?¶

Too complex for this use case
Schema definition overhead
Requires server
CSS selectors are simpler

Why not Markdown?¶

Can't represent structured properties
No native relationship types
Needs front matter (not standard)
HTML is more flexible

Why not RDF/Semantic Web?¶

Too complex
Poor tooling
Steep learning curve
Overkill for most use cases

Next Steps¶

Why HTML? - Core philosophy
Comparisons - vs alternatives
Contributing - Help improve HtmlGraph