HtmlGraph¶
Core graph operations, queries, and traversal algorithms.
Overview¶
The HtmlGraph class provides:
- Node and edge management
- Multiple query methods (CSS selectors, QueryBuilder, Find API)
- Graph traversal algorithms
- O(1) edge index for reverse lookups
Initialization¶
from htmlgraph import HtmlGraph
# Initialize with graph directory
graph = HtmlGraph(graph_dir=".htmlgraph")
Node Operations¶
Adding Nodes¶
from htmlgraph.models import Node
node = Node(
id="feature-001",
title="Add login",
type="feature",
status="todo",
priority="high"
)
graph.add(node)
Getting Nodes¶
# Get single node by ID
node = graph.get("feature-001")
# Get all nodes
nodes = list(graph.nodes())
# Get nodes by type
features = [n for n in graph.nodes() if n.type == "feature"]
Updating Nodes¶
Removing Nodes¶
Query Methods¶
HtmlGraph provides four ways to query nodes:
1. CSS Selector Queries¶
# Simple attribute queries
blocked = graph.query('[data-status="blocked"]')
high_priority = graph.query('[data-priority="high"]')
# Combined selectors (AND)
urgent = graph.query('[data-status="blocked"][data-priority="high"]')
2. QueryBuilder (Fluent API)¶
For complex queries with OR, NOT, numeric comparisons, and text search.
from htmlgraph import QueryBuilder
# Simple equality
features = graph.query_builder().where("type", "feature").execute()
# Chained conditions
urgent = (graph.query_builder()
.where("status", "blocked")
.and_("priority", "high")
.execute())
# OR conditions
high_or_critical = (graph.query_builder()
.where("priority", "high")
.or_("priority", "critical")
.execute())
# NOT conditions
not_done = (graph.query_builder()
.where("type", "feature")
.not_("status").eq("done")
.execute())
# Numeric comparisons
high_effort = (graph.query_builder()
.where("properties.effort").gt(8)
.execute())
# Text search
auth_features = (graph.query_builder()
.where("title").contains("auth")
.execute())
# Regex matching
api_features = (graph.query_builder()
.where("title").matches(r"API|REST")
.execute())
# List operations
priorities = (graph.query_builder()
.where("priority").in_(["high", "critical"])
.execute())
# Result methods
first_blocked = graph.query_builder().where("status", "blocked").first()
blocked_count = graph.query_builder().where("status", "blocked").count()
3. Find API (BeautifulSoup-style)¶
Simple, intuitive queries with Django-style lookup suffixes.
# Find first match
feature = graph.find(type="feature")
blocked = graph.find(status="blocked", priority="high")
# Find all matches
all_features = graph.find_all(type="feature")
blocked_high = graph.find_all(status="blocked", priority="high")
# With limit
top_5 = graph.find_all(type="feature", limit=5)
# Django-style lookup suffixes
auth = graph.find_all(title__contains="auth")
high_effort = graph.find_all(properties__effort__gt=8)
priorities = graph.find_all(priority__in=["high", "critical"])
Available lookup suffixes:
- __contains, __icontains - substring search
- __startswith, __endswith - prefix/suffix match
- __regex - regular expression
- __gt, __gte, __lt, __lte - numeric comparisons
- __in, __not_in - list membership
- __isnull - null check
4. Relationship Queries¶
# Find nodes related to a specific node
related = graph.find_related("feature-001")
# By relationship type
blockers = graph.find_related("feature-001", relationship="blocked_by")
# Convenience methods
blocking = graph.find_blocking("feature-001") # What this blocks
blocked_by = graph.find_blocked_by("feature-001") # What blocks this
Edge Index¶
O(1) reverse edge lookups (vs O(V×E) linear scan).
# Get edges pointing TO a node (incoming)
incoming = graph.get_incoming_edges("feature-001")
blockers = graph.get_incoming_edges("feature-001", relationship="blocked_by")
# Get edges pointing FROM a node (outgoing)
outgoing = graph.get_outgoing_edges("feature-001")
# Get all connected neighbors
neighbors = graph.get_neighbors("feature-001")
Graph Traversal¶
Ancestors and Descendants¶
# Get all ancestors (nodes this depends on)
ancestors = graph.ancestors("feature-001")
# With depth limit
immediate_deps = graph.ancestors("feature-001", max_depth=1)
# Get all descendants (nodes that depend on this)
descendants = graph.descendants("feature-001")
# With depth limit
immediate_dependents = graph.descendants("feature-001", max_depth=1)
Transitive Dependencies¶
# Get all transitive dependencies (follows blocked_by edges)
deps = graph.transitive_deps("feature-001")
Path Finding¶
# Find shortest path between nodes
shortest = graph.shortest_path("feature-001", "feature-045")
# Find all paths
all_paths = graph.all_paths("feature-001", "feature-045")
# With max length constraint
short_paths = graph.all_paths("feature-001", "feature-045", max_length=4)
Subgraph Extraction¶
# Extract subgraph with specific nodes
subset = graph.subgraph(["feature-001", "feature-002", "feature-003"])
# Without internal edges
nodes_only = graph.subgraph(["f-001", "f-002"], include_edges=False)
Connected Components¶
# Get all nodes in the same connected component
component = graph.connected_component("feature-001")
# Filter by relationship type
blocking_component = graph.connected_component("feature-001", relationship="blocked_by")
Graph Algorithms¶
Find Bottlenecks¶
Critical Path¶
Dependents¶
# Nodes that directly depend on this one (O(1) with EdgeIndex)
dependents = graph.dependents("feature-001")
Complete Method Reference¶
Node Management¶
| Method | Description |
|---|---|
add(node, overwrite=False) |
Add a node to the graph |
get(node_id) |
Get node by ID |
update(node) |
Update an existing node |
remove(node_id) |
Remove a node |
nodes() |
Iterator over all nodes |
Query Methods¶
| Method | Description |
|---|---|
query(css_selector) |
CSS selector query |
query_builder() |
Start fluent query builder |
find(type=None, **kwargs) |
Find first matching node |
find_all(type=None, limit=None, **kwargs) |
Find all matching nodes |
find_related(node_id, relationship=None) |
Find related nodes |
find_blocking(node_id) |
Find nodes this blocks |
find_blocked_by(node_id) |
Find nodes blocking this |
Edge Index¶
| Method | Description |
|---|---|
get_incoming_edges(node_id, relationship=None) |
Edges pointing to node |
get_outgoing_edges(node_id, relationship=None) |
Edges from node |
get_neighbors(node_id) |
All connected node IDs |
Traversal¶
| Method | Description |
|---|---|
ancestors(node_id, relationship, max_depth) |
All ancestor nodes |
descendants(node_id, relationship, max_depth) |
All descendant nodes |
transitive_deps(node_id) |
All transitive dependencies |
shortest_path(from_id, to_id) |
Shortest path between nodes |
all_paths(from_id, to_id, relationship, max_length) |
All paths between nodes |
subgraph(node_ids, include_edges) |
Extract subgraph |
connected_component(node_id, relationship) |
Connected component |
Analysis¶
| Method | Description |
|---|---|
find_bottlenecks() |
Nodes blocking many others |
find_critical_path() |
Longest dependency chain |
dependents(node_id) |
Direct dependents |
See Also¶
- Query Cookbook - Comprehensive query examples
- Migration Guide - Migrating from CSS-only queries
- QueryBuilder - QueryBuilder API reference
- FindAPI - Find API reference