sovereign ai agents

Causal Graphs as the Missing Layer: Bridging Context Graphs, Decision Traces, and Semantic Spacetime

VP(WP) January 12, 2026

The convergence of three architectural patterns — causal knowledge graphs (prioritizing cause-effect relationships), context graphs (capturing decision provenance), and semantic spacetime (modeling temporal-relational knowledge) — reveals the next evolution in AI memory systems. Recent research from Luo et al. (2025) demonstrates that filtering knowledge graphs to emphasize causal edges yields 10% accuracy improvements in medical reasoning tasks. When combined with Foundation Capital’s context graph thesis and the temporal-relational modeling of semantic spacetime, a clear architecture emerges for building AI systems that don’t just retrieve facts — they trace why decisions happened and how knowledge flows through time.

The Core Problem — Correlation Masquerading as Causation

What Traditional Knowledge Graphs Get Wrong

Knowledge graphs excel at modeling what exists and how things relate, but they fundamentally conflate two distinct relationship types:

When an LLM retrieves from a traditional knowledge graph, it receives a massive subgraph mixing both types. The retrieval system cannot distinguish between:

Gene_Y --ASSOCIATED_WITH--> Disease_XGene_Y --CAUSES--> 
Protein_Dysfunction --LEADS_TO--> Disease_X

The first edge is correlation noise. The second is a causal mechanism. But standard graph traversal treats both as equally valid paths.

The Medical Reasoning Benchmark Reveals the Gap

Luo et al. tested this hypothesis on MedMCQA and MedQA datasets using SemMedDB (a medical knowledge graph with 94+ million edges). Their key insight:

When you filter a knowledge graph to retain only edges with explicit causal significance, then align retrieval with the LLM’s chain-of-thought reasoning steps, accuracy jumps by 10 percentage points.

Their CGMT (Causal Graphs Meet Thoughts) pipeline works in three stages:

1. Causal Subgraph Construction   
└── Scan KG, score edges by causality function f(r)   
└── Discard edges where f(r) < threshold θ   
└── Result: GC = filtered graph with only cause-effect edges

2. CoT-Driven Stepwise Retrieval     
└── LLM generates chain-of-thought: "S₁ → S₂ → S₃"   
└── For each step Si, extract entities E(Si)   
└── Query GC for paths connecting E(Si) to E(Si+1)   
└── Fallback to full KG if no causal path exists3. 
Path Scoring & Re-injection   
└── Score paths: α·CUI_overlap + β·semantic_overlap + γ·length_penalty   └── Merge similar paths, prune loops   └── Re-inject top paths + original CoT into LLM for synthesis

Critical result: GPT-4o on MedMCQA achieved 92.90% precision with causal filtering versus 85.52% with direct inference — a 7.38 point gain from graph structure alone.

Why This Matters for Context Graphs

Foundation Capital’s “context graph” thesis argues that enterprises lose decision context — the reasoning chains that connect data to actions. When a sales agent approves a 20% discount (violating policy), the CRM stores the outcome but discards:

The causal graph research provides the filtering mechanism that context graphs need. Not all edges in a decision trace have equal explanatory power. The challenge is identifying which edges represent cause (input that drove the decision) versus correlation (data that happened to be present).

Decision Traces Are Reified Causal Chains

Reification: Making Reasoning Visible

TrustGraph’s Daniel Davis correctly argues that “decision trace” is a misnomer — computers don’t “decide” in the epistemological sense. What these systems actually capture is reification: representing statements about statements.

In RDF 1.2 syntax (December 2024 release):

<<Agent_Sales_001 :approved :Discount_20pct>> :timestamp "2025-01-12T14:32Z" ;                                               :causedBy :ChurnRisk_High ;                                               :overridesPolicy :MaxDiscount_10pct ;                                               :authorizedBy :VP_Sales .

This is a second-order statement — a claim about a claim. The discount approval (first-order) is wrapped in metadata explaining why it occurred (second-order causation).

Connecting to W3C PROV Ontology

The W3C PROV-O standard (2013) already provides the formal framework:

:discount_approval a prov:Activity ;  
  prov:used :customer_incident_history ;    
prov:used :churn_risk_score ;    
prov:wasAssociatedWith :agent_001 ;    
prov:wasInfluencedBy :policy_exception_rule .

:customer_incident_history a prov:Entity ;    
prov:wasGeneratedBy :pagerduty_query ;    
prov:wasDerivedFrom :sev1_incident_001, 
:sev1_incident_002, :sev1_incident_003 .

The causal graph filtering from Luo et al. maps directly to PROV’s wasInfluencedBy relationships—these are directional, explanatory edges rather than mere associations.

The Epistemology Connection

For AI systems to produce justified beliefs (not just correlated outputs), they need:

The CGMT pipeline provides #1 through edge filtering. Context graphs add #2 through bi-temporal modeling. Your epistemology layer work provides #3 through belief strength propagation.

Semantic Spacetime as the Unifying Framework

The Four Fundamental Relations Revisited

Your Semantic Spacetime framework defines four primitive relationships that all knowledge graphs ultimately reduce to:

Key insight: Causal edges are a specialized form of LEADS_TO with mechanistic grounding.

When Luo et al. filter for causality using Causality(r) = f(r), they're implicitly scoring how strongly an edge belongs to the LEADS_TO category versus NEAR/SIMILAR_TO.

Compare these edge types:

# NEAR/SIMILAR_TO 
(correlation, low causal weight)Gene_Y --CO_OCCURS_WITH--> Disease_X     
 f(r) = 0.2

# LEADS_TO with weak causal mechanism 
 Gene_Y --ASSOCIATED_WITH--> Disease_X    
 f(r) = 0.4# LEADS_TO with strong causal mechanismGene_Y --CAUSES--> Pathway_Z --RESULTS_IN--> Disease_X    f(r) = 0.9

The causality function f(r) is essentially measuring directional strength of the LEADS_TO relation.

Temporal Validity: When Does Causation Hold?

Context graphs add bi-temporal modeling:

This maps to semantic spacetime’s temporal dimension. A causal edge like:

Smoking --CAUSES--> Lung_Cancer [causal_strength: 0.85]

Actually requires temporal bounds:

<<:smoking :LEADS_TO :lung_cancer>>     :valid_from "1950-01-01"^^xsd:date ;    # When medical consensus formed    :causal_strength 0.85 ;    :mechanism :chronic_inflammation ;    :latency_period "20-30 years" .

The latency_period annotation is critical — causation in semantic spacetime isn’t instantaneous. LEADS_TO edges have temporal extent.

Bringing It Together: The Four-Layer Architecture

Combining these three frameworks yields a unified stack:

┌─────────────────────────────────────────────┐
│   Layer 4:
 Synthetic Reasoning Layer        ││   
(LLM chain-of-thought outputs)      
│├─────────────────────────────────────────────┤│  
 Layer 3: Causal Knowledge Graph           
││   (Filtered for cause-effect edges)         ││   
• Causality scoring: f(r) ≥ θ             ││   
• Maps to LEADS_TO relations              
│├─────────────────────────────────────────────┤│   
Layer 2: Context Graph / Decision Traces  ││  
 (Bi-temporal provenance)                  ││   
• PROV-O: Entity-Activity-Agent           ││   
• Valid time + Transaction time           ││   
• Reified statements about reasoning      
│├─────────────────────────────────────────────┤│   
Layer 1: Semantic Spacetime Foundation    ││  
 (Four fundamental relations)              ││   
• NEAR/SIMILAR_TO: Correlation space      ││   
• LEADS_TO: Causal & temporal chains      ││   
• CONTAINS: Hierarchical structure        ││   
• EXPRESSES_PROPERTY: Attribution         
│└─────────────────────────────────────────────┘

Data flow:

Implementation Architecture & Technical Patterns

Causal Edge Scoring in Practice

The Causality(r) function from the paper can be implemented as a relation type hierarchy with pre-assigned weights:

CAUSAL_WEIGHTS = {   
 # Strong causal (LEADS_TO with mechanism)    
"CAUSES": 1.0,    "RESULTS_IN": 0.95,    
"MANIFESTATION_OF": 0.9,    
"INDUCES": 0.85,        
# Moderate causal (LEADS_TO without full mechanism)   
 "ASSOCIATED_WITH": 0.4,    "PREDISPOSES": 0.5,   
 "COMPLICATES": 0.45,        
# Weak/correlation (NEAR/SIMILAR_TO)    
"CO_OCCURS_WITH": 0.2,    "RELATED_TO": 0.15,    "LOCATION_OF": 0.1,        
# Structural (CONTAINS, EXPRESSES_PROPERTY)    "HAS_PART": 0.0,  # Not causal    "PROPERTY_OF": 0.0,}

def filter_causal_subgraph(kg, threshold=0.5):   
 return {       
 (u, r, v) for (u, r, v) in kg.edges       
 if CAUSAL_WEIGHTS.get(r, 0.0) >= threshold    
}

This trivially maps to Cypher queries in Neo4j:

// Build causal subgraph 
viewMATCH (u)-[r]->(v)WHERE r.causal_weight >= 0.5RETURN u,
 r, v

CoT-Aligned Retrieval Pattern

The CGMT paper’s stepwise retrieval aligns perfectly with agentic workflows:

def cot_causal_retrieval(query, llm, causal_graph):   
 # Stage 1: Generate chain-of-thought    
cot_prompt = f"Break down this query into reasoning steps:\n{query}"    
cot = llm.generate(cot_prompt)    
steps = parse_cot_steps(cot)  
# ["S1 → S2 → S3"]        
# Stage 2: Stepwise entity extraction + path finding    all_paths = []    for i, (step_i, step_j) in enumerate(zip(steps[:-1], steps[1:])):        entities_i = extract_entities(step_i)        entities_j = extract_entities(step_j)                # Query causal subgraph first        paths = find_causal_paths(            causal_graph,             source=entities_i,             target=entities_j,            max_hops=3        )                # Fallback to full KG if no causal path        if not paths:            paths = find_any_paths(full_kg, entities_i, entities_j)                all_paths.extend(paths)        # Stage 3: Score and re-inject    scored_paths = score_paths(all_paths, query)    top_paths = heapq.nlargest(5, scored_paths, key=lambda p: p.score)        final_prompt = f"""    Original query: {query}    Chain of thought: {cot}    Relevant knowledge paths: {serialize_paths(top_paths)}        Synthesize a final answer using only the provided paths as evidence.    """    return llm.generate(final_prompt)

Path Scoring with Semantic Spacetime Awareness

The paper’s scoring function:

TotalScore(p) = α·CUI_overlap + β·semantic_overlap + 
γ·length_penalty

Can be enhanced with relation-type awareness from semantic spacetime:

def score_path_semantic_spacetime(path, query):
    # Original scoring components    
cui_overlap = compute_cui_overlap(path, query)    
semantic_overlap = compute_embedding_similarity(path, query)    
length_penalty = 1 / (1 + len(path))        
# NEW: Relation type scoring    
relation_score = 0    for edge in path:        
if edge.relation in ["CAUSES", "RESULTS_IN", "LEADS_TO"]:
            relation_score += edge.causal_weight * 1.0  
# Strongly prefer causal        
elif edge.relation in ["CONTAINS", "HAS_PART"]:     
       relation_score += 0.3  # Structural context is useful
        elif edge.relation in ["SIMILAR_TO", "RELATED_TO"]:  
          relation_score += 0.1  # Correlation is weak evidence       
 relation_score /= len(path)  
# Normalize        
# NEW: Temporal validity scoring   
 temporal_score = 0    
current_time = datetime.now()   
 for edge in path:        
if hasattr(edge, 't_valid') and hasattr(edge, 't_invalid'):
            if edge.t_valid <= current_time < edge.t_invalid:                
temporal_score += 1.0  # Edge is currently valid        
    else:                temporal_score += 0.2  # Historical edge, less relevant     
   else:            temporal_score += 0.5  # No temporal bounds = assume valid      
  temporal_score /= len(path)        # Combined scoring with semantic spacetime awareness  
  return (        0.25 * cui_overlap +        0.20 * semantic_overlap +        0.15 * length_penalty +       
 0.30 * relation_score +      
# Prioritize causal chains        
0.10 * temporal_score         
# Prefer current knowledge    
)

Bi-Temporal Context Graph Schema

Combining PROV-O with bi-temporal modeling:

from dataclasses import dataclassfrom datetime 
import datetime

@dataclassclass TemporalCausalEdge:   
 """    A causal edge with bi-temporal validity tracking.    Maps to both PROV-O and semantic spacetime frameworks.    """    source: str              # Source entity    target: str              # Target entity    relation: str            # LEADS_TO, CAUSES, etc.    causal_weight: float     # Causality strength f(r)        # Bi-temporal tracking    t_valid_start: datetime  # When relationship became true    t_valid_end: datetime    # When relationship ceased being true    t_transaction: datetime  # When system recorded this edge    t_expired: datetime      # When record was marked invalid        # Provenance (PROV-O)    generated_by: str        # prov:Activity that created this edge    derived_from: List[str]  # prov:Entity sources    attributed_to: str       # prov:Agent responsible        # Semantic spacetime metadata    mechanism: Optional[str] # Causal mechanism description    confidence: float        # Belief strength (0-1)    evidence: List[str]      # Supporting entity IDs# Query pattern for "what did we know at time T?"def query_knowledge_at_time(kg, query_entities, as_of_date):    """    Reconstruct knowledge state as of historical date.    Uses transaction time to determine what was recorded by then.    """    return [        edge for edge in kg.edges        if edge.t_transaction <= as_of_date        and edge.t_valid_start <= as_of_date < edge.t_valid_end        and edge.source in query_entities    ]

The Hype vs. Substance Assessment

What’s Genuinely New

What’s Repackaged

What’s Actually Hard

The paper downplays three major challenges:

1. Causal weight estimation

The causality function f(r) requires either:

No perfect solution exists. The paper sidesteps this by using SemMedDB’s pre-existing relation types.

2. CoT instability

The paper acknowledges: “CoT outlines can vary under identical prompts, leading to contradictory intermediate states.” This is a killer problem for production systems. If retrieval depends on CoT parsing, and CoT is stochastic, you get non-deterministic results for the same query.

3. Knowledge graph completeness

The paper admits: “Certain clinically relevant edges may be missing, forcing fallback retrieval from correlation-based links.” In practice, causal subgraphs will have massive coverage gaps. The fallback mechanism undermines the core thesis.

Synthesis — A Unified Model

Mapping the Three Frameworks

Concept Causal Graphs (Luo et al.) Context Graphs (Foundation Capital) Semantic Spacetime (Volodia) Core Problem Correlation noise drowns causal signal Decision context is lost post-hoc Need temporal-relational primitives Solution Filter KG for cause-effect edges Capture provenance at decision time Four fundamental relation types Data Structure Weighted directed graph G_C Bi-temporal triple store 4D manifold: entities × relations × time × confidence Key Operation Causality scoring f(r) ≥ θ Reification: <<S P O>> metadata Projection onto LEADS_TO subspace Retrieval Pattern CoT-driven stepwise queries Query by decision event + time range Navigate relation-type-filtered paths Epistemology Causal inference (Pearl) Provenance (PROV-O) Justified belief propagation

The Complete Architecture

User Query    
↓┌───────────────────────────────────────┐
│ 1. Chain-of-Thought Generation        
││    └─ LLM produces reasoning steps    │
└───────────────────────────────────────┘    
↓┌───────────────────────────────────────┐
│ 2. Semantic Spacetime Query Planning  ││    
└─ Map CoT to relation types:      ││       
"Why?" → LEADS_TO filter        ││       
"What contains?" → CONTAINS     ││      
 "Similar to?" → NEAR/SIMILAR_TO │
└───────────────────────────────────────┘    
↓┌───────────────────────────────────────┐
│ 3. Causal Subgraph Retrieval          ││   
 └─ Filter: f(r) ≥ threshold        ││    
└─ Find paths connecting CoT steps ││    
└─ Fallback to full KG if needed   │
└───────────────────────────────────────┘    
↓┌───────────────────────────────────────┐
│ 4. Context Graph Validation           ││ 
   └─ Check bi-temporal validity      ││    
└─ Verify provenance chain         ││    
└─ Score by temporal recency       │
└───────────────────────────────────────┘    
↓┌───────────────────────────────────────┐
│ 5. Path Scoring & Synthesis           ││ 
   └─ Multi-factor scoring:           ││     
  α·overlap + β·semantic +        ││       
γ·length + δ·causal_weight +    ││      
 ε·temporal_validity             │
└───────────────────────────────────────┘    
↓┌───────────────────────────────────────┐
│ 6. LLM Re-injection & Final Answer    ││    
└─ Combine: query + CoT + paths    ││    
└─ Generate: justified response    ││   
 └─ Annotate: confidence + sources  │
└───────────────────────────────────────┘

Production Implementation Sketch

class UnifiedCausalContextGraph:   
 """    Combines causal graph filtering (Luo et al.),  
  context graph provenance (Foundation Capital),    
and semantic spacetime relations (Volodia).   
 """        def __init__(self, neo4j_uri, llm):    
    self.graph = Neo4jGraph(neo4j_uri)       
 self.llm = llm               
 # Precompute causal subgraph       
 self.causal_view = self.graph.query("""      
      MATCH (u)-[r]->(v)          
  WHERE r.causal_weight >= 0.5        
    RETURN u, r, v        """)        

def query(self, user_query, as_of_time=None):     
   # Stage 1: Generate CoT        
cot = self.llm.generate_cot(user_query)    
    steps = self.parse_cot(cot)               
 # Stage 2: Map CoT steps to semantic spacetime relation types   
     relation_filters = []        
for step in steps:            
if "why" in step.lower() or "cause" in step.lower():
                relation_filters.append("LEADS_TO")
            elif "what" in step.lower():            
    relation_filters.append(["CONTAINS", "EXPRESSES_PROPERTY"])           
 else:                
relation_filters.append(None) 
 # No filter              
  # Stage 3: Stepwise causal retrieval        
all_paths = []        
for i, (step_i, step_j) in 
enumerate(zip(steps[:-1], steps[1:])):           
 entities_i = self.extract_entities(step_i)            
entities_j = self.extract_entities(step_j)           
             # Build cypher query with relation type filter
            rel_filter = relation_filters[i]           
 if rel_filter:                
type_constraint = f"WHERE type(r) IN {rel_filter}"     
       else:                type_constraint = ""         
               paths = self.graph.query(f"""       
         MATCH path = (u)-[r*1..3]->(v)              
  WHERE u.id IN $source_ids                
  AND v.id IN $target_ids                  
AND r.causal_weight >= 0.5                  
{type_constraint}              
  RETURN path               
 LIMIT 10            """,
 source_ids=entities_i, target_ids=entities_j)  
                      all_paths.extend(paths)    
            # Stage 4: Context graph temporal filtering  
      if as_of_time:            all_paths = [       
         p for p in all_paths               
 if all(                    
edge.t_valid_start <= as_of_time < edge.t_valid_end    
                and edge.t_transaction <= as_of_time     
               for edge in p                )   
         ]              
  # Stage 5: Multi-dimensional path scoring 
       scored_paths = [            
(path, self.score_path(path, user_query))        
    for path in all_paths        ]        
top_paths = heapq.nlargest(5, scored_paths, key=lambda x: x[1])         
       # Stage 6: LLM synthesis with provenance        
context = self.serialize_paths_with_provenance(top_paths)
        final_prompt = f"""        Query: {user_query}    
    Reasoning trace: {cot}       
 Supporting evidence: {context}            
    Synthesize an answer. For each claim, cite the supporting path ID.
        """                
answer = self.llm.generate(final_prompt)     
   return {         
   "answer": answer,    
        "cot": cot,           
 "evidence_paths": top_paths,        
    "as_of_time": as_of_time,        }  
  def score_path(self, path, query):        
"""Multi-factor scoring per semantic spacetime framework.""" 
       # Entity overlap (CUI matching)        
entity_score = self.compute_entity_overlap(path, query)  
              # Semantic similarity (embedding distance)   
     semantic_score = self.compute_semantic_similarity(path, query)        
        # Length penalty (prefer shorter paths)        
length_score = 1 / (1 + len(path))               
 # Causal weight (prefer LEADS_TO over NEAR/SIMILAR_TO)    
    causal_score = sum(e.causal_weight for e in path) / len(path)         
       # Temporal validity (prefer current knowledge)
        now = datetime.now()       
 temporal_score = sum(           
 1.0 if e.t_valid_start <= now < e.t_valid_end else 0.2    
        for e in path        ) / len(path)               
 return (            0.20 * entity_score +           
 0.15 * semantic_score +            0.10 * length_score +  
          0.40 * causal_score +      # Highest weight       
     0.15 * temporal_score        )

Open Questions & Research Directions

Automated Causal Weight Estimation

Problem: Manual annotation doesn’t scale. Automated causal discovery (PC algorithm, etc.) assumes:

Medical KGs violate all three. Research direction: Can LLMs reliably score causal strength from relation type + entity context?

# Experiment: LLM-based causal scoringdef llm_estimate_causality(source, relation, target, context):    prompt = f"""    Given: {source} --{relation}--> {target}    Context: {context}        On a scale 0-1, how strong is the causal relationship?    0 = Pure correlation/co-occurrence    1 = Direct mechanistic causation        Score:    """    score = llm.generate(prompt)    return float(score)

Validation needed: Compare LLM scores to expert-annotated medical literature.

CoT Stabilization for Deterministic Retrieval

Problem: Stochastic CoT → non-deterministic retrieval → unreliable production systems.

Potential solutions:

Research direction: Benchmark CoT variance across different models and prompt strategies.

Ontology Alignment: Semantic Spacetime → Domain KGs

Problem: Medical KGs use domain relations (TREATS, DIAGNOSES). How do these map to the four semantic spacetime primitives?

TREATS: Drug → Disease  → LEADS_TO? (Drug causes symptom reduction)  → NEAR/SIMILAR_TO? (Drug and disease co-occur in treatment contexts)

DIAGNOSES: Symptom → Disease    → LEADS_TO? (Symptom is caused by disease)  → EXPRESSES_PROPERTY? (Symptom is a manifestation of disease)

Research direction: Build explicit mapping functions from domain ontologies to semantic spacetime.

Provenance Chain Compression

Problem: Reifying every decision edge creates graph explosion. A single agent action might generate 100+ provenance triples.

Example:

:action_123 a prov:Activity ;    prov:used :input_1, :input_2, ..., :input_50 ;    prov:wasAssociatedWith :agent_X ;    prov:wasInfluencedBy :rule_A, :rule_B, ..., :rule_Z .

Research direction: Develop provenance summarization techniques that preserve causal chain fidelity while reducing storage overhead.

Multi-Agent Causal Attribution

Problem: In agent collaboration, decisions emerge from interaction. How do you attribute causality when multiple agents contribute?

Agent_A suggests Action_X (confidence: 0.7)Agent_B critiques (confidence: 0.4)  Agent_C approves modified Action_X' (confidence: 0.9)

Which edge is causal?

Research direction: Extend PROV-O with multi-agent attribution patterns.

Conclusion: The Path Forward

The convergence of causal graphs, context graphs, and semantic spacetime reveals a coherent architecture for next-generation AI memory systems:

For practitioners:

The honest assessment: This is solid integration engineering, not revolutionary invention. The academic novelty is validating that causal prioritization + CoT alignment improves accuracy. The engineering novelty is packaging three established patterns (temporal KGs, provenance tracking, causal inference) into a coherent stack optimized for LLM retrieval.

The “trillion-dollar opportunity” framing is venture theater. The real value is making provenance infrastructure accessible for the agentic era — ensuring AI systems don’t just answer questions but can justify their reasoning with auditable causal chains.