MemState
Data modelTopics, fields, and links.
MemState stores what the system has learned from agent observations as a graph of topics. Each topic is one self-contained unit of knowledge — its own fields, its own history, its own embedding, and its own links to other topics — shaped by MemState policy, not by hand-authored graph edits in the agent loop.
Memory has three parts
Every MemState deployment keeps three things in sync. Together they define what context retrieval returns and how new observations are absorbed.
Content
The topics themselves — the things the agent remembers, each with its own fields, summary, and embedding.
Structure
Typed links between topics, plus field-level references that let retrieval expand context.
Policies
The rules that govern every change: history limits, salience thresholds, and archival behavior.
Formal state (GEM lens)
MemState is easiest to reason about if you treat it as a concrete Governed Evolving Memory (GEM)
for agents: one service absorbs observations, answers context questions, and runs maintenance so the store
stays truthful and bounded. At each logical time t, write the deployment state as a triple
Mt = (Dt, St, Pt).
-
Dt — Durable graph. What lives in the embedded Kuzu database:
topic nodes (identity, summary, kind, salience, embedding,
fields_jsonwith per-field revision stacks),RELATEDedges with akindstring, and archival metadata. -
St — Retrieval-facing semantics. The executor’s view of
Dtwhen answering a query: semantic candidate selection over stored embeddings (which topics to hydrate), optional structural expansion along edges and field references, optional temporal expansion over field histories, and salience bumps on topics that were actually used. -
Pt — Policies and limits. Tunable governance in the
Policiesmodel (history caps, salience thresholds, scan limits, embedding width) plus environment-driven settings. In code,Ptis the subset of the full research policy lattice that the reference build materializes today (see Limits and configuration).
ui: one vertex carrying its own scalar metadata,
one embedding over title and summary text, typed fields whose values are versioned over time, and graph
incidence (typed RELATED edges plus optional ref_topic_id on field revisions).
Retrieval returns a bundle for each selected unit so callers do not have to stitch partial rows.
Mt = (Dt, St, Pt) separates what is stored
durably (Dt), how queries are allowed to read and narrow it (St),
and which thresholds and caps constrain change (Pt). In the reference service,
St is not a second database; it is the query path over the same graph with explicit
stages (semantic, structural, temporal) controlled by the request and by Pt.
Mt: ingest (observations → graph writes),
retrieve (natural-language question → structured context from the relevant topics), revise
(background consolidation such as duplicate merges), and forget (non-destructive
attenuation such as salience-based archival). The developer pages
Ingest, Query,
Revision, and Forget
document each path against the current codebase.
Why a graph, and why not just a graph
MemState is built on an embedded property graph (Kuzu) because typed nodes and typed edges are the cleanest way to express what a topic is and how it relates to others. But a plain graph is not enough on its own for agent memory.
What the topic graph is
A topic graph is not a free-form sketch of entities. It is a graph whose vertices
are topics — each vertex already bundles identity, fields (with history), summary, salience,
and embedding — and whose edges are typed links between those bundles. A separate
mechanism, field references (ref_topic_id), points from a field value to
another topic so the pointer travels with the field. Retrieval can walk edges, follow refs, or use
embeddings on the same store to choose candidates when the semantic stage is on.
The topic graph is the property graph formed by topic vertices and RELATED edges (here: an
extension and an association). Field references add a second way to point at a topic without duplicating
the bundle; see Relationships for modeling rules.
| Classical model | What's missing for agent memory |
|---|---|
| Plain property graph | No history on attributes. A change overwrites the old value instead of recording when it changed. |
| Vector database | Ranks by similarity only. Has no concept of freshness, supersession, or structural neighbors. |
| Document store | Every write tends to overwrite the whole document. Cross-document relationships are flattened out. |
| Temporal database | Keeps past states but doesn't know when a past value was corrected versus merely changed. |
MemState adds three things on top of the graph substrate:
- Topic nodes carry full context. Title, summary, fields, embedding, salience, and links all live on one node. One read returns the full picture of one thing.
- Every field keeps its history. When a value changes, the previous value is preserved with its timestamp and source, so retrieval can distinguish the current truth from what used to be true.
- Embeddings live on the topic node. The same record feeds optional semantic candidate selection and graph-backed expansion — no drift between a separate vector service and the graph store.
Anatomy of a topic
A topic is the boundary of what MemState stores as one indexed, versioned unit. Everything you need to know about one subject lives on one node.
A topic carries its identity, its versioned fields, its importance, its embedding, and its links — all on one node.
Topic versus entity: where to draw the line
An entity is anything the agent talks about — a person, a paper, a task, a concept. A topic is how the agent stores one of those as a first-class unit of knowledge. Not every entity needs to be its own topic.
| Situation | What to do |
|---|---|
| The entity is small, local to one topic's narrative, and not reused elsewhere. | Keep it inside the parent topic as a field value (string, list, or JSON). |
| The entity grows complex, starts being referenced from other topics, or needs its own history. | Promote it to its own topic. Link from the parent with an extension or association edge, or from a field with a reference. |
| Two things share real-world meaning but have independent lifecycles. | Keep them as separate topics and connect them with an association link. |
Topic record
The physical shape of a topic node on disk.
| Property | Type | Role |
|---|---|---|
id |
STRING (primary key) | Stable identity used by APIs, links, and field references. |
title, summary |
STRING | Human-readable metadata and the default source text for the embedding. |
topic_kind |
STRING | Optional classification label for filtering or workflow grouping. |
salience, failed_salience |
DOUBLE | Importance and failure signals used by retrieval reinforcement and forgetting. |
fields_json |
STRING (JSON) | Typed field map with per-field history and optional field-level references. |
topic_history_json |
STRING (JSON) | Append-only audit trail of topic-level events. |
embedding |
DOUBLE[] | Semantic search vector. |
archived |
BOOLEAN | Excludes the topic from default retrieval candidates without deleting it. |
created_at, updated_at |
STRING (ISO timestamp) | Temporal metadata. |
Modeling principles
Use scalar columns for hot paths
Properties you filter, tune, or maintain on (salience, archived, timestamps) stay as top-level columns.
Use fields for domain attributes
Anything domain-specific that evolves — owner, status, location, metrics — belongs in fields_json with full history.
Use links for relationships
Typed edges for structural relationships; field-level references for attribute-scoped pointers.
Start simple, split later
Keep small entities inside a parent topic. Only promote them when they earn standalone identity.