MemState

Data model

Topics, fields, and links.

MemState stores what the system has learned from agent observations as a graph of topics. Each topic is one self-contained unit of knowledge — its own fields, its own history, its own embedding, and its own links to other topics — shaped by MemState policy, not by hand-authored graph edits in the agent loop.

Memory has three parts

Every MemState deployment keeps three things in sync. Together they define what context retrieval returns and how new observations are absorbed.

🗂

Content

The topics themselves — the things the agent remembers, each with its own fields, summary, and embedding.

🔗

Structure

Typed links between topics, plus field-level references that let retrieval expand context.

⚙

Policies

The rules that govern every change: history limits, salience thresholds, and archival behavior.

Formal state (GEM lens)

MemState is easiest to reason about if you treat it as a concrete Governed Evolving Memory (GEM) for agents: one service absorbs observations, answers context questions, and runs maintenance so the store stays truthful and bounded. At each logical time t, write the deployment state as a triple M_t = (D_t, S_t, P_t).

D_t — Durable graph. What lives in the embedded Kuzu database: topic nodes (identity, summary, kind, salience, embedding, fields_json with per-field revision stacks), RELATED edges with a kind string, and archival metadata.
S_t — Retrieval-facing semantics. The executor’s view of D_t when answering a query: semantic candidate selection over stored embeddings (which topics to hydrate), optional structural expansion along edges and field references, optional temporal expansion over field histories, and salience bumps on topics that were actually used.
P_t — Policies and limits. Tunable governance in the Policies model (history caps, salience thresholds, scan limits, embedding width) plus environment-driven settings. In code, P_t is the subset of the full research policy lattice that the reference build materializes today (see Limits and configuration).

Definition 1 Semantic unit (topic). A topic is an indexed semantic unit u_i: one vertex carrying its own scalar metadata, one embedding over title and summary text, typed fields whose values are versioned over time, and graph incidence (typed RELATED edges plus optional ref_topic_id on field revisions). Retrieval returns a bundle for each selected unit so callers do not have to stitch partial rows.

Definition 2 Memory state decomposition. The triple M_t = (D_t, S_t, P_t) separates what is stored durably (D_t), how queries are allowed to read and narrow it (S_t), and which thresholds and caps constrain change (P_t). In the reference service, S_t is not a second database; it is the query path over the same graph with explicit stages (semantic, structural, temporal) controlled by the request and by P_t.

Definition 3 Operators on memory. Four operators advance M_t: ingest (observations → graph writes), retrieve (natural-language question → structured context from the relevant topics), revise (background consolidation such as duplicate merges), and forget (non-destructive attenuation such as salience-based archival). The developer pages Ingest, Query, Revision, and Forget document each path against the current codebase.

Why a graph, and why not just a graph

MemState is built on an embedded property graph (Kuzu) because typed nodes and typed edges are the cleanest way to express what a topic is and how it relates to others. But a plain graph is not enough on its own for agent memory.

What the topic graph is

A topic graph is not a free-form sketch of entities. It is a graph whose vertices are topics — each vertex already bundles identity, fields (with history), summary, salience, and embedding — and whose edges are typed links between those bundles. A separate mechanism, field references (ref_topic_id), points from a field value to another topic so the pointer travels with the field. Retrieval can walk edges, follow refs, or use embeddings on the same store to choose candidates when the semantic stage is on.

The topic graph is the property graph formed by topic vertices and RELATED edges (here: an extension and an association). Field references add a second way to point at a topic without duplicating the bundle; see Relationships for modeling rules.

Classical model	What's missing for agent memory
Plain property graph	No history on attributes. A change overwrites the old value instead of recording when it changed.
Vector database	Ranks by similarity only. Has no concept of freshness, supersession, or structural neighbors.
Document store	Every write tends to overwrite the whole document. Cross-document relationships are flattened out.
Temporal database	Keeps past states but doesn't know when a past value was corrected versus merely changed.

MemState adds three things on top of the graph substrate:

Topic nodes carry full context. Title, summary, fields, embedding, salience, and links all live on one node. One read returns the full picture of one thing.
Every field keeps its history. When a value changes, the previous value is preserved with its timestamp and source, so retrieval can distinguish the current truth from what used to be true.
Embeddings live on the topic node. The same record feeds optional semantic candidate selection and graph-backed expansion — no drift between a separate vector service and the graph store.

Anatomy of a topic

A topic is the boundary of what MemState stores as one indexed, versioned unit. Everything you need to know about one subject lives on one node.

A topic carries its identity, its versioned fields, its importance, its embedding, and its links — all on one node.

Topic versus entity: where to draw the line

An entity is anything the agent talks about — a person, a paper, a task, a concept. A topic is how the agent stores one of those as a first-class unit of knowledge. Not every entity needs to be its own topic.

Situation	What to do
The entity is small, local to one topic's narrative, and not reused elsewhere.	Keep it inside the parent topic as a field value (string, list, or JSON).
The entity grows complex, starts being referenced from other topics, or needs its own history.	Promote it to its own topic. Link from the parent with an extension or association edge, or from a field with a reference.
Two things share real-world meaning but have independent lifecycles.	Keep them as separate topics and connect them with an association link.

Topic record

The physical shape of a topic node on disk.

Property	Type	Role
`id`	STRING (primary key)	Stable identity used by APIs, links, and field references.
`title`, `summary`	STRING	Human-readable metadata and the default source text for the embedding.
`topic_kind`	STRING	Optional classification label for filtering or workflow grouping.
`salience`, `failed_salience`	DOUBLE	Importance and failure signals used by retrieval reinforcement and forgetting.
`fields_json`	STRING (JSON)	Typed field map with per-field history and optional field-level references.
`topic_history_json`	STRING (JSON)	Append-only audit trail of topic-level events.
`embedding`	DOUBLE[]	Semantic search vector.
`archived`	BOOLEAN	Excludes the topic from default retrieval candidates without deleting it.
`created_at`, `updated_at`	STRING (ISO timestamp)	Temporal metadata.

There are two distinct histories on a topic. Topic history logs node-level events like creation, salience changes, and embedding refreshes. Field history logs each value change on each field.

Modeling principles

✓

Use scalar columns for hot paths

Properties you filter, tune, or maintain on (salience, archived, timestamps) stay as top-level columns.

✓

Use fields for domain attributes

Anything domain-specific that evolves — owner, status, location, metrics — belongs in fields_json with full history.

✓

Use links for relationships

Typed edges for structural relationships; field-level references for attribute-scoped pointers.

✓

Start simple, split later

Keep small entities inside a parent topic. Only promote them when they earn standalone identity.

Topics, fields, and links.

Memory has three parts

Content

Structure

Policies

Formal state (GEM lens)

Why a graph, and why not just a graph

What the topic graph is

Anatomy of a topic

Topic versus entity: where to draw the line

Topic record

Modeling principles

Use scalar columns for hot paths

Use fields for domain attributes

Use links for relationships

Start simple, split later

Next