MemState

Data model

Topics, fields, and links.

MemState stores what the system has learned from agent observations as a graph of topics. Each topic is one self-contained unit of knowledge — its own fields, its own history, its own embedding, and its own links to other topics — shaped by MemState policy, not by hand-authored graph edits in the agent loop.

Memory has three parts

Every MemState deployment keeps three things in sync. Together they define what context retrieval returns and how new observations are absorbed.

🗂

Content

The topics themselves — the things the agent remembers, each with its own fields, summary, and embedding.

🔗

Structure

Typed links between topics, plus field-level references that let retrieval expand context.

Policies

The rules that govern every change: history limits, salience thresholds, and archival behavior.

Formal state (GEM lens)

MemState is easiest to reason about if you treat it as a concrete Governed Evolving Memory (GEM) for agents: one service absorbs observations, answers context questions, and runs maintenance so the store stays truthful and bounded. At each logical time t, write the deployment state as a triple Mt = (Dt, St, Pt).

  • DtDurable graph. What lives in the embedded Kuzu database: topic nodes (identity, summary, kind, salience, embedding, fields_json with per-field revision stacks), RELATED edges with a kind string, and archival metadata.
  • StRetrieval-facing semantics. The executor’s view of Dt when answering a query: semantic candidate selection over stored embeddings (which topics to hydrate), optional structural expansion along edges and field references, optional temporal expansion over field histories, and salience bumps on topics that were actually used.
  • PtPolicies and limits. Tunable governance in the Policies model (history caps, salience thresholds, scan limits, embedding width) plus environment-driven settings. In code, Pt is the subset of the full research policy lattice that the reference build materializes today (see Limits and configuration).
Definition 1 Semantic unit (topic). A topic is an indexed semantic unit ui: one vertex carrying its own scalar metadata, one embedding over title and summary text, typed fields whose values are versioned over time, and graph incidence (typed RELATED edges plus optional ref_topic_id on field revisions). Retrieval returns a bundle for each selected unit so callers do not have to stitch partial rows.
Definition 2 Memory state decomposition. The triple Mt = (Dt, St, Pt) separates what is stored durably (Dt), how queries are allowed to read and narrow it (St), and which thresholds and caps constrain change (Pt). In the reference service, St is not a second database; it is the query path over the same graph with explicit stages (semantic, structural, temporal) controlled by the request and by Pt.
Definition 3 Operators on memory. Four operators advance Mt: ingest (observations → graph writes), retrieve (natural-language question → structured context from the relevant topics), revise (background consolidation such as duplicate merges), and forget (non-destructive attenuation such as salience-based archival). The developer pages Ingest, Query, Revision, and Forget document each path against the current codebase.

Why a graph, and why not just a graph

MemState is built on an embedded property graph (Kuzu) because typed nodes and typed edges are the cleanest way to express what a topic is and how it relates to others. But a plain graph is not enough on its own for agent memory.

What the topic graph is

A topic graph is not a free-form sketch of entities. It is a graph whose vertices are topics — each vertex already bundles identity, fields (with history), summary, salience, and embedding — and whose edges are typed links between those bundles. A separate mechanism, field references (ref_topic_id), points from a field value to another topic so the pointer travels with the field. Retrieval can walk edges, follow refs, or use embeddings on the same store to choose candidates when the semantic stage is on.

Example topic graph Four topic nodes connected by one extension edge, one association edge, and one dashed field reference from a bug topic to a release topic. Regression #4412 topic node Alpha release fields · history · embedding · salience one indexed unit = one retrieval bundle Sprint board extends parent context Acme Corp independent peer topic field ref ref_topic_id extension RELATED edge association RELATED edge Read in one pass: vertices = topics (rich nodes). solid arrows = native graph edges for traversal; dashed = pointer on a field value that also resolves to a topic.

The topic graph is the property graph formed by topic vertices and RELATED edges (here: an extension and an association). Field references add a second way to point at a topic without duplicating the bundle; see Relationships for modeling rules.

Classical model What's missing for agent memory
Plain property graph No history on attributes. A change overwrites the old value instead of recording when it changed.
Vector database Ranks by similarity only. Has no concept of freshness, supersession, or structural neighbors.
Document store Every write tends to overwrite the whole document. Cross-document relationships are flattened out.
Temporal database Keeps past states but doesn't know when a past value was corrected versus merely changed.

MemState adds three things on top of the graph substrate:

  1. Topic nodes carry full context. Title, summary, fields, embedding, salience, and links all live on one node. One read returns the full picture of one thing.
  2. Every field keeps its history. When a value changes, the previous value is preserved with its timestamp and source, so retrieval can distinguish the current truth from what used to be true.
  3. Embeddings live on the topic node. The same record feeds optional semantic candidate selection and graph-backed expansion — no drift between a separate vector service and the graph store.

Anatomy of a topic

A topic is the boundary of what MemState stores as one indexed, versioned unit. Everything you need to know about one subject lives on one node.

Topic Identity & metadata id · title · summary · kind created_at · updated_at · archived Salience importance score (0-10) Fields typed values + full history per field Embedding 384-dim vector for semantic search Another topic extension link Another topic association link Another topic field reference Embedding title + summary → one vector per topic KNN or cosine search (not per-field, not chunked)

A topic carries its identity, its versioned fields, its importance, its embedding, and its links — all on one node.

Topic versus entity: where to draw the line

An entity is anything the agent talks about — a person, a paper, a task, a concept. A topic is how the agent stores one of those as a first-class unit of knowledge. Not every entity needs to be its own topic.

Situation What to do
The entity is small, local to one topic's narrative, and not reused elsewhere. Keep it inside the parent topic as a field value (string, list, or JSON).
The entity grows complex, starts being referenced from other topics, or needs its own history. Promote it to its own topic. Link from the parent with an extension or association edge, or from a field with a reference.
Two things share real-world meaning but have independent lifecycles. Keep them as separate topics and connect them with an association link.

Topic record

The physical shape of a topic node on disk.

Property Type Role
id STRING (primary key) Stable identity used by APIs, links, and field references.
title, summary STRING Human-readable metadata and the default source text for the embedding.
topic_kind STRING Optional classification label for filtering or workflow grouping.
salience, failed_salience DOUBLE Importance and failure signals used by retrieval reinforcement and forgetting.
fields_json STRING (JSON) Typed field map with per-field history and optional field-level references.
topic_history_json STRING (JSON) Append-only audit trail of topic-level events.
embedding DOUBLE[] Semantic search vector.
archived BOOLEAN Excludes the topic from default retrieval candidates without deleting it.
created_at, updated_at STRING (ISO timestamp) Temporal metadata.
There are two distinct histories on a topic. Topic history logs node-level events like creation, salience changes, and embedding refreshes. Field history logs each value change on each field.

Modeling principles

Use scalar columns for hot paths

Properties you filter, tune, or maintain on (salience, archived, timestamps) stay as top-level columns.

Use fields for domain attributes

Anything domain-specific that evolves — owner, status, location, metrics — belongs in fields_json with full history.

Use links for relationships

Typed edges for structural relationships; field-level references for attribute-scoped pointers.

Start simple, split later

Keep small entities inside a parent topic. Only promote them when they earn standalone identity.

Next