Architecture

One process. One store. Three access surfaces.

MemState runs as a single FastAPI service with an embedded Kuzu graph. All three API surfaces share the same in-process store, so state is immediately consistent across product endpoints, UI tools, and the agent chat loop. Agent paths stay limited to observations and context queries; revise and forget run inside the reasoner, not in the agent integration.

Storage details Reasoner behavior HTTP stack

Under the GEM state split M_t = (D_t, S_t, P_t), this architecture keeps one durable D_t in Kuzu, evaluates S_t (semantic / structural / temporal query stages) in the executor over that same store, and applies P_t from the in-process Policies object plus environment settings.

Topology

MemState is deployed as one service. There is no separate worker, no external database tier, and no message queue. Everything — routing, write semantics, background maintenance, and persistence — lives in one process.

🧩

Single service

FastAPI + Uvicorn. Launched via python -m memstate.api.cli. One process owns the socket and the graph handle.

🗄

Embedded store

Kuzu is embedded and opened from MEMSTATE_KUZU_PATH. There is no external DB to operate.

↻

In-process maintenance

Reasoner work runs on FastAPI BackgroundTasks — no separate queue or worker.

🔗

Shared singletons

All three API surfaces resolve the same Executor and the same store handle via dependency injection.

Components

Component	Role	Called by
`api.main`	Route layer, health endpoints, auth wiring, router mounting.	HTTP clients
`api.deps`	Cached providers for settings, executor, reasoner, and the shared store.	FastAPI dependency injection
`core.Executor`	Write and read semantics: how observations become graph updates, retrieval stages, salience reinforcement (placement may be explicit on the wire today).	`/v1/ingest`, `/v1/query`
`reasoner.Reasoner`	Background maintenance: revise duplicates, forget by low salience.	Post-response background tasks
`store.GraphStore`	Kuzu facade for topic, field, edge, and vector operations.	Executor, UI API, tool runner
`llm.MemoryToolRunner`	Runs LLM tool calls against the same shared store.	`/api/llm/chat`, MCP server

Request lifecycles

Ingest

Client -> FastAPI route -> Executor.ingest
  -> GraphStore writes (topic / fields / edges / embedding)
  -> HTTP response returned
  -> BackgroundTasks callback -> Reasoner.run("ingest_complete")
      -> run_forget_low_salience  (if policy threshold met)
      -> run_revise_duplicates    (duplicate-title heuristic)

Retrieve

Client -> FastAPI route -> Executor.query
  -> embed query text
  -> pick candidate topics (vector KNN or cosine fallback)
  -> optional structural expansion (neighbors + fields)
  -> optional temporal expansion (history when explain=true)
  -> salience bump + topic-history append per touched topic
  -> HTTP response returned
  -> BackgroundTasks callback -> Reasoner.run("query_complete")

Shared-singleton dependency model

Every API surface resolves the Executor through one dependency function. The Executor holds the shared store; the store holds one Kuzu handle per configured database path. That means writes from the chat loop are instantly visible to /v1/query without any cross-process sync.

One dependency cache guarantees every surface uses the same Executor and the same Kuzu handle.

Error behavior

Validation errors — ValueError becomes HTTP 400 through a global exception handler.
Auth errors — optional API key mismatch becomes HTTP 401 from verify_api_key.
Graph health failures — /health/graph returns 503 with a path and operator hint.
Background failures — a reasoner error does not affect the already-sent ingest or query response.