What It Does
Most AI agents forget everything between conversations. Long-Term Memory gives agents a durable, structured knowledge system that persists across sessions, grows through use, and maintains itself automatically.
Memories are not chat logs. They are durable knowledge — facts, rules, instructions, and observations — stored in a knowledge graph with typed relationships, vector embeddings, and three temperature tiers optimized for different access patterns.
Hot
Always-loaded index. A bounded table of contents injected into every agent context. The agent always knows what it remembers.
Warm
On-demand recall. LLM-selected retrieval with graph traversal. Full content loaded only when relevant to the current conversation.
Cold
Archival search. Stale observations and pruned memories remain searchable via vector similarity but are never bulk-loaded.
Memory Taxonomy
Not all memories are equal. Four types ensure semantic consistency and enable intelligent consolidation.
| Type | What It Stores | Persistence |
|---|---|---|
| Fact | Verified truths, domain knowledge, system behaviors, established states. Other memories link to Facts. | Long-lived. Rarely pruned. |
| Rule | Business rules, constraints, policies, conventions, conditions that govern decisions. | Long-lived. Relationship-rich. |
| Instruction | User directives, preferences, standing orders, explicit guidance from the user. | Persists until explicitly revoked or superseded. |
| Observation | Patterns noticed, inferences, situational context, unverified assessments. | Candidates for promotion to Fact or pruning as stale. |
The Retrieval Pipeline
Recall is a three-stage pipeline that minimizes latency and token cost while maximizing relevance.
Stage 1: Header Scan
A lightweight query retrieves metadata only — titles, types, descriptions, timestamps — for up to 200 memories. No content or embeddings loaded. This gives the system a fast manifest of everything the agent knows.
Stage 2: LLM Relevance Selection
The manifest is passed to an LLM that selects the top memories relevant to the current conversation. This uses semantic understanding, not just cosine distance — a rare memory with a non-obvious connection to the query can still be selected. Falls back to vector-based ranking if the LLM is unavailable.
Stage 3: Full Content Load + Graph Traversal
Selected memories are loaded with full content. In parallel, graph traversal discovers related entities within configurable hops — surfacing connections the agent didn’t explicitly search for. Freshness warnings are attached to stale or point-in-time memories so the agent knows when to re-verify information.
Intelligent Consolidation
Memories accumulate. Without maintenance, the knowledge graph degrades. A four-pass consolidation pipeline keeps it clean.
Pass 1: Deduplication
Finds near-duplicate memories by vector distance. Keeps the newer entity and merges content from the older one using LLM synthesis. Knowledge compounds, not duplicates.
Pass 2: Staleness Pruning
Identifies memories older than the threshold that are not in the hot index. LLM confirms before archival. Demotes to cold — never hard deletes. Point-in-time memories pruned more aggressively.
Pass 3: Contradiction Resolution
Analyzes recent memories for incompatible claims. Uses a preference hierarchy (Facts > Observations, recent > old) to demote the stale memory. Prevents the agent from holding conflicting knowledge.
Pass 4: Index Truncation
Enforces dual caps on the hot index: maximum entry count and maximum token budget. Evicts oldest entries when either limit is exceeded. Keeps the always-loaded context tight.
Key Capabilities
Entity Matching on Save
When a new memory is saved, the system checks for existing entities with similar content. If a match is found, knowledge is merged using LLM synthesis instead of creating a duplicate.
Synthetic Imagining
Beyond reactive search. Analyzes the full conversation to generate diverse queries that proactively discover memories the agent should know about — before the user asks.
Automatic Extraction
During message compaction, durable knowledge is automatically extracted from conversation history before messages are summarized away. Nothing valuable is lost.
Point-in-Time Snapshots
Special mode for agents with live data sources. Database snapshots and current statuses become stale — snapshot memories are aggressively pruned and always carry freshness warnings.
Knowledge Graph Storage
Memories are concept nodes in a graph with typed, weighted relationships. Agents understand how their knowledge connects, not just what individual facts say.
Agent Isolation
Each agent maintains a completely separate knowledge graph. Memories are scoped by agent handle and never leaked between agents, even on the same host.
Interested in Long-Term Memory?
Long-Term Memory is experimental and actively evolving. If you’re building agents that need to grow their knowledge over time, we’d love to talk.