Experimental

Long-Term Memory

Three-Temperature Knowledge Persistence

Give your agents durable knowledge that persists across conversations. Three temperature tiers, intelligent consolidation, and a knowledge graph that grows and maintains itself over time.

In plain English

Long-Term Memory is what makes an AI assistant actually remember things about your business — past customers, past decisions, how things usually go. Without it, every conversation with an AI starts from zero. If you're a business owner: this is why our AI tools get smarter over time instead of being amnesiacs. The technical details below are for developers.

What It Does

Most AI agents forget everything between conversations. Long-Term Memory gives agents a durable, structured knowledge system that persists across sessions, grows through use, and maintains itself automatically.

Memories are not chat logs. They are durable knowledge — facts, rules, instructions, and observations — stored in a knowledge graph with typed relationships, vector embeddings, and three temperature tiers optimized for different access patterns.

Hot

Always-loaded index. A bounded table of contents injected into every agent context. The agent always knows what it remembers.

Warm

On-demand recall. LLM-selected retrieval with graph traversal. Full content loaded only when relevant to the current conversation.

Cold

Archival search. Stale observations and pruned memories remain searchable via vector similarity but are never bulk-loaded.

Memory Taxonomy

Not all memories are equal. Four types ensure semantic consistency and enable intelligent consolidation.

Type	What It Stores	Persistence
Fact	Verified truths, domain knowledge, system behaviors, established states. Other memories link to Facts.	Long-lived. Rarely pruned.
Rule	Business rules, constraints, policies, conventions, conditions that govern decisions.	Long-lived. Relationship-rich.
Instruction	User directives, preferences, standing orders, explicit guidance from the user.	Persists until explicitly revoked or superseded.
Observation	Patterns noticed, inferences, situational context, unverified assessments.	Candidates for promotion to Fact or pruning as stale.

The Retrieval Pipeline

Recall is a three-stage pipeline that minimizes latency and token cost while maximizing relevance.

Stage 1: Header Scan

A lightweight query retrieves metadata only — titles, types, descriptions, timestamps — for up to 200 memories. No content or embeddings loaded. This gives the system a fast manifest of everything the agent knows.

Stage 2: LLM Relevance Selection

The manifest is passed to an LLM that selects the top memories relevant to the current conversation. This uses semantic understanding, not just cosine distance — a rare memory with a non-obvious connection to the query can still be selected. Falls back to vector-based ranking if the LLM is unavailable.

Stage 3: Full Content Load + Graph Traversal

Selected memories are loaded with full content. In parallel, graph traversal discovers related entities within configurable hops — surfacing connections the agent didn’t explicitly search for. Freshness warnings are attached to stale or point-in-time memories so the agent knows when to re-verify information.

Intelligent Consolidation

Memories accumulate. Without maintenance, the knowledge graph degrades. A four-pass consolidation pipeline keeps it clean.

Pass 1: Deduplication

Finds near-duplicate memories by vector distance. Keeps the newer entity and merges content from the older one using LLM synthesis. Knowledge compounds, not duplicates.

Pass 2: Staleness Pruning

Identifies memories older than the threshold that are not in the hot index. LLM confirms before archival. Demotes to cold — never hard deletes. Point-in-time memories pruned more aggressively.

Pass 3: Contradiction Resolution

Analyzes recent memories for incompatible claims. Uses a preference hierarchy (Facts > Observations, recent > old) to demote the stale memory. Prevents the agent from holding conflicting knowledge.

Pass 4: Index Truncation

Enforces dual caps on the hot index: maximum entry count and maximum token budget. Evicts oldest entries when either limit is exceeded. Keeps the always-loaded context tight.

Key Capabilities

Entity Matching on Save

When a new memory is saved, the system checks for existing entities with similar content. If a match is found, knowledge is merged using LLM synthesis instead of creating a duplicate.

Synthetic Imagining

Beyond reactive search. Analyzes the full conversation to generate diverse queries that proactively discover memories the agent should know about — before the user asks.

Automatic Extraction

During message compaction, durable knowledge is automatically extracted from conversation history before messages are summarized away. Nothing valuable is lost.

Point-in-Time Snapshots

Special mode for agents with live data sources. Database snapshots and current statuses become stale — snapshot memories are aggressively pruned and always carry freshness warnings.

Knowledge Graph Storage

Memories are concept nodes in a graph with typed, weighted relationships. Agents understand how their knowledge connects, not just what individual facts say.

Agent Isolation

Each agent maintains a completely separate knowledge graph. Memories are scoped by agent handle and never leaked between agents, even on the same host.

Experimental

Interested in Long-Term Memory?

Long-Term Memory is experimental and actively evolving. If you’re building agents that need to grow their knowledge over time, we’d love to talk.

Talk to Us All Solutions