The AI Memory Revolution
Transform how your AI remembers. Onelist provides the structured, secure, cost-efficient memory layer that Moltbot needs to truly understand and serve you.
curl -sSL https://onelist.my/install.sh | bash -s -- --with-moltbot
Most AI memory implementations are fundamentally broken. Here's what's wrong.
Most AI memory systems dump everything into a single MEMORY.md file or similar flat structure. This approach has critical limitations:
- User's name is Alex - User prefers dark mode - User asked about Python yesterday - User likes coffee - User mentioned deadline on Friday - User's timezone is PST - ... (hundreds more unstructured lines)
When you load an entire memory file into context for every request, costs explode exponentially:
With GPT-4 at $0.03/1K input tokens, a 100KB memory file (roughly 25,000 tokens) costs $0.75 just to read - before generating any response. Over 20 daily interactions, that's $450/month for a single user.
The common solution is to give LLMs a "memory_search" tool. But this approach fails in practice:
When memory files grow too large, systems attempt "compaction" - using an LLM to summarize and reduce content. This process is deeply flawed:
Each compaction cycle loses nuance. After 3-4 cycles, critical details vanish entirely. "User prefers Python for data science, JavaScript for web" becomes "User codes."
LLMs sometimes "helpfully" add inferred information during compaction. These hallucinations become permanent "memories."
Once compacted, original memories are gone. If compaction goes wrong, there's no recovery. Critical context lost forever.
Compaction requires reading entire memory, processing with LLM, and rewriting. For large memories, this alone costs $5-20.
Human memory naturally forgets irrelevant information. AI memory systems don't:
Without decay, your AI treats a coffee preference mentioned once 2 years ago with the same weight as your current critical project deadline.
Most AI memory implementations store sensitive data in plaintext local files:
When something goes wrong with AI memory, there's no way back:
A fundamentally different approach that solves every problem above.
Every memory is a typed entry with rich metadata, relationships, and representations:
{
"id": "mem_abc123",
"entry_type": "memory",
"title": "User's Python preferences",
"content": "Uses Python 3.11, prefers type hints...",
"tags": ["preferences", "programming", "python"],
"metadata": {
"importance": 0.85,
"source": "conversation",
"confidence": 0.95,
"last_accessed": "2026-01-28T10:30:00Z"
},
"representations": {
"summary": "Prefers Python 3.11 with type hints",
"embedding": [0.023, -0.145, ...],
"structured": {"language": "python", "version": "3.11"}
},
"links": ["mem_def456", "mem_ghi789"]
}
Your memories are encrypted before leaving your device:
Onelist handles storage, search, and retrieval. Your AI handles intelligence.
┌─────────────────────────────────────────────────────────────────────┐ │ │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ │ │ │ │ │ │ ONELIST │◄────────────►│ MOLTBOT │ │ │ │ │ │ │ │ │ │ - Structured │ REST API │ - Understanding │ │ │ │ Storage │ │ - Reasoning │ │ │ │ - Hybrid Search │ │ - Generation │ │ │ │ - Versioning │ │ - Context Assembly │ │ │ │ - Encryption │ │ - Tool Use │ │ │ │ - Decay/Relevance │ │ │ │ │ │ │ │ │ │ │ └─────────────────────┘ └─────────────────────┘ │ │ │ │ STORAGE LAYER INTELLIGENCE LAYER │ │ │ └─────────────────────────────────────────────────────────────────────┘
This separation means you can upgrade your AI without losing memories, and your memories stay secure even if the AI is compromised.
Every change to every memory is tracked. Roll back any entry to any point in time.
Not all memories are equal. Our hierarchy ensures the right context at the right time.
┌───────────────────────────────────────────────────────────┐
│ │
│ TOKEN BUDGET: 1,200 - 3,800 tokens │
│ (vs. unbounded in naive approaches) │
│ │
└───────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────┴─────────────────────────────────┐
│ │
▼ ▼
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
│ │ │ │
│ LAYER 1: FOUNDATIONAL │ │ LAYER 2: PROFILE │
│ 200-500 tokens │ │ 300-800 tokens │
│ │ │ │
│ Always loaded. Core facts │ │ Topic-based loading │
│ that never change. │ │ Dynamic preferences │
│ │ │ │
│ - Name, timezone │ │ - Communication style │
│ - Critical rules │ │ - Domain expertise │
│ - Key relationships │ │ - Tool preferences │
│ - Absolute truths │ │ - Work patterns │
│ │ │ │
└─────────────────────────────────┘ └─────────────────────────────────┘
│ │
└───────────────────────────┬───────────────────────────────────────┘
│
┌───────────────────────────┴───────────────────────────────────────┐
│ │
▼ ▼
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
│ │ │ │
│ LAYER 3: EPISODIC │ │ LAYER 4: TASK-SPECIFIC │
│ 500-2000 tokens │ │ 200-500 tokens │
│ │ │ │
│ Recency + Relevance scoring │ │ Generated on-demand │
│ Recent conversations │ │ Temporary, not stored │
│ │ │ │
│ - Last 3-5 interactions │ │ - Synthesized answers │
│ - Active project context │ │ - Computed insights │
│ - Recent decisions │ │ - Search result summaries │
│ - Pending tasks │ │ - Cross-reference data │
│ │ │ │
└─────────────────────────────────┘ └─────────────────────────────────┘
Core facts that define who you are. Always loaded, never filtered.
Dynamic preferences loaded based on detected topic or domain.
Recent context scored by recency and relevance. The "working memory."
Generated on-demand for the current task. Not persisted.
Every memory stored in multiple formats, optimized for different uses.
Full content with formatting
Condensed key points
Just the title/headline
Vector for semantic search
JSON key-value pairs
Load summaries by default. Fetch full content only when explicitly needed.
# Loads 50,000 tokens
memories = client.memories.list(
representation="markdown", # Full content!
limit=100
)
# Cost: $0.50 just to load context
# Latency: 2-3 seconds
# Result: Context window exhausted
# Loads 5,000 tokens
memories = client.memories.list(
representation="summary", # Summaries only!
limit=100
)
# Expand only what's needed
if needs_detail(memory):
full = client.memories.get(
memory.id,
representation="markdown"
)
Memories retrieved before the LLM is invoked. No tool calls, no latency.
PROACTIVE PRE-LOADING PIPELINE
==============================
User Input LLM Response
│ ▲
▼ │
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ │ │ │ │ │ │ │
│ STEP 1 │───►│ STEP 2 │───►│ STEP 3 │───►│ STEP 4 │─┘
│ │ │ │ │ │ │ │
│ Intent │ │ Parallel │ │ Context │ │ LLM │
│ Classify │ │ Memory │ │ Assembly │ │ Invocation │
│ │ │ Retrieval │ │ + Budget │ │ │
│ ~50ms │ │ ~100-150ms │ │ ~50ms │ │ ~500-2000ms │
│ │ │ │ │ │ │ │
│ Fast model │ │ Parallel │ │ Token │ │ Full model │
│ or rules │ │ queries │ │ counting │ │ with rich │
│ │ │ │ │ │ │ context │
└───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘
│
▼
┌─────────────────────────┐
│ │
│ ONELIST API │
│ │
│ • Foundational Layer │◄─── Always loaded
│ • Profile Layer │◄─── Topic-matched
│ • Episodic Layer │◄─── Relevance-scored
│ • Task Layer │◄─── Query-synthesized
│ │
└─────────────────────────┘
TOTAL LATENCY: 200-400ms
(vs. 2-5 seconds with tool-based retrieval)
Fast, cheap classification of user intent before expensive LLM call.
classify_intent(user_message)
# Returns: {
# "domain": "programming",
# "topics": ["python", "testing"],
# "intent": "question",
# "urgency": "normal"
# }
Uses lightweight model (GPT-3.5-turbo) or rule-based classifier. Cost: ~$0.0001
Multiple queries executed in parallel, not sequentially.
await Promise.all([
getFoundational(), // Always
getProfile("python"), // Topic-matched
getEpisodic(query), // Relevance-scored
synthesizeTask(query) // On-demand
])
Parallel execution means 4 queries complete in 150ms, not 600ms.
Assemble context within strict token budget. Prioritize by layer.
budget = TokenBudget(max=3500) budget.add(foundational, priority=1) budget.add(profile, priority=2) budget.add(episodic, priority=3) budget.add(task_specific, priority=4) # Automatically truncates lowest # priority if budget exceeded
Full LLM called with complete, relevant context already assembled.
response = await llm.complete( system=system_prompt, context=assembled_context, # Pre-loaded! user=user_message, # No memory_search tool needed # No multi-turn tool calls # Single, fast completion )
Pre-loading eliminates:
Memories that matter surface. Stale memories fade. Just like human memory.
How recently was this memory created or updated?
How often is this memory retrieved?
How connected is this to other memories?
User or AI marked as important?
10x reduction in LLM costs. 98.5% storage reduction via compaction.
Naive memory approach
Optimized memory system
# Configure per-layer token budgets config = MoltbotConfig( token_budgets={ "foundational": 500, # Core facts, always loaded "profile": 800, # Topic-matched preferences "episodic": 2000, # Recent context "task": 500, # On-demand synthesis }, total_budget=3800, # Hard ceiling overflow_strategy="truncate_lowest_priority" ) # Budget is automatically enforced context = await moltbot.get_context( user_message="Help me with Python testing", budget=config.token_budgets ) # context.token_count guaranteed <= 3800 # Lowest-priority items truncated first
Embeddings generated in batches during off-peak hours. Up to 80% cost reduction vs. real-time generation.
Embeddings cached and reused. Only regenerated when content changes significantly.
Use cheaper embedding models for low-importance memories. Premium embeddings only for high-value content.
Run Onelist alongside Moltbot with bidirectional file sync.
# Download and configure curl -sSL https://onelist.my/install/docker-compose.yml -o docker-compose.yml curl -sSL https://onelist.my/install/.env.example -o .env nano .env # Set POSTGRES_PASSWORD, SECRET_KEY_BASE, MOLTBOT_MEMORY_DIR # Start everything docker compose up -d # Verify curl http://localhost:4000/health
# Full installation with Moltbot integration curl -sSL https://onelist.my/install.sh | bash -s -- \ --postgres-port 5433 \ --onelist-port 4000 \ --with-moltbot \ --enable-web \ --no-agents
Installs PostgreSQL 16 with pgvector, Erlang/Elixir, Onelist, and creates systemd services.
# In your .env file: MOLTBOT_MEMORY_DIR=/home/user/moltbot/memory # Point to your files # Optional components (all default to false for minimal install) ENABLE_WEB=true # Enable Onelist Web UI ENABLE_AGENTS=false # Enable background agents ENABLE_RIVER=false # Enable River AI assistant # If enabling agents OPENAI_API_KEY=sk-... # Or other LLM provider
Key guarantee: Files are ALWAYS the source of truth. If Onelist fails or is removed, Moltbot works exactly as before using native file operations.
Your VPS can run full Onelist alongside Moltbot. Get Onelist Web + River agent without the $20/mo Web Access fee.
Use Free tier for local-only, or add Cloud Sync ($3/mo) for backup. See Pricing →
Join thousands of developers building smarter AI with Onelist.