M
Give Moltbot superpowered memory with one command
Install Now
All Features
M

Moltbot Integration

The AI Memory Revolution

Transform how your AI remembers. Onelist provides the structured, secure, cost-efficient memory layer that Moltbot needs to truly understand and serve you.

10x
Cost Reduction
4-Layer
Memory Hierarchy
98.5%
Storage Reduction
E2EE
End-to-End Encrypted
# Full Stack installation - files always stay source of truth
curl -sSL https://onelist.my/install.sh | bash -s -- --with-moltbot
The Problem

Why Current AI Memory Solutions Fail

Most AI memory implementations are fundamentally broken. Here's what's wrong.

Flat Storage in Markdown Files

Most AI memory systems dump everything into a single MEMORY.md file or similar flat structure. This approach has critical limitations:

No hierarchy: All memories treated equally, regardless of importance
No relationships: Memories exist in isolation, can't reference each other
No types: Facts, preferences, and events mixed together
Linear growth: File grows unbounded until unusable
# Typical MEMORY.md - everything in one flat file
- User's name is Alex
- User prefers dark mode
- User asked about Python yesterday
- User likes coffee
- User mentioned deadline on Friday
- User's timezone is PST
- ... (hundreds more unstructured lines)

Token Inefficiency - The $11 "Hi" Problem

When you load an entire memory file into context for every request, costs explode exponentially:

50,000
Tokens loaded per request
$0.50+
Cost per simple query
$11.00
Just to say "Hi" with full context

With GPT-4 at $0.03/1K input tokens, a 100KB memory file (roughly 25,000 tokens) costs $0.75 just to read - before generating any response. Over 20 daily interactions, that's $450/month for a single user.

LLMs Fail to Use memory_search Tools

The common solution is to give LLMs a "memory_search" tool. But this approach fails in practice:

Tool call latency: Each tool call adds 500-2000ms latency. Multiple searches = unusable delays.
Wrong queries: LLMs generate poor search queries. "What does user prefer?" instead of specific terms.
Skipped searches: LLMs often skip searching entirely, preferring to "wing it" with incomplete context.
Missing connections: LLMs can't know what they don't know. They won't search for context they're unaware of.

Compaction Failures

When memory files grow too large, systems attempt "compaction" - using an LLM to summarize and reduce content. This process is deeply flawed:

Information Loss

Each compaction cycle loses nuance. After 3-4 cycles, critical details vanish entirely. "User prefers Python for data science, JavaScript for web" becomes "User codes."

Hallucination Injection

LLMs sometimes "helpfully" add inferred information during compaction. These hallucinations become permanent "memories."

No Undo

Once compacted, original memories are gone. If compaction goes wrong, there's no recovery. Critical context lost forever.

Expensive Process

Compaction requires reading entire memory, processing with LLM, and rewriting. For large memories, this alone costs $5-20.

No Decay Mechanism

Human memory naturally forgets irrelevant information. AI memory systems don't:

Without decay, your AI treats a coffee preference mentioned once 2 years ago with the same weight as your current critical project deadline.

Old, irrelevant facts persist
Context window polluted
Important items buried

Security Vulnerabilities

Most AI memory implementations store sensitive data in plaintext local files:

Unencrypted Storage

  • - MEMORY.md readable by any process
  • - Indexed by Spotlight/search
  • - Backed up to cloud unencrypted
  • - Visible in file sync clients

Attack Surface

  • - Malware can read/modify memories
  • - Prompt injection via memory file
  • - No access control or audit log
  • - Shared system = shared memories

No Versioning

When something goes wrong with AI memory, there's no way back:

  • AI writes incorrect information - no undo, manual repair required
  • Compaction destroys data - no recovery possible
  • File corruption - start from scratch
  • No audit trail of what changed, when, or why
The Solution

Onelist: Purpose-Built AI Memory

A fundamentally different approach that solves every problem above.

Structured Storage with Entry Types

Every memory is a typed entry with rich metadata, relationships, and representations:

{
  "id": "mem_abc123",
  "entry_type": "memory",
  "title": "User's Python preferences",
  "content": "Uses Python 3.11, prefers type hints...",
  "tags": ["preferences", "programming", "python"],
  "metadata": {
    "importance": 0.85,
    "source": "conversation",
    "confidence": 0.95,
    "last_accessed": "2026-01-28T10:30:00Z"
  },
  "representations": {
    "summary": "Prefers Python 3.11 with type hints",
    "embedding": [0.023, -0.145, ...],
    "structured": {"language": "python", "version": "3.11"}
  },
  "links": ["mem_def456", "mem_ghi789"]
}

End-to-End Encryption

Your memories are encrypted before leaving your device:

AES-256-GCM
Military-grade encryption
PBKDF2
Key derivation from passphrase
Zero-Knowledge
We can't read your data

Clear Separation: Storage vs. Intelligence

Onelist handles storage, search, and retrieval. Your AI handles intelligence.

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  ┌─────────────────────┐              ┌─────────────────────┐      │
│  │                     │              │                     │      │
│  │      ONELIST        │◄────────────►│      MOLTBOT        │      │
│  │                     │              │                     │      │
│  │  - Structured       │   REST API   │  - Understanding    │      │
│  │    Storage          │              │  - Reasoning        │      │
│  │  - Hybrid Search    │              │  - Generation       │      │
│  │  - Versioning       │              │  - Context Assembly │      │
│  │  - Encryption       │              │  - Tool Use         │      │
│  │  - Decay/Relevance  │              │                     │      │
│  │                     │              │                     │      │
│  └─────────────────────┘              └─────────────────────┘      │
│                                                                     │
│        STORAGE LAYER                     INTELLIGENCE LAYER         │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

This separation means you can upgrade your AI without losing memories, and your memories stay secure even if the AI is compromised.

Full Version History

Every change to every memory is tracked. Roll back any entry to any point in time.

Complete Audit Trail

  • - Who made each change (human or AI)
  • - When changes occurred
  • - What the previous state was
  • - Diff between versions

One-Click Rollback

  • - Restore any previous version
  • - Undo AI mistakes instantly
  • - Recover from compaction errors
  • - Never lose data again
Memory Architecture

The 4-Layer Memory Hierarchy

Not all memories are equal. Our hierarchy ensures the right context at the right time.

                    ┌───────────────────────────────────────────────────────────┐
                    │                                                           │
                    │              TOKEN BUDGET: 1,200 - 3,800 tokens           │
                    │              (vs. unbounded in naive approaches)           │
                    │                                                           │
                    └───────────────────────────────────────────────────────────┘
                                              │
            ┌─────────────────────────────────┴─────────────────────────────────┐
            │                                                                   │
            ▼                                                                   ▼
┌─────────────────────────────────┐                       ┌─────────────────────────────────┐
│                                 │                       │                                 │
│   LAYER 1: FOUNDATIONAL        │                       │   LAYER 2: PROFILE              │
│   200-500 tokens                │                       │   300-800 tokens                │
│                                 │                       │                                 │
│   Always loaded. Core facts     │                       │   Topic-based loading           │
│   that never change.            │                       │   Dynamic preferences           │
│                                 │                       │                                 │
│   - Name, timezone              │                       │   - Communication style         │
│   - Critical rules              │                       │   - Domain expertise            │
│   - Key relationships           │                       │   - Tool preferences            │
│   - Absolute truths             │                       │   - Work patterns               │
│                                 │                       │                                 │
└─────────────────────────────────┘                       └─────────────────────────────────┘
            │                                                                   │
            └───────────────────────────┬───────────────────────────────────────┘
                                        │
            ┌───────────────────────────┴───────────────────────────────────────┐
            │                                                                   │
            ▼                                                                   ▼
┌─────────────────────────────────┐                       ┌─────────────────────────────────┐
│                                 │                       │                                 │
│   LAYER 3: EPISODIC             │                       │   LAYER 4: TASK-SPECIFIC         │
│   500-2000 tokens               │                       │   200-500 tokens                │
│                                 │                       │                                 │
│   Recency + Relevance scoring   │                       │   Generated on-demand           │
│   Recent conversations          │                       │   Temporary, not stored         │
│                                 │                       │                                 │
│   - Last 3-5 interactions       │                       │   - Synthesized answers         │
│   - Active project context      │                       │   - Computed insights           │
│   - Recent decisions            │                       │   - Search result summaries     │
│   - Pending tasks               │                       │   - Cross-reference data        │
│                                 │                       │                                 │
└─────────────────────────────────┘                       └─────────────────────────────────┘
1

Foundational Layer

200-500 tokens

Core facts that define who you are. Always loaded, never filtered.

Loading: 100% - every request
Update frequency: Rare - only when facts change
Examples: Name: Alex, TZ: PST, Language: English
2

Profile Layer

300-800 tokens

Dynamic preferences loaded based on detected topic or domain.

Loading: Topic-triggered (coding, writing, health...)
Update frequency: Regular - as preferences evolve
Examples: Prefers TypeScript, Concise responses
3

Episodic Layer

500-2000 tokens

Recent context scored by recency and relevance. The "working memory."

Loading: Top-N by composite score
Update frequency: Every conversation
Examples: Yesterday's project discussion, pending decisions
4

Task-Specific Layer

200-500 tokens

Generated on-demand for the current task. Not persisted.

Loading: Query-specific synthesis
Update frequency: Generated fresh each time
Examples: Aggregated search results, computed summaries
1,200 - 3,800 tokens
Total context budget per request
Compare to: 50,000+ tokens with naive "load everything" approaches
Multi-Format Storage

Representation Strategy

Every memory stored in multiple formats, optimized for different uses.

markdown

Full content with formatting

~500 tokens

summary

Condensed key points

~50 tokens
DEFAULT LOAD

title_only

Just the title/headline

~10 tokens

embedding

Vector for semantic search

1536 floats

structured

JSON key-value pairs

Variable

The Summary-First Loading Pattern

Load summaries by default. Fetch full content only when explicitly needed.

Bad: Load Full Content

# Loads 50,000 tokens
memories = client.memories.list(
    representation="markdown",  # Full content!
    limit=100
)

# Cost: $0.50 just to load context
# Latency: 2-3 seconds
# Result: Context window exhausted

Good: Summary-First

# Loads 5,000 tokens
memories = client.memories.list(
    representation="summary",  # Summaries only!
    limit=100
)

# Expand only what's needed
if needs_detail(memory):
    full = client.memories.get(
        memory.id,
        representation="markdown"
    )
Proactive Architecture

The Proactive Pre-Loading Pipeline

Memories retrieved before the LLM is invoked. No tool calls, no latency.

                                    PROACTIVE PRE-LOADING PIPELINE
                                    ==============================

    User Input                                                              LLM Response
        │                                                                        ▲
        ▼                                                                        │
┌───────────────┐    ┌───────────────┐    ┌───────────────┐    ┌───────────────┐ │
│               │    │               │    │               │    │               │ │
│   STEP 1      │───►│   STEP 2      │───►│   STEP 3      │───►│   STEP 4      │─┘
│               │    │               │    │               │    │               │
│   Intent      │    │   Parallel    │    │   Context     │    │   LLM         │
│   Classify    │    │   Memory      │    │   Assembly    │    │   Invocation  │
│               │    │   Retrieval   │    │   + Budget    │    │               │
│   ~50ms       │    │   ~100-150ms  │    │   ~50ms       │    │   ~500-2000ms │
│               │    │               │    │               │    │               │
│   Fast model  │    │   Parallel    │    │   Token       │    │   Full model  │
│   or rules    │    │   queries     │    │   counting    │    │   with rich   │
│               │    │               │    │               │    │   context     │
└───────────────┘    └───────────────┘    └───────────────┘    └───────────────┘

                           │
                           ▼
              ┌─────────────────────────┐
              │                         │
              │   ONELIST API           │
              │                         │
              │   • Foundational Layer  │◄─── Always loaded
              │   • Profile Layer       │◄─── Topic-matched
              │   • Episodic Layer      │◄─── Relevance-scored
              │   • Task Layer          │◄─── Query-synthesized
              │                         │
              └─────────────────────────┘

              TOTAL LATENCY: 200-400ms
              (vs. 2-5 seconds with tool-based retrieval)
1

Intent Classification

~50ms

Fast, cheap classification of user intent before expensive LLM call.

classify_intent(user_message)
# Returns: {
#   "domain": "programming",
#   "topics": ["python", "testing"],
#   "intent": "question",
#   "urgency": "normal"
# }

Uses lightweight model (GPT-3.5-turbo) or rule-based classifier. Cost: ~$0.0001

2

Parallel Memory Retrieval

~100-150ms

Multiple queries executed in parallel, not sequentially.

await Promise.all([
  getFoundational(),      // Always
  getProfile("python"),   // Topic-matched
  getEpisodic(query),     // Relevance-scored
  synthesizeTask(query)   // On-demand
])

Parallel execution means 4 queries complete in 150ms, not 600ms.

3

Context Assembly + Budget

~50ms

Assemble context within strict token budget. Prioritize by layer.

budget = TokenBudget(max=3500)
budget.add(foundational, priority=1)
budget.add(profile, priority=2)
budget.add(episodic, priority=3)
budget.add(task_specific, priority=4)
# Automatically truncates lowest
# priority if budget exceeded
4

LLM Invocation

~500-2000ms

Full LLM called with complete, relevant context already assembled.

response = await llm.complete(
  system=system_prompt,
  context=assembled_context,  # Pre-loaded!
  user=user_message,
  # No memory_search tool needed
  # No multi-turn tool calls
  # Single, fast completion
)

Latency Target Achieved

Proactive Pre-Loading: 200-400ms
Tool-based Retrieval: 2-5 seconds

Pre-loading eliminates:

  • - Tool call round-trips
  • - LLM query generation time
  • - Sequential retrieval delays
  • - "Should I search?" decision latency
Intelligent Memory Management

Relevance Scoring & Decay

Memories that matter surface. Stale memories fade. Just like human memory.

The Relevance Score Formula

relevance_score = recency_weight + access_frequency + link_density + explicit_importance
Recency Weight

How recently was this memory created or updated?

decay = e^(-days_since / 30)
Weight: 0.0 - 0.3
Access Frequency

How often is this memory retrieved?

freq = accesses / time_span
Weight: 0.0 - 0.25
Link Density

How connected is this to other memories?

links = inbound + outbound
Weight: 0.0 - 0.2
Explicit Importance

User or AI marked as important?

starred, pinned, tagged
Weight: 0.0 - 0.25

Decay Thresholds & Actions

Active (1.0 - 0.5)
Fading (0.5 - 0.3)
Archive (0.3 - 0.1)
Delete (< 0.1)
Score: 1.0 - 0.5
Status: Active
  • - Eligible for context loading
  • - Full search visibility
  • - Normal representation
Score: 0.5 - 0.3
Status: Fading
  • - Lower loading priority
  • - Still searchable
  • - Candidate for compaction
Score: 0.3 - 0.1
Status: Archived
  • - Not auto-loaded
  • - Explicit search only
  • - Compacted representation
Score: < 0.1
Status: Deletion Candidate
  • - Flagged for review
  • - User confirmation required
  • - Auto-delete after 30 days

Intelligent Compaction Schedule

Daily

  • Recalculate relevance scores
  • Update access frequency metrics
  • Merge duplicate entries

Weekly

  • Compact fading memories
  • Regenerate summary representations
  • Archive low-score entries

Monthly

  • Deep compaction pass
  • Propose deletions for review
  • Rebalance embedding index
Economics

Dramatic Cost Optimization

10x reduction in LLM costs. 98.5% storage reduction via compaction.

Without Onelist

Naive memory approach

Context tokens per request 50,000
Cost per request (GPT-4) $0.50+
Monthly cost (20 requests/day) $300+
Storage growth Unbounded

With Onelist

Optimized memory system

Context tokens per request 3,000
Cost per request (GPT-4) $0.05
Monthly cost (20 requests/day) $30
Storage after compaction 1.5% of original

Token Budget Management

# Configure per-layer token budgets
config = MoltbotConfig(
    token_budgets={
        "foundational": 500,    # Core facts, always loaded
        "profile":      800,    # Topic-matched preferences
        "episodic":     2000,   # Recent context
        "task":         500,    # On-demand synthesis
    },
    total_budget=3800,              # Hard ceiling
    overflow_strategy="truncate_lowest_priority"
)

# Budget is automatically enforced
context = await moltbot.get_context(
    user_message="Help me with Python testing",
    budget=config.token_budgets
)

# context.token_count guaranteed <= 3800
# Lowest-priority items truncated first

Embedding Cost Reduction Strategies

Batch Processing

Embeddings generated in batches during off-peak hours. Up to 80% cost reduction vs. real-time generation.

Embedding Cache

Embeddings cached and reused. Only regenerated when content changes significantly.

Tiered Models

Use cheaper embedding models for low-importance memories. Premium embeddings only for high-value content.

10x
LLM Cost Reduction
98.5%
Storage Reduction
5x
Faster Response Times
Get Started

Full Stack Installation

Run Onelist alongside Moltbot with bidirectional file sync.

What You Get

Required Components

  • Onelist Core (Phoenix API)
  • PostgreSQL with pgvector
  • Bidirectional file sync layer
  • Files always source of truth

Optional Components

  • Onelist Web (LiveView UI) - ENABLE_WEB=true
  • River agent (GTD, proactive) - ENABLE_RIVER=true
  • Other agents (Reader, Librarian) - ENABLE_AGENTS=true
RECOMMENDED

Option A: Docker Compose

# Download and configure
curl -sSL https://onelist.my/install/docker-compose.yml -o docker-compose.yml
curl -sSL https://onelist.my/install/.env.example -o .env
nano .env  # Set POSTGRES_PASSWORD, SECRET_KEY_BASE, MOLTBOT_MEMORY_DIR

# Start everything
docker compose up -d

# Verify
curl http://localhost:4000/health

Option B: Native Installation

# Full installation with Moltbot integration
curl -sSL https://onelist.my/install.sh | bash -s -- \
  --postgres-port 5433 \
  --onelist-port 4000 \
  --with-moltbot \
  --enable-web \
  --no-agents

Installs PostgreSQL 16 with pgvector, Erlang/Elixir, Onelist, and creates systemd services.

Configure Optional Components

# In your .env file:
MOLTBOT_MEMORY_DIR=/home/user/moltbot/memory  # Point to your files

# Optional components (all default to false for minimal install)
ENABLE_WEB=true           # Enable Onelist Web UI
ENABLE_AGENTS=false       # Enable background agents
ENABLE_RIVER=false        # Enable River AI assistant

# If enabling agents
OPENAI_API_KEY=sk-...     # Or other LLM provider

Key guarantee: Files are ALWAYS the source of truth. If Onelist fails or is removed, Moltbot works exactly as before using native file operations.

Already Running Your Own Moltbot Instance?

Your VPS can run full Onelist alongside Moltbot. Get Onelist Web + River agent without the $20/mo Web Access fee.

Use Free tier for local-only, or add Cloud Sync ($3/mo) for backup. See Pricing →

Transform Your AI's Memory Today

Join thousands of developers building smarter AI with Onelist.