Moltbot Integration - The AI Memory Revolution

The Problem

Why Current AI Memory Solutions Fail

Most AI memory implementations are fundamentally broken. Here's what's wrong.

Flat Storage in Markdown Files

Most AI memory systems dump everything into a single MEMORY.md file or similar flat structure. This approach has critical limitations:

No hierarchy: All memories treated equally, regardless of importance

No relationships: Memories exist in isolation, can't reference each other

No types: Facts, preferences, and events mixed together

Linear growth: File grows unbounded until unusable

# Typical MEMORY.md - everything in one flat file

- User's name is Alex
- User prefers dark mode
- User asked about Python yesterday
- User likes coffee
- User mentioned deadline on Friday
- User's timezone is PST
- ... (hundreds more unstructured lines)

Token Inefficiency - The $11 "Hi" Problem

When you load an entire memory file into context for every request, costs explode exponentially:

50,000

Tokens loaded per request

$0.50+

Cost per simple query

$11.00

Just to say "Hi" with full context

With GPT-4 at $0.03/1K input tokens, a 100KB memory file (roughly 25,000 tokens) costs $0.75 just to read - before generating any response. Over 20 daily interactions, that's $450/month for a single user.

LLMs Fail to Use memory_search Tools

The common solution is to give LLMs a "memory_search" tool. But this approach fails in practice:

Tool call latency: Each tool call adds 500-2000ms latency. Multiple searches = unusable delays.

Wrong queries: LLMs generate poor search queries. "What does user prefer?" instead of specific terms.

Skipped searches: LLMs often skip searching entirely, preferring to "wing it" with incomplete context.

Missing connections: LLMs can't know what they don't know. They won't search for context they're unaware of.

Compaction Failures

When memory files grow too large, systems attempt "compaction" - using an LLM to summarize and reduce content. This process is deeply flawed:

Information Loss

Each compaction cycle loses nuance. After 3-4 cycles, critical details vanish entirely. "User prefers Python for data science, JavaScript for web" becomes "User codes."

Hallucination Injection

LLMs sometimes "helpfully" add inferred information during compaction. These hallucinations become permanent "memories."

No Undo

Once compacted, original memories are gone. If compaction goes wrong, there's no recovery. Critical context lost forever.

Expensive Process

Compaction requires reading entire memory, processing with LLM, and rewriting. For large memories, this alone costs $5-20.

No Decay Mechanism

Human memory naturally forgets irrelevant information. AI memory systems don't:

Without decay, your AI treats a coffee preference mentioned once 2 years ago with the same weight as your current critical project deadline.

Old, irrelevant facts persist

Context window polluted

Important items buried

Security Vulnerabilities

Most AI memory implementations store sensitive data in plaintext local files:

Unencrypted Storage

- MEMORY.md readable by any process
- Indexed by Spotlight/search
- Backed up to cloud unencrypted
- Visible in file sync clients

Attack Surface

- Malware can read/modify memories
- Prompt injection via memory file
- No access control or audit log
- Shared system = shared memories

No Versioning

When something goes wrong with AI memory, there's no way back:

AI writes incorrect information - no undo, manual repair required
Compaction destroys data - no recovery possible
File corruption - start from scratch
No audit trail of what changed, when, or why

The Solution

Onelist: Purpose-Built AI Memory

A fundamentally different approach that solves every problem above.

Structured Storage with Entry Types

Every memory is a typed entry with rich metadata, relationships, and representations:

{
  "id": "mem_abc123",
  "entry_type": "memory",
  "title": "User's Python preferences",
  "content": "Uses Python 3.11, prefers type hints...",
  "tags": ["preferences", "programming", "python"],
  "metadata": {
    "importance": 0.85,
    "source": "conversation",
    "confidence": 0.95,
    "last_accessed": "2026-01-28T10:30:00Z"
  },
  "representations": {
    "summary": "Prefers Python 3.11 with type hints",
    "embedding": [0.023, -0.145, ...],
    "structured": {"language": "python", "version": "3.11"}
  },
  "links": ["mem_def456", "mem_ghi789"]
}

End-to-End Encryption

Your memories are encrypted before leaving your device:

AES-256-GCM

Military-grade encryption

PBKDF2

Key derivation from passphrase

Zero-Knowledge

We can't read your data

Clear Separation: Storage vs. Intelligence

Onelist handles storage, search, and retrieval. Your AI handles intelligence.

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  ┌─────────────────────┐              ┌─────────────────────┐      │
│  │                     │              │                     │      │
│  │      ONELIST        │◄────────────►│      MOLTBOT        │      │
│  │                     │              │                     │      │
│  │  - Structured       │   REST API   │  - Understanding    │      │
│  │    Storage          │              │  - Reasoning        │      │
│  │  - Hybrid Search    │              │  - Generation       │      │
│  │  - Versioning       │              │  - Context Assembly │      │
│  │  - Encryption       │              │  - Tool Use         │      │
│  │  - Decay/Relevance  │              │                     │      │
│  │                     │              │                     │      │
│  └─────────────────────┘              └─────────────────────┘      │
│                                                                     │
│        STORAGE LAYER                     INTELLIGENCE LAYER         │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

This separation means you can upgrade your AI without losing memories, and your memories stay secure even if the AI is compromised.

Full Version History

Every change to every memory is tracked. Roll back any entry to any point in time.

Complete Audit Trail

- Who made each change (human or AI)
- When changes occurred
- What the previous state was
- Diff between versions

One-Click Rollback

- Restore any previous version
- Undo AI mistakes instantly
- Recover from compaction errors
- Never lose data again

Memory Architecture

The 4-Layer Memory Hierarchy

Not all memories are equal. Our hierarchy ensures the right context at the right time.

                    ┌───────────────────────────────────────────────────────────┐
                    │                                                           │
                    │              TOKEN BUDGET: 1,200 - 3,800 tokens           │
                    │              (vs. unbounded in naive approaches)           │
                    │                                                           │
                    └───────────────────────────────────────────────────────────┘
                                              │
            ┌─────────────────────────────────┴─────────────────────────────────┐
            │                                                                   │
            ▼                                                                   ▼
┌─────────────────────────────────┐                       ┌─────────────────────────────────┐
│                                 │                       │                                 │
│   LAYER 1: FOUNDATIONAL        │                       │   LAYER 2: PROFILE              │
│   200-500 tokens                │                       │   300-800 tokens                │
│                                 │                       │                                 │
│   Always loaded. Core facts     │                       │   Topic-based loading           │
│   that never change.            │                       │   Dynamic preferences           │
│                                 │                       │                                 │
│   - Name, timezone              │                       │   - Communication style         │
│   - Critical rules              │                       │   - Domain expertise            │
│   - Key relationships           │                       │   - Tool preferences            │
│   - Absolute truths             │                       │   - Work patterns               │
│                                 │                       │                                 │
└─────────────────────────────────┘                       └─────────────────────────────────┘
            │                                                                   │
            └───────────────────────────┬───────────────────────────────────────┘
                                        │
            ┌───────────────────────────┴───────────────────────────────────────┐
            │                                                                   │
            ▼                                                                   ▼
┌─────────────────────────────────┐                       ┌─────────────────────────────────┐
│                                 │                       │                                 │
│   LAYER 3: EPISODIC             │                       │   LAYER 4: TASK-SPECIFIC         │
│   500-2000 tokens               │                       │   200-500 tokens                │
│                                 │                       │                                 │
│   Recency + Relevance scoring   │                       │   Generated on-demand           │
│   Recent conversations          │                       │   Temporary, not stored         │
│                                 │                       │                                 │
│   - Last 3-5 interactions       │                       │   - Synthesized answers         │
│   - Active project context      │                       │   - Computed insights           │
│   - Recent decisions            │                       │   - Search result summaries     │
│   - Pending tasks               │                       │   - Cross-reference data        │
│                                 │                       │                                 │
└─────────────────────────────────┘                       └─────────────────────────────────┘

1

Foundational Layer

200-500 tokens

Core facts that define who you are. Always loaded, never filtered.

Loading: 100% - every request

Update frequency: Rare - only when facts change

Examples: Name: Alex, TZ: PST, Language: English

2

Profile Layer

300-800 tokens

Dynamic preferences loaded based on detected topic or domain.

Loading: Topic-triggered (coding, writing, health...)

Update frequency: Regular - as preferences evolve

Examples: Prefers TypeScript, Concise responses

3

Episodic Layer

500-2000 tokens

Recent context scored by recency and relevance. The "working memory."

Loading: Top-N by composite score

Update frequency: Every conversation

Examples: Yesterday's project discussion, pending decisions

4

Task-Specific Layer

200-500 tokens

Generated on-demand for the current task. Not persisted.

Loading: Query-specific synthesis

Update frequency: Generated fresh each time

Examples: Aggregated search results, computed summaries

1,200 - 3,800 tokens

Total context budget per request

Compare to: 50,000+ tokens with naive "load everything" approaches

Multi-Format Storage

Representation Strategy

Every memory stored in multiple formats, optimized for different uses.

markdown

Full content with formatting

~500 tokens

summary

Condensed key points

~50 tokens

DEFAULT LOAD

title_only

Just the title/headline

~10 tokens

embedding

Vector for semantic search

1536 floats

structured

JSON key-value pairs

Variable

The Summary-First Loading Pattern

Load summaries by default. Fetch full content only when explicitly needed.

Bad: Load Full Content

# Loads 50,000 tokens
memories = client.memories.list(
    representation="markdown",  # Full content!
    limit=100
)

# Cost: $0.50 just to load context
# Latency: 2-3 seconds
# Result: Context window exhausted

Good: Summary-First

# Loads 5,000 tokens
memories = client.memories.list(
    representation="summary",  # Summaries only!
    limit=100
)

# Expand only what's needed
if needs_detail(memory):
    full = client.memories.get(
        memory.id,
        representation="markdown"
    )

Proactive Architecture

The Proactive Pre-Loading Pipeline

Memories retrieved before the LLM is invoked. No tool calls, no latency.

                                    PROACTIVE PRE-LOADING PIPELINE
                                    ==============================

    User Input                                                              LLM Response
        │                                                                        ▲
        ▼                                                                        │
┌───────────────┐    ┌───────────────┐    ┌───────────────┐    ┌───────────────┐ │
│               │    │               │    │               │    │               │ │
│   STEP 1      │───►│   STEP 2      │───►│   STEP 3      │───►│   STEP 4      │─┘
│               │    │               │    │               │    │               │
│   Intent      │    │   Parallel    │    │   Context     │    │   LLM         │
│   Classify    │    │   Memory      │    │   Assembly    │    │   Invocation  │
│               │    │   Retrieval   │    │   + Budget    │    │               │
│   ~50ms       │    │   ~100-150ms  │    │   ~50ms       │    │   ~500-2000ms │
│               │    │               │    │               │    │               │
│   Fast model  │    │   Parallel    │    │   Token       │    │   Full model  │
│   or rules    │    │   queries     │    │   counting    │    │   with rich   │
│               │    │               │    │               │    │   context     │
└───────────────┘    └───────────────┘    └───────────────┘    └───────────────┘

                           │
                           ▼
              ┌─────────────────────────┐
              │                         │
              │   ONELIST API           │
              │                         │
              │   • Foundational Layer  │◄─── Always loaded
              │   • Profile Layer       │◄─── Topic-matched
              │   • Episodic Layer      │◄─── Relevance-scored
              │   • Task Layer          │◄─── Query-synthesized
              │                         │
              └─────────────────────────┘

              TOTAL LATENCY: 200-400ms
              (vs. 2-5 seconds with tool-based retrieval)

1

Intent Classification

~50ms

Fast, cheap classification of user intent before expensive LLM call.

classify_intent(user_message)
# Returns: {
#   "domain": "programming",
#   "topics": ["python", "testing"],
#   "intent": "question",
#   "urgency": "normal"
# }

Uses lightweight model (GPT-3.5-turbo) or rule-based classifier. Cost: ~$0.0001

2

Parallel Memory Retrieval

~100-150ms

Multiple queries executed in parallel, not sequentially.

await Promise.all([
  getFoundational(),      // Always
  getProfile("python"),   // Topic-matched
  getEpisodic(query),     // Relevance-scored
  synthesizeTask(query)   // On-demand
])

Parallel execution means 4 queries complete in 150ms, not 600ms.

3

Context Assembly + Budget

~50ms

Assemble context within strict token budget. Prioritize by layer.

budget = TokenBudget(max=3500)
budget.add(foundational, priority=1)
budget.add(profile, priority=2)
budget.add(episodic, priority=3)
budget.add(task_specific, priority=4)
# Automatically truncates lowest
# priority if budget exceeded

4

LLM Invocation

~500-2000ms

Full LLM called with complete, relevant context already assembled.

response = await llm.complete(
  system=system_prompt,
  context=assembled_context,  # Pre-loaded!
  user=user_message,
  # No memory_search tool needed
  # No multi-turn tool calls
  # Single, fast completion
)

Latency Target Achieved

Proactive Pre-Loading: 200-400ms

Tool-based Retrieval: 2-5 seconds

Pre-loading eliminates:

- Tool call round-trips
- LLM query generation time
- Sequential retrieval delays
- "Should I search?" decision latency

Intelligent Memory Management

Relevance Scoring & Decay

Memories that matter surface. Stale memories fade. Just like human memory.

The Relevance Score Formula

relevance_score = recency_weight + access_frequency + link_density + explicit_importance

Recency Weight

How recently was this memory created or updated?

decay = e^(-days_since / 30)
Weight: 0.0 - 0.3

Access Frequency

How often is this memory retrieved?

freq = accesses / time_span
Weight: 0.0 - 0.25

Link Density

How connected is this to other memories?

links = inbound + outbound
Weight: 0.0 - 0.2

Explicit Importance

User or AI marked as important?

starred, pinned, tagged
Weight: 0.0 - 0.25

Decay Thresholds & Actions

Active (1.0 - 0.5)

Fading (0.5 - 0.3)

Archive (0.3 - 0.1)

Delete (< 0.1)

Score: 1.0 - 0.5

Status: Active

- Eligible for context loading
- Full search visibility
- Normal representation

Score: 0.5 - 0.3

Status: Fading

- Lower loading priority
- Still searchable
- Candidate for compaction

Score: 0.3 - 0.1

Status: Archived

- Not auto-loaded
- Explicit search only
- Compacted representation

Score: < 0.1

Status: Deletion Candidate

- Flagged for review
- User confirmation required
- Auto-delete after 30 days

Intelligent Compaction Schedule

Daily

Recalculate relevance scores
Update access frequency metrics
Merge duplicate entries

Weekly

Compact fading memories
Regenerate summary representations
Archive low-score entries

Monthly

Deep compaction pass
Propose deletions for review
Rebalance embedding index

Economics

Dramatic Cost Optimization

10x reduction in LLM costs. 98.5% storage reduction via compaction.

Without Onelist

Naive memory approach

Context tokens per request 50,000

Cost per request (GPT-4) $0.50+

Monthly cost (20 requests/day) $300+

Storage growth Unbounded

With Onelist

Optimized memory system

Context tokens per request 3,000

Cost per request (GPT-4) $0.05

Monthly cost (20 requests/day) $30

Storage after compaction 1.5% of original

Token Budget Management

# Configure per-layer token budgets
config = MoltbotConfig(
    token_budgets={
        "foundational": 500,    # Core facts, always loaded
        "profile":      800,    # Topic-matched preferences
        "episodic":     2000,   # Recent context
        "task":         500,    # On-demand synthesis
    },
    total_budget=3800,              # Hard ceiling
    overflow_strategy="truncate_lowest_priority"
)

# Budget is automatically enforced
context = await moltbot.get_context(
    user_message="Help me with Python testing",
    budget=config.token_budgets
)

# context.token_count guaranteed <= 3800
# Lowest-priority items truncated first

Embedding Cost Reduction Strategies

Batch Processing

Embeddings generated in batches during off-peak hours. Up to 80% cost reduction vs. real-time generation.

Embedding Cache

Embeddings cached and reused. Only regenerated when content changes significantly.

Tiered Models

Use cheaper embedding models for low-importance memories. Premium embeddings only for high-value content.

10x

LLM Cost Reduction

98.5%

Storage Reduction

5x

Faster Response Times

Get Started

Full Stack Installation

Run Onelist alongside Moltbot with bidirectional file sync.

What You Get

Required Components

Onelist Core (Phoenix API)
PostgreSQL with pgvector
Bidirectional file sync layer
Files always source of truth

Optional Components

Onelist Web (LiveView UI) - ENABLE_WEB=true
River agent (GTD, proactive) - ENABLE_RIVER=true
Other agents (Reader, Librarian) - ENABLE_AGENTS=true

RECOMMENDED

Option A: Docker Compose

# Download and configure
curl -sSL https://onelist.my/install/docker-compose.yml -o docker-compose.yml
curl -sSL https://onelist.my/install/.env.example -o .env
nano .env  # Set POSTGRES_PASSWORD, SECRET_KEY_BASE, MOLTBOT_MEMORY_DIR

# Start everything
docker compose up -d

# Verify
curl http://localhost:4000/health

Option B: Native Installation

# Full installation with Moltbot integration
curl -sSL https://onelist.my/install.sh | bash -s -- \
  --postgres-port 5433 \
  --onelist-port 4000 \
  --with-moltbot \
  --enable-web \
  --no-agents

Installs PostgreSQL 16 with pgvector, Erlang/Elixir, Onelist, and creates systemd services.

Configure Optional Components

# In your .env file:
MOLTBOT_MEMORY_DIR=/home/user/moltbot/memory  # Point to your files

# Optional components (all default to false for minimal install)
ENABLE_WEB=true           # Enable Onelist Web UI
ENABLE_AGENTS=false       # Enable background agents
ENABLE_RIVER=false        # Enable River AI assistant

# If enabling agents
OPENAI_API_KEY=sk-...     # Or other LLM provider

Key guarantee: Files are ALWAYS the source of truth. If Onelist fails or is removed, Moltbot works exactly as before using native file operations.

Create Free Account Full Documentation

Already Running Your Own Moltbot Instance?

Your VPS can run full Onelist alongside Moltbot. Get Onelist Web + River agent without the $20/mo Web Access fee.

Use Free tier for local-only, or add Cloud Sync ($3/mo) for backup. See Pricing →

Transform Your AI's Memory Today

Join thousands of developers building smarter AI with Onelist.

Get Started Free Read the Docs