Mnemosyne — Cognitive OS for Local-First AI Agents

The Premise

The problem isn't that agents don't remember. The problem is that memory only flows in one direction.

Every other system retrieves upward from stored text. Mnemosyne adds the return path — higher-level reflection distilled back down into fast, user-specific instinct. The agent's first response is shaped by what it has learned about you, not just base-model priors.

↑ Retrieval (everyone)

› Raw turn — "what happened"

› Vector store — embed + retrieve

› Summary — compress to abstract

› Response — context injected

↑
│
╳
│
↓

↓ Reflection → Instinct (Mnemosyne)

‹ L5 identity + L4 patterns

‹ Distilled into L0 reflex cache

‹ Fast-path shapes first token

‹ Agent acts like it knows you

Eval-Gated · Reproducible

Benchmarks, not vibes.

Every number ships with the command that produced it and a regression gate that fails any change which drops a metric. To our knowledge, no other Hermes memory provider publishes eval-gated baselines.

0/8

Hermes runtime checks PASS

Live Hermes v0.16.0 — discovery, routing, persistence, fresh-session recall.

0.0000

Retrieval recall@5

Deterministic probe set — recall@5 / MRR / hit@1 by category.

0.0000

LOCOMO retrieval-only

Across 1,986 questions · snap-research/locomo runner.

0.00

Continuity score

1.00 cross-session · 50 scenarios, 6 categories (v0.7.1).

0.00ms

Per-memory write

INSERT + FTS5 sync · 10,000 writes in 2.13 s.

0.00ms

Search p50 · 10K corpus

FTS5 BM25, 2-token query · p95 18.4 ms.

0.00ms

Wrapper overhead

Just 0.24% at realistic model latency — observability is free.

0/6

Identity slips rewritten

4-layer lock vs. a 40-prompt jailbreak suite.

Reproduce any of these yourself: python3 tests/test_all.py · bash test-harness.sh · mnemosyne-continuity run. Throughput measured single-thread on a Linux sandbox; your hardware will vary. Full methodology in docs/BENCHMARKS.md.

The Hermes Plugin

A drop-in memory provider for Hermes Agent.

Plug the full six-tier ICMS into any Hermes Agent (Nous Research) as its persistent backend. One SQLite file. No API keys, no vector DB, no cloud. Demonstrated, not aspirational — validated end-to-end on a live runtime.

hermes · v0.16.0 · runtime validation

Three tools. Zero ceremony.

Turns persist automatically through a single-writer SQLite queue — the agent loop is never blocked. Session-end hooks run salient extraction, writing facts, preferences, and goals up-tier.

memory_search(query, limit)	FTS5 BM25 + strength-weighted retrieval across all tiers
memory_write(content, kind, tier)	store fact / preference / goal / pattern at tier 2–5
memory_stats()	direct SQLite per-tier row counts

✓ Local SQLite core ✓ ACT-R decay ✓ Hebbian strength ✓ Offline dreams

ICMS · Six-Tier Cognitive Memory

Not a flat fact store. A mind with tiers.

Memories carry tier semantics — working context vs. long-term vs. consolidated patterns vs. human-approved identity — with promotion, decay, and a distilled fast-path reflex cache. Hover a tier.

INSTINCT

Distilled fast-path reflex cache — always checked first

▓▓▓

HOT

Working context — current session, selected slice

▓▓░

WARM

Short-term — default for new writes, feeds dream consolidation

▓▒░

COLD

Long-term — demoted from L2, TF-IDF clustering target

▒░░

PATTERN

Consolidated patterns — promoted by the compactor, + user_instinct overlay

◆◇◇

IDENTITY

Human-approved core values — the 4-layer lock the agent never breaks

★★★

Mnemosyne system architecture: Channels to Brain to Tool Executor, six-tier ICMS, inner dialogue, dream consolidation, and the Meta-Harness self-improvement loop

Channels → Brain (context assembly + identity lock) → Tool Executor + 19-provider backend · Inner Dialogue (Planner / Critic / Doer / Evaluator) · Dream Consolidation · the closed Meta-Harness loop.

Observability

An avatar that visualizes its own mind.

◐
29 derived traits, all observableEvery visual property of the SVG avatar maps to one number — no opaque personality engine.
▤
Memory browser across all tiersFTS5 search over L1–L5, live per-tier row counts, goals that persist across sessions.
⟳
Real-time event streammemory_write, tool_call, persona_call — every action logged and inspectable.
⌂
Survivable datasqlite3 memory.db "SELECT content FROM memories" works with the framework gone.

The avatar's visual contract ↗

Foundations

Standing on real research.

Mnemosyne is engineering, not mysticism — every mechanism traces to a paper. The honest split of shipped vs. experimental vs. research lives in ROADMAP.md.

Preprint · Mnemosyne

A Cognitive OS for Local-First Agents

The architecture writeup: ICMS tiers, the Reflection → Instinct loop, the Meta-Harness, and the eval contract behind every number on this page.

Read the docs

Benchmark · 2024

LOCOMO — Very Long-Term Conversational Memory

Maharana et al. The 10-conversation, ~1,986-question benchmark Mnemosyne's retrieval runner targets, with recorded baselines.

arXiv:2402.17753

Cognitive Science

ACT-R — The Adaptive Character of Thought

Anderson et al. The activation-decay model behind Mnemosyne's per-kind memory decay and Hebbian strength reinforcement.

act-r.psy.cmu.edu

Interpretability · Roadmap

Neural Geometry & Concept Manifolds

Goodfire's manifold work guides the eval-gated roadmap for L4 pattern memory — preserving relational structure, not just flat abstracts.

goodfire.ai/research

Runtime · Nous Research

Hermes Agent

The agent framework Mnemosyne validates against as a drop-in MemoryProvider — discovery, tool routing, and the plugin manifest.

hermes-agent.nousresearch.com

Reproduce

The Eval Harness

Retrieval probe set, the full LOCOMO runner with LM Studio + Mem0 adapters, and the check_regression.py gate — in the lab repo.

atxgreene/mnemosyne-lab

From the Field

Written up on 𝕏.

Mnemosyne

@atxgreene

"I built a brain for my AI agent — a six-tier memory that flows both directions, runs entirely local, and survives as plain SQLite. Here's what I learned."

↻ Repost♥ Like↗ Share

⌁ Embed slot ready — paste your X article URL in chat and the live post drops in right here, with this card as the styled fallback.

Quickstart

Ten lines to a first conversation.

bash · ~/projects/mnemosyne

# zero runtime deps — this pulls nothing from PyPI $ pip install mnemosyne-harness $ mnemosyne-serve & # daemon + dashboard $ open http://127.0.0.1:8484/ui # avatar evolves in real time # or drop it into Hermes as a memory provider: $ hermes plugins enable mnemosyne $ hermes plugins list → enabled user 0.1.0 mnemosyne

Verify: python3 tests/test_all.py E2E: bash test-harness.sh Demo: ./demo.sh

★ Star on GitHub ↗ Quickstart guide ↗