memory-lancedb-pro Plugin Maintenance Guide
Overview
memory-lancedb-pro is an enhanced long-term memory plugin for OpenClaw. It replaces the built-in memory-lancedb plugin with advanced retrieval capabilities, multi-scope memory isolation, and a management CLI.
Repository: https://github.com/win4r/memory-lancedb-pro
License: MIT | Language: TypeScript (ESM) | Runtime: Node.js via OpenClaw Gateway
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β index.ts (Entry Point) β
β Plugin Registration Β· Config Parsing Β· Lifecycle Hooks β
ββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬ββββββββββββββββ
β β β β
ββββββΌββββ ββββββΌββββ βββββΌβββββ ββββΌβββββββββββ
β store β βembedderβ βretrieverβ β scopes β
β .ts β β .ts β β .ts β β .ts β
ββββββββββ ββββββββββ ββββββββββ βββββββββββββββ
β β
ββββββΌββββ βββββββΌβββββββββββ
βmigrate β βnoise-filter.ts β
β .ts β βadaptive- β
ββββββββββ βretrieval.ts β
ββββββββββββββββββ
βββββββββββββββ ββββββββββββ
β tools.ts β β cli.ts β
β (Agent API) β β (CLI) β
βββββββββββββββ ββββββββββββ
File Reference (Quick Navigation)
| File |
Purpose |
Key Exports |
index.ts |
Plugin entry point. Registers with OpenClaw Plugin API, parses config, mounts lifecycle hooks |
memoryLanceDBProPlugin (default), shouldCapture, detectCategory |
openclaw.plugin.json |
Plugin metadata + full JSON Schema config with uiHints |
β |
package.json |
NPM package. Deps: @lancedb/lancedb, openai, @sinclair/typebox |
β |
cli.ts |
CLI: memory-pro list/search/stats/delete/delete-bulk/export/import/reembed/migrate |
createMemoryCLI, registerMemoryCLI |
src/store.ts |
LanceDB storage layer. Table creation, FTS indexing, CRUD, vector/BM25 search |
MemoryStore, MemoryEntry, loadLanceDB |
src/embedder.ts |
Embedding abstraction. OpenAI-compatible API, task-aware, LRU cache |
Embedder, createEmbedder, getVectorDimensions |
src/retriever.ts |
Hybrid retrieval engine. Full scoring pipeline |
MemoryRetriever, createRetriever, DEFAULT_RETRIEVAL_CONFIG |
src/scopes.ts |
Multi-scope access control |
MemoryScopeManager, createScopeManager |
src/tools.ts |
Agent tool definitions: memory_recall/store/forget/update/stats/list |
registerAllMemoryTools |
src/noise-filter.ts |
Noise filter for low-quality content |
isNoise, filterNoise |
src/adaptive-retrieval.ts |
Skip retrieval for greetings, commands, emoji |
shouldSkipRetrieval |
src/migrate.ts |
Migration from legacy memory-lancedb |
MemoryMigrator, createMigrator |
scripts/jsonl_distill.py |
JSONL session distillation script (Python) |
β |
Core Subsystem Reference
For detailed deep-dives into each subsystem, read the appropriate reference file:
- Retrieval Pipeline (scoring math, RRF fusion, reranking, all scoring stages): See references/retrieval_pipeline.md
- Storage & Data Model (LanceDB schema, FTS indexing, CRUD, vector dim): See references/storage_and_schema.md
- Embedding System (providers, task-aware API, caching, dimensions): See references/embedding_system.md
- Plugin Lifecycle & Config (hooks, registration, config parsing): See references/plugin_lifecycle.md
- Scope System (multi-scope isolation, agent access, patterns): See references/scope_system.md
- Tools & CLI (agent tools, CLI commands, parameters): See references/tools_and_cli.md
- Common Gotchas & Troubleshooting: See references/troubleshooting.md
Development Workflows
Adding a New Embedding Provider
- Check if it's OpenAI-compatible (most are). If so, no code change needed β just config
- If the model is not in
EMBEDDING_DIMENSIONS map in src/embedder.ts, add it
- If the provider needs special request fields beyond
task and normalized, extend buildPayload() in src/embedder.ts
- Test with
embedder.test() method
- Document the provider in README.md table
Adding a New Rerank Provider
- Add provider name to
RerankProvider type in src/retriever.ts
- Add case in
buildRerankRequest() for request format (headers + body)
- Add case in
parseRerankResponse() for response parsing
- Add to
rerankProvider enum in openclaw.plugin.json
- Test with actual API calls β reranker has 5s timeout protection
Adding a New Scoring Stage
- Create a
private apply<StageName>(results: RetrievalResult[]): RetrievalResult[] method in MemoryRetriever
- Add corresponding config fields to
RetrievalConfig interface
- Insert the stage in the pipeline sequence in both
hybridRetrieval() and vectorOnlyRetrieval()
- Add defaults to
DEFAULT_RETRIEVAL_CONFIG
- Add JSON Schema fields to
openclaw.plugin.json
- Pipeline order: Fusion β Rerank β Recency β Importance β LengthNorm β TimeDecay β HardMin β Noise β MMR
Adding a New Agent Tool
- Create
registerMemory<ToolName>Tool() in src/tools.ts
- Define parameters with
Type.Object() from @sinclair/typebox
- Use
stringEnum() from openclaw/plugin-sdk for enum params
- Always validate scope access via
context.scopeManager
- Register in
registerAllMemoryTools() β decide if core (always) or management (optional)
- Return
{ content: [{ type: "text", text }], details: {...} }
Adding a New CLI Command
- Add command in
registerMemoryCLI() in cli.ts
- Pattern:
memory.command("name <args>").description("...").option("--flag", "...").action(async (args, opts) => { ... })
- Support
--json flag for machine-readable output
- Use
process.exit(1) for error cases
- CLI is registered via
api.registerCli() in index.ts
Modifying Auto-Capture Logic
shouldCapture(text) in index.ts controls what gets auto-captured
MEMORY_TRIGGERS regex array defines trigger patterns (supports EN/CJK)
detectCategory(text) classifies captures as preference/fact/decision/entity/other
- Auto-capture runs in
agent_end hook, limited to 3 per turn
- Duplicate detection threshold: cosine similarity > 0.95
Modifying Auto-Recall Logic
- Auto-recall uses
before_agent_start hook (OFF by default)
shouldSkipRetrieval() from src/adaptive-retrieval.ts gates retrieval
- Injected as
<relevant-memories> XML block with UNTRUSTED DATA warning
sanitizeForContext() strips HTML, newlines, limits to 300 chars per memory
- Max 3 memories injected per turn
Key Design Decisions
- autoRecall defaults to OFF β prevents model from echoing injected memory context
- autoCapture defaults to ON β transparent memory accumulation
- sessionMemory defaults to OFF β raw session summaries degrade retrieval quality; use JSONL distillation instead
- LanceDB dynamic import β loaded asynchronously to avoid blocking; cached in singleton promise
- Startup checks are fire-and-forget β gateway binds HTTP port immediately; embedding/retrieval tests run in background with 8s timeout
- Daily JSONL backup β 24h interval, keeps last 7 files, runs 1 min after start
- BM25 score normalization β raw BM25 scores are unbounded, normalized with sigmoid:
1 / (1 + exp(-score/5))
- Update = delete + re-add β LanceDB doesn't support in-place updates
- ID prefix matching β 8+ hex char prefix resolves to full UUID for user convenience
- CJK-aware thresholds β shorter minimum lengths for Chinese/Japanese/Korean text (4β6 chars vs 10β15 for English)
- Env var resolution β
${VAR} syntax resolved at config parse time; gateway service may not inherit shell env
Testing
- Smoke test:
node test/cli-smoke.mjs
- Manual verification:
openclaw plugins doctor, openclaw memory-pro stats
- Embedding test:
embedder.test() returns { success, dimensions, error? }
- Retrieval test:
retriever.test() returns { success, mode, hasFtsSupport, error? }