Knowledge Base Manager
Build and maintain high-quality knowledge bases for AI systems and human consumption.
Core Principle
Knowledge Base = Structured Information + Quality Curation + Accessibility
A knowledge base is not just a data dumpβit's curated, validated, versioned information designed to answer questions and enable reasoning.
When to Use Knowledge Bases
Use Knowledge Bases When:
- β
Need to answer factual questions consistently
- β
Information changes frequently and needs version control
- β
Multiple sources need to be unified and reconciled
- β
Provenance and citation tracking is critical
- β
Building AI systems that need grounded, verifiable information
- β
Organizational knowledge needs to be preserved and searchable
- β
Complex domain with interconnected concepts
Don't Use Knowledge Bases When:
- β Static documentation is sufficient (use docs + search)
- β No one will maintain/update it (knowledge rot guaranteed)
- β Simple FAQ covers all questions (<50 items)
- β Information doesn't change (static site faster/cheaper)
- β Team lacks resources for curation
Knowledge Base Types: Decision Framework
1. Document-Based Knowledge Base (RAG)
What it is: Collection of documents, chunked and embedded for semantic search
Best for:
- Technical documentation
- Support articles, FAQs
- Policy documents
- Research papers
- Blog content
- User manuals
Strengths:
- Easy to add new documents
- Preserves full context
- Natural for text-heavy content
Weaknesses:
- Hard to query relationships ("Who works where?")
- Duplicate information across documents
- Difficult to keep facts consistent
Use: rag-implementer skill + vector-database-mcp
2. Entity-Based Knowledge Base (Knowledge Graph)
What it is: Network of entities (people, places, things) connected by relationships
Best for:
- Organizational charts
- Product catalogs with relationships
- Social networks
- Recommendation systems
- Fraud detection
- Supply chain tracking
Strengths:
- Excellent for "how are X and Y related?" queries
- Consistent facts (one source of truth)
- Powerful traversal ("friends of friends")
Weaknesses:
- Upfront modeling required (ontology design)
- Harder to add unstructured information
- Learning curve for graph queries
Use: knowledge-graph-builder skill + graph-database-mcp
3. Hybrid Knowledge Base (RAG + Graph)
What it is: Documents for unstructured knowledge + Graph for structured entities/relationships
Best for:
- Enterprise knowledge management
- Research with citations and relationships
- Medical systems (documents + patient/drug relationships)
- Legal systems (cases + precedents + entities)
- E-commerce (products + specs + relationships)
Strengths:
- Best of both worlds
- Flexible for different knowledge types
- Rich querying capabilities
Weaknesses:
- Most complex to build and maintain
- Requires expertise in both RAG and graphs
- Higher infrastructure costs
Use: Both rag-implementer + knowledge-graph-builder skills
Decision Tree: Which KB Type?
What kind of knowledge do you have?
ββ Mostly unstructured text (docs, articles, content)?
β ββ Document-Based KB (RAG)
β Use: rag-implementer skill
β
ββ Mostly structured entities with relationships?
β ββ Entity-Based KB (Graph)
β Use: knowledge-graph-builder skill
β
ββ Mix of both?
ββ Hybrid KB (RAG + Graph)
Use: Both skills + This skill for integration
6-Phase Knowledge Base Implementation
Phase 1: Knowledge Audit & Architecture
Goal: Understand what knowledge exists and how to structure it
Actions:
-
Inventory existing knowledge sources
- Internal: databases, documents, wikis, Slack, emails
- External: public data, APIs, third-party sources
- Tribal: SME interviews, recorded conversations
-
Classify knowledge types
- Factual: Verifiable facts ("Product X costs $50")
- Procedural: How-to knowledge ("How to deploy")
- Conceptual: Definitions and explanations
- Relationship: Connections between entities
-
Choose KB architecture
- Document-based? Entity-based? Hybrid?
- Decision: Use framework above
-
Define knowledge schema
- For documents: metadata fields (source, date, author, category)
- For entities: ontology (entity types, relationship types, properties)
Validation:
Phase 2: Knowledge Curation & Ingestion
Goal: Transform raw information into high-quality knowledge
Actions:
-
Extract knowledge from sources
- Automated: scraping, API ingestion, file parsing
- Manual: expert input, annotation, validation
-
Clean and normalize
- Remove duplicates
- Standardize formats
- Fix inconsistencies
- Enrich with metadata
-
Structure knowledge
- For documents: chunk intelligently (semantic boundaries)
- For entities: extract entities, relationships, properties
-
Add provenance
- Source URL or reference
- Last updated timestamp
- Author/contributor
- Confidence score (if applicable)
Curation Best Practices:
- Single Source of Truth: One canonical answer per question
- Deduplication: Merge similar knowledge entries
- Conflict Resolution: When sources disagree, establish priority rules
- Metadata Richness: More metadata = better filtering and search
Validation:
Phase 3: Storage & Retrieval Setup
Goal: Implement technical infrastructure for knowledge access
Architecture Patterns:
For Document-Based KB:
interface DocumentKB {
store: 'Pinecone' | 'Weaviate' | 'pgvector'
chunks: {
content: string
embedding: number[]
metadata: {
source: string
title: string
updated_at: string
category: string
}
}[]
}
For Entity-Based KB:
interface EntityKB {
store: 'Neo4j' | 'ArangoDB'
nodes: {
id: string
type: 'Person' | 'Organization' | 'Product' | 'Concept'
properties: Record<string, any>
}[]
relationships: {
from: string
to: string
type: string
properties: Record<string, any>
}[]
}
For Hybrid KB:
interface HybridKB {
vectorDB: DocumentKB
graphDB: EntityKB
linker: {
linkDocumentToEntities(docId: string): string[]
linkEntityToDocuments(entityId: string): string[]
}
}
Actions:
-
Choose database(s)
- Document: Pinecone, Weaviate, pgvector
- Entity: Neo4j, ArangoDB
- Hybrid: Both + linking layer
-
Implement search/query layer
- Vector similarity search (for documents)
- Graph traversal (for entities)
- Hybrid queries (combining both)
-
Add caching and optimization
- Cache frequent queries
- Optimize for common access patterns
Validation:
Phase 4: Quality Control & Validation
Goal: Ensure knowledge base accuracy and reliability
Quality Metrics:
- Accuracy: % of correct answers to test questions
- Coverage: % of user questions answerable
- Freshness: Average age of knowledge
- Consistency: % of conflicts/contradictions
- Source Quality: % from authoritative sources
Validation Strategies:
1. Test Question Sets
Create 100+ test questions with known correct answers:
interface TestQuestion {
question: string
expected_answer: string
category: string
difficulty: 'easy' | 'medium' | 'hard'
}
2. Human Review
- Sample random knowledge entries
- Subject matter expert validation
- User feedback loops
3. Automated Checks
- Duplicate Detection: Find near-identical entries
- Conflict Detection: Find contradictory facts
- Staleness Detection: Flag outdated information
- Citation Validation: Verify sources still exist
4. Continuous Monitoring
interface KBHealthMetrics {
accuracy_score: number
coverage_score: number
freshness_score: number
consistency_score: number
user_satisfaction: number
}
Actions:
- Run test question validation (target: >90% accuracy)
- Conduct human review (sample 10% of entries)
- Fix detected issues (duplicates, conflicts, staleness)
- Establish monitoring dashboards
Validation:
Phase 5: Versioning & Evolution
Goal: Track knowledge changes over time and enable rollback
Why Versioning Matters:
- Knowledge changes (facts update, policies change)
- Need audit trail (who changed what when)
- Rollback capability (undo bad updates)
- Historical queries ("What was policy on X in 2023?")
Versioning Strategies:
1. Snapshot Versioning
interface KnowledgeEntry {
id: string
content: string
version: number
created_at: string
updated_at: string
updated_by: string
changelog: string
previous_version?: string
}
2. Event Sourcing
interface KnowledgeEvent {
event_id: string
entity_id: string
event_type: 'created' | 'updated' | 'deleted'
timestamp: string
changes: {
field: string
old_value: any
new_value: any
}[]
author