Ebook Analysis: Non-Fiction Knowledge Extraction
You analyze ebooks to extract knowledge with full citation traceability. This skill supports two complementary extraction modes:
- Concept Extraction - Extract ideas classified by abstraction (principle β tactic)
- Entity Extraction - Extract named things (studies, researchers, frameworks, anecdotes) that persist across books
Core Principle
Every extraction must be traceable to its exact source. Citation traceability is non-negotiable. Extract less with full provenance rather than more without it.
Two Extraction Modes
Mode 1: Concept Extraction
For extracting IDEAS organized by abstraction level.
Use when: Analyzing a book for transferable ideas, building a concept taxonomy, understanding how abstract principles relate to concrete tactics.
Output: JSON files (analysis.json, concepts.json)
Example: "Spaced repetition improves retention" is a MECHANISM at Layer 2.
Mode 2: Entity Extraction
For extracting NAMED THINGS that can be cross-referenced across books.
Use when: Building a knowledge base where the same study, researcher, or framework appears in multiple books. The goal is entity resolutionβrecognizing that "Hogarth's framework" in Range is the same as "kind/wicked environments" mentioned elsewhere.
Output: Markdown files in knowledge base structure
Example: "Kind vs Wicked Environments" is a FRAMEWORK by Robin Hogarth.
Choosing a Mode
| If you want to... |
Use Mode |
| Understand a book's argument structure |
Concept Extraction |
| Build a reference library across books |
Entity Extraction |
| Create actionable takeaways |
Concept Extraction |
| Track what researchers say across sources |
Entity Extraction |
| Both |
Run both modes sequentially |
Entity Extraction Mode (Detailed)
Entity Types
| Type |
What It Captures |
Example |
| study |
Research findings, experiments, data |
Flynn Effect, Marshmallow Test |
| researcher |
People and their contributions |
Anders Ericsson, Robin Hogarth |
| framework |
Mental models, taxonomies, systems |
Kind vs Wicked, Desirable Difficulties |
| anecdote |
Stories used to illustrate points |
Tiger vs Roger, Challenger Disaster |
| concept |
Ideas that aren't frameworks |
Cognitive entrenchment, Match quality |
Extended Entity Type Guidance
Some entities don't fit cleanly into the five types. Guidelines:
| Entity Kind |
Use Type |
Rationale |
| Simulations/Games (Superstruct, EVOKE) |
anecdote |
Illustrative events, even if hypothetical |
| Institutions (IFTF, WEF) |
researcher |
Organizations contribute ideas like individuals |
| Historical events (Challenger disaster) |
anecdote |
Stories that illustrate principles |
| Hypothetical scenarios |
anecdote |
Future scenarios from books like Imaginable |
| Thought experiments |
framework |
If systematic; otherwise concept |
When uncertain: Default to anecdote for narratives/events, concept for ideas, framework for systematic methods.
Author-as-Subject Pattern
When the book's author is also a significant entity (e.g., Jane McGonigal in Imaginable):
Create a researcher entity if:
- Author has notable prior work or institutional affiliation
- Author appears in Wikipedia or other reference sources
- Author's background/credentials are relevant to understanding the book
- Other books in your collection might reference them
Skip if:
- Author is primarily known only for this book
- No external sources to verify/enrich the entity
Template addition for author-subjects:
## Note
This researcher is the author of [Book] in our collection. Their frameworks and concepts are documented separately.
Entity File Template
# [Entity Name]
**Type:** study | researcher | framework | anecdote | concept
**Status:** stub | partial | solid | authoritative
**Last Updated:** YYYY-MM-DD
**Aliases:** alias1, alias2, alias3
## Summary
[2-3 sentence synthesized understanding]
## Key Findings / What It Illustrates
1. [Claim or finding with source]
β Source: [Book], Ch.[X]
2. [Another claim]
β Source: [Book], Ch.[X]
## Key Quotes
> "Quotable text here."
> "Another memorable quote."
## Sources in Collection
|------|--------|---------------|----------|
| Range | Epstein | [Role in book] | Ch.X |
## Sources NOT in Collection
- [Book that would enrich this entity]
## Related Entities
- [Other Entity](../type/other-entity.md) - Relationship description
## Open Questions
- [What we don't yet know]
Knowledge Base Structure
/knowledge/
βββ _index.md # Master registry
βββ _entities.json # Searchable index (generated)
β
βββ nonfiction/
β βββ _index.md # Domain index
β βββ _[book]-quotes.md # Book-specific quotes file
β βββ studies/
β β βββ flynn-effect.md
β β βββ chase-simon-chunking.md
β βββ researchers/
β β βββ hogarth-robin.md
β β βββ tetlock-philip.md
β βββ frameworks/
β β βββ kind-vs-wicked-environments.md
β β βββ desirable-difficulties.md
β βββ anecdotes/
β β βββ tiger-vs-roger.md
β β βββ challenger-disaster.md
β βββ concepts/
β βββ cognitive-entrenchment.md
β βββ match-quality.md
β
βββ cooking/ # Domain-specific structure
β βββ techniques/
β βββ ingredients/
β βββ equipment/
β
βββ technical/
βββ patterns/
βββ technologies/
Quotes Extraction
Quotable quotes are a distinct extraction type. For each book, create a quotes file:
File: _[book-slug]-quotes.md
Structure:
# Quotable Quotes from [Book Title]
**Author:** [Author]
**Last Updated:** YYYY-MM-DD
## On [Theme 1]
> "Quote text here."
> "Another quote on same theme."
## On [Theme 2]
> "Quote on different theme."
What makes a good quote:
- Memorable phrasing that captures a key insight
- Self-contained (understandable without context)
- Surprising or counterintuitive formulation
- Useful for presentations, writing, or reference
Entity Extraction Workflow
- Scan book - Read through identifying named studies, researchers, frameworks, illustrative stories
- Check existing entities - Use
kb-resolve-entity.ts to see if entity already exists
- Create or update - New entity β create file; existing β add as source
- Add quotes - Extract memorable quotes to quotes file
- Cross-link - Add Related Entities sections
- Regenerate index - Run
kb-generate-index.ts
Entity Extraction States (KB0-KB5)
| State |
Symptoms |
Intervention |
| KB0 |
No knowledge base |
Create directory structure |
| KB1 |
Structure exists, no entities |
Begin extraction |
| KB2 |
Extracting from book |
Create entity files |
| KB3 |
Entities created, not linked |
Add Related Entities |
| KB4 |
Linked, no index |
Run kb-generate-index.ts |
| KB5 |
Complete for this book |
Proceed to next book |
Cross-Book Synthesis Workflow
Triggered when: 2+ books have been extracted to the knowledge base.
Goals:
- Find entities that appear in multiple books
- Identify conceptual connections between books
- Surface contradictions or complementary perspectives
- Update entity files with multi-source synthesis
Process:
-
Entity overlap detection
grep -l "Sources in Collection" knowledge/nonfiction/**/*.md | \
xargs grep -l "| .* | .* |" | head -20
Or manually review entities updated with new source.
-
Conceptual connection mapping
- Compare frameworks across books (e.g., Range's "wicked environments" β Imaginable's "futures thinking")
- Identify shared researchers (e.g., Tetlock appears in both Range and Imaginable)
- Look for complementary themes (prediction failure β preparation despite uncertainty)
-
Synthesis documentation
For entities appearing in 2+ books, update the Summary section:
## Summary
[Synthesized understanding from BOTH sources, noting agreements and differences]
-
Cross-book insights
Document thematic connections in context/insights/cross-book-{theme}.md:
# Cross-Book Insight: [Theme]
## Books Contributing
- Range (Epstein) - [perspective]
- Imaginable (McGonigal) - [perspective]
## Synthesis
[How the books complement or contradict each other]
Concept Extraction Mode (Detailed)
Concept Types (Abstract β Concrete)
| Type |
Definition |
Example |
| Principle |
Foundational truth or axiom |
"Communities form around shared identity" |
| Mechanism |
How something works |
"Reciprocity creates social bonds" |
| Pattern |
Recurring structure or approach |
"The community lifecycle pattern" |
| Strategy |
High-level approach to achieve goals |
"Build trust before asking for contribution" |
| Tactic |
Specific actionable technique |
"Send welcome emails within 24 hours" |
Abstraction Layers
| Layer |
Name |
Abstraction |
Example |
| 0 |
Foundational |
Universal principles |
"Humans seek belonging" |
| 1 |
Theoretical |
Domain-specific theory |
"Community requires shared purpose" |
| 2 |
Strategic |
Approaches and frameworks |
"The funnel model of engagement" |
| 3 |
Tactical |
Specific methods |
"Onboarding sequences" |
| 4 |
Specific |
Concrete implementations |
"Use Discourse for forums" |
Relationship Types
| Relationship |
Meaning |
When to Use |
| INFLUENCES |
A affects B |
Causal or correlational connection |
| SUPPORTS |
A provides evidence for B |
Citation, example, validation |
| CONTRADICTS |
A conflicts with B |
Opposing claims |
| COMPOSED_OF |
A contains B |
Part-whole relationships |
| DERIVES_FROM |
A is derived from B |
Logical conclusions |
Concept Extraction States (EA0-EA7)
| State |
Symptoms |
Intervention |
| EA0 |
No input file |
Guide file preparation |
| EA1 |
Raw file, not parsed |
Run ea-parse.ts |
| EA2 |
Parsed, not extracted |
LLM extracts concepts |
| EA3 |
Extracted, not classified |
Assign types and layers |
| EA4 |
Classified, not annotated |
Add themes, relationships |
| EA5 |
Single book complete |
Export or proceed to synthesis |
| EA6 |
Multi-book ready |
Cross-book synthesis |
| EA7 |
Analysis complete |
Generate reports |
Concept Extraction Workflow
- Parse - Run
ea-parse.ts to chunk book with position tracking
- Extract - Present chunks to LLM for concept identification with exact quotes
- Classify - Assign type (principleβtactic) and layer (0-4)
- Annotate - Add themes and functional analysis
- Link - Connect related concepts
- Export - Generate analysis.json, concepts.json, report.md
Available Tools
Parsing Tools
ea-parse.ts
Parse ebook files into chunks with metadata and position tracking.
deno run --allow-read scripts/ea-parse.ts path/to/book.txt
deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub
deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150
Output: JSON with metadata, chapters (if detected), and chunks with positions.
Knowledge Base Tools
kb-generate-index.ts
Scan knowledge base and generate searchable entity index.
deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledge
Output: Creates _entities.json with all entities, aliases, and metadata.
kb-resolve-entity.ts
Search for existing entities before creating duplicates.
deno run --allow-read scripts/kb-resolve-entity.ts