Translate Book (Parallel Subagents)
Skill by ara.so β Daily 2026 Skills collection.
A Claude Code skill that translates entire books (PDF/DOCX/EPUB) into any language using parallel subagents. Each chunk gets an isolated context window β preventing truncation and context accumulation that plague single-session translation.
Pipeline Overview
Input (PDF/DOCX/EPUB)
β
βΌ
Calibre ebook-convert β HTMLZ β HTML β Markdown
β
βΌ
Split into chunks (~6000 chars each)
β manifest.json tracks SHA-256 hashes
βΌ
Parallel subagents (8 concurrent by default)
β each: read chunk β translate β write output_chunk*.md
βΌ
Validate (manifest hash check, 1:1 sourceβoutput match)
β
βΌ
Merge β Pandoc β HTML (with TOC) β Calibre β DOCX / EPUB / PDF
Prerequisites
brew install --cask calibre
sudo apt-get install calibre
brew install pandoc
sudo apt-get install pandoc
pip install pypandoc beautifulsoup4
Verify all tools are available:
ebook-convert --version
pandoc --version
python3 -c "import pypandoc; print('pypandoc ok')"
Installation
Option A: npx (recommended)
npx skills add deusyu/translate-book -a claude-code -g
Option B: ClawHub
clawhub install translate-book
Option C: Git clone
git clone https://github.com/deusyu/translate-book.git ~/.claude/skills/translate-book
Usage in Claude Code
Once the skill is installed, use natural language inside Claude Code:
translate /path/to/book.pdf to Chinese
translate ~/Downloads/mybook.epub to Japanese
/translate-book translate /path/to/book.docx to French
The skill orchestrates the full pipeline automatically.
Supported Languages
| Code |
Language |
zh |
Chinese |
en |
English |
ja |
Japanese |
ko |
Korean |
fr |
French |
de |
German |
es |
Spanish |
Language codes are extensible β add new ones in the skill definition.
Running Pipeline Steps Manually
Step 1: Convert to Markdown Chunks
python3 scripts/convert.py /path/to/book.pdf --olang zh
This produces inside {book_name}_temp/:
chunk0001.md, chunk0002.md, ... (source chunks, ~6000 chars each)
manifest.json (SHA-256 hashes for validation)
python3 scripts/convert.py /path/to/book.epub --olang ja
python3 scripts/convert.py /path/to/book.docx --olang fr
Step 2: Translate (Parallel Subagents)
The skill handles this step β it launches 8 concurrent subagents per batch, each translating one chunk independently:
# Each subagent receives exactly this task:
Read chunk0042.md β translate to target language β write output_chunk0042.md
Resumable: Already-translated chunks (valid output_chunk*.md files) are skipped on re-run.
Step 3: Merge and Build All Formats
python3 scripts/merge_and_build.py \
--temp-dir book_name_temp \
--title "γBook Title in Target Languageγ"
Before merging, validation checks:
- Every source chunk has a matching output file (1:1)
- Source chunk hashes match
manifest.json (no stale outputs)
- No output files are empty
Outputs produced:
| File |
Description |
output.md |
Merged translated Markdown |
book.html |
Web version with floating TOC |
book.docx |
Word document |
book.epub |
E-book format |
book.pdf |
Print-ready PDF |
Project Structure
translate-book/
βββ SKILL.md # Claude Code skill definition (orchestrator)
βββ scripts/
β βββ convert.py # PDF/DOCX/EPUB β Markdown chunks via Calibre HTMLZ
β βββ manifest.py # SHA-256 chunk tracking and merge validation
β βββ merge_and_build.py # Merge chunks β HTML β DOCX/EPUB/PDF
β βββ calibre_html_publish.py # Calibre wrapper for format conversion
β βββ template.html # Web HTML template with floating TOC
β βββ template_ebook.html # Ebook HTML template
βββ README.md
How Manifest Validation Works
manifest = {
"chunk0001.md": "sha256:abc123...",
"chunk0002.md": "sha256:def456...",
}
If validation fails, the script auto-deletes stale output.md and re-merges from valid chunk outputs.
Real-World Example: Translate a Technical Book
npx skills add deusyu/translate-book -a claude-code -g
cd ~/books
ls clean-code_temp/
Resuming an Interrupted Translation
Changing Output Metadata After Translation
If you need to update the title, author, template, or image assets without re-translating:
cd book_name_temp/
rm -f output.md book*.html book.docx book.epub book.pdf
python3 ../scripts/merge_and_build.py \
--temp-dir . \
--title "γNew Titleγ"
Do NOT delete chunk files β those are your translated content. Only delete final artifacts when changing metadata.
Troubleshooting
| Problem |
Solution |
Calibre ebook-convert not found |
Install Calibre; ensure ebook-convert is in $PATH |
Manifest validation failed |
Source chunks changed β re-run convert.py |
Missing source chunk |
Source file deleted β re-run convert.py to regenerate |
| Incomplete translation |
Re-run the skill β resumes from last valid chunk |
| Changed title/template but output unchanged |
Delete output.md, book*.html, book.docx, book.epub, book.pdf then re-run merge_and_build.py |
output.md exists but manifest invalid |
Script auto-deletes stale output and re-merges |
| PDF generation fails |
Verify Calibre has PDF output support; try ebook-convert --help |
| Empty output chunks |
Retry failed chunks; check API rate limits |
Diagnosing Chunk Issues
ls book_temp/chunk*.md | wc -l
ls book_temp/output_chunk*.md | wc -l
for f in book_temp/chunk*.md; do
base=$(basename "$f" .md)
out="book_temp/output_${base}.md"
if [ ! -f "$out" ] || [ ! -s "$out" ]; then
echo "Missing: $out"
fi
done
cat book_temp/manifest.json | python3 -m json.tool | head -30
Configuration Tips
- Chunk size: ~6000 chars per chunk is the default. Smaller chunks = more parallelism but more API calls.
- Concurrency: Default is 8 parallel subagents per batch. Adjust in
SKILL.md if hitting rate limits.
- Languages: Add new language codes to the skill triggers and translation prompt in
SKILL.md.
- Templates: Customize
scripts/template.html and scripts/template_ebook.html for different HTML/ebook styling.
Key Design Principles
- Isolated context per chunk β each subagent starts fresh, preventing context overflow on long books
- Hash-based integrity β SHA-256 tracking catches stale or corrupt translated chunks before merging
- Resumable at chunk granularity β never re-translate what's already done
- Format-agnostic input β Calibre handles PDF/DOCX/EPUB normalization before the pipeline begins
- Multiple output formats β single pipeline produces HTML, DOCX, EPUB, and PDF simultaneously