ElevenLabs Voiceover Generation
Generate professional AI voiceovers for Remotion videos using ElevenLabs API.
Prerequisites
ELEVENLABS_API_KEY in .env.local
Quick Start
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text here" --output public/audio/voiceover.mp3
node .claude/skills/elevenlabs-remotion-skill/generate.js --text "Your text" --character narrator --output voiceover.mp3
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes remotion/scenes.json --output-dir public/audio/project/
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --scene scene2 --new-text "Updated text"
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-voices
node .claude/skills/elevenlabs-remotion-skill/generate.js --list-characters
Character Presets
Use character presets for more natural voiceovers instead of literal screen text reading:
| Character |
Description |
Best For |
literal |
Reads text exactly as written |
Screen text, quotes |
narrator |
Professional storyteller, smooth, engaging |
Explainers, documentaries |
salesperson |
Enthusiastic, persuasive, energetic |
Marketing, ads |
expert |
Authoritative, confident, knowledgeable |
Legal content, tutorials |
conversational |
Casual, friendly, natural |
Social media, casual content |
dramatic |
Intense, emotional, impactful |
Hooks, problem statements |
calm |
Soothing, reassuring, gentle |
Trust-building, conclusions |
node .claude/skills/elevenlabs-remotion-skill/generate.js --scenes scenes.json --character narrator --output-dir public/audio/
{
"scenes": [
{ "id": "scene1", "text": "Problem statement", "character": "dramatic" },
{ "id": "scene2", "text": "Solution", "character": "calm" }
]
}
Scene-Based Generation with Request Stitching
Generate multiple scenes with consistent prosody using ElevenLabs request stitching:
scenes.json Format
{
"name": "product-demo",
"voice": "George",
"character": "narrator",
"scenes": [
{
"id": "scene1",
"text": "Generic text-to-speech sounds robotic. Your brand deserves better.",
"duration": 4.5,
"character": "dramatic"
},
{
"id": "scene2",
"text": "With voice cloning, you can use your own voice for unlimited content.",
"duration": 5.5
},
{
"id": "scene3",
"text": "Record a short sample. Clone it. Create professional voiceovers in minutes.",
"duration": 6,
"delay": 0.3
}
]
}
Generate All Scenes
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/product-demo-scenes.json \
--output-dir public/audio/product-demo/
This creates:
product-demo-scene1.mp3 through sceneN.mp3
product-demo-combined.mp3 (all scenes stitched)
product-demo-info.json (metadata with durations)
Single Scene Regeneration
If a scene starts too early, has wrong timing, or needs different text:
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene2 \
--new-text "Updated scene 2 text" \
--output-dir public/audio/project/
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene3 \
--character salesperson \
--output-dir public/audio/project/
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--scenes remotion/scenes.json \
--scene scene1 \
--output-dir public/audio/project/
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/my-video.mp4 \
--thumbnail public/videos/my-thumbnail.png \
--output public/videos/my-video-with-thumb.mp4
The tool automatically:
- Uses request stitching from previous scenes for consistent prosody
- Updates the info.json file with new metadata
- Updates scenes.json if
--new-text is provided
Thumbnail Embedding
Embed a thumbnail image into MP4 videos so platforms like Twitter, YouTube, and video players display your custom thumbnail instead of the first frame.
Embed Thumbnail into Video
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/promo.mp4 \
--thumbnail public/videos/thumbnail.png
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/promo.mp4 \
--thumbnail public/videos/thumbnail.png \
--output public/videos/promo-final.mp4
Workflow with Remotion
npx remotion render MyVideo public/videos/my-video.mp4
npx remotion still MyVideoThumbnail public/videos/my-thumbnail.png
node .claude/skills/elevenlabs-remotion-skill/generate.js \
--embed-thumbnail public/videos/my-video.mp4 \
--thumbnail public/videos/my-thumbnail.png \
--output public/videos/my-video-final.mp4
Supported Formats
- Video: MP4 (H.264/H.265)
- Thumbnail: PNG, JPG, JPEG
The embedding uses ffmpeg's -disposition:v:1 attached_pic flag to set the thumbnail as an attached picture, which most video players and platforms recognize.
Timing Validation
The skill automatically validates timing after generation using ffprobe:
What It Checks
| Check |
Threshold |
Description |
| Duration mismatch |
>15% |
Warns if actual differs from expected duration |
| Leading silence |
>200ms |
Audio starts late (voiceover delayed) |
| Trailing silence |
>500ms |
Unnecessary silence at end |
| Speaking rate |
2-4.5 wps |
Optimal ~3 words/second |
Validate Existing Audio
node .claude/skills/elevenlabs-remotion-skill/generate.js --validate public/audio/product-demo/
Output example:
๐ Validating product-demo (6 scenes)
โ scene1: 3.00s (expected: 4.5s)
โ Audio 1.50s shorter than expected
๐ 8 words @ 3.1 words/sec
โ ๏ธ scene2: 6.35s (expected: 5.5s)
โ ๏ธ Leading silence: 235ms (may start late)
๐ข 10 words @ 1.8 words/sec
โ
scene4: 4.36s (expected: 4s)
๐ 9 words @ 2.3 words/sec
๐ Total duration: 30.80s (expected: 30.00s)
Updated info.json
After validation, the info.json includes actual measurements:
{
"scenes": [
{
"id": "scene1",
"duration": 4.5,
"actualDuration": 3.0,
"leadingSilence": 0.05,
"wordsPerSecond": 3.1
}
]
}
Use actualDuration in your Remotion composition for precise sync.
Options
| Option |
Description |
Default |
--text, -t |
Text to convert to speech |
Required (or --file/--scenes) |
--file, -f |
Read text from file |
- |
--output, -o |
Output file path |
output.mp3 |
--output-dir |
Output directory for scenes |
public/audio |
--voice, -v |
Voice name or ID |
George |
--model, -m |
Model ID |
eleven_multilingual_v2 |
--character, -c |
Character preset |
literal |
--scenes |
JSON file with scenes |
- |
--scene |
Regenerate single scene ID |
- |
--new-text |
New text for scene regen |
- |
--validate |
Validate existing audio dir |
- |
--skip-validation |
Skip auto-validation |
false |
--embed-thumbnail |
Video file to embed thumbnail into |
- |
--thumbnail |
Thumbnail image file (PNG/JPG) |
- |
--stability |
Voice stability (0-1) |
varies by character |
--similarity |
Voice similarity (0-1) |
varies by character |
--style |
Style exaggeration (0-1) |
varies by character |
--no-combined |
Skip combined file |
false |
Recommended Voices
| Voice |
Style |
Best For |
George |
Warm, captivating British |
Narration, explainers |
Antoni |
Professional, warm |
Legal content, tutorials |
Arnold |
Authoritative, deep |
Corporate, serious topics |
Josh |
Friendly, conversational |
Marketing, casual content |
Integration with Remotion
After generating scene voiceovers, use them in your composition:
import { Audio, Sequence, staticFile } from "remotion";
const SCENE_DURATIONS