video-understand▌
heygen-com/skills · updated Apr 18, 2026
Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.
video-understand
Understand video content locally using ffmpeg for frame extraction and Whisper for transcription. Fully offline, no API keys required.
Prerequisites
ffmpeg+ffprobe(required):brew install ffmpegopenai-whisper(optional, for transcription):pip install openai-whisper
Commands
# Scene detection + transcribe (default)
python3 skills/video-understand/scripts/understand_video.py video.mp4
# Keyframe extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m keyframe
# Regular interval extraction
python3 skills/video-understand/scripts/understand_video.py video.mp4 -m interval
# Limit frames extracted
python3 skills/video-understand/scripts/understand_video.py video.mp4 --max-frames 10
# Use a larger Whisper model
python3 skills/video-understand/scripts/understand_video.py video.mp4 --whisper-model small
# Frames only, skip transcription
python3 skills/video-understand/scripts/understand_video.py video.mp4 --no-transcribe
# Quiet mode (JSON only, no progress)
python3 skills/video-understand/scripts/understand_video.py video.mp4 -q
# Output to file
python3 skills/video-understand/scripts/understand_video.py video.mp4 -o result.json
CLI Options
| Flag | Description |
|---|---|
video |
Input video file (positional, required) |
-m, --mode |
Extraction mode: scene (default), keyframe, interval |
--max-frames |
Maximum frames to keep (default: 20) |
--whisper-model |
Whisper model size: tiny, base, small, medium, large (default: base) |
--no-transcribe |
Skip audio transcription, extract frames only |
-o, --output |
Write result JSON to file instead of stdout |
-q, --quiet |
Suppress progress messages, output only JSON |
Extraction Modes
| Mode | How it works | Best for |
|---|---|---|
scene |
Detects scene changes via ffmpeg select='gt(scene,0.3)' |
Most videos, varied content |
keyframe |
Extracts I-frames (codec keyframes) | Encoded video with natural keyframe placement |
interval |
Evenly spaced frames based on duration and max-frames | Fixed sampling, predictable output |
If scene mode detects no scene changes, it automatically falls back to interval mode.
Output
The script outputs JSON to stdout (or file with -o). See references/output-format.md for the full schema.
{
"video": "video.mp4",
"duration": 18.076,
"resolution": {"width": 1224, "height": 1080},
"mode": "scene",
"frames": [
{"path": "/abs/path/frame_0001.jpg", "timestamp": 0.0, "timestamp_formatted": "00:00"}
],
"frame_count": 12,
"transcript": [
{"start": 0.0, "end": 2.5, "text": "Hello and welcome..."}
],
"text": "Full transcript...",
"note": "Use the Read tool to view frame images for visual understanding."
}
Use the Read tool on frame image paths to visually inspect extracted frames.
References
references/output-format.md-- Full JSON output schema documentation
Ratings
4.7★★★★★54 reviews- ★★★★★Shikha Mishra· Dec 20, 2024
video-understand has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Liam Desai· Dec 16, 2024
Useful defaults in video-understand — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Yuki Nasser· Dec 16, 2024
video-understand is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Tariq Lopez· Dec 8, 2024
video-understand reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Yuki Haddad· Dec 4, 2024
We added video-understand from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Tariq Haddad· Nov 27, 2024
Registry listing for video-understand matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Olivia Bansal· Nov 23, 2024
video-understand fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Yash Thakker· Nov 11, 2024
Keeps context tight: video-understand is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Tariq Khan· Nov 7, 2024
video-understand is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Tariq Singh· Nov 7, 2024
Useful defaults in video-understand — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
showing 1-10 of 54