explainx / blog
video-use: Edit Videos With Claude Code β No Premiere Pro Needed
video-use turns Claude Code into a video editor β no Premiere Pro needed. Drop footage in a folder, describe the cut, get final.mp4. Full setup guide.
explainx / blog
video-use turns Claude Code into a video editor β no Premiere Pro needed. Drop footage in a folder, describe the cut, get final.mp4. Full setup guide.
Jun 27, 2026
OpenMontage hit GitHub Trending with 23.6k stars as the first open-source agentic video production system. This guide answers what it actually does, whether you need paid API keys, how it differs from slideshow generators, and how to run it in Claude Code or Cursor.
Jun 24, 2026
Justin Poehnelt spent nearly seven years at Google on the Workspace DevRel team. He built an open-source CLI for Google Workspace β Drive, Gmail, Calendar, 40+ agent skills β that went viral, hit #1 on Hacker News, and gained thousands of users within days. Then Google fired him. Two days later, Google Cloud Next announced an official Workspace CLI was coming. The irony is precise. The story behind it reveals something about how large companies respond to internal disruption in the age of AI agents.
Jun 23, 2026
Firecrawl is not another scraping library. It is a web context layer between the messy, JS-rendered, CAPTCHA-gated internet and LLMs that need clean data. The Agent endpoint β describe what you want, get it β is the interesting part. 137K stars and counting.
Adobe Premiere Pro costs $60/month and takes hundreds of hours to master. DaVinci Resolve is free but has a steep learning curve. Final Cut Pro is Mac-only and $300 upfront.
For a large class of video editing tasks β cutting filler words, removing dead air, color grading, burning subtitles, combining multiple takes into a launch video β none of that complexity is necessary anymore.
video-use is an open-source tool from the browser-use team that turns Claude Code into a video editor. You drop raw footage in a folder, describe what you want, and get edit/final.mp4 back. No timeline. No menus. No NLE.
11.6k GitHub stars in roughly two months. MIT license. Here is how it works and how to set it up.
video-use is a skill β a self-contained plugin that registers with Claude Code (or Codex, Hermes, Openclaw) and gives it video editing capabilities via shell access.
It handles:
project.md so next week's session picks up exactly where it left offThe output lands in <your_videos_dir>/edit/ β the skill directory stays clean.
The LLM never watches the video. It reads it.
Traditional frame-based AI video approaches dump thousands of frames as images β 30,000 frames Γ 1,500 tokens = 45 million tokens of visual noise per minute of footage. That is why they are slow, expensive, and miss context.
video-use does what browser-use does for web browsing: instead of showing the LLM raw pixels, it gives it a structured representation. For the web, that is the DOM. For video, that is the transcript.
Layer 1 β Audio transcript (always loaded). One ElevenLabs Scribe call per source clip gives word-level timestamps, speaker diarization (who is speaking), and audio events like (laughter) and (applause). Every take packs into a single ~12KB markdown file:
## C0103 (duration: 43.0s, 8 phrases)
[002.52-005.36] S0 Ninety percent of what a web agent does is completely wasted.
[006.08-006.74] S0 We fixed this.
This is the LLM's primary editing surface. It can reason over word boundaries, identify the best take of a phrase, detect false starts before the word ends, and place cuts at exact millisecond timestamps β all from text.
Layer 2 β Visual composite (on demand). When the LLM needs to compare a close-up cut or sanity-check an ambiguous pause, it calls timeline_view β a tool that generates a filmstrip + speaker track + waveform + word labels PNG for any time range. It is called only at decision points, not for every frame.
The result: the same quality of edit decision a human editor makes from scrubbing, but derived from 12KB of text and a handful of targeted images.
Transcribe ββ> Pack ββ> LLM Reasons ββ> EDL ββ> Render ββ> Self-Eval
β
ββ issue? fix + re-render (max 3x)
takes_packed.md β one file the LLM holds in contexttimeline_view on the rendered output at every cut boundary. Catches visual jumps, audio pops, hidden subtitles. If issues found: fix + re-render. You see the preview only after it passes.The "ask β confirm β execute β self-eval β persist" sequence is the same pattern as Claude Code's agentic coding loop β the agent never takes a destructive action (overwriting footage, finalizing render) without strategy approval first.
uv or pipbrew install ffmpeg on Mac, apt install ffmpeg on Linux)elevenlabs.io/app/settings/api-keys)Paste this into Claude Code:
Set up https://github.com/browser-use/video-use for me.
Read install.md first to install wire up ffmpeg, register the skill with whichever agent you're running under, and set up the ElevenLabs API key β ask me to paste it when you need it. Then read SKILL.md for daily usage, and always read helpers/ because that's where the editing scripts live. After install, don't transcribe anything on your own β just tell me it's ready and wait for me to drop footage into a folder.
Claude Code handles: clone, dependencies, skill registration, prompts you once for the ElevenLabs API key.
# 1. Clone and symlink into your agent's skills directory
git clone https://github.com/browser-use/video-use ~/Developer/video-use
ln -sfn ~/Developer/video-use ~/.claude/skills/video-use # Claude Code
# ln -sfn ~/Developer/video-use ~/.codex/skills/video-use # Codex
# 2. Install Python deps
cd ~/Developer/video-use
uv sync # or: pip install -e .
# 3. Install system deps
brew install ffmpeg # required
brew install yt-dlp # optional β for downloading online sources
# 4. API key
cp .env.example .env
# Edit .env and add: ELEVENLABS_API_KEY=your_key_here
cd /path/to/your/videos
claude # opens Claude Code in that folder
Then in Claude Code:
edit these into a launch video
The agent will:
edit/final.mp4For an always-on setup (editing from your phone or via Telegram), the team recommends running through Browser Use Box on a VPS.
Default: 2-word UPPERCASE chunks burned at the bottom. To customize, describe your style in the session:
use sentence-case subtitles, white text with black outline, position at top
The agent translates your description into ffmpeg drawtext filter parameters.
video-use supports four animation engines:
| Engine | Best for | Notes |
|---|---|---|
| HyperFrames | Motion graphics, lower thirds | Default recommendation |
| Remotion | React-based frame-accurate animations | Requires Node.js |
| Manim | Mathematical/technical visualizations | Requires Python Manim |
| PIL | Simple image overlays, static graphics | Fastest, no extra deps |
Animations run as parallel sub-agents β one per animation block β so a video with five animated sections generates all five concurrently instead of sequentially.
| video-use | Adobe Premiere Pro | DaVinci Resolve | |
|---|---|---|---|
| Price | Free (MIT) | $60/month | Free / $295 (Studio) |
| Interface | Natural language | Timeline GUI | Timeline GUI |
| Learning curve | Minutes | Weeksβmonths | Weeks |
| Filler word removal | Automatic | Manual or plugin | Manual or scripted |
| Frame-precise control | Via description | Yes | Yes |
| Color grading | Auto (ffmpeg presets) | Full professional suite | Industry-leading |
| Subtitle generation | Automatic | Via plugin/AI | Auto-caption feature |
| Animation overlays | Remotion / Manim / PIL | After Effects integration | Fusion (built-in) |
| Self-evaluation | Built-in | Manual preview | Manual preview |
| Session memory | project.md | Project files | Project files |
| Best for | Fast rough cuts, filler removal, launch videos | Professional broadcast, film | Color work, feature-length |
For talking head content, YouTube tutorials, podcast clips, and launch videos β video-use matches or beats the speed of an experienced Premiere Pro editor on the tasks that consume 80% of editing time (filler removal, take selection, subtitle burning). For color work, complex motion graphics, and frame-precise creative cuts, Premiere Pro and DaVinci Resolve remain stronger.
This mirrors how Claude Code vs. traditional coding tools plays out β agents win on repetitive, well-defined tasks; humans + traditional tools win on open-ended creative decisions.
Launch and product videos: Multiple takes, redundant sentences, filler words to cut. The LLM picks the cleanest take of each phrase, cuts dead air, grades warm/cinematic, burns subtitles. This is exactly the use case Thariq's Claude Code Fable 5 launch video pipeline ran β same ffmpeg + Remotion + transcript approach, now packaged as a shareable skill.
Podcast clip generation: Drop a 90-minute recording, ask for the five best 90-second clips for social. The agent reads the full transcript, identifies the most quotable moments, and renders clips with subtitles.
Tutorial screen recordings: Cut dead pauses between steps, add callout animations over specific regions (Manim overlays), burn chapter markers as subtitle events.
Interview editing: Multi-speaker diarization from ElevenLabs tells the agent who is speaking when. It can cut between speakers based on content flow, not just silence detection.
Batch processing: Point the agent at a folder of 20 videos and ask for subtitle-burned versions of all of them. Runs sequentially; session memory tracks which are done.
SKILL.md documents 12 hard production rules the agent enforces regardless of creative direction:
helpers/ before every session β that is where editing scripts liveedit/ only β sources are never modifiedproject.md β next session reads it firstThe philosophy: 12 rules for production correctness, artistic freedom for everything else.
video-use sits in the same agentic video space as ViMax (which generates video from scripts using AI models) and Kling/Sora/Wan 2.5 (which generate video from text or image prompts). The distinction:
These are complementary. You might generate a product demo clip with Kling, shoot talking-head footage yourself, then bring both into video-use to cut and composite the final video.
For loop engineering patterns, video-use's self-eval loop is a good example of a production-correct agent loop: structured output (EDL JSON) β deterministic execution (ffmpeg) β automated verification β conditional retry.
| Task | Command in session |
|---|---|
| First edit | edit these into a launch video |
| Specific style | keep all the takes, just remove filler words and grade warm |
| Subtitles only | burn white sentence-case subtitles, no other edits |
| Social clips | find the 3 best 90-second clips for Instagram |
| Animation | add a lower-third title animation for each speaker |
| Download + edit | download [URL] and edit into a 3-minute summary |
Official resources:
Feature details reflect video-use as of the main branch at June 29, 2026. The project is under active development β check the GitHub README for the latest capabilities.