comfyui-video-pipeline▌
mckruz/comfyui-expert · updated Apr 8, 2026
Orchestrates video generation across three engines, selecting the best one based on requirements and available resources.
ComfyUI Video Pipeline
Orchestrates video generation across three engines, selecting the best one based on requirements and available resources.
Engine Selection
VIDEO REQUEST
|
|-- Need film-level quality?
| |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
| |-- Yes + 8GB VRAM → Wan 2.2 1.3B
|
|-- Need long video (>10 seconds)?
| |-- Yes → FramePack (60 seconds on 6GB)
|
|-- Need fast iteration?
| |-- Yes → AnimateDiff Lightning (4-8 steps)
|
|-- Need camera/motion control?
| |-- Yes → AnimateDiff V3 + Motion LoRAs
|
|-- Need first+last frame control?
| |-- Yes → Wan 2.2 MoE (exclusive feature)
|
|-- Default → Wan 2.2 (best general quality)
Pipeline 1: Wan 2.2 MoE (Highest Quality)
Image-to-Video
Prerequisites:
wan2.1_i2v_720p_14b_bf16.safetensorsinmodels/diffusion_models/umt5_xxl_fp8_e4m3fn_scaled.safetensorsinmodels/clip/open_clip_vit_h_14.safetensorsinmodels/clip_vision/wan_2.1_vae.safetensorsinmodels/vae/
Settings:
| Parameter | Value | Notes |
|---|---|---|
| Resolution | 1280x720 (landscape) or 720x1280 (portrait) | Native training resolution |
| Frames | 81 (~5 seconds at 16fps) | Multiples of 4 + 1 |
| Steps | 30-50 | Higher = better quality |
| CFG | 5-7 | |
| Sampler | uni_pc | Recommended for Wan |
| Scheduler | normal |
Frame count guide:
| Duration | Frames (16fps) |
|---|---|
| 1 second | 17 |
| 3 seconds | 49 |
| 5 seconds | 81 |
| 10 seconds | 161 |
VRAM optimization:
- FP8 quantization: halves VRAM with minimal quality loss
- SageAttention: faster attention computation
- Reduce frames if OOM
Text-to-Video
Same as I2V but uses wan2.1_t2v_14b_bf16.safetensors and EmptySD3LatentImage instead of image conditioning.
First+Last Frame Control (Wan 2.2 Exclusive)
Wan 2.2 MoE allows specifying both the first and last frame, enabling precise video planning:
- Generate two hero images with consistent character
- Use first as start frame, second as end frame
- Wan interpolates the motion between them
Pipeline 2: FramePack (Long Videos, Low VRAM)
Key Innovation
VRAM usage is invariant to video length - generates 60-second videos at 30fps on just 6GB VRAM.
How it works:
- Dynamic context compression: 1536 markers for key frames, 192 for transitions
- Bidirectional memory with reverse generation prevents drift
- Frame-by-frame generation with context window
Settings
| Parameter | Value | Notes |
|---|---|---|
| Resolution | 640x384 to 1280x720 | Depends on VRAM |
| Duration | Up to 60 seconds | VRAM-invariant |
| Quality | High (comparable to Wan) | Uses same base models |
When to Use
- Videos longer than 10 seconds
- Limited VRAM systems (but RTX 5090 doesn't need this)
- When VRAM is needed for parallel operations
- Batch video generation
Pipeline 3: AnimateDiff V3 (Fast, Controllable)
Strengths
- Motion LoRAs for camera control (pan, zoom, tilt, roll)
- Effect LoRAs (shatter, smoke, explosion, liquid)
- Sliding context window for infinite length
- Very fast with Lightning model (4-8 steps)
Settings
| Parameter | Value (Standard) | Value (Lightning) |
|---|---|---|
| Motion Module | v3_sd15_mm.ckpt |
animatediff_lightning_4step.safetensors |
| Steps | 20-25 | 4-8 |
| CFG | 7-8 | 1.5-2.0 |
| Sampler | euler_ancestral | lcm |
| Resolution | 512x512 | 512x512 |
| Context Length | 16 | 16 |
| Context Overlap | 4 | 4 |
Camera Motion LoRAs
| LoRA | Motion |
|---|---|
| v2_lora_ZoomIn | Camera zooms in |
| v2_lora_ZoomOut | Camera zooms out |
| v2_lora_PanLeft | Camera pans left |
| v2_lora_PanRight | Camera pans right |
| v2_lora_TiltUp | Camera tilts up |
| v2_lora_TiltDown | Camera tilts down |
| v2_lora_RollingClockwise | Camera rolls clockwise |
Post-Processing Pipeline
After any video generation:
1. Frame Interpolation (RIFE)
Doubles or quadruples frame count for smoother motion:
Input (16fps) → RIFE 2x → Output (32fps)
Input (16fps) → RIFE 4x → Output (64fps)
Use rife47 or rife49 model.
2. Face Enhancement (if character video)
Apply FaceDetailer to each frame:
- denoise: 0.3-0.4 (lower than image - preserves temporal consistency)
- guide_size: 384 (speed optimization for video)
- detection_model: face_yolov8m.pt
3. Deflicker (if needed)
Reduces temporal inconsistencies between frames.
4. Color Correction
Maintain consistent color grading across frames.
5. Video Combine
Final output via VHS Video Combine:
frame_rate: 16 (native) or 24/30 (after interpolation)
format: "video/h264-mp4"
crf: 19 (high quality) to 23 (smaller file)
Talking Head Pipeline
Complete pipeline for character dialogue:
1. Generate audio → comfyui-voice-pipeline
2. Generate base video → This skill (Wan I2V or AnimateDiff)
- Prompt: "{character}, talking naturally, slight head movement"
- Duration: match audio length
3. Apply lip-sync → Wav2Lip or LatentSync
4. Enhance faces → FaceDetailer + CodeFormer
5. Final output → video-assembly
Quality Checklist
Before marking video as complete:
- Character identity consistent across frames
- No flickering or temporal artifacts
- Motion looks natural (not jerky or frozen)
- Face enhancement applied if character video
- Frame rate is smooth (24+ fps for delivery)
- Audio synced (if talking head)
- Resolution matches delivery target
Reference
references/workflows.md- Workflow templates for Wan and AnimateDiffreferences/models.md- Video model download linksreferences/research-log.md- Latest video generation advancesstate/inventory.json- Available video models
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.7★★★★★73 reviews- ★★★★★Henry Liu· Dec 28, 2024
Keeps context tight: comfyui-video-pipeline is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Nia Lopez· Dec 24, 2024
comfyui-video-pipeline is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Carlos Abbas· Dec 24, 2024
I recommend comfyui-video-pipeline for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Mei Ndlovu· Dec 20, 2024
Keeps context tight: comfyui-video-pipeline is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Ganesh Mohane· Dec 16, 2024
Keeps context tight: comfyui-video-pipeline is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Nia Abebe· Dec 12, 2024
comfyui-video-pipeline reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Harper Garcia· Dec 4, 2024
comfyui-video-pipeline has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Aisha Yang· Nov 23, 2024
comfyui-video-pipeline fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Charlotte Abbas· Nov 19, 2024
Registry listing for comfyui-video-pipeline matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Henry Verma· Nov 15, 2024
comfyui-video-pipeline reduced setup friction for our internal harness; good balance of opinion and flexibility.
showing 1-10 of 73