AI/ML

ai-avatar-video

inferen-sh/skills · updated Apr 8, 2026

$npx skills add https://github.com/inferen-sh/skills --skill ai-avatar-video
summary

Generate talking head and avatar videos from images and audio using OmniHuman, Fabric, and PixVerse models.

  • Four model options: OmniHuman 1.5 (multi-character), OmniHuman 1.0 (single character), Fabric 1.0 (image lipsync), and PixVerse Lipsync (highly realistic)
  • Audio-driven workflow: pair portrait images with speech files to generate realistic avatar videos with synchronized lip movement
  • Composable with text-to-speech and video transcription for end-to-end pipelines: generate speech
skill.md

AI Avatar & Talking Head Videos

Create AI avatars and talking head videos via inference.sh CLI.

AI Avatar & Talking Head Videos

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Create avatar video from image + audio
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Available Models

Model App ID Best For
OmniHuman 1.5 bytedance/omnihuman-1-5 Multi-character, best quality
OmniHuman 1.0 bytedance/omnihuman-1-0 Single character
Fabric 1.0 falai/fabric-1-0 Image talks with lipsync
PixVerse Lipsync falai/pixverse-lipsync Highly realistic

Search Avatar Apps

infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"

Examples

OmniHuman 1.5 (Multi-Character)

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Supports specifying which character to drive in multi-person images.

Fabric 1.0 (Image Talks)

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'

PixVerse Lipsync

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Generates highly realistic lipsync from any audio.

Full Workflow: TTS + Avatar

# 1. Generate speech from text
infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our product demo. Today I will show you..."
}' > speech.json

# 2. Create avatar video with the speech
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://presenter-photo.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'

Full Workflow: Dub Video in Another Language

# 1. Transcribe original video
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

# 2. Translate text (manually or with an LLM)

# 3. Generate speech in new language
infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

# 4. Lipsync the original video with new audio
infsh app run infsh/latentsync-1-6 --input '{
  "video_url": "https://original-video.mp4",
  "audio_url": "<new-audio-url>"
}'

Use Cases

  • Marketing: Product demos with AI presenter
  • Education: Course videos, explainers
  • Localization: Dub content in multiple languages
  • Social Media: Consistent virtual influencer
  • Corporate: Training videos, announcements

Tips

  • Use high-quality portrait photos (front-facing, good lighting)
  • Audio should be clear with minimal background noise
  • OmniHuman 1.5 supports multiple people in one image
  • LatentSync is best for syncing existing videos to new audio

Related Skills

# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@infsh-cli

# Text-to-speech (generate audio for avatars)
npx skills add inference-sh/skills@text-to-speech

# Speech-to-text (transcribe for dubbing)
npx skills add inference-sh/skills@speech-to-text

# Video generation
npx skills add inference-sh/skills@ai-video-generation

# Image generation (create avatar images)
npx skills add inference-sh/skills@ai-image-generation

Browse all video apps: infsh app list --category video

Documentation

general reviews

Ratings

4.510 reviews
  • Shikha Mishra· Oct 10, 2024

    ai-avatar-video is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Piyush G· Sep 9, 2024

    Keeps context tight: ai-avatar-video is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Chaitanya Patil· Aug 8, 2024

    Registry listing for ai-avatar-video matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Sakshi Patil· Jul 7, 2024

    ai-avatar-video reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Ganesh Mohane· Jun 6, 2024

    I recommend ai-avatar-video for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Oshnikdeep· May 5, 2024

    Useful defaults in ai-avatar-video — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Dhruvi Jain· Apr 4, 2024

    ai-avatar-video has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Rahul Santra· Mar 3, 2024

    Solid pick for teams standardizing on skills: ai-avatar-video is focused, and the summary matches what you get after install.

  • Pratham Ware· Feb 2, 2024

    We added ai-avatar-video from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Yash Thakker· Jan 1, 2024

    ai-avatar-video fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.