dialogue-audio

inferen-sh/skills · updated Apr 8, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/inferen-sh/skills --skill dialogue-audio
0 commentsdiscussion
summary

Realistic multi-speaker dialogue audio generation with Dia TTS via inference.sh CLI.

  • Supports two-speaker conversations with automatic voice assignment using [S1] and [S2] speaker tags
  • Emotion and pacing controlled through punctuation ( . , ! , ? , ... , — ) and parenthetical sound cues like (laughs) , (sighs) , and (whispers)
  • Includes structured patterns for interviews, tutorials, debates, and conversational content with practical script-writing guidelines
  • Post-production support
skill.md

Dialogue Audio

Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.

Quick Start

Requires inference.sh CLI (infsh). Install instructions

infsh login

# Two-speaker conversation
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'

Speaker Tags

Dia TTS uses [S1] and [S2] to distinguish two speakers.

Tag Role Voice
[S1] Speaker 1 Automatically assigned voice A
[S2] Speaker 2 Automatically assigned voice B

Rules:

  • Always start each speaker turn with the tag
  • Tags must be uppercase: [S1] not [s1]
  • Maximum 2 speakers per generation
  • Each speaker maintains consistent voice within a session

Emotion & Expression Control

Dia TTS interprets punctuation and non-speech cues for emotional delivery.

Punctuation Effects

Punctuation Effect Example
. Neutral, declarative, medium pause "This is important."
! Emphasis, excitement, energy "This is amazing!"
? Rising intonation, questioning "Are you sure about that?"
... Hesitation, trailing off, long pause "I thought it would work... but it didn't."
, Short breath pause "First, we analyze. Then, we act."
or -- Interruption or pivot "I was going to say — never mind."

Non-Speech Sounds

Dia TTS supports parenthetical sound descriptions:

(laughs)      — laughter
(sighs)       — exasperation or relief
(clears throat) — attention-getting pause
(whispers)    — softer delivery
(gasps)       — surprise

Examples with Emotion

# Excited conversation
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'

# Serious/thoughtful dialogue
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'

# Teaching/explaining
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'

Pacing Control

Pause Hierarchy

Technique Pause Length Use For
Comma , ~0.3 seconds Between clauses, list items
Period . ~0.5 seconds Between sentences
Ellipsis ... ~1.0 seconds Dramatic pause, thinking, hesitation
New speaker tag ~0.3 seconds Natural turn-taking gap

Speed Control

  • Shorter sentences = faster perceived pace
  • Longer sentences with commas = measured, thoughtful pace
  • Questions followed by answers = engaging back-and-forth rhythm
# Fast-paced, energetic
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync."
}'

# Slow, contemplative
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now."
}'

Conversation Structure Patterns

Interview Format

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'

Tutorial / Explainer

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'

Debate / Discussion

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'

Post-Production Tips

Volume Normalization

Both speakers should be at consistent volume. If one is louder:

# Merge with balanced audio
infsh app run infsh/video-audio-merger --input '{
  "video": "talking-head.mp4",
  "audio": "dialogue.mp3",
  "audio_volume": 1.0
}'

Adding Background/Music

# Merge dialogue with background music
infsh app run infsh/media-merger --input '{
  "media": ["dialogue.mp3", "background-music.mp3"]
}'

Segmenting Long Conversations

For conversations longer than ~30 seconds, generate in segments:

# Segment 1: Introduction
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome back to another episode..."
}'

# Segment 2: Main content
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So let us dive into today s topic..."
}'

# Segment 3: Wrap-up
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Great conversation today..."
}'

# Merge all segments
infsh app run infsh/media-merger --input '{
  "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'

Script Writing Tips

Do Don't
Write how people talk Write how people write
Short sentences (< 15 words) Long academic sentences
Contractions ("can't", "won't") Formal ("cannot", "will not")
Natural fillers ("So,", "Well,") Every sentence perfectly formed
Vary sentence length All sentences same length
Include reactions ("Exactly!", "Hmm.") One-sided monologues
Read it aloud before generating Assume it sounds right

Common Mistakes

Mistake Problem Fix
Monologues longer than 3 sentences Sounds like a lecture, not conversation Break into exchanges
No emotional variation Flat, robotic delivery Use punctuation and non-speech cues
Missing speaker tags Voices don't alternate Start every turn with [S1] or [S2]
Formal written language Sounds unnatural spoken Use contractions, short sentences
No pauses between topics Feels rushed Use ... or scene breaks
All same energy level Monotonous Vary between high/low energy moments

Related Skills

# ElevenLabs dialogue (22+ voices, voice direction)
npx skills add inference-sh/skills@elevenlabs-dialogue

npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@ai-podcast-creation
npx skills add inference-sh/skills@ai-avatar-video

Browse all apps: infsh app list

how to use dialogue-audio

How to use dialogue-audio on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add dialogue-audio
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/inferen-sh/skills --skill dialogue-audio

The skills CLI fetches dialogue-audio from GitHub repository inferen-sh/skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/dialogue-audio

Reload or restart Cursor to activate dialogue-audio. Access the skill through slash commands (e.g., /dialogue-audio) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

User Story & Requirements Generation

Create detailed user stories, acceptance criteria, and feature specs

Example

Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios

Reduce spec writing time by 50%, ensure comprehensive coverage

Competitive Analysis

Research competitors, compare features, identify gaps

Example

Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities

Complete competitive research in 2 hours instead of 2 days

Roadmap Prioritization

Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs

Example

Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale

Make data-driven prioritization decisions faster

Stakeholder Communication

Draft PRDs, status updates, and stakeholder presentations

Example

Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement

Save 3-5 hours/week on communication overhead

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client
  • Access to product documentation and roadmap tools (Jira, Notion, etc.)
  • Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
  • Stakeholder contact information and communication channels

Time Estimate

30-60 minutes to see productivity improvements

Installation Steps

  1. 1.Install product management skill
  2. 2.Start with user story generation for known feature
  3. 3.Progress to competitive analysis: research 2-3 competitors
  4. 4.Use for roadmap prioritization: apply RICE/ICE scoring
  5. 5.Draft stakeholder communications and refine based on feedback
  6. 6.Build template library for recurring PM tasks
  7. 7.Share effective prompts with product team

Common Pitfalls

  • Not validating competitive research—verify facts before sharing
  • Accepting user stories without involving engineering team
  • Over-relying on frameworks without qualitative judgment
  • Not customizing outputs to company culture and communication style
  • Skipping stakeholder validation of generated requirements

Best Practices

✓ Do

  • +Validate research and competitive analysis with real data
  • +Collaborate with engineering when generating technical requirements
  • +Customize frameworks and templates to your company context
  • +Use skill for first drafts, refine with stakeholder input
  • +Document successful prompt patterns for PM tasks
  • +Combine AI efficiency with human judgment and intuition

✗ Don't

  • Don't publish competitive analysis without fact-checking
  • Don't finalize user stories without engineering review
  • Don't make prioritization decisions solely on AI scoring
  • Don't skip customer validation of generated requirements
  • Don't ignore company-specific context and culture

💡 Pro Tips

  • Provide context: company goals, constraints, customer feedback
  • Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
  • Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
  • Use skill for 70% generation + 30% customization to company needs

When to Use This

✓ Use When

Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.

✗ Avoid When

Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.

Learning Path

  1. 1Basic: user stories, feature specs, status updates
  2. 2Intermediate: competitive analysis, prioritization frameworks, PRDs
  3. 3Advanced: product strategy, go-to-market planning, OKR setting
  4. 4Expert: product vision, market positioning, business model innovation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.454 reviews
  • Carlos Iyer· Dec 20, 2024

    We added dialogue-audio from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Pratham Ware· Dec 16, 2024

    Registry listing for dialogue-audio matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Henry Liu· Dec 16, 2024

    Useful defaults in dialogue-audio — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Hana Shah· Dec 8, 2024

    dialogue-audio reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Hana Srinivasan· Dec 8, 2024

    dialogue-audio fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Min Huang· Dec 4, 2024

    dialogue-audio is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Soo Desai· Nov 27, 2024

    Registry listing for dialogue-audio matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Anaya Huang· Nov 15, 2024

    dialogue-audio is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Min Li· Nov 11, 2024

    Useful defaults in dialogue-audio — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Sakshi Patil· Nov 7, 2024

    dialogue-audio reduced setup friction for our internal harness; good balance of opinion and flexibility.

showing 1-10 of 54

1 / 6