Pauses & rhythm
Older SSML-compatible stacks accept explicit <break time="1.5s" />; newer expressive models substitute narrative punctuation and bracketed deliveries. Anchor breathing room at punctuation, not arbitrary mid-clause commas, unless irony demands it—over-breaking destabilizes some voices.
Normalization playbook
- Expand currencies, ordinals, phone numbers, and ambiguous decimals when listeners need conversational clarity—not spreadsheet fidelity.
- Convert keyboard shortcuts (Cmd/Alt/Ctrl combos) into spoken phrases instead of glyphs.
- For URLs, pick either hyper-verbalized paths or truncated brand references—avoid ambiguous slash stacks.
- When LLMs upstream draft copy, prepend an instruction block mirroring ExplainX normalization recipes (cardinal vs ordinal distinctions, saints vs streets for “St.”).
Pronunciation controls
Phoneme tags shine on supported English flash models—verify compatibility before baking SSML-heavy scripts. Alias grapheme→phoneme substitutions work project-wide inside pronunciation dictionaries; keep case sensitivity in mind during bulk imports.
Eleven “v3” expressive tags
Use bracket tags such as [whispers], [laughs], [sighs] sparingly—they steer delivery but clash with mismatched acoustic priors inside the voice corpus. Compose dialogue cinematically; prune tags downstream if audible artifacts creep in.
Multi-pass composition
Stitch complex beds (ambience loops, narration, sfx) externally when timing must be frame-accurate—few single-shot prose blobs outperform layered stems for dense productions.