The first LLM for text-to-speech

Prompt to create AI voices, change emotions, and more

Trusted By

University of Zurich

University of Zurich

University of Zurich

A text-to-speech system that understands what it's saying

Octave (Omni-capable text and voice engine) isn't a traditional TTS model. It’s a voice-based LLM. That means it understands what words mean in context, so it can predict emotions, cadence, and more.

Create any voice you can imagine with Octave Voice Design

Create any AI voice you can imagine, like a "sarcastic medieval peasant," with a brief prompt or evocative script

Sarcastic medieval peasant

Full prompt: The speaker is a medieval peasant with a cockney accent, raspy voice, dripping with sarcasm.

00:00

00:00

Literature professor

Full prompt: A retired Black female literature professor who analyzes poetry with precise academic language and references to her own published criticism.

00:00

00:00

Charming cowboy

Full prompt: The speaker is a grizzled old cowboy with a folksy Texan drawl Southern accent, speaking in a charismatic tone with a deep but relaxed vibe.

00:00

00:00

Sitcom inner monologue

Full prompt: The star of a popular sitcom, with frequent inner monologues about her life.

00:00

00:00

Dungeon master

Full prompt: A know-it-all dungeons and dragons dungeon master speaking excitedly with a lisp.

00:00

00:00

Warm English narrator

Full prompt: The speaker is a sophisticated British female narrator with a gentle, warm voice, recounting the ending of a classic romance novel.

00:00

00:00

Unserious movie trailer guy

Full prompt: The speaker is an American, deep middle-aged male film trailer narrator for a film about chickens.

00:00

00:00

Raspy evil vampire

Full prompt: A villainous undead vampire, with a horrifying raspy voice, and a slight Transylvanian accent.

00:00

00:00

Reminiscing man

Full prompt: A middle-aged African American man, reminiscing with a slightly gravelly voice and a tone of hard-earned wisdom.

00:00

00:00

Nature documentary narrator

Full prompt: The speaker is a distinguished British narrator, whose voice carries a deep sense of wisdom and curiosity.

00:00

00:00

Texan fishing guru

Prompt: The speaker has a booming, charismatic radio voice, like a Texan fishing guru with a hint of gravel and an infectious laugh, perfect for reeling in listeners to 'Big Dicky's live fishing frenzy.'

00:00

00:00

Any emotion or speaking style, on command

Octave is the first TTS system that can take natural language instructions to change emotional delivery and speaking style. Give directions like "sound sarcastic" or "whisper fearfully." For the first time, creators have total control.

For creators and developers alike

Octave was built to generate the most expressive AI voices for any content: podcasts, voiceovers, audiobooks, and more. With our API, you can bring it to any application.

playground Documentation

TTS Projects

The full developer platform for deploying emotionally intelligent voice agents

Everything you need to create voice experiences that users actually want to engage with—from rapid prototyping to enterprise-scale deployment

Custom Voices

Create your own voices using prompts, or use default pre-curated voices per use case.

Naturalness

A majority of users prefer Hume over other TTS providers due to the naturalness and expressive nature of Octave's voices.

High Quality Audio

Hume-generated voices were preferred in 71.6% of trials against ElevenLabs.

Speed & Pause Control

Add pauses and modify the speed/pacing to alter the delivery of your text.

Expression Control

Control your character's delivery with "acting instructions," which lets you direct exactly how you'd like the text to be spoken.

Low Latency

Turn on Instant Mode for 200ms response time, so you can get expressiveness, high quality, and speed, all in one.

Ultra low latency streaming without compromising on quality

Use Instant Mode to reduce your time to first token (TTFT) to 200ms with our streaming API for developers. Hume TTS is perfect for AI assistants, avatars, phone calling, and so much more.

References

Image (21)

Octave TTS: the first text-to-speech system that understands what it’s saying

Today we’re launching Octave (Omni-capable text and voice engine), the first LLM for text-to-speech. Unlike conventional TTS that merely “reads” words, Octave is a speech-language model that understands what words mean in context, unlocking a new level of expressiveness and nuance—and new AI voice capabilities.

Documentation Rounded

Developer Documentation

Explore our documentation with concise guides, hands-on tutorials, and an in-depth API reference—crafted to support your integration.

Explore the docs

Blog Placeholder Default

Controlling the speed of AI voices

The rise of artificial intelligence has revolutionized many aspects of our lives, and one area where its impact is increasingly felt is in the realm of voice technology. AI voices are now commonly used in virtual assistants, customer service bots, audiobooks, video games, and various accessibility tools. As this technology continues to evolve, a key question emerges: to what extent can we control the speed (speech rate) of these synthetic voices?

00/00