ProductOctave

Octave

Text-to-speech with emotional intelligence. Generate expressive, natural-sounding speech that conveys the full range of human emotion.

Get started free Read the docs

Text-to-speech

0/500

Type text and click play to hear it spoken

Your voice, your way

Choose from a curated library of expressive voices, clone any voice from a sample, or design entirely new voices from natural language descriptions. Create the perfect voice for your brand.

Acting instructions

Direct the emotional delivery of every line. Specify tone, pacing, emphasis, and mood with natural language instructions. From whispered secrets to excited announcements—Octave performs exactly as directed.

“With warm enthusiasm”

“Speak slowly and in a whisper”

“Speak with a sarcastic tone”

Real-time streaming

Start playback in milliseconds with streaming audio output. Octave begins generating speech instantly, streaming audio chunks as they're ready. Perfect for real-time applications where every millisecond counts.

~300msTime to first byte

Features

Everything you need for text-to-speech

Multilingual

Native-quality speech in 16+ languages with authentic accents.

Word and phoneme level timestamps

Precise timing data for lip sync, captions, and highlighting.

Multiple formats

Export as MP3, WAV, OGG, FLAC, or raw PCM audio.

Contextual continuation

Natural flow across requests with smart heteronym disambiguation.

Streaming output

Start playback immediately with chunked audio delivery.

Speed control

Adjust speaking rate from 0.25x to 4x speed.

Audio normalization

Consistent volume levels across all generated audio.

Voice presets

Save and reuse voice configurations for consistency.

For Developers

Generate speech in seconds

Integrate Octave into your application with just a few lines of code. Full SDK support for Python, TypeScript, .NET, Swift, and more.

index.ts

Developer resources API documentation

Case Studies

See what others are building

View all case studies

Niantic Spatial × Hume AI: Creating Interactive & Spatially Aware AI Companions

In partnership with Snap Inc. (hardware) and Hume AI (voice), Niantic Spatial has developed location-aware companions for Spectacles, blending Snap Inc.’s AR glasses, Niantic Spatial’s Large Geospatial Model, and Hume’s Empathic Voice Interface (EVI) for natural, emotionally intelligent conversation. Niantic Spatial, the team pioneering AI that understands the physical world, is showcasing a compelling glimpse of what can happen when spatial intelligence and augmented reality meet.

Read case study

GAF Powers Professional Training with Hume’s Text-to-Speech

To support their extensive training programs and marketing initiatives, GAF leverages Hume's text-to-speech technology to make internal training videos and marketing voiceovers. Our partnership addresses several key needs: Professional training content: Delivering consistent, high-quality audio for thousands of contractors and employees. Marketing collateral: Producing engaging voiceovers for promotional content and product demonstrations. Scalable production: Generating content without the logistics and cost of traditional voice recording. Hume's voice design also proved ideal for GAF. The platform's natural, expressive voices maintain the authoritative yet approachable tone that GAF needs to communicate with contractors, retailers, and customers. Unlike synthetic voices that can sound robotic or overly casual, Hume's TTS technology delivers the polished, trustworthy quality expected from an industry leader.

Read case study

Hume AI powers conversational learning with Coconote

While traditional note-taking apps require students to manually scroll and search through content, Coconote is creating interactive study experiences through conversational AI. Coconote’s voice chat feature, powered by Hume's EVI, helps users transform static notes into dynamic conversations. Students can: Ask natural questions about their lecture content Receive contextual explanations referencing specific notes, and Engage in quiz-style conversations for active learning—all through natural voice interaction.

Read case study

Enterprise Ready

Built for business

Deploy with confidence knowing Octave meets the security, compliance, and scale requirements of enterprise applications.

Contact sales

SOC 2 Type II

Enterprise-grade security with industry-leading practices.

HIPAA compliant

Build healthcare applications with confidence.

Enterprise plans

Custom SLAs, dedicated support, and volume pricing.

FAQ

Frequently asked questions

Octave is Hume's text-to-speech API with emotional intelligence. It generates expressive, natural-sounding speech that can convey the full range of human emotion, going beyond flat robotic voices to deliver truly lifelike audio.

Start generating with Octave

Create expressive, natural speech in seconds. Free to start, scales with you.

Get started free Read the docs

Octave

Text-to-speech

Your voice, your way

Acting instructions

Real-time streaming

Multilingual

Word and phoneme level timestamps

Multiple formats

Contextual continuation

Streaming output

Speed control

Audio normalization

Voice presets

Case Studies

Niantic Spatial × Hume AI: Creating Interactive & Spatially Aware AI Companions

GAF Powers Professional Training with Hume’s Text-to-Speech

Hume AI powers conversational learning with Coconote

Enterprise Ready

SOC 2 Type II

HIPAA compliant

Enterprise plans

FAQ

Start generating with Octave

Stay in the loop

Join the community

Octave

Text-to-speech

Your voice, your way

Acting instructions

Real-time streaming

Multilingual

Word and phoneme level timestamps

Multiple formats

Contextual continuation

Streaming output

Speed control

Audio normalization

Voice presets

Case Studies

Niantic Spatial × Hume AI: Creating Interactive & Spatially Aware AI Companions

GAF Powers Professional Training with Hume’s Text-to-Speech

Hume AI powers conversational learning with Coconote

Enterprise Ready

SOC 2 Type II

HIPAA compliant

Enterprise plans

FAQ

What is Octave?

How do acting instructions work?

Can I create custom voices?

What's the typical latency?

What languages does Octave support?

How do I get started?

Start generating with Octave

Stay in the loop

Join the community