ProductOctave

Octave

Text-to-speech with emotional intelligence. Generate expressive, natural-sounding speech that conveys the full range of human emotion.

Text-to-speech

0/500

Type text and click play to hear it spoken

Your voice, your way

Choose from a curated library of expressive voices, clone any voice from a sample, or design entirely new voices from natural language descriptions. Create the perfect voice for your brand.

Acting instructions

Direct the emotional delivery of every line. Specify tone, pacing, emphasis, and mood with natural language instructions. From whispered secrets to excited announcements—Octave performs exactly as directed.

Speak slowly and in a whisper

Speak with urgency and excitement

With warm enthusiasm

Real-time streaming

Start playback in milliseconds with streaming audio output. Octave begins generating speech instantly, streaming audio chunks as they're ready. Perfect for real-time applications where every millisecond counts.

~300msTime to first byte

Features

Everything you need for text-to-speech

Multilingual

Native-quality speech in 16+ languages with authentic accents.

Word and phoneme level timestamps

Precise timing data for lip sync, captions, and highlighting.

Multiple formats

Export as MP3, WAV, OGG, FLAC, or raw PCM audio.

Contextual continuation

Natural flow across requests with smart heteronym disambiguation.

Streaming output

Start playback immediately with chunked audio delivery.

Speed control

Adjust speaking rate from 0.25x to 4x speed.

Audio normalization

Consistent volume levels across all generated audio.

Voice presets

Save and reuse voice configurations for consistency.

For Developers

Generate speech in seconds

Integrate Octave into your application with just a few lines of code. Full SDK support for Python, TypeScript, .NET, Swift, and more.

index.ts

Case Studies

See what others are building

View all case studies
Screenshot 2025 04 07 at 4.25.08 Pm 3

Niantic Spatial × Hume AI: Creating Interactive & Spatially Aware AI Companions

In partnership with Snap Inc. (hardware) and Hume AI (voice), Niantic Spatial has developed location-aware companions for Spectacles, blending Snap Inc.’s AR glasses, Niantic Spatial’s Large Geospatial Model, and Hume’s Empathic Voice Interface (EVI) for natural, emotionally intelligent conversation. Niantic Spatial, the team pioneering AI that understands the physical world, is showcasing a compelling glimpse of what can happen when spatial intelligence and augmented reality meet.

Read case study
Gaf Logo

GAF Powers Professional Training with Hume’s Text-to-Speech

To support their extensive training programs and marketing initiatives, GAF leverages Hume's text-to-speech technology to make internal training videos and marketing voiceovers. Our partnership addresses several key needs: Professional training content: Delivering consistent, high-quality audio for thousands of contractors and employees. Marketing collateral: Producing engaging voiceovers for promotional content and product demonstrations. Scalable production: Generating content without the logistics and cost of traditional voice recording. Hume's voice design also proved ideal for GAF. The platform's natural, expressive voices maintain the authoritative yet approachable tone that GAF needs to communicate with contractors, retailers, and customers. Unlike synthetic voices that can sound robotic or overly casual, Hume's TTS technology delivers the polished, trustworthy quality expected from an industry leader.

Read case study
Coconot Logo 3.0

Hume AI powers conversational learning with Coconote

While traditional note-taking apps require students to manually scroll and search through content, Coconote is creating interactive study experiences through conversational AI. Coconote’s voice chat feature, powered by Hume's EVI, helps users transform static notes into dynamic conversations. Students can: Ask natural questions about their lecture content Receive contextual explanations referencing specific notes, and Engage in quiz-style conversations for active learning—all through natural voice interaction.

Read case study

Enterprise Ready

Built for business

Deploy with confidence knowing Octave meets the security, compliance, and scale requirements of enterprise applications.

Contact sales

SOC 2 Type II

Enterprise-grade security with industry-leading practices.

HIPAA compliant

Build healthcare applications with confidence.

Enterprise plans

Custom SLAs, dedicated support, and volume pricing.

FAQ

Frequently asked questions

Octave is Hume's text-to-speech API with emotional intelligence. It generates expressive, natural-sounding speech that can convey the full range of human emotion, going beyond flat robotic voices to deliver truly lifelike audio.

Start generating with Octave

Create expressive, natural speech in seconds. Free to start, scales with you.

Stay in the loop

Get the latest on empathic AI research, product updates, and company news.

Join the community

Connect with other developers, share projects, and get help from the team.

Join our Discord