Product Updates

Introducing Hume’s Empathic Voice Interface (EVI) API

Published on Apr 18, 2024

Frame

Integrate emotionally intelligent voice experiences into any application with our EVI API.

Introduction

Last month, we released the demo of our Empathic Voice Interface (EVI) API. The first emotionally intelligent voice AI API is finally here!

EVI does a lot more than stitch together transcription, LLMs, and text-to-speech. With a new empathic LLM (eLLM) that processes your tone of voice, EVI unlocks new capabilities like knowing when to speak, generating more empathic language, and intelligently modulating its own tune, rhythm, and timbre. 

EVI is the first voice AI that really sounds like it understands you. By adapting its tone of voice, it emulates the way humans convey meaning beyond words, unlocking more efficient, smooth, and satisfying AI interactions.

Accessing EVI

The main way to work with EVI is through a WebSocket connection that sends audio and receives responses in real time. This enables fluid, bidirectional dialogue where users speak, EVI listens and analyzes their voice, and EVI generates emotionally intelligent responses. You start a conversation by connecting to the WebSocket and streaming the user’s voice input to EVI.

As the user speaks to EVI, the client can also send EVI text to speak aloud, which is intelligently integrated it into the conversation.

See our documentation for more information on how to integrate EVI into your application. A great way to get started is on our voice playground, which allows developers to interactively configure custom system prompts and voices.

Empathic AI (eLLM) features

  • Responds at the right time: Uses your tone of voice for state-of-the-art end-of-turn detection — the true bottleneck to responding rapidly without interrupting you.
  • Understands users’ prosody: Provides streaming measurements of the tune, rhythm, and timbre of the user’s speech using Hume’s prosody model, integrated with our eLLM.
  • Forms its own natural tone of voice: Guided by the users’ prosody and language, our model responds with an empathic, naturalistic tone of voice, matching the users’ nuanced “vibe” (calmness, interest, excitement, etc.). It responds to frustration with an apologetic tone, to sadness with sympathy, and more.
  • Responds to expression: Powered by our empathic large language model (eLLM), EVI crafts responses that are not just intelligent but attuned to what the user is expressing with their voice.
  • Always interruptible: Stops rapidly whenever users interject, listens, and responds with the right context based on where it left off.
  • Aligned with well-being: Trained on human reactions to optimize for positive expressions like happiness and satisfaction. EVI continuously learns from users’ reactions.

Configurability 

With the general release of EVI we’re also releasing our Configuration API, which will enable developers to customize their EVI—the system prompt, LLM, the tools that EVI can use, context to use during the conversation, and more. You can configure EVI in both the API or the UI. Configurable elements below —

  • System prompt: customize EVI’s personality, response style, and the content of speech through prompt engineering. Use our guidelines for prompting EVI to improve the performance, or try out our sample prompts on the voice playground.

  • Inject other LLM responses into our model: Hume’s empathic large language model (eLLM) always generates the first response to a query, but you can configure other LLMs to formulate longer responses. 

    • Integrate another LLM API: Currently we support Fireworks Mixtral8x7b, all OpenAI models, and all Anthropic models.

    • Bring your own LLM or generate text another way: Connect our WebSocket to your own server with your own tool or text generation, allowing you to determine all EVI messages in the conversation. 

  • Bring your own model: Rather than using our LLMs, connect our WebSocket to your own server to generate your own text, allowing you to determine exactly how EVI responds in the conversation. 

  • TTS: Use just EVI’s expressive voice by sending our API text to be spoken aloud. 

We plan to add more configuration options soon, allowing EVI to use tools, change its speaking style, and more. Join our Discord for product updates and technical support. 

Subscribe

Sign up now to get notified of any updates or new articles.

Recent articles