Product Updates

Hume AI + Anthropic create emotionally intelligent voice interactions

Published on November 22, 2024

Hume AI trained its speech-language foundation model to verbalize Claude responses, powering natural, empathic voice conversations that help developers build trust with users in healthcare, customer service, and consumer applications.

Key impact metrics

Over 2 million minutes of AI voice conversations completed
36% of users choose Claude, higher than any external LLM integrated with Hume’s speech-language foundation model
80% reduction in costs and 10% decrease in latency through prompt caching

From emotion research to AI innovation

Alan Cowen's journey to founding Hume AI began over a decade ago during his PhD research in Psychology at Berkeley, where he pioneered using AI to map the complex structure of human emotional expressions. Cowen said, "About 15 years ago, people were interested in understanding expressive behavior, voice and facial expression, and were classifying it in a reductive way." His groundbreaking research revealed that traditional approaches overlook more than 70% of the information present in human expressions.

This work led to further research Google and Facebook, where Cowen advocated for building emotional intelligence into AI models. Cowen said, "I was an early advocate within those organizations for building empathy into AI models." Recognizing the need for more comprehensive data and research, he founded Hume AI in 2021 to advance the field of emotionally intelligent AI.

A vision for emotionally intelligent AI

Hume AI was founded to build foundational AI models optimized for human wellbeing. "We want the AI to understand what frustrates and confuses you, by understanding your voice and not just what you're saying. It can learn from those signals to better understand your personal preferences," said Cowen.

Giving Claude a voice: Hume’s Empathic Voice Interface (EVI)

At the heart of Hume's technology is EVI, their flagship speech-language foundation model. EVI represents a breakthrough in conversational AI, capable of understanding and responding to users with natural, emotionally intelligent voice interactions. The model’s voice-to-voice generation capabilities allow it to intelligently adapt the tone of its voice based on the users’ measured expression and the context of the language being spoken. It can take on a wide range of personalities, accents, and speaking styles. It can also integrate external language outputs from language models like Claude with its empathic generation voice capabilities, reading them as an actor would read lines.

"By integrating Claude with EVI, we've created something truly special. Claude's frontier natural language capabilities and personality complement EVI's expression understanding and empathy, so EVI can “act out” Claude’s responses and generate fluid, context-aware conversations that feel remarkably human,” said Cowen.

How Claude complements Hume’s voice interaction capabilities

After evaluating multiple AI models, Hume made Claude the default supplemental LLM for EVI. The decision came down to Claude's unique combination of capabilities, including its vibe and natural conversational abilities. "Claude is very eloquent," said Cowen. "It has a really good personality that people enjoy talking to."

Claude's reliability in following complex prompts was crucial for diverse use cases, while its outputs complemented those of Hume’s speech-language foundation model, which adds in multimodal expression understanding and emotional intelligence. Claude’s extensive context handling for long conversations and built-in safety features reduced development overhead.

Combining Claude with Hume’s speech-language foundation model enables many compelling use cases, including:

Customer service interactions
AI tutoring
Personal digital assistants
Practice sessions for difficult emotional interactions
Mental health support conversations

Measurable impact for customers

Integrating Claude with EVI has driven remarkable adoption and engagement across Hume's platform. Users have conducted over 1 million distinct conversations totaling nearly 2 million minutes of interaction time, with many conversations extending beyond 30 minutes. Developers have created hundreds of thousands of custom EVI configurations, with Claude models being the most popular choice, commanding 36% of all specified AI model selections.

"Our users say that they love Claude 3.5 Sonnet, and find its personality fits really well with EVI. One customer is using EVI for immersive coaching simulations, helping managers practice delivering feedback to a defensive direct report. They've found Sonnet through EVI adapts to complex personality traits throughout a long conversation,” said Cowen.

The technical benefits have been equally impressive. Prompt caching has helped Hume reduce costs by 80% and decrease latency by more than 10%. Hume encourages their customers to look beyond traditional metrics to measure impact through the lens of user wellbeing. They track not just customer satisfaction but also how interactions affect users' overall experience over time.

Controlling devices with voice

With new tools that allow LLMs to control devices, we're seeing a glimpse of the future of AI interfaces and agents. We combined EVI with Claude's new computer use functionality to demonstrate how you can now control a computer with just your voice. EVI was used to process speech in real-time, send instructions to Claude's agentic computer control loop, and explain its actions with voice. It can even be interrupted to change course.

Looking ahead to a voice-first future

Hume envisions a future where voice becomes the primary interface for human-AI interaction, driven by its inherent advantages in speed, ease, and emotional expressiveness. Cowen said, "In a few years, voice AI will be omnipresent, serving as the primary interface for human-AI interactions. Voice AI will enable human-level emotional intelligence, which is necessary to ensure that AI continues to prioritize our preferences as it gets smarter. It will speak all languages and be embedded in almost every software product, smartphone, and wearable."

As this future approaches, Hume sees personalization as key to building trust. "People will need their own personal AIs," said Cowen. "From a psychological perspective, it just makes sense for this to have recognizable voices. You recognize this AI as your personal AI based on its voice, and that has a big psychological impact." This vision of personal, trusted AI assistants will be crucial as AI-powered interactions become more prevalent in daily life.

The alignment between Hume and Anthropic's core values and long-term vision makes their partnership powerful. Both are committed to research-driven development and responsible AI prioritizing human wellbeing. "Hume and Anthropic are mission-driven, research-based companies with strong scientific cultures and a long-term focus on AI alignment," said Cowen. Together, we aim to ensure that as voice AI becomes ubiquitous, it optimizes for human wellbeing and builds genuine trust with users.