Hume AI creates emotionally intelligent voice interactions with Claude
Published on Nov 22, 2024
Hume AI creates emotionally intelligent voice interactions with Claude
Hume AI trained its speech-language foundation model to verbalize Claude responses, powering natural, empathic voice conversations that help developers build trust with users in healthcare, customer service, and consumer applications.
Key impact metrics:
-
Over 2 million minutes of AI voice conversations completed
-
36% of users choose Claude, higher than any external LLM integrated with Hume’s speech-language foundation model
-
80% reduction in costs and 10% decrease in latency through prompt caching
From emotion research to AI innovation
Alan Cowen's journey to founding Hume AI began over a decade ago during his PhD research in Psychology at Berkeley, where he pioneered using AI to map the complex structure of human emotional expressions. Cowen said, "About 15 years ago, people were interested in understanding expressive behavior, voice and facial expression, and were classifying it in a reductive way." His groundbreaking research revealed that traditional approaches overlook more than 70% of the information present in human expressions.
This work led to further research Google and Facebook, where Cowen advocated for building emotional intelligence into AI models. Cowen said, "I was an early advocate within those organizations for building empathy into AI models." Recognizing the need for more comprehensive data and research, he founded Hume AI in 2021 to advance the field of emotionally intelligent AI.
A vision for emotionally intelligent AI
Hume AI was founded to build foundational AI models optimized for human wellbeing. "We want the AI to understand what frustrates and confuses you, by understanding your voice and not just what you're saying. It can learn from those signals to better understand your personal preferences," said Cowen.
Giving Claude a voice: Hume’s Empathic Voice Interface (EVI)
At the heart of Hume's technology is EVI, their flagship speech-language foundation model. EVI represents a breakthrough in conversational AI, capable of understanding and responding to users with natural, emotionally intelligent voice interactions. The model’s voice-to-voice generation capabilities allow it to intelligently adapt the tone of its voice based on the users’ measured expression and the context of the language being spoken. It can take on a wide range of personalities, accents, and speaking styles. It can also integrate external language outputs from language models like Claude with its empathic generation voice capabilities, reading them as an actor would read lines.
"By integrating Claude with EVI, we've created something truly special. Claude's frontier natural language capabilities and personality complement EVI's expression understanding and empathy, so EVI can “act out” Claude’s responses and generate fluid, context-aware conversations that feel remarkably human,” said Cowen.
How Claude complements Hume’s voice interaction capabilities
After evaluating multiple AI models, Hume made Claude the default supplemental LLM for EVI. The decision came down to Claude's unique combination of capabilities, including its vibe and natural conversational abilities. "Claude is very eloquent," said Cowen. "It has a really good personality that people enjoy talking to."
Claude's reliability in following complex prompts was crucial for diverse use cases, while its outputs complemented those of Hume’s speech-language foundation model, which adds in multimodal expression understanding and emotional intelligence. Claude’s extensive context handling for long conversations and built-in safety features reduced development overhead.
Combining Claude with Hume’s speech-language foundation model enables many compelling use cases, including:
-
Customer service interactions
-
AI tutoring
-
Personal digital assistants
-
Practice sessions for difficult emotional interactions
-
Mental health support conversations
Measurable impact for customers
Integrating Claude with EVI has driven remarkable adoption and engagement across Hume's platform. Users have conducted over 1 million distinct conversations totaling nearly 2 million minutes of interaction time, with many conversations extending beyond 30 minutes. Developers have created hundreds of thousands of custom EVI configurations, with Claude models being the most popular choice, commanding 36% of all specified AI model selections.
"Our users say that they love Claude 3.5 Sonnet, and find its personality fits really well with EVI. One customer is using EVI for immersive coaching simulations, helping managers practice delivering feedback to a defensive direct report. They've found Sonnet through EVI adapts to complex personality traits throughout a long conversation,” said Cowen.
The technical benefits have been equally impressive. Prompt caching has helped Hume reduce costs by 80% and decrease latency by more than 10%. Hume encourages their customers to look beyond traditional metrics to measure impact through the lens of user wellbeing. They track not just customer satisfaction but also how interactions affect users' overall experience over time.
Looking ahead to a voice-first future
Hume envisions a future where voice becomes the primary interface for human-AI interaction, driven by its inherent advantages in speed, ease, and emotional expressiveness. Cowen said, "In a few years, voice AI will be omnipresent, serving as the primary interface for human-AI interactions. Voice AI will enable human-level emotional intelligence, which is necessary to ensure that AI continues to prioritize our preferences as it gets smarter. It will speak all languages and be embedded in almost every software product, smartphone, and wearable."
As this future approaches, Hume sees personalization as key to building trust. "People will need their own personal AIs," said Cowen. "From a psychological perspective, it just makes sense for this to have recognizable voices. You recognize this AI as your personal AI based on its voice, and that has a big psychological impact." This vision of personal, trusted AI assistants will be crucial as AI-powered interactions become more prevalent in daily life.
The alignment between Hume and Anthropic's core values and long-term vision makes their partnership powerful. Both are committed to research-driven development and responsible AI prioritizing human wellbeing. "Hume and Anthropic are mission-driven, research-based companies with strong scientific cultures and a long-term focus on AI alignment," said Cowen. Together, we aim to ensure that as voice AI becomes ubiquitous, it optimizes for human wellbeing and builds genuine trust with users.
Subscribe
Sign up now to get notified of any updates or new articles.
Share article
Recent articles
How EverFriends.ai uses empathic AI for eldercare
To truly connect with users and provide a natural, empathic experience, EverFriends.ai needed an AI solution capable of understanding and responding to emotional cues. They found their answer in Hume's Empathic Voice Interface (EVI). EVI merges generative language and voice into a single model trained specifically for emotional intelligence, enabling it to emphasize the right words, laugh or sigh at appropriate times, and much more, guided by language prompting to suit any particular use case.
How can emotionally intelligent voice AI support our mental health?
Recent advances in voice-to-voice AI, like EVI 2, offer emotionally intelligent interactions, picking up on vocal cues related to mental and physical health, which could enhance both clinical care and daily well-being.
Are emotional expressions universal?
Do people around the world express themselves in the same way? Does a smile mean the same thing worldwide? And how about a chuckle, a sigh, or a grimace? These questions about the cross-cultural universality of expressions are among the more important and long-standing in behavioral sciences like psychology and anthropology—and central to the study of emotion.