Introducing EVI 2, our new foundational voice-to-voice model
By Alan Cowen on Sep 11, 2024
Introducing EVI 2, our new foundational voice-to-voice model
EVI 2 is our new voice-to-voice foundation model. It is one of the first AI models with which you can have remarkably human-like voice conversations. It can converse rapidly and fluently with users with subsecond response times, understand a user’s tone of voice, generate any tone of voice, and even respond to some more niche requests like changing its speaking rate or rapping. It can emulate a wide range of personalities, accents, and speaking styles and possesses emergent multilingual capabilities.
At a higher level, EVI 2 excels at anticipating and adapting to your preferences, made possible by its special training for emotional intelligence. It’s trained to maintain characters and personalities that are fun and interesting to interact with. Put together, EVI 2 is designed to emulate the ideal AI personality for each application it is built into and each user.
Getting started with EVI 2
Today, EVI 2 is available in beta for anyone to use. It is available to talk to via our app and to build into applications via our API (in keeping our guidelines).
Importantly, EVI 2 is incapable of cloning voices without modifications to its code. This is by design: we believe voice cloning has unique risks. By controlling its identity-related voice characteristics at the model architecture level, we force the model to adopt one voice identity at a time, maintaining it across sessions.
But we still wanted to give users and developers the ability to adapt EVI 2’s voice to their unique preferences and requirements. To that end, we developed an experimental voice modulation approach that allows anyone to create synthetic voices and personalities. Developers can adjust EVI 2’s base voices along a number of continuous scales, including gender, nasality, pitch, and more. This first-of-its-kind feature allows you to create tailored voices for specific apps and users without the risks of voice cloning.
What's next?
The model that we’re releasing today is EVI-2-small. We are still making improvements to this model—in the coming weeks, it will become more reliable, learn more languages, follow more complex instructions, and use a wider range of tools. We’re also fine-tuning EVI-2-large, which we will be announcing soon.
EVI 2 represents a critical step forward on our mission to optimize AI for human well-being. We focused on making its voice and personality highly adaptable to give it more affordances to optimize for users’ happiness and satisfaction. After all, personalities are the amalgamation of many subtle, subsecond decisions made during our interactions, and EVI 2 demonstrates that AI optimized for well-being will have a particularly pleasant and fun personality as a result of its deeper alignment with your goals. Our ongoing research is focused on optimizing for each user’s preferences automatically, with methods to fine-tune the model to generate responses that align with signs of happiness and satisfaction during everyday use of an application.
Resources
Subscribe
Sign up now to get notified of any updates or new articles.
Recent articles
00/00
We’re introducing Voice Control, a novel interpretability-based method that brings precise control to AI voice customization without the risks of voice cloning. Our tool gives developers control over 10 voice dimensions, labeled “masculine/feminine,” “assertiveness,” “buoyancy,” “confidence,” “enthusiasm,” “nasality,” “relaxedness,” “smoothness,” “tepidity,” and “tightness.” Unlike prompt-based approaches, Voice Control enables continuous adjustments along these dimensions, allowing for precise control and making voice modifications reproducible across sessions.
Hume AI creates emotionally intelligent voice interactions with Claude
Hume AI trained its speech-language foundation model to verbalize Claude responses, powering natural, empathic voice conversations that help developers build trust with users in healthcare, customer service, and consumer applications.
How EverFriends.ai uses empathic AI for eldercare
To truly connect with users and provide a natural, empathic experience, EverFriends.ai needed an AI solution capable of understanding and responding to emotional cues. They found their answer in Hume's Empathic Voice Interface (EVI). EVI merges generative language and voice into a single model trained specifically for emotional intelligence, enabling it to emphasize the right words, laugh or sigh at appropriate times, and much more, guided by language prompting to suit any particular use case.