How can emotionally intelligent voice AI support our mental health?
By Jeffrey Brooks on Oct 22, 2024
How can emotionally intelligent voice AI support our mental health?
-
Recent advances in voice-to-voice AI, like EVI 2, offer emotionally intelligent interactions, picking up on vocal cues related to mental and physical health, which could enhance both clinical care and daily well-being
-
Voice AI has potential to address the loneliness epidemic by offering conversational engagement, improving well-being through empathy and emotional support, especially among socially isolated groups.
-
AI can analyze vocal markers such as speech rate and pitch variation to help identify early signs of mental health conditions like depression and schizophrenia, improving screening and monitoring.
-
Voice AI can guide users through mindfulness exercises, behavioral nudging, and therapeutic conversations, helping manage emotions, achieve goals, and regulate impulsive behaviors.
In the U.S. alone, tens of millions of people struggle with anxiety, depression, addiction, and personality disorders. However, for multiple reasons—from personal financial and social considerations to a scarcity of clinical resources—seeking and following through with treatment for mental health issues can be very difficult, leading to an enormous amount of preventable suffering.
For a long time, researchers and clinicians have recognized the potential of AI to fill some of the gaps that exist in our current approach to mental healthcare, ranging from supporting diagnostics and administrative tasks through capabilities like transcription to interacting directly with patients. You could even say that the AI assistant space started for mental health purposes: the first chatbot, created in the 1960s, ELIZA, was designed to simulate a psychotherapist. But until recently, these tools lacked any real ability to engage with and understand signs of well-being in users, limiting their therapeutic value.
For the first time, voice-driven AI technologies and our understanding of the voice as a signal for mental health are powerful enough to provide empathetic, non-judgmental, and human-like support to the many people who have found conventional methods less effective or accessible. This is a breakthrough moment in which long-standing goals of efforts to integrate AI into mental healthcare, dating back to ELIZA and the origins of chatbots, are finally being achieved in consumer applications.
Voice assistants are now being integrated into a wide range of mental healthcare applications, from reducing loneliness and tracking moods through daily conversations and journaling to performing diagnostics, screening, and behavioral interventions. There are even efforts to use empathic AI assistants as virtual therapists, supporting users in ways that far exceed the capabilities of text-based chatbots. These assistants are being built not just for individuals with diagnosed issues but for everyday wellness challenges, such as regulating emotions, sticking to goals, navigating career changes, and maintaining focus on tasks we care about.
Historical approaches to AI for mental health
Experiments to build chatbots for mental healthcare began long before humanlike voice AI was feasible. The first-ever chatbot was ELIZA, a system designed by Joseph Weizenbaum in 1966 to simulate a psychotherapist. ELIZA was powered through pattern matching, giving it the illusion of understanding. Though rudimentary, ELIZA captivated many users, illustrating the potential of human-computer interaction in therapy and people's capacity to form emotional connections with automated systems. Following ELIZA, early mental health chatbots remained rule-based, offering scripted responses designed to codify the expertise of human practitioners. With advances in NLP and LLMs, modern chatbots like Wysa and Woebot now offer more fluid conversations, with LLM-based language capabilities that enable them to incorporate CBT techniques and coping strategies.
But by relying on text, chatbots miss a crucial dimension of human communication, paralinguistics: not what we say, but how we say it. The voice is an incredibly rich signal communicating information about our emotional state, as well as our overall mental and physical health.
Voice-to-voice AI: a breakthrough moment
Voice-to-voice AI models like EVI 2, which directly process audio and understand users’ voice modulations, enable two fundamental advances in technology for health and wellness. First, advances in machine learning for audio enable these systems to pick up on emotional expressions in a user’s voice—subtle cues of well-being, changes over time, vocal characteristics that can inform screening, symptom tracking, and even provide an indication of changes in physical health. Second, these systems generate voice responses with a tune, rhythm, and timbre that conveys important social cues, drives conversational immersion, and reflects an understanding of the user’s preferences and intentions.
These capabilities enable a wide range of mental health applications that were previously impossible. Here are just a few of the mental health use cases where we see voice-to-voice AI being deployed. Over the coming months and years, we believe many of these applications will become ubiquitous.
-
Reducing loneliness by offering conversational companionship. We are currently living through what is estimated to be an epidemic of loneliness and social disconnection. Loneliness is more than an unpleasant emotion—it negatively impacts both individual and societal health, and has been linked to increased risk of cardiovascular disease, dementia, stroke, depression, anxiety, and early mortality. Voice-to-voice models like EVI 2 can provide support for socially isolated individuals by seamlessly integrating into their lives, offering conversational engagement, warmth, support, and empathy. Voice assistants are already being explored in the elderly population, with controlled studies suggesting that they can mitigate loneliness and provide effective companionship. Advanced voice-to-voice AI that combines an understanding of emotional behavior with advanced linguistic and vocal features that only stands to further improve its effectiveness in this area.
-
Tracking moods through daily conversations and journaling. A key feature of EVI 2, our flagship voice-to-voice model, is its ability to interpret and measure vocal modulations associated with emotion. This enables it to pick up when you’re feeling down or annoyed and adjust the conversation accordingly, but it also provides a powerful foundation for longitudinal tracking of emotional and mood states. Millions of people currently use digital journal apps. By using EVI 2 regularly as a conversational or journaling partner, journaling apps are beginning to measure the frequency and regularity of a large number of positive and negative emotions, allowing for greater self-awareness, tracking of symptoms for individuals with depression and anxiety, and tracking progress in mental health treatment.
-
Supporting diagnostics and screening for mental health issues. A growing body of research shows that mental health conditions can be distinguished and characterized through markers in vocal features. These vocal cues range from conversational features like speech rate, speaking time, and pause duration to more subtle aspects of the voice itself like pitch variation and tone. In clinical laboratories, vocal features are being used to distinguish depression from other conditions like schizophrenia and bipolar disorder. For instance, someone speaking more slowly or with reduced pitch variation might be showing early signs of depression, whereas someone with reduced pitch variation and blunted affect could be exhibiting symptoms of schizophrenia. Voice AI can track these patterns over time, providing data that could be used for diagnostics or to gauge the effectiveness of interventions. EVI is being integrated into apps that screen for these conditions and can prompt users when variations in their behavior, including language and vocal cues, suggest that they should consider seeking out treatment.
-
Conducting interventions, such as behavioral “nudging” and guiding users through mindfulness exercises. The desire to change our behavior to be more in line with our goals is a common human experience not restricted to those with mental health conditions. This is one of the ways that voice-to-voice AI can raise the baseline level of mental health in the population at large. By understanding our goals, apps are using EVI 2 to help prompt users to make more sustainable shopping decisions, remind them to reach out to friends, and help users maintain focus on tasks that are important to them. Other AI solutions for these kinds of behavioral “nudges” are already widely adopted, but through an empathic voice seamlessly integrated with users’ interactions with technology, applications aim to use EVI 2 to ascertain individuals’ goals for behavior change and track their progress much more directly. For instance, through its ability to interpret expressions, EVI 2 drives applications that seek to prevent individuals from sending angry emails or making impulsive online shopping decisions when sad. It is also being used to engage and guide users more directly through helpful exercises drawn from traditions like CBT and mindfulness-based stress reduction.
-
Acting as virtual therapists for ongoing support and care. Effective talk therapy requires many of the features outlined thus far: the ability to interpret emotional behaviors in context and across time; picking up on changes in mood and aspects of the voice that could point to positive or negative changes in symptoms; providing proactive support and helping us change our behavior. Therapists are in massive demand but there is a shortage of available clinicians, leading to a decline in the effectiveness of telehealth solutions and widespread burnout among practitioners. Empathic voice-to-voice AI promises to fill this widening gap by picking up on key features of the way we speak, interpreting cues of our feelings, and engaging with us accordingly via a warm and encouraging voice. While there are still challenges ahead to the widespread adoption of voice-to-voice AI therapists, this is one of the key areas where our flagship model EVI 2 could shine, and could finally reach the potential anticipated by historical approaches in this space.
Conclusion
The use of AI in mental healthcare is not a new idea, but historical approaches—starting with early chatbot systems like ELIZA—lacked dynamism, empathy, and most importantly, the capacity for authentic voice interaction. Empathic voice-to-voice AI represents a major step forward, combining the power of voice as a rich emotional signal with advances in AI to create systems capable of supporting mental health in ways that can revolutionize mental healthcare for clinical disease, as well as raising the baseline level of well-being in the general population. As these technologies evolve, they hold the potential to transform mental healthcare by making it more accessible, personalized, and responsive to the emotional needs of individuals.
Subscribe
Sign up now to get notified of any updates or new articles.
Recent articles
00/00
We’re introducing Voice Control, a novel interpretability-based method that brings precise control to AI voice customization without the risks of voice cloning. Our tool gives developers control over 10 voice dimensions, labeled “masculine/feminine,” “assertiveness,” “buoyancy,” “confidence,” “enthusiasm,” “nasality,” “relaxedness,” “smoothness,” “tepidity,” and “tightness.” Unlike prompt-based approaches, Voice Control enables continuous adjustments along these dimensions, allowing for precise control and making voice modifications reproducible across sessions.
Hume AI creates emotionally intelligent voice interactions with Claude
Hume AI trained its speech-language foundation model to verbalize Claude responses, powering natural, empathic voice conversations that help developers build trust with users in healthcare, customer service, and consumer applications.
How EverFriends.ai uses empathic AI for eldercare
To truly connect with users and provide a natural, empathic experience, EverFriends.ai needed an AI solution capable of understanding and responding to emotional cues. They found their answer in Hume's Empathic Voice Interface (EVI). EVI merges generative language and voice into a single model trained specifically for emotional intelligence, enabling it to emphasize the right words, laugh or sigh at appropriate times, and much more, guided by language prompting to suit any particular use case.