Announcing our latest research update OCTAVERead more
Article

Best ways to talk to AI with voice: NotebookLM, Advanced Voice Mode, and EVI 2

Published on January 28, 2025

Artificial intelligence (AI) is revolutionizing how we interact with technology, and voice-based communication is at the forefront of this transformation. From personalized AI assistants to emotionally intelligent conversational agents, voice-enabled AI is becoming more sophisticated and human-like. In this article, we’ll explore three cutting-edge voice AI technologies: Google’s NotebookLM, OpenAI’s Advanced Voice Mode, and Hume AI’s EVI 2. Each of these platforms offers unique capabilities that redefine how we communicate with AI.


1. Google’s NotebookLM: Your AI-Powered Research Assistant

Google’s NotebookLM is an AI-powered notebook with an interactive podcast feature. It is designed to help users organize, analyze, and interact with their notes and documents. Unlike traditional AI models, NotebookLM is tailored to work with your personal data, making it a powerful tool for researchers, students, and professionals.

Key Features:

  • Source-Grounded Responses: NotebookLM grounds its responses in the documents you provide, ensuring accuracy and relevance. This makes it ideal for tasks like summarizing research papers, extracting key insights, or answering specific questions about your notes.

  • Voice Interaction: While primarily text-based, NotebookLM integrates with Google’s voice technologies, allowing users to query their documents using voice commands. This feature is particularly useful for hands-free research or multitasking.

  • Personalization: The AI adapts to your writing style and preferences, offering tailored suggestions and insights based on your unique data.

Use Cases:

  • Academic Research: Quickly summarize lengthy articles or generate citations from your notes.

  • Content Creation: Brainstorm ideas or draft content by conversing with your stored documents.

  • Knowledge Management: Organize and retrieve information from large datasets using natural language queries.

NotebookLM represents Google’s vision of a future where AI seamlessly integrates with personal workflows, making information retrieval and analysis more intuitive and efficient.


2. OpenAI’s Advanced Voice Mode: The Voice of ChatGPT

OpenAI has been a pioneer in natural language processing (NLP) with models like GPT-4, and its Advanced Voice Mode takes conversational AI to the next level. This feature, integrated into platforms like ChatGPT, enables users to engage in real-time, voice-based conversations with AI.

Key Features:

  • Real-Time Dialogue: OpenAI’s voice mode allows for fluid, back-and-forth conversations with minimal latency, making interactions feel more natural.

  • Multimodal Capabilities: The AI can process both voice and text inputs, enabling seamless transitions between speaking and typing.

  • Emotional Nuance: While still under development, OpenAI is working on incorporating emotional intelligence into its voice mode, allowing the AI to detect and respond to tone and sentiment.

Use Cases:

  • Personal Assistants: Use voice commands to schedule appointments, set reminders, or draft emails.

  • Language Learning: Practice speaking and listening in a new language with real-time feedback.

  • Accessibility: Assist users with disabilities by providing voice-based access to information and services.

OpenAI’s Advanced Voice Mode is a significant step toward creating AI systems that can engage in human-like conversations, bridging the gap between technology and natural communication.


3. Hume AI’s EVI 2: Emotionally Intelligent Voice AI

Hume AI is pushing the boundaries of voice AI with its Empathic Voice Interface (EVI) 2, a foundation model designed to understand and respond to human emotions. EVI 2 is built to make interactions with AI more empathetic and personalized, setting it apart from other voice technologies.

Key Features:

  • Emotional Intelligence: EVI 2 can detect subtle emotional cues in a user’s voice, such as tone and pitch, and respond with appropriate empathy and understanding.

  • Rapid Response Times: The model delivers subsecond responses, making conversations feel fluid and natural.

  • Customizable Voices: Developers can adjust voice attributes like gender, pitch, and accent to create unique AI personalities.

  • Personality Emulation: EVI 2 can emulate a wide range of personalities, making interactions more engaging and tailored to individual preferences.

Use Cases:

  • Customer Service: EVI 2 can power empathetic chatbots that handle customer inquiries with a human touch.

  • Mental Health Support: The AI can provide emotional support and companionship, offering a safe space for users to express their feelings.

  • Interactive Storytelling: EVI 2 can bring characters to life in games or educational apps, creating immersive experiences.

Hume AI’s EVI 2 represents a new era of emotionally aware AI, where technology not only understands what we say but also how we feel.

The Future of Voice AI: OCTAVE and Beyond

Hume's upcoming model, OCTAVE, represents the cutting edge of speech-language models, enabling on-the-fly creation of voices and personalities that you can interact with. With just a short audio clip or description, it can generate a unique personality complete with a new voice, emotional expressiveness, and distinct traits.

OCTAVE promises to enable richer, more realistic, and more multifaceted conversational experiences than EVI 2. For example, users will be able to craft personas for AI agents, personalize them for individuals or even create them on the fly to answer a particular question, or enable real-time group conversations involving multiple users or AIs.

Conclusion

Voice AI is transforming how we interact with technology, and platforms like Google’s NotebookLM, OpenAI’s Advanced Voice Mode, and Hume AI’s EVI 2 are leading the charge. Whether it’s organizing research, engaging in natural conversations, or experiencing emotionally intelligent interactions, these technologies are redefining the possibilities of human-computer communication.

As we look to the future, initiatives like OCTAVE and ongoing advancements in emotional and contextual awareness promise to make voice AI even more intuitive, inclusive, and impactful. The era of talking to AI is just beginning, and the possibilities are limitless.

Subscribe

Sign up now to get notified of any updates or new articles.

Recent articles