WeresearchandoptimizeaudiodatasetsforotherfrontiervoiceAIlabs

who are we

We are a frontier voice AI lab

As a frontier research lab, we know what it takes to curate large scale quantities of data that create successful, interactive, and empathic multimodal models. Scaling data is at an inflection point for voice, and we're excited to help labs scale audio pre-training and post-training for speech-language models.

We provide the data and research tooling to help scale model capabilities.

EVERYTHING YOUR MODEL NEEDS

Capabilities

Teach your model to speak 50+ languages, generate voices from prompts, code switch, adopt specific emotions, and more.

Request samples

Explore samples that align with your intended languages, use cases, and model goals.

License access

Scale up access to our large-scale off-the-shelf data, evaluation pipelines, and our voice gym.

Iterate

Collaborate with our researchers to diagnose remaining areas of improvement for your model.

Recent Publications

Peer-reviewed insights

View all

arXiv·Feb 2026

TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (Under Review)

Trung Dang, Sharath Rao, Ananya Gupta and 6 more

Modern Text-to-Speech (TTS) systems increasingly leverage Large Language Model (LLM) architectures to achieve scalable, high-fidelity, zero-shot generation. However, these systems typically rely on fixed-frame-rate acoustic tokenization, resulting in speech sequences that are significantly longer than, and asynchronous with their corresponding text. Beyond computational inefficiency, this sequence length disparity often triggers hallucinations in TTS and amplifies the modality gap in spoken language modeling (SLM). In this paper, we propose a novel tokenization scheme that establishes one-to-one synchronization between continuous acoustic features and text tokens, enabling unified, single-stream modeling within an LLM. We demonstrate that these synchronous tokens maintain high-fidelity audio reconstruction and can be effectively modeled in a latent space by a large language model with a flow matching head. Moreover, the ability to seamlessly toggle speech modality within the context enables text-only guidance--a technique that blends logits from text-only and text-speech modes to flexibly bridge the gap toward text-only LLM intelligence. Experimental results indicate that our approach achieves performance competitive with state-of-the-art TTS and SLM systems while virtually eliminating content hallucinations and preserving linguistic integrity, all at a significantly reduced inference cost.

View paper Download PDF

Frontiers in Psychology·May 2024

How emotion is experienced and expressed in multiple cultures: a large-scale experiment across North America, Europe, and Japan

+13

Alan Cowen, Jeffrey Brooks, Gautam Prasad and 13 more

Core to understanding emotion are subjective experiences and their expression in facial behavior. Past studies have largely focused on six emotions and prototypical facial poses, reflecting limitations in scale and narrow assumptions about the variety of emotions and their patterns of expression.

View paper Download PDF

iScience·Feb 2024

Deep learning reveals what facial expressions mean to people in different cultures

+10

Jeffrey Brooks, Lauren Kim, Michael Opara and 10 more

Cross-cultural studies of the meaning of facial expressions have largely focused on judgments of small sets of stereotypical images by small numbers of people. Here, we used large-scale data collection and machine learning to map what facial expressions convey in six countries.

View paper Download PDF

Everything your model needs

Why Our Datasets

World-class data for pre-training and fine-tuning your emotion AI models, backed by years of scientific research.

Ethically Sourced

All data collected with informed consent and rigorous privacy protections.

Globally Diverse

Representative samples across cultures, ages, genders, and demographics.

Expert Annotated

Labeled by trained researchers using validated scientific frameworks.

Research Ready

Clean, structured formats optimized for modern ML pipelines.

Research Areas

Where Hume enables research

From fundamental affective computing to applied behavioral research, our tools power studies across the full spectrum of emotion science.

Affective Computing

Study how AI systems can recognize, interpret, and respond to human emotions across modalities.

Human-AI Interaction

Research the dynamics of emotional exchange between humans and AI systems.

Psychology & Behavior

Use emotion recognition to study human behavior, mental health, and psychological phenomena.

Speech & Language

Analyze prosodic features, sentiment, and emotional expression in human communication.

Multimodal Learning

Explore how emotion manifests simultaneously across face, voice, and language.

Ethics & AI Safety

Study the ethical implications of emotionally-aware AI systems and develop guidelines.

From the Blog

Latest research updates

View all

Research

Opensourcing TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization

TADA (Text-Acoustic Dual Alignment) is Hume AI's open-source speech-language model that synchronizes text and audio one-to-one.

Mar 10, 2026

Research

Introducing OCTAVE (Omni-Capable Text and Voice Engine)

A frontier speech-language model with new emergent capabilities, like on-the-fly AI voice and personality creation.

Dec 23, 2024

Research

How can emotionally intelligent voice AI support our mental health?

Recent advances in voice-to-voice AI, like EVI 2, offer emotionally intelligent interactions, picking up on vocal cues related to mental and physical health, which could enhance both clinical care and daily well-being.

Oct 22, 2024

WeresearchandoptimizeaudiodatasetsforotherfrontiervoiceAIlabs

We are a frontier voice AI lab

Capabilities

Request samples

License access

Iterate

Peer-reviewed insights

TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (Under Review)

How emotion is experienced and expressed in multiple cultures: a large-scale experiment across North America, Europe, and Japan

Deep learning reveals what facial expressions mean to people in different cultures

Why Our Datasets

Ethically Sourced

Globally Diverse

Expert Annotated

Research Ready

Where Hume enables research

Affective Computing

Human-AI Interaction

Psychology & Behavior

Speech & Language

Multimodal Learning

Ethics & AI Safety

Latest research updates

Opensourcing TADA: Fast, Reliable Speech Generation Through Text-Acoustic Synchronization

Introducing OCTAVE (Omni-Capable Text and Voice Engine)

How can emotionally intelligent voice AI support our mental health?

Stay in the loop

Join the community