Announcing our Custom Model API
Published on Dec 14, 2023
Meet our Custom Model API — a cutting edge AI tool that integrates language, voice, and/or facial movement to predict human preferences and needs more accurately than any LLM
You can now use our Custom Model API to predict well-being, satisfaction, mental health, and more. Using a few labeled examples, our API integrates dynamic patterns of language, vocal expression, and/or facial expression into a custom multimodal model.
Leveraging Hume’s AI models pretrained on millions of videos and audio files, our API can usually predict your labels accurately after seeing just a few dozen examples. That means that with just a few labeled examples and a few clicks, you can deploy powerful AI models that predict the outcomes your users care about most. Of course, the models you train using our API are yours alone to deploy and share.
Visit dev.hume.ai/docs/custom-models or login to beta.hume.ai to get started!
Our new API translates nuanced multimodal measures into personalized insights more accurately than any LLM
For instance, we partnered with Lawyer.com to predict the quality of customer support calls. Using just 73 calls, we were able to train a model that predicted expert ratings of whether a call went well or poorly with 97.3% accuracy. By contrast, using language models alone—including one of the world’s most capable language models along with our in-house language emotion model—resulted in a 3x higher error rate.
Leveraging dynamic patterns of language, vocal expression, and/or facial movement
Our Custom Model API works by integrating complex patterns of language, vocal expression, and/or facial movement captured using Hume’s expression AI models.
To combine these signals with language, we inserted the expression measures extracted by each of our expression models along with transcribed language into a novel empathic large language model (eLLM). We then pretrained our eLLM on millions of human interactions.
When you train a custom model using our Custom Model API, you are leveraging the joint language-expression embeddings extracted by our eLLM to predict your own labels.
Pricing
There are two steps to using our Custom Model API, (1) Training and (2) Inference:
1. Training: During our beta release, the model training process is completely free. This includes uploading data, training, evaluating results, and retraining.
2. Inference: When deploying your custom model in your application, a fee is charged for each file processed by your model. Detailed pricing can be found on our pricing page.
Hope you enjoy using our Custom Model API! If you have any questions, you can post them on our Discord channel. We look forward to seeing what you build.
Subscribe
Sign up now to get notified of any updates or new articles.
Recent articles
00/00
We’re introducing Voice Control, a novel interpretability-based method that brings precise control to AI voice customization without the risks of voice cloning. Our tool gives developers control over 10 voice dimensions, labeled “masculine/feminine,” “assertiveness,” “buoyancy,” “confidence,” “enthusiasm,” “nasality,” “relaxedness,” “smoothness,” “tepidity,” and “tightness.” Unlike prompt-based approaches, Voice Control enables continuous adjustments along these dimensions, allowing for precise control and making voice modifications reproducible across sessions.
Hume AI creates emotionally intelligent voice interactions with Claude
Hume AI trained its speech-language foundation model to verbalize Claude responses, powering natural, empathic voice conversations that help developers build trust with users in healthcare, customer service, and consumer applications.
How EverFriends.ai uses empathic AI for eldercare
To truly connect with users and provide a natural, empathic experience, EverFriends.ai needed an AI solution capable of understanding and responding to emotional cues. They found their answer in Hume's Empathic Voice Interface (EVI). EVI merges generative language and voice into a single model trained specifically for emotional intelligence, enabling it to emphasize the right words, laugh or sigh at appropriate times, and much more, guided by language prompting to suit any particular use case.