Tutorial: Hands-on with Hume AI’s API
By Vineet Tiruvadi on Sep 9, 2022
Hands-on with Hume AI’s API
Welcome to our first Hume AI Platform product walk-through.
A major goal of our platform is to provide developers with new tools to understand users' expressions and wellness, beyond engagement (read more). One of the important ways we do this is through our platform's application programming interface (API).
In this post we’re going to walk you through the main steps of working with our API so you can start integrating our models into creative, scientific, and empathic applications of your own.
We'll cover:
1. Finding Your API access key
2. Deciding Between Batch and Streaming APIs
3. Making the API Call
4. Checking Out Your Results
Ready? Let's get into it.
This is Dr. Dacher Keltner. He’s written extensively on happiness and compassion, and he’s our Chief Scientific Advisor.
Let’s use the Hume AI Platform to measure the facial expression Dr. Keltner is forming in the image here.
To do so, we’ll need to find our API access key.
Step 1: Finding Your API Access Key
This is a key-code specific to your account that lets you authenticate into our platform. To retrieve your API key, visit beta.hume.ai, click on your profile icon in the upper right corner, and choose Settings. Your key is listed as part of your Profile. For a more detailed tutorial on accessing the API, check out our help page.
Next, we need to decide how we want to feed our data into the platform.
Step 2: Deciding Between Batch and Streaming APIs
There are two APIs we can choose from: a batch API and a streaming API.
The batch API can process a single media file or multiple files in parallel. It measures all of the expressions found in each file and notifies you when the results are ready, usually within a few minutes.
The streaming API can be connected to a live webcam or microphone input and returns measures of expressive behavior in real time.
For a saved image, the batch API is the best fit. We’ll work with the batch API in this tutorial, but spend more time with the streaming API in future tutorials.
So, now that we’ve got an API key and we’ve decided to go with the batch API, we can tell the Hume AI Platform where our data is and how we want to analyze it.
Step 3: The API Call
We’ll be using curl to call the API. We’ll first need to specify the URL of the data we want to analyze and the model(s) we want to use (to explore the models we have available, see our Products page).
We’ll also need a link to our data that is accessible to our API (our example will use a publicly available URL), along with our Platform Access Key.
Once we have those ready, we can package up the information into a format our APIs can understand:
curl --header "Content-Type: application/json" --request POST --data '{"urls": ["<YOUR_URL>"],"models": {"<THE_MODEL_TO_USE>": {}}}' "https://api.hume.ai/v0/batch/jobs?apikey=<YOUR_API_KEY>"
To measure the facial expression Dr. Keltner is forming, we’ll send the URL of the picture above to our facial expression model (“face”), using our API key for authorization:
curl --header "Content-Type: application/json" --request POST --data '{"urls": ["https://assets.nationbuilder.com/mysticartists/pages/315/attachments/original/1462271161/DacherKeltner.jpg"],"models": {"face": {}}}' "https://api.hume.ai/v0/batch/jobs?apikey=<YOUR_API_KEY>"
This command typically takes just a second or two to process, but in rare cases may take up to two minutes as we scale up our platform access.
Step 4: Checking Out Our Results
The response to our curl request includes a URL where we can find our results once the models are done processing our data. By default, as soon as that URL is populated, we’ll receive an email from the platform with a link to the results in a JSON formatted file similar to the example shown below. The JSON contains some additional metadata, as well as a breakdown of the scores for dimensions of expression, labeled with emotion categories, that we’ve found are largely consistent in meaning across cultures (read more here).
Once the models are done, we're provided with measures of the facial expression that Dr. Keltner is forming. The output indicates that Dr. Keltner’s expression loads highly on dimensions of “Joy” and “Amusement.” Note that these labels are just shorthands for underlying patterns of facial movement, not readouts of what Dr. Keltner is feeling (which would be impossible). We use emotion labels because they capture the subtlety of human expression; unfortunately, coarse descriptors like “smile” and “scowl” do not.
{ "bbox": { "x": 94.045, "y": 38.421, "w": 66.237, "h": 86.245 }, "emotions": [ { "name": "Calmness", "score": 0.220 }, { "name": "Boredom", "score": 0.198 }, { "name": "Interest", "score": 0.185 } # ... More emotions ] }
And that’s it! We’ve gotten ourselves set up with the Hume AI Platform. Now that we can use Hume’s APIs, we have everything we need to start integrating cutting-edge models of expressive communication into bigger projects.
In Closing
The Hume AI Platform strives to be the only toolkit developers need to measure verbal and nonverbal cues in audio, video, or images, based on rigorous scientific studies of human expressive behavior. Our API is the simplest programmatic endpoint for working with our models. You can also explore the outputs of our models interactively on our Playground.
In this post we walked through the basics of using our API to measure expressive behavior in an example file. For more details and the latest documentation, be sure to bookmark our tutorials. And if you plan to develop an application using our API, note that it will need to adhere to the ethical guidelines of the The Hume Initiative.
We’re excited to expand our private beta over the next few months, and look forward to what you’ll build with it! If you have any questions, please reach out to [email protected].
Connect With Us
Follow us on twitter @hume_ai or reach out to us directly. If you’re interested in beta access, feel free to sign up.
Subscribe
Sign up now to get notified of any updates or new articles.
Recent articles
00/00
We’re introducing Voice Control, a novel interpretability-based method that brings precise control to AI voice customization without the risks of voice cloning. Our tool gives developers control over 10 voice dimensions, labeled “masculine/feminine,” “assertiveness,” “buoyancy,” “confidence,” “enthusiasm,” “nasality,” “relaxedness,” “smoothness,” “tepidity,” and “tightness.” Unlike prompt-based approaches, Voice Control enables continuous adjustments along these dimensions, allowing for precise control and making voice modifications reproducible across sessions.
Hume AI creates emotionally intelligent voice interactions with Claude
Hume AI trained its speech-language foundation model to verbalize Claude responses, powering natural, empathic voice conversations that help developers build trust with users in healthcare, customer service, and consumer applications.
How EverFriends.ai uses empathic AI for eldercare
To truly connect with users and provide a natural, empathic experience, EverFriends.ai needed an AI solution capable of understanding and responding to emotional cues. They found their answer in Hume's Empathic Voice Interface (EVI). EVI merges generative language and voice into a single model trained specifically for emotional intelligence, enabling it to emphasize the right words, laugh or sigh at appropriate times, and much more, guided by language prompting to suit any particular use case.