Sep 11, 2024 12:27 PM

This New Tech Puts AI In Touch With Its Emotions—and Yours

Hume AI, a startup founded by a psychologist who specializes in measuring emotion, gives some top large language models a realistic human voice.

A photo illustration of three men talking in a circle with one having a blue overlay over his face and hands.

Photo-Illustration: WIRED Staff; Getty Images

A new “empathic voice interface” launched today by Hume AI, a New York–based startup, makes it possible to add a range of emotionally expressive voices, plus an emotionally attuned ear, to large language models from Anthropic, Google, Meta, Mistral, and OpenAI—portending an era when AI helpers may more routinely get all gushy on us.

“We specialize in building empathic personalities that speak in ways people would speak, rather than stereotypes of AI assistants,” says Hume AI cofounder Alan Cowen, a psychologist who has coauthored a number of research papers on AI and emotion, and who previously worked on emotional technologies at Google and Facebook.

WIRED tested Hume’s latest voice technology, called EVI 2 and found its output to be similar to that developed by OpenAI for ChatGPT. (When OpenAI gave ChatGPT a flirtatious voice in May, company CEO Sam Altman touted the interface as feeling “like AI from the movies.” Later, a real movie star, Scarlett Johansson, claimed OpenAI had ripped off her voice.)

Like ChatGPT, Hume is far more emotionally expressive than most conventional voice interfaces. If you tell it that your pet has died, for example, it will adopt a suitable somber and sympathetic tone. (Also, as with ChatGPT, you can interrupt Hume mid-flow, and it will pause and adapt with a new response.)

OpenAI has not said how much its voice interface tries to measure the emotions of users, but Hume’s is expressly designed to do that. During interactions, Hume’s developer interface will show values indicating a measure of things like “determination,” “anxiety,” and “happiness” in the users’ voice. If you talk to Hume with a sad tone it will also pick up on that, something that ChatGPT does not seem to do.

Hume also makes it easy to deploy a voice with specific emotions by adding a prompt in its UI. Here it is when I asked it to be “sexy and flirtatious”:

Hume AI's “sexy and flirtatious” message

And when told to be “sad and morose”:

Hume AI's "sad and morose" message

And here’s the particularly nasty message when asked to be “angry and rude”:

Hume AI's “angry and rude” message

The technology did not always seem as polished and smooth as OpenAI’s, and it occasionally behaved in odd ways. For example, at one point the voice suddenly sped up and spewed gibberish. But if the voice can be refined and made more reliable, it has the potential to help make humanlike voice interfaces more common and varied.

The idea of recognizing, measuring, and simulating human emotion in technological systems goes back decades and is studied in a field known as “affective computing,” a term introduced by Rosalind Picard, a professor at the MIT Media Lab, in the 1990s.

Albert Salah, a professor at Utrecht University in the Netherlands who studies affective computing, is impressed with Hume AI’s technology and recently demonstrated it to his students. “What EVI seems to be doing is assigning emotional valence and arousal values [to the user], and then modulating the speech of the agent accordingly,” he says. “It is a very interesting twist on LLMs.”

Salah says Hume’s technology could prove useful in marketing, as well as for mental health therapy. However he notes that people often disguise their true emotions or change their affect during an interaction, making it difficult for AI systems to pick up on their true feelings accurately. He also wonders how well the technology works for non-English languages, and notes that subtle biases could cause it to treat different accents differently, something Hume says it has addressed with a diversity of training data.

Cowen envisions a time when voice assistants are far more attuned to your feelings, responding with what appears to be genuine empathy when you are frustrated. As AI-powered voice assistants multiply, Cowen believes that each will need to exhibit a consistent personality and emotional tone to build trust with users. “We’ll have so many different AIs that we talk to,” he says. “Just being able to recognize one by voice, I think, is hugely important for this future.”

Jess Hoey, a professor at the University of Waterloo who studies affective computing, says it is important to note that LLMs can only mimic human emotion because they do not, in fact, experience any emotions. “AI helpers will appear to be more empathic in the near future, but I do not think they will actually be more empathic,” he says. “And I think most humans will see through this thin disguise.”

Even if there is no real feeling behind the bot, there may be risks to playing with users’ emotions. OpenAI has said it is proceeding carefully with ChatGPT’s voice interface, conducting research to determine how addictive or persuasive the interface might turn out to be. Hume has established the Hume Initiative, which brings in outside experts to provide ethical guidelines and oversight as it develops and deploys its technology.

Danielle Krettek-Cobb, an advisor to Hume who previously worked with Cowen at Google, says tech companies have been relatively slow in tapping into the emotional potential of technology but will need to be more ambitious in order to build machines that are more intelligent. “I believe the most important aspect of human intelligence is social and emotional,” she says. “It is how we understand and relate to the world—it is our original interface.”

You Might Also Like …