- Emergent Behavior
- Posts
- EB-5: The SuperPersuader
EB-5: The SuperPersuader
building the foundation model for human emotion
đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
Who
Alan Cowen (LinkedIn), Founder, Chief Scientist and CEO of Hume AI, an empathic AI to serve human well-being.
Alan is an emotion scientist who previously worked as a researcher at Google and UC Berkeley.
He holds a Ph.D. in Psychology from UC Berkeley and has conducted research on how emotional behaviors can be measured, predicted, and translated using computational methods.
What
Hume has developed an "empathic large language model" (eLLM) that is trained on millions of human interactions to measure and understand nuanced emotional expressions in audio, video, and images.
Goal is to optimize AI for human well-being by making it more emotionally fluent
Hume provides an emotion interface layer for developers to customize on top of language AIs
Released voice-based Empathic AI API that links language understanding, expression measurement, and text-to-speech
First product, an emotive voice API (Demo), which can both detect the human speaker’s emotions and respond empathically in a voice with context that acknowledges the emotions of the speaker
When
April 1, 2024
Highlights
On Hume's empathic AI: "We build AI models that understand more than just language. They understand the voice, they understand facial expression, and they can learn from those signals."
On optimizing for speed and reasoning: "We have a fast and a slow system. So we can return a fast response and it carries on the conversation. It's a very conversationally intelligent small model... And then they have a slow response which can integrate tools."
On why AI needs to understand human emotion: "When you're angry about something, it means that you've been treated unfairly. If that motivates the right thought process that allows you to identify how you've been treated unfairly and to remedy the situation, it is the best emotion to be feeling at that time."
On how emotions reverberate through logic: "Somebody dips a cockroach into your orange juice and takes it out. Totally sterilized cockroach, not dangerous at all. But now the orange juice is disgusting, right? It's viscerally disgusting... emotions reverberate across that line. And I think that's what makes them different from perceptions."
On why women like men who make them laugh: "If you meet somebody for the first time and laugh together, you're much more likely to be friends later. It predicts things like divorce rates and stuff."
On nuanced vocal patterns vs. simplistic labels: " Sometimes it'll say it's angry, but actually that just means that it's using a pattern of speech that people might associate with anger."
On how incentives shape scientific theories: "If you go study other cultures and they're exactly the same as your culture, it's not a very interesting finding, right? So they tend to have publication bias toward things that have cultural diversity."
On proving emotional expressions are universal: "We analyzed over 20 million videos and found 16 facial expression dimensions were associated with similar contexts across cultures."
On overturning the flawed conclusions of the 20th century: "Psychologists thought 80% of emotional expression was cultural, based on small samples and culturally-biased images. People made wild arguments from noisy datasets."
On bypassing the domain-specific journals to publish in Nature directly: “You've had many decades of people expounding theories. And because they don't have a data-led orientation, their approach to their career is, I'm going to defend this theory. That is my career. So those people do not review papers well when they contradict that theory.”
On the impact of Transformers in affective computing: "Transformers started affecting other areas very rapidly because the data was already there. In affective computing, you only had really small scale measures of expressive behavior, which is what we had to correct."
On autism and emotional intelligence: "People with ASD can understand expressions, but don't naturally pay attention to them… a different kind of intelligence that's less focused on social interactions."
On the double-edged sword of emotional AI: "If an AI is optimized for your well-being, I think the future will be very bright. If it's optimized for you to buy what somebody is selling you or spend all your time in an app, then that could be dark"
On setting ethical guidelines for emotion AI: "One of the things that was top of mind when I left Google was how are people going to misuse this?"
Listen On
EB-5, the fifth episode of our podcast, dropped last week. Before I continue, the rules of the game are:
Pods that CHART stay alive
Pods that get a Follow on Apple Podcasts CHART
So FIRST, CLICK on the link below (opens up your Apple Podcasts app), and click “+Follow” (in the upper right-hand corner)
Then go ahead and listen to the podcast any way you want to on your preferred app, capiche mon ami?
Listen on
Why
Enabling emotionally intelligent AI could unlock applications in education, health, customer service
Aims to expand emotional palette AI can recognize to improve human-AI interaction
Non-empathic AI’s exhaust humans quickly, engagement required emotion
Am I the only one that is having an allergic reaction to these AI-generated videos and songs?
This feels like borderline spam. It's not that different from the bots on X; it's just more annoying at some level.
Unless AI can create a hit show, a viral tiktok, or top the Grammy… twitter.com/i/web/status/1…
— Bindu Reddy (@bindureddy)
4:06 PM • Apr 5, 2024
How
Built foundation model for human emotion
Collected psychology data at the scale needed to train ML models (100,000s of hours)
Built large-scale dataset labeling millions of videos/images with nuanced emotion tags
Designed studies to isolate expressive behaviors from contextual noise/bias
Built emotion detection models for facial expressions, vocal prosody, laughs, sighs, etc.
Built language models on expression data to make them multimodal
Limitations and Next Steps
Current models primarily trained on English/Western data, need to expand globally
Video generation of emotive faces still in early stages, rollout planned in the future
Challenges in responsibly deploying emotion AI to avoid manipulation/addiction
Risks of AI engagement hacking and kids spending 10+ hours/day on apps
Why It Matters
Emotions influence decision-making as much as logic, but AI has been emotionally illiterate
Personalized AI tutors could detect confusion and optimize learning for each student
Analyzing emotions could enable AIs to have more natural, engaging conversations
But emotional AI misused for manipulation vs well-being optimization is an existential risk
Hume aims to be an ethical leader, publishing first guidelines for empathic AI development
Additional Notes
Cowen's work overturned decades of flawed, small sample-size emotion research
He aims to clean up mistakes of 20th-century psychology with large-scale AI-driven science
GPU availability was a major bottleneck in 2022 but has since improved
Transcripts
Background Research
Hume investor USV:
The company aims to do for AI technology what Bob Dylan did for music: endow it with EQ and concern for human well-being. Dr. Alan Cowen, who leads Hume AI’s fantastic team of engineers, AI scientists, and psychologists, developed a novel approach to emotion science called semantic space theory. This theory is what’s behind the data-driven methods that Hume AI uses to capture and understand complex, subtle nuances of human expression and communication—tones of language, facial and bodily expressions, the tune, rhythm, and timbre of speech, “umms” and “ahhs.”
Hume AI has productized this research into an expressive communication toolkit for software developers to build their applications with the guidance of human emotional expression. The toolkit contains a comprehensive set of AI tools for understanding vocal and nonverbal communication – models that capture and integrate hundreds of expressive signals in the face and body, language and voice. The company also provides transfer learning tools for adapting these models of expression to drive specific metrics in any application. Its technology is being explored for applications including healthcare, education, and robotics. With that early focus on healthcare, the company has partnerships with Mt. Sinai, Boston University Medical Center, and Harvard Medical School.
Empathic AI such as this could pose risks; for example, interpreting emotional behaviors in ways that are not conducive to well-being. It could surface and reinforce unhealthy temptations when we are most vulnerable to them, help create more convincing deepfakes, or exacerbate harmful stereotypes. Hume AI has established the Hume Initiative with a set of six guiding principles: beneficence, emotional primacy, scientific legitimacy, inclusivity, transparency and consent. As part of those principles and guidelines developed by a panel of experts (including AI ethicists, cyberlaw experts, and social scientists), Hume AI has committed to specific use cases that they will never support.
Potential use cases
The digital applications for that work are, at least in theory, vast. Cowen foresees digital-assistant algorithms recognizing emotions on our faces and in our voices and then making recommendations accordingly, like suggesting soothing sounds for the stressed.
He also imagines crisis call-center operators employing Hume-based AI to help diagnose the seriousness of a person’s depression from their vocalizations.
Keltner noted further rich possibilities, including customizing educational plans based on children’s particular emotional responses and judging the real-world risks of incendiary political content.
Cowen said he even envisions social media companies using Hume’s platform to gauge a user’s mood — then algorithmically adjusting served posts to improve it.
“Is it increasing people’s sadness when they see a post? Is it making them sad a day or two later even when the post is gone?” Cowen said. “Companies have been lacking objective measures of emotions, multifaceted and nuanced measures of people’s negative and positive experiences. And now they can’t say that anymore.”
The Landmark 2020 Nature Paper summary:
Universal Context-Expression Associations: Using deep neural networks and a dataset of 6 million videos from 144 countries, the study identifies 16 types of facial expressions that have distinct associations with specific contexts (e.g., weddings, sports competitions) that are preserved across 12 world regions. This suggests a 70% preservation of facial expression-context associations globally, indicating a degree of universality in how facial expressions are used in different social situations.
Regional Variations in Expression Frequency: While certain expressions are universally associated with specific contexts, regions differ in the frequency of these expressions based on the salience of those contexts locally. This finding highlights both universal patterns in facial expressions and regional variations influenced by cultural context.
Globally Preserved Facial Expressions: The research confirms that all 16 examined facial expressions have distinct meanings that are understood across different cultures. This supports the idea that human facial expressions and their interpretations have a common foundation worldwide.
Technological Methodology: The study leverages advanced machine learning techniques, specifically deep neural networks (DNNs), to analyze facial expressions and their context associations across a vast dataset of videos. This methodological approach represents a significant advancement in emotion research, allowing for the analysis of naturalistic behavior on a scale previously unachievable.
Implications for Emotional Theory and Technology: The findings have broad implications for theories of emotion, suggesting that facial expressions are part of a universal language of emotion. Additionally, the results are relevant for the development of technology that relies on facial expression recognition, such as affective computing applications, ensuring these technologies are applicable across different cultural contexts.
Press on the Nature paper
To teach the network, Cowen and his colleagues needed a large repository of videos rated by human viewers. A team of English-speaking raters in India completed this task, making 273,599 ratings of 186,744 YouTube video clips lasting one to three seconds each. The research team used the results to train the neural network to classify patterns of facial movement with one of 16 emotion-related labels, such as pain, doubt or surprise.
Semantic Space Theory
(A) Schematic representation of a semantic space of emotion. A semantic space of emotion is described by (a) its dimensionality, or the number of varieties of emotion within the space; (b) the conceptualization of the emotional states in the space in terms of categories and more general affective features—that is, the concepts which tile the space (upper right); and (c) the distribution of emotional states within the space. A semantic space is the subspace that is captured by concepts (e.g., categories, affective features). For example, a semantic space of expressive signals of the face is a space of facial movements along dimensions that can be conceptualized in terms of emotion categories and affective features, and that involves clusters or gradients of expression along those dimensions.
AI measurement of dynamic emotional reaction in real time
Dynamic Reaction - Measure dynamic patterns in facial expression over time that are correlated with over 20 distinct reported emotions
The Demo
The Voice Behind the Demo
The secret behind our Empathic Voice Interface (EVI) demo... #AprilFools
— Hume (@hume_ai)
9:15 PM • Apr 1, 2024
Reply