Emergent Behavior
Posts
2024-06-21-EB-15: World Builder

2024-06-21-EB-15: World Builder

Prakash Ate-A-Pi
June 21, 2024

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

2024-06-21-EB-15: World Builder
AI artwork of the day

2024-06-21-EB-15: World Builder

Who:

Michael Lingelbach, (LinkedIn) Co-founder of Hedra, an AI startup focused on creating highly controllable video generation models. Michael's passion for storytelling and filmmaking drives Hedra's mission to empower creators with cutting-edge AI tools.

What:

The interview centers around Hedra's launch of Character One, their initial AI model capable of generating expressive talking-head videos from a single image and audio input.

Manifesting
— Ate-a-Pi (@8teAPi)
4:03 PM • Jun 18, 2024

I got early access to @hedra_labs Character-1 and it's incredible.
The model generates video and dynamic 3D content with a focus on expressive humans.
Workflow and examples below:
#hedra
— Heather Cooper (@HBCoop_)
4:20 PM • Jun 18, 2024

Highlights:

On the issues with video generation models. "What's always disappointed me about these video models is that...You generate a character and that character's face moves randomly, their lips move randomly, their hair kind of phases in and out of existence."
On how Hedra is different “Think of it as a spatial-temporal control net, you apply a signal on the video, that guides the generation process. If you really want to cross that uncanny valley, you need to co-generate the characters’ expressions, how the character moves, even how they interact with the scene in line with the audio cues.“
On replicating nuanced human behavior. "One of the first things I was struck by when we made the very first version of the model, even in the earliest days, you could see that characters would take a breath kind of whenever you heard that in the audio track."
On users' creative interpretations. "Not in distribution, but people are making stuff."
On capturing the nuances of different animation styles. "We know the domain of how an anime character should talk. So if we run our model with the appropriate settings, you can reproduce something that's going to look like it came out of a show.
On automating the tedious parts of animation. "That's why there's like a lot of promise in building these world models, right? Is because you start with a keyframe and then you can provide some type of control signal to get you to the next keyframe."
On removing limitations and sparking creativity. "Can we just let people not be restricted by the thought of, 'My God, this is going to take so long to generate and it might not look good'... but get people into a flow where they're operating in real-time?"
On the freedom of fast generation speeds. "Our model's so fast, you have to reroll it."
On the unexpected challenges of animating horses. " If you ever watch a video model, this is one of my litmus tests, watch the legs of the horse as they pass by each other. In a lot of generations, the legs will just switch places."
On the evolving relationship between creators and AI. "I don't see the role of generative media as being a replacement tool, but as a creativity augmentation tool."
On navigating the startup tightrope. "We had our internal roadmap and like how we want to scale and build out these models... And now we are also trying to figure out to what extent do we optimize the current versions of the models to serve all these requests while we're simultaneously scaling up the team, scaling up compute, and moving on to these more ambitious versions of our model..."
On the importance of artistic considerations. "Oftentimes it's not just like the capabilities or diversity of what the model can handle, but also a matter of like, What are you actually rendering out? How are you color grading? All this other stuff that goes into making content that aligns with what people's expectations are based on what they've seen."
On the potential for AI to unlock new creative workflows. "I think that the ability to rewind and generate counterfactuals is an extremely powerful creative aid."
On sweating the small stuff to achieve realism. "One thing that's not commented on yet... is the lighting and the hair... if we situate a character... and there's a spotlight shine on that character, is that going to accurately model on their face, or is the character going to be totally lit in a different way?"
On prioritizing latency for user experience. "One of our advantages was going to be speed... we knew that one of the huge off-putting elements for consumer AI is the wait times."
On reimagining storytelling through shared creation. "I'm very excited to empower collaborative storytelling... You should be able to create that with your friends."
On democratizing access to powerful storytelling tools. "You can generate characters now and you can tell them to do things like you just couldn't do... I'm really excited to see what people can make."
On needing research on user GUIs for some features “We had a version where you can actually position the character in 3D space, move it forward and back. We are planning to put this in a subsequent product that we'll be releasing.“
On the separation between China and the US “These products just remain the Chinese ecosystem of apps and they're not really brought to the US or Western markets. But, I do think China has been leading in deploying generative media solutions“
On the compromises in diffusion models “historically there's been like a trade off when you inject something like a control net into a diffusion model. Oftentimes by constraining the model, you get a less aesthetically satisfying result.”
On the end of an era in social media “But I believe that the day in which you need to pick up a camera if you want to make some sort of content for social media, is coming to a close.“

Listen Here

EB-15, the fifteenth episode of our podcast, dropped this week. Before I continue, the rules of the game are:

Pods that CHART stay alive
Pods that get a Follow on Apple Podcasts CHART

So FIRST, CLICK on the link below (opens up your Apple Podcasts app) and click “+Follow” (in the upper right-hand corner)

Then go ahead and listen to the podcast any way you want to on your preferred app, capiche mon ami?

Listen on

Why:

Michael sees a vast untapped potential for AI to democratize content creation by providing intuitive tools that simplify animation and video production.
He aims to empower storytellers and creators of all skill levels, allowing them to bring their ideas to life without needing extensive technical expertise.

How:

Hedra leverages a novel architecture that incorporates audio as a core control signal, driving the generation of facial expressions, lip movements, and even subtle body language.
While they are currently using a diffusion-based model, Michael emphasizes their willingness to explore different architectures to optimize speed and quality.
Hedra prioritizes building a user-friendly platform that allows for easy experimentation and iteration, enabling a creative flow for users.

What are the limitations and what's next:

The current model is limited to talking-head videos with a fixed resolution.
Future development will focus on expanding to full-body animation, environments, and higher resolutions.
Michael envisions a future where creators can seamlessly blend real and generated content to produce high-quality videos with unprecedented ease.

Why It Matters:

Hedra's work represents a significant step toward making AI video generation more accessible and user-friendly.
By incorporating elements like audio conditioning and focusing on controllability, they address some of the key limitations that have hindered the broader adoption of these tools.
If successful, Hedra's technology could revolutionize video production, empowering a new generation of creators and fundamentally changing how we tell stories.

Additional Notes:

The interview highlights the team's excitement about the creative potential of their technology, illustrated by their internal use of the model to generate humorous content like talking potato memes.
Michael's passion for storytelling shines through, emphasizing his desire to build tools that unlock creativity and allow everyone to share their unique voice with the world.

Share this story

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🖼️ AI Artwork Of The Day

Star Wars-themed coffee shops - frontbackend from midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.