Sora Roundup

The reviews of the leap forward in video models

Prakash Ate-A-Pi
February 20, 2024

Here’s today at a glance:

Sora Roundup
Things happen
AI artwork of the day

📚 Sora Roundup

On Jerk Day, Thursday, Feb 15th, OpenAI disclosed their text to video model: Sora, and the world moved forward again. It was not completely unexpected, as many, many teams across the industry were working on individual aspects of video. But still… it was a great leap forward. It is very hard to generate any video for more than 2 seconds, let alone up to one minute, without any weird morph artifacts and missing and disappearing features.

Comparisons

.@OpenAI SORA vs @pika_labs vs @runwayml vs @StabilityAI Video.
I gave the other models SORA's starting frame. I tried my best prompting and camera motion techniques to get the other models to output something similar to SORA.
SORA's just much better at longer scenes.
— Gabor Cselle (@gabor)
1:02 AM • Feb 16, 2024

Capabilities

I just want to take a moment to explore the capabilities of the Sora model, It shows

Clear signs of having been trained on the output of a 3D engine

here is a better one:
— Sam Altman (@sama)
7:09 PM • Feb 15, 2024

It can generate multiple videos in the same “world” at the same time. This means that eventually, you can just imagine a scene from every possible angle, without needing cameras everywhere.

Sora can generate multiple videos side-by-side simultaneously.
This is a single video sample from Sora. We didn't stitch this together; Sora decided it wanted to have five different viewpoints all at once!
— Bill Peebles (@billpeeb)
9:05 PM • Feb 17, 2024

Sequential scene changes in the same story world

welcome to bling zoo! this is a single video generated by sora, shot changes and all.
— Bill Peebles (@billpeeb)
8:16 PM • Feb 15, 2024

Storytelling

Sora can also generate stories involving a sequence of events, although it's far from perfect.
For this video, I asked that a golden retriever and samoyed should walk through NYC, then a taxi should stop to let the dogs pass a crosswalk, then they should walk past a pretzel and… twitter.com/i/web/status/1…
— Bill Peebles (@billpeeb)
9:27 PM • Feb 17, 2024

Worryingly realistic-looking humans

— Sam Altman (@sama)
7:59 PM • Feb 15, 2024

Sora allows video-to-video editing

Video Editing will never be the same.
''Sora'' by OpenAI doesn't just generate simple AI videos, it can also edit videos.
Video-to-video editing using diffusion models allows Sora to transform styles and environments in input videos based on text prompts.
Here are 12 examples:… twitter.com/i/web/status/1…
— Endrit (@EndritRestelica)
4:16 PM • Feb 18, 2024

Same Data Source

The comparisons between Sora and Midjourney revealed that they seemed to have been trained on the same data. When we dream in latent space, we have similar dreams.

An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat...
— Nick St. Pierre (@nickfloats)
2:22 PM • Feb 16, 2024

In effect, the similarity in training data causes convergence to the same district of latent space. Another example below:

An adorable happy otter confidently stands on a surfboard wearing a yellow lifejacket, riding along turquoise tropical waters near lush tropical islands, 3D digital render art style. --ar 16:9 --style raw
— Nick St. Pierre (@nickfloats)
2:22 PM • Feb 16, 2024

We Don’t Know How To Do This

Meanwhile, Yann LeCun, Facebook’s AI chief, declared in the Middle East just days prior that generative AI would never reach this milestone:

Yann LeCun, a few days ago at the World Governments summit, on AI video:
“We don’t know how to do this”
— Ric Burton (@ricburton)
6:32 AM • Feb 16, 2024

Yann was out and about on Twitter defending his statements, and to be honest, he may still be right in the end, but still, the juxtaposition is a tad embarrassing.

In any case, there was an incredible amount of cope among real-world animators.

The reason I'm not scared (yet) of the Sora vids as an animator is that animation is an iterative process, especially when working for a client
Here's a bunch of notes to improve one of the anims, which a human could address, but AI would just start over
What client wants that?
— Owen Fern (@owenferny)
1:26 PM • Feb 16, 2024

Though everyone should know better at this point

Build Alpha

The best information on the Sora build came from the co-author of the underlying paper, Saining Xie:

He goes on to speculate that Sora might only be a 3 billion parameter model, which implies:

not that many GPUs utilized for generation
fast inference
cheap
lots more runway to improve
and quickly

There are real questions on how closely Sora is simulating reality, with some converting Sora video into 3D scrollable representations known as radiance fields:

Sora unlocks generative large scale radiance fields and this is such a massive deal
radiancefields.com/openai-launche…
— Radiance Fields (@RadianceFields)
8:23 PM • Feb 15, 2024

OpenAI’s first intern, Dr. Jim Fan, was roundly shouted down, but persisted in that Sora must be performing both world and physics modeling,

Poor Google

Meanwhile, poor Google achieved 5-second videos in late January and has still not released the model to the public. Compare:

3週間前にGoogleが発表した「Lumiere」
vs.
今日OpenAIが発表した「Sora」
— Matsuda Takumi (@matsuda_tkm)
8:38 AM • Feb 16, 2024

The final Sora rundown

Leveraging spacetime patches, Sora offers a unified representation for large-scale training across various durations, resolutions, and aspect ratios.
It generates high-definition content, showcasing its prowess in handling videos and images with dynamic aspect ratios.
It excels in framing and composition, outperforming traditional square-cropped training methods.
Utilizing descriptive video captions, Sora achieves higher text fidelity, making it adept at following detailed user prompts for video generation.
From animating static images to extending videos, Sora showcases a wide range of editing capabilities.
Sora's training reveals emergent properties like 3D consistency and long-range coherence, hinting at its potential as a simulator for the physical and digital world.

All Known Soras

A supercut of all known and confirmed Sora videos with their associated prompts.

Share this story

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🗞️ Things Happen

Legendary chip architect Jim Keller responds to Sam Altman's plan to raise $7 trillion to make AI chips — 'I can do it for less than $1 trillion'. Everyone is targeting chips at this point
Geoffrey Hinton: 200,000 people a year die of incorrect medical diagnoses in the United States. AI will fix that in the next 10 years.

🖼️ AI Artwork Of The Day

The Simpsons Remade as K-drama - u/No-Neighborhood-2119 in r/MidJourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.