- Emergent Behavior
- Posts
- Sora Roundup
Sora Roundup
Comparisons, capabilities and rundown
🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.
On Jerk Day, Thursday, Feb 15th, OpenAI disclosed their text-to-video model: Sora, and the world moved forward again. It was not completely unexpected, as many, many teams across the industry were working on individual aspects of video. But still… it was a great leap forward. It is very hard to generate any video for more than 2 seconds, let alone up to one minute, without any weird morph artifacts and missing or disappearing features.
Comparisons
.@OpenAI SORA vs @pika_labs vs @runwayml vs @StabilityAI Video.
I gave the other models SORA's starting frame. I tried my best prompting and camera motion techniques to get the other models to output something similar to SORA.
SORA's just much better at longer scenes.
— Gabor Cselle (@gabor)
1:02 AM • Feb 16, 2024
Capabilities
I just want to take a moment to explore the capabilities of the Sora model, It shows
Clear signs of having been trained on the output of a 3D engine
here is a better one:
— Sam Altman (@sama)
7:09 PM • Feb 15, 2024
It can generate multiple videos in the same “world” at the same time. This means that eventually, you can just imagine a scene from every possible angle, without needing cameras everywhere.
Sora can generate multiple videos side-by-side simultaneously.
This is a single video sample from Sora. We didn't stitch this together; Sora decided it wanted to have five different viewpoints all at once!
— Bill Peebles (@billpeeb)
9:05 PM • Feb 17, 2024
Sequential scene changes in the same story world
welcome to bling zoo! this is a single video generated by sora, shot changes and all.
— Bill Peebles (@billpeeb)
8:16 PM • Feb 15, 2024
Storytelling
Sora can also generate stories involving a sequence of events, although it's far from perfect.
For this video, I asked that a golden retriever and samoyed should walk through NYC, then a taxi should stop to let the dogs pass a crosswalk, then they should walk past a pretzel and… twitter.com/i/web/status/1…
— Bill Peebles (@billpeeb)
9:27 PM • Feb 17, 2024
Worryingly realistic-looking humans
Sora allows video-to-video editing
Video Editing will never be the same.
''Sora'' by OpenAI doesn't just generate simple AI videos, it can also edit videos.Video-to-video editing using diffusion models allows Sora to transform styles and environments in input videos based on text prompts.
Here are 12 examples:… twitter.com/i/web/status/1…
— Endrit (@EndritRestelica)
4:16 PM • Feb 18, 2024
Same Data Source
The comparisons between Sora and Midjourney revealed that they seemed to have been trained on the same data. When we dream in latent space, we have similar dreams.
An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat...
— Nick St. Pierre (@nickfloats)
2:22 PM • Feb 16, 2024
In effect, the similarity in training data causes convergence to the same district of latent space. Another example below:
An adorable happy otter confidently stands on a surfboard wearing a yellow lifejacket, riding along turquoise tropical waters near lush tropical islands, 3D digital render art style. --ar 16:9 --style raw
— Nick St. Pierre (@nickfloats)
2:22 PM • Feb 16, 2024
We Don’t Know How To Do This
Meanwhile, Yann LeCun, Facebook’s AI chief, declared in the Middle East just days prior that generative AI would never reach this milestone:
Yann LeCun, a few days ago at the World Governments summit, on AI video:
“We don’t know how to do this”
— Ric Burton (@ricburton)
6:32 AM • Feb 16, 2024
Yann was out and about on Twitter defending his statements, and to be honest, he may still be right in the end, but still, the juxtaposition is a tad embarrassing.
In any case, there was an incredible amount of cope among real-world animators.
The reason I'm not scared (yet) of the Sora vids as an animator is that animation is an iterative process, especially when working for a client
Here's a bunch of notes to improve one of the anims, which a human could address, but AI would just start over
What client wants that?
— Owen Fern (@owenferny)
1:26 PM • Feb 16, 2024
Though everyone should know better at this point
Build Alpha
The best information on the Sora build came from the co-author of the underlying paper, Saining Xie:
He goes on to speculate that Sora might only be a 3 billion parameter model, which implies:
not that many GPUs utilized for generation
fast inference
cheap
lots more runway to improve
and quickly
There are real questions on how closely Sora is simulating reality, with some converting Sora video into 3D scrollable representations known as radiance fields:
Sora unlocks generative large scale radiance fields and this is such a massive deal
radiancefields.com/openai-launche…
— Radiance Fields (@RadianceFields)
8:23 PM • Feb 15, 2024
OpenAI’s first intern, Dr. Jim Fan, was roundly shouted down, but persisted in that Sora must be performing both world and physics modeling,
Poor Google
Meanwhile, poor Google achieved 5-second videos in late January and has still not released the model to the public. Compare:
3週間前にGoogleが発表した「Lumiere」
vs.
今日OpenAIが発表した「Sora」— Matsuda Takumi (@matsuda_tkm)
8:38 AM • Feb 16, 2024
The final Sora rundown
Leveraging spacetime patches, Sora offers a unified representation for large-scale training across various durations, resolutions, and aspect ratios.
It generates high-definition content, showcasing its prowess in handling videos and images with dynamic aspect ratios.
It excels in framing and composition, outperforming traditional square-cropped training methods.
Utilizing descriptive video captions, Sora achieves higher text fidelity, making it adept at following detailed user prompts for video generation.
From animating static images to extending videos, Sora showcases a wide range of editing capabilities.
Sora's training reveals emergent properties like 3D consistency and long-range coherence, hinting at its potential as a simulator for the physical and digital world.
All Known Soras
A supercut of all known and confirmed Sora videos with their associated prompts.
Become a subscriber for daily breakdowns of what’s happening in the AI world:
Reply