2024-08-28: Strawberry Summer

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

Where is OpenAI now?

My overall sense of where OpenAI is has not changed since the board struggle last year: Noam Brown (the guy who solved poker and Diplomacy) and a bunch of folks working on reasoning came up with a fundamental breakthrough, they used inference time compute to get a small model to reason through grade school math problems, adding this to the existing scaling curve means a step up in functionality that cannot be accurately captured by existing benchmarks.

The board struggle last year showed that the foundation of the company was weak, and what followed was spring cleaning. Successively:

  • Jan Leike, Ilya Sutskever and most of the superalignment team parted with the company

  • John Schulman, who let post-training also quits to Anthropic

  • Former NSA Director Paul Nakasone joins the board

  • The rest of the board is rounded out with tech, AI and business luminaries

  • Greg Brockman takes a sabbatical till the end of year

This leaves Wojciech Zaremba as the sole co-founder still left at the firm, a decade after founding. Which is not unusual for the Valley. What is surprising is how a bunch of alpha nerds worth a hundred million dollars a piece managed to stay together for a decade while being actively recruited by everyone in the Valley. That’s tribute to Sam’s management, but now comes the time where difficult decisions have to be made, and rules must be made, and followed. Bureaucracy must be built, release approval pipelines, and org charts and all those MBA things that allow groups of people who don’t necessarily like each other all that much to cohere around an end goal.

The Election

I have been pretty clear also that I think the main event in AI this year is the Presidential Election. I see the uncanny valley in multiple AI domains will be crossed in the next 6 months:

  • self driving with both Waymo and Tesla,

  • image generation with MidJourney, Black Forest Labs, Ideogram

  • character video with Hedra, Heygen

  • voice generation with OpenAI, Elevenlabs, Hume

OpenAI of course must have crossed the valley internally a long time ago. But then the long hold while they built up internal and external systems and support networks.

What comes next is fairly significant. To keep the revenue doubling every 6 months pace, the revenue target for next year would be ~$25 billion ARR. That’s $2 billion per month by Dec 2025.

That’s going to be disruptive. OpenAI will start drinking the milkshakes of a number of other firms in the market. In odd places. Medical professionals complain about “Dr.Google” now, wait till they meet Dr.ChatGPT with Advanced Voice. Lawyers, accountants, most intellectual labor will feel an impact.

And that’s the reason why I believe the US Presidential Election is the next milestone in AI development. It’s the kickoff point for the race detailed in Situational Awareness. It’s when things get real.

Summer

One of the main problems with actually believing in OpenAI is that they are so irreparably Gen-Z coded, that they simply must post vigorously on x.com. And so we hang onto small signs

until we get actual leaks (theinformation is not averse to paying its sources I hear).

OpenAI also gave demonstrations of Strawberry to national security officials this summer, said a person with direct knowledge of those meetings. “

We feel like we have enough [data] for this next model,” Altman said at an event in May, likely referring to Orion.

“We have done all sorts of experiments including generating synthetic data.” (...) What Ilya Saw, Strawberry has its roots in research. It was started years ago by Ilya Sutskever, then OpenAI's chief scientist. He recently left to start a competing AI lab. Before he left, OpenAI researchers Jakub Pachocki and Szymon Sidor built on Sutskever's work by developing a new math-solving model, Q*, alarming some researchers focused on AI safety.

The breakthrough and safety conflicts at OpenAI came just before OpenAI board directors—led by Sutskever—fired Altman before quickly rehiring him. Last year, in the leadup to Q*, OpenAI researchers developed a variation of a concept known as test-time computation, meant to boost LLMs’ problem-solving abilities.

..

In internal demonstrations, Strawberry reportedly solved the New York Times word puzzle "Connections." The model could also serve as a foundation for more advanced AI systems capable of not just generating content, but taking action.

Reuters reported that OpenAI has already tested an AI internally that scored over 90 percent on the MATH benchmark, a collection of math mastery tasks. This is likely Strawberry, which has also been presented to national security officials, according to The Information.

Internal OpenAI documents describe plans to use Strawberry models for autonomous internet searches, enabling the AI to plan ahead and conduct in-depth research.

The Information notes that it's uncertain whether Strawberry will launch this year. If released, it would be a distilled version of the original model, delivering similar performance with less computational power – a technique OpenAI has also used for GPT-4 variants since the original model was released in March 2023.

OpenAI's approach reportedly resembles the "Self-Taught Reasoner" (STaR) method introduced by Stanford researchers, which aims to improve AI systems' reasoning abilities.

This is New York Times Connections, a simple game of hypothesis testing and inference.

This was the final answer that presumably Strawberry got to:

And here’s the comparison to Claude 3.5

They are also claiming State of the Art on the MATH benchmark, with a score of 90.

Now we have three different vectors of capabilities:

  • humanlike persuasion - the Advanced Voice model

  • step by step reasoning - the Strawberry model

  • memorization/knowledge - GPT-5

Capabilities release schedules are tied to revenue goals, which are tied to datacenter construction targets. Meanwhile they also have to watch the progress of other firms in the sector.. too much of a revenue bleed to other parties will mean capabilities advancing elsewhere, which will mean lack of control over safety.

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🖼️ AI Artwork Of The Day

Cavepools - puredigitaldreams from midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.