Emergent Behavior
Posts
2024-07-19-Four Omni Mini

2024-07-19-Four Omni Mini

Prakash Ate-A-Pi
July 19, 2024

🔷Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

Four Omni Mini
AI artwork of the day

Four Omni Mini

OpenAI announced GPT-4o-mini Thursday in a rather quiet announcement. Quick facts:

It’s a small, fast, cheap model
It’s better than GPT-4-turbo, Clause Haiku, and Gemini Flash on most benchmarks but behind GPT-4o
It replaced GPT-3.5-turbo in the API

OpenAI’s strategy of competing wherever there is a critical mass of AI dollars, including at the low end means that it was inevitable that they would come after this market. Though Sam’s spin is slightly more sophisticated:

towards intelligence too cheap to meter:
15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast.
most importantly, we think people will really, really like using the new model.
— Sam Altman (@sama)
5:09 PM • Jul 18, 2024

The model is at the quality/cost frontier

GPT-4o Mini, announced today, is very impressive for how cheap it is being offered 👀
With a MMLU score of 82% (reported by TechCrunch), it surpasses the quality of other smaller models including Gemini 1.5 Flash (79%) and Claude 3 Haiku (75%). What is particularly exciting is… x.com/i/web/status/1…
— Artificial Analysis (@ArtificialAnlys)
4:35 PM • Jul 18, 2024

Andrej Karpathy’s take was that we are in the process of distilling the rules for thought out of bigger and bigger models, separating knowledge base from thinking:

LLM model size competition is intensifying… backwards!

My bet is that we'll see models that "think" very well and reliably that are very very small. There is most likely a setting even of GPT-2 parameters for which most people will consider GPT-2 "smart". The reason current models are so large is because we're still being very wasteful during training - we're asking them to memorize the internet and, remarkably, they do and can e.g. recite SHA hashes of common numbers, or recall really esoteric facts. (Actually LLMs are really good at memorization, qualitatively a lot better than humans, sometimes needing just a single update to remember a lot of detail for a long time). But imagine if you were going to be tested, closed book, on reciting arbitrary passages of the internet given the first few words. This is the standard (pre)training objective for models today. The reason doing better is hard is because demonstrations of thinking are "entangled" with knowledge, in the training data.

Therefore, the models have to first get larger before they can get smaller, because we need their (automated) help to refactor and mold the training data into ideal, synthetic formats.

It's a staircase of improvement - of one model helping to generate the training data for next, until we're left with "perfect training set". When you train GPT-2 on it, it will be a really strong / smart model by today's standards. Maybe the MMLU will be a bit lower because it won't remember all of its chemistry perfectly. Maybe it needs to look something up once in a while to make sure.

This is not very different from Tesla with self-driving networks. What is the "offline tracker" (presented in AI day)? It is a synthetic data generating process, taking the previous, weaker (or e.g. singleframe, or bounding box only) models, running them over clips in an offline 3D+time reconstruction process, and generating cleaner training data, at scale, directly for the 3D multicam video networks. The same has to play out in LLMs.

@karpathy

How cheap is it?

$0.15/$0.60 per million input/output tokens, and half that for batched requests. That’s a 99% price drop from last year’s GPT4 pricing. More use cases will become feasible at this point.

Structured Output

It's also very good at structured output, which will impact a lot of parsing use cases. This is probably the most important industrial use case for small models.

4o-mini passes all the simple instructor evals for structured output. damn
— jason liu (@jxnlco)
5:27 PM • Jul 18, 2024

And it can be fine-tuned…

Constitution Testing

Finally, they’re testing role-based access controls in production:

GPT-4o mini in the API is the first model to apply our instruction hierarchy(opens in a new window) method, which helps to improve the model’s ability to resist jailbreaks, prompt injections, and system prompt extractions.

OpenAI

This is pretty exciting as the learnings from a production run will definitely be used in GPT-5 construction later this year.

Summary

All in all, it’s an exciting update on the path forward toward greater intelligence

Share this story

🌠Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🖼️ AI Artwork Of The Day

Alternate Universe DC movies - Slippin_Jimm from midjourney

That’s it for today! Become a subscriberfor daily breakdowns of what’s happening in the AI world:

Reply

or to participate.