Emergent Behavior
Posts
2024-04-22: Zuck on Dwarkesh

2024-04-22: Zuck on Dwarkesh

Prakash Ate-A-Pi
April 21, 2024

🔷 Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

Zuck on Dwarkesh
AI artwork of the day

❄️ Zuck on Dwarkesh

Meta CEO Mark Zuckerberg launched Llama 3 on the Dwarkesh Patel podcast. A lot has been said in social media (yes, I admit some of it by me) on both Llama 3 and the podcast, but I’m going to cover the critical bits of narrative that were surprising to me.

TLDR: AI winter is here. Zuck is a realist and believes progress will be incremental from here on. No AGI for you in 2025.

Zuck is essentially a real-world growth pessimist. He thinks the bottlenecks will start appearing soon for energy, and they will take decades to resolve. AI growth will thus be gated by real-world constraints.

I actually think before we hit that, you're going to run into energy constraints. I don't think anyone's built a gigawatt single training cluster yet. You run into these things that just end up being slower in the world.

Mark Zuckerberg 00:26:33

I just think that there's all these physical constraints that make that unlikely to happen. I just don't really see that playing out. I think we'll have time to acclimate a bit.

Mark Zuckerberg 00:33:43

Zuck, while learning open source, is not committed to it. He would stop open-sourcing if:
1. If the model is the product,
2. Safety issues

Maybe the model ends up being more of the product itself. I think it's a trickier economic calculation then, whether you open source that.

Mark Zuckerberg 01:06:44

If at some point however there's some qualitative change in what the thing is capable of, and we feel like it's not responsible to open source it, then we won't.

Mark Zuckerberg 00:37:21

Believes they will be able to move from Nvidia GPUs to custom silicon soon.

When we were able to move that to our own silicon, we're now able to use the more expensive NVIDIA GPUs only for training. At some point we will hopefully have silicon ourselves that we can be using for at first training some of the simpler things, then eventually training these really large models.

Mark Zuckerberg 01:14:19

The models have gotten much more intelligent, while not getting much bigger. This is a measure of compression, of how much additional intelligence has been crammed into the same set of numbers, and can be obtained by expending the same energy cost at inference

The smallest Llama-3 is basically as powerful as the biggest Llama-2

Mark Zuckerberg 00:03:45

Meta over-purchased GPUs in 2022 because they were building a recommender for out-of-friend-network Instagram short videos, copying TikTok’s For You page. Zuck doubled down his bets just in case.

hey, we have to make sure that we're never in this situation again. So let's order enough GPUs to do what we need to do on Reels and ranking content and feed. But let's also double that.

Mark Zuckerberg 00:04:22

The future is parasocial. He envisions creators running bots to interact with their fans.

If you could create something where that creator can basically own the AI, train it in the way they want, and engage their community, I think that's going to be super powerful.

Mark Zuckerberg 00:15:32

Zuck believes cleaning up the data for pre-training will be the bulk of the work going forward. This ties in with the news that Llama3 was trained on data that was cleaned up by Llama2. In effect, one generation of intelligence builds another

in the future, more of what we call training for these big models is actually more along the lines of inference generating synthetic data to then go feed into the model

Mark Zuckerberg 00:31:03

To ensure Llama 3 is trained on data of the highest quality, we developed a series of data-filtering pipelines. These pipelines include using heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality. We found that previous generations of Llama are surprisingly good at identifying high-quality data, hence we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.

Llama 3 Blog

Zuck believes intelligence is separable from consciousness.

The current incarnation of all this stuff feels like it's going in a direction where intelligence can be pretty separated from consciousness, agency, and things like that, which I think just makes it a super valuable tool.

Mark Zuckerberg 00:35:48

Zuck draws a distinction between adversarial and non-adversarial systems. For non-adversarial systems, the AI will get better over time as the long tail of exceptions is nailed down. For adversarial systems, it will continue to be a never-ending game of cat and mouse.

Hate speech is not super adversarial in the sense that people aren't getting better at being racist. That's one where I think the AIs are generally getting way more sophisticated faster than people are at those issues.

Mark Zuckerberg 00:47:46

Nation states trying to interfere in elections. That's an example where they absolutely have cutting edge technology and absolutely get better each year. So we block some technique, they learn what we did and come at us with a different technique. It's not like a person trying to say mean things, They have a goal. They're sophisticated. They have a lot of technology.

Mark Zuckerberg 00:47:46

Conclusion

Overall, I was surprised by how negative the interview was.

A) Energy

Zuck is pessimistic about the real-world growth necessary to support the increase in compute. Meanwhile, the raw compute per unit of energy has doubled every 2-3 years for the last decade. Jensen is also aware of this, and it beggars belief that he does not think of paths forward where he has to continue this ramp.

Summarizing CPU and GPU Design Trends with Product Data

2x every 3 years, 10x every 10 years (EpochAI)

Meanwhile, at x.ai

At xAI, we have made maximizing useful compute per watt the key focus of our efforts. Over the past few months, our infrastructure has enabled us to minimize downtime and maintain a high Model Flop Utilization (MFU) even in the presence of unreliable hardware.

X.ai

So energy efficiency, algorithmic and otherwise, are obvious areas where firms will be focused on. Zuck, meanwhile is planning to move Nvidia chips soon, basically believing that the returns to having a share of the most advanced compute cluster are diminishing.

Jensen’s presentation at Nvidia GTC last month revealed a 4x improvement in training energy efficiency over a 2-year period on a (wink, wink) GPT-4 class model.

NVIDIA GTC 2024 Keynote

So there you debunk the first Zuck excuse: the inability of energy production to keep pace with compute demands. Just get more efficient, bro. That’s what everyone else is doing.

B) AGI Negative

Zuck fundamentally does not believe the model—the AI itself—will be the product. In his view, it is the context, the network graph of friendships per user, the moderation, the memory, and the infrastructure that is the product. This belief also allows him to freely release open-source models because he has all of the rest of the pieces of user-facing scaffolding already done.

Actual AGI, where a small model learns and accompanies the user for long periods while maintaining its own state, with a constitution of what it can or cannot do rather than frequent updates from a central server, would be detrimental to Meta’s business and would cause a re-evaluation of what they are doing. We are far away from this ideal, though the design trajectory is clear.

Notably, the OpenAI view has also evolved from just providing the API in the GPT3 days, to ChatGPT as a product, and now the GPT Store, and the Assistants API. The original view that all you need is a smart generalist agent has faded, and OpenAI has received criticism from some quarters for drifting away from the original mission in search of revenues.

I cannot tell to what extent the siloed top-secret nature of OpenAI’s work means that the product team is essentially running on much the same capability guesses as the rest of us. I actually suspect Greg and Sam make product choices with a view to future capability that the rest of the product organization does not know. I also sense that Satya knows.

We will see how well-founded my suspicions are.

C) Summary

Zuck has essentially settled into the trap of believing in incrementalism. He’s advised by the smartest people in the world. He’s technically competent. But he does not believe in states of the world where a 100x improvement from GPT-4 is possible, or that AGI is possible within a short timeframe.

But… he’s not raising capital.

The three people who are raising capital: Sam Altman, Elon Musk, and Dario Amodei, are all on record expecting dramatic increases in capability. They could be hyping because they need higher valuations. I don’t know.

They surely are hyping.

Robin Hanson, Assoc. Professor, George Mason University

And so again, we wait for GPT-5. If it is 10-100x as good as GPT-4, the current benchmarks won’t even work (how does one measure 100x as good on an MMLU scale of 1-100 ?).

If the models deliver value and deliver large amounts of value exceeding manifold the capital deployed to develop them, then progress will continue. If not, not.

For me, the most exciting part of all of this is.. 🍿🍿🍿 drama. You get to see who is bluffing, with billions of dollars on the line, in a fairly short timeframe. And I, for the most part, am in it for sheer entertainment value in spectating potentially the greatest game mankind has ever played.

Cheers 🥂

I’d like to congratulate Dwarkesh for an excellent interview… and for some great ad reads as well 😜

Postscript

The great thing about Twitter is instant feedback. So since I posted a teaser to the above on Friday, there has been simply enormous online discussion, pulling in many people in tech and AI. Gathered below are the most interesting threads.

The model is not the product - net positive for AI wrapper startups

If the product is not the LLM itself, and instead all the things you need to make it actually usable by humans, then literally 10,000,000 startup flowers will bloom

YCombinator President Garry Tan

NVIDIA - Friday also happened to be a huge downmove day for NVIDIA and across the entire semiconductor sector. I hit publish at 5.30 pm-ish PST and was aware of the down news prior, which plausibly could have swung my TLDR caption (I usually go with the most viral vibe from the universe). So I was not responsible for Friday…

Is Nvidia down today because a 8B model is almost on par with a 70B model or did something else happen today?
— Dhaval Shroff (@dhaval_shroff)
12:29 AM • Apr 20, 2024

After I went out, opinion started to solidify. So Monday, if it is a bloody… well…

Okay so if we are going to have an ugly over reaction this is about to be a fucking awesome period of time for semi stocks.
I’ve been so bored because of the factor and semi thing, it’s time to have some fun ;)
— Fabricated Knowledge (@_fabknowledge_)
4:37 PM • Apr 20, 2024

NVIDIA folks did push back

AI winter? No. Even if GPT-5 plateaus. Robotics hasn’t even started to scale yet.

Embodied intelligence in the physical world will be a powerhouse for economic value. Friendly reminder to everyone that LLM is not all of AI. It is just one piece of a bigger puzzle.

DrJimFan, Nvidia Senior Research Manager

AI Winter - There was plenty of pushback on the “AI Winter“ characterization and whether the interview as a whole was pessimistic. On a personal basis, I’m a believer in Kurzweil, but I’m also practical, and I’ve been trying to connect AGI 2029 and Singularity 2045 with the concrete steps that need to happen to get there.
Zuck’s view that even energy buildout would take a couple of decades is essentially pessimistic. In fact, to go further, it is a Malthusian Limits to Growth view that ignores potential innovations to get around constraints.. for example, like x.ai redefining compute to “useful compute“, thereby presumably reducing wasteful compute cycles (of which there are many)

It surprises me that someone watches this podcast & concludes that Zuck was being pessimistic. Zuck never said an AI winter is here. I watched the full thing and left with a completely different impression, didn't seem pessimistic at all and these quotes are very out of context.

@ldjconfirmed

Energy - Broad acknowledgment of the issues around energy consumption, and thoughts and comments about ways around it.

This was my position before Zuck said it.

Changed my predictions when I learned that there is a point where the physical constraints require Manhattan Project level buy in from national governments for progress to continue.

Tanner Greer, ScholarStage

Noah looking on the bright side; AI won’t consume all the energy, leaving nothing for us.

I've seen some folks like @8teAPi post this graph as evidence of energy constraints on AI.

And in one sense, it is. If our ability to turn watts of power into FLOPS of compute stagnates, that does mean that AI will be constrained by a lack of energy.

But it doesn't just mean that. It means that there's also a constraint on compute itself.

Imagine if our ability to turn energy into compute just kept improving and improving. Since AI improves with more compute, that means AI keeps becoming a more and more valuable use of energy. At some point, AI starts competing for energy use with agriculture, home heating, transportation, etc.

This can lead to the scenario in which AI gets so good that it impoverishes (much of) humanity.

BUT, if there's some limitation on our ability to turn energy into compute, then it won't make sense to keep turning more and more and more energy into AI. Eventually, the economic value you get from adding more compute isn't worth the energy cost. At that point, a stable equilibrium is reached, in which energy is left over for human uses.

In other words, if this graph flatlines, I don't think we have to worry about AI turning humans into horses.

Noah Smith (@noahpinion) Bloomberg columnist

Ways to overcome the bottleneck:

Starting to feel the top of an s-curve for the current tech tree.

We're working on the next few S-curves, let us cook.

@BasedBeffJezos, Founder of an energy-efficient computation chip startup

Disgruntlement - Those who accepted my characterization of Zuck’s essential views, but thought he was wrong (I myself am in this category)

I love when people who started thinking about AI around 1.5 years ago come onto the scene and say stuff like yeah it’s all incremental progress from here we hit the top
— roon (@tszzl)
8:28 PM • Apr 20, 2024

In essence, I think Zuck, amazingly, has just missed the progress in the field. His distance from the rank-and-file machine learning engineering teams is showing. His focus on software engineering vs training models is evident. I foresee a future where Meta experiences what I call a difficulty collapse. When the moat which surrounds your company with: engagement becomes exceedingly easy to surpass due to a sudden change in technological capability.

📣 Announcement

From this week on, Emergent Behavior will move to a two- to three-times-a-week schedule, as I try to focus more on in-depth writing on truly interesting issues in AI.

Share this story

🌠 Enjoying this edition of Emergent Behavior? Send this web link with a friend to help spread the word of technological progress and positive AI to the world!

Or send them the below subscription link:

🖼️ AI Artwork Of The Day

Countries as anime characters - u/prompted_ from r/midjourney

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.