EB-1: SemiAnalysis - Transcript - Part 2

Transcript: SemiAnalysis - Part 2

Transcript: SemiAnalysis - Part 2


[32:32.6]πŸ’Ό Dylan Patel: It wasn't a technological showcase, right? Like people have done more impressive stuff with these models before that, right? But it was like a, oh wow, this changes advertising, right? But then, you know, at the same time, like, I don't know if you watched the Superbowl, but Kanye West did a fricking ad where he recorded it with his iPhone camera like four feet away from his face. So like, you know, is the face of advertising changing? And is it changing so much that like Kanye West had such a successful ad because he did that? So I don't know, right?

[32:37.253]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right. Mm-hmm.

[32:41.839]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[32:50.368]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[32:55.619]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[32:59.883]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah, yeah, indeed, indeed. I mean, it boils down to authenticity, right? Like Kanye, like, you know, he's a genius, right? He understands what he's getting himself into. So I thought that was like, you know, spending $7 million to like, you know, and you know, he had more, I think like, more traffic out of that than, you know, I think anything else he could have done. It was amazing, right? So.
Let me diverge a little bit. So the first big news of this week was probably the $7 trillion number, which kind of like seized everyone's minds for a few days, probably caused Nvidia stock to jump, like I don't know, 10%, 20%. And the reporting on it is very sparse. From what I could gather. And you know.
Oftentimes in these business things, Sam might have said something in a meeting, but what the other guy caught is not what Sam meant. You could be like, oh yeah, I think you could spend $7 trillion on chips. That would be great. And then the other guy is like, Sam Altman wants to spend $7 trillion on chips. So I don't know to what extent there's this kind of like a two-way mutual misunderstanding.
But it seems like Sam was, or at least reported to, he was looking at, and not raising, I don't think he was looking at raising, he was looking at organizing a scheme where OpenAI would be willing to provide purchase guarantees to a bunch of suppliers who would set up factories, the TSMC would run and operate the factories, these Asian and Gulf investors would invest in them.
And he would basically get a lot of chips. And that's basically what, what I, and it was a mixture of debt and equity. And I think, you know, to SFVC spaces, they don't really understand debt that much, but that would have only meant like maybe like, you know, maybe like 7 trillion, you would need a, maybe 80% of it would be debt. There's only like 1.4 trillion of like equity. So, you know, the returns on that and, you know, golf investors don't require.

[35:24.547]πŸ‘©β€πŸŽ€ Ate-A-Pi: you know, 100X returns, they're looking at like, you know, if they can get something like 15% to 18%, they're okay. So I think it's not very well understood because people are like, oh, it's equivalent to like venture investment on seven trillion, it's not, it's equivalent to like, kind of like, you know, solar investment of like, you know, a trillion, right? Like it's 18% returns. So like, what do you think about that? Like, what is your guess on that? Like we're all speculating. So what do you think on that one?

[35:54.036]πŸ’Ό Dylan Patel: I think you kind of put a little bit more intelligence to what the reporting has been. But I kind of want to take a step back first and see what exactly, people are like, what the fuck, $7 trillion, it's so stupid. And then you see so many hot takes about, oh, this doesn't make sense, blah, blah. And it's like, you have to believe what exactly is happening.

[36:17.913]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[36:18.348]πŸ’Ό Dylan Patel: And for context, right? Like you look at like OpenAI's history, of course, they've done other stuff. They did the Dota bot and they've done like Whisperer and Dali and all this sort of stuff, but really like, you know, to me, it seems like GPT-2, right? You know, cost millions to train and you know, GPT-3, you know, at the time, probably tens of millions. Nowadays you could do a GPT-3 run for like, you know, I think less than 1.5 million.
Maybe even less than 1 million for GPT-3, like that many parameters and tokens in the public cloud, like a core weaver or whatever, or a lambda, or a fluid stack, voltage park, et cetera, go on and on, like one of these folks. SF compute, yeah, those are good friends. But you could do that kind of run that many 175 billion parameters and whatever, 300 something billion tokens. You could do that for less than a million dollars probably.

[36:42.058]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[36:45.691]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[36:50.447]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[37:06.219]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm hmm. Yeah.
All right.

[37:11.236]πŸ’Ό Dylan Patel: you know, at the time it probably cost them 10 million, right? And then GPT 3.5 probably cost them in the, you know, 100 billion range, right? And four was definitely around 500 billion, right? And it's like, well, what's five, right? As soon as they had four completed, they had, I don't think they had released it yet before Microsoft made the deal, but it was $10 billion investment, right? So, so we can go right there and say, oh, okay. So GPT five is going to cost $5 billion to trade, right? You know, that's not all compute. There's, there's people, there's

[37:22.523]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[37:29.883]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[37:35.867]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[37:38.28]πŸ’Ό Dylan Patel: Tons of experimentation on architecture and things like that. It's not all just the pre-training, right? But you know, hey, $5 billion, right? OpenEye is gonna drop GPT-5, or they might just show it to some people and not even release it publicly, and they're gonna be able to raise a fuckload more money. Right, like the people are gonna look at it and be like, wow, you know, between this and Sora and whatever else, right, like they come up with Dolly for, you know, whatever they can come up with in the meantime, they're gonna be able to raise, you know, not $10 billion for 49% of the company, but maybe 50 or a hundred billion dollars.

[37:43.796]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[37:51.063]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[38:01.9]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[38:05.975]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[38:06.972]πŸ’Ό Dylan Patel: for a smaller chunk of the company, right? With Microsoft still contributing and keeping their share and then, you know, just like laying out like the potential path, right? In addition to that, people are gonna wanna invest with them. It's like, hey, OpenAI invested in this company, right? Or OpenAI's agreeing to purchase from this company, right? You know, they have a lot of ambitions that the media have reported, right? There's the client device with Johnny Ive and there's the chips. There's the capacity corridors. There's all these sorts of things

[38:15.333]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[38:21.017]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.
All right.
All right.

[38:31.086]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-mm.

[38:35.713]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[38:36.992]πŸ’Ό Dylan Patel: seen, there's the search, I think, I think just this week, someone, you know, I think someone, someone leaked that they're doing a search product. That's not just Bing. It might incorporate some of that, but like kind of probably thinking like perplexity style, I don't know. Better than, you know, current GPD search. I don't know, right. What, what, what exactly that's going to be. But like, there's all these ambitions and, and they're going to, you know, some of them are going to be fully funded by them, right. And their investments and some of them are going to be like partnerships, right. You know, like, hey, you know, we work with this company to do that. Right. And so I think

[38:41.803]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yes. Right.

[38:48.947]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[38:57.716]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[39:03.716]πŸ’Ό Dylan Patel: I think that's going to happen more and more, right? Like, you know, there's the soft bank, you know, on the, on the consumer device, right? Like what exactly happens there? I got no clue. Um, on the semi side, I think there's, there's a bit more clarity about what's going on, uh, but I don't want to say, you know, so much, I'm not saying I know it much, right? Just people have murmured stuff to me. And so I want to be respectful of that. Um, so I don't want to, I don't want to, you know, again, like I said, I don't know much, uh, but, um, a little bit more than was out there, so I don't want to say too much, but like, it just makes sense, right? You go GPD five, 50, a hundred billion dollars, right

[39:08.507]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[39:18.043]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[39:21.211]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[39:32.824]πŸ’Ό Dylan Patel: after they show everyone GPT-5 and agents and stuff, right? Like, and then they showed GPT-6 and it's like, oh my God, this is either AGI or this is like really freaking close in like many, many tasks and then the agent tasks, right? And it's like, yeah, we've already ingested the entirety of video content and Unreal Engine, it understands physics and it understands, you know, oh, we can, you know, next version we can put on a robot, right? Like, and it'll be autonomously functioning, right? GPT-7, all right, yeah, cool, a trillion dollars, right? Like there's a path, maybe there's a couple more steps in the middle, right?

[39:33.967]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[39:44.811]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[39:55.756]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[40:01.68]πŸ’Ό Dylan Patel: But it's very clear to me that they can continue to just YOLO, right? Like just fucking like YOLO everything they have on making the next step, ingesting more data, more larger model size, and all the architectural gains they get as well, right? And on a simplistic term, that's going to be able to enable them to raise more and more money from the world, right? Now, I think there's certain things like, you can't just raise money from the Middle East.

[40:02.02]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[40:14.843]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.
Mm-hmm.

[40:30.904]πŸ’Ό Dylan Patel: willy-nilly and expect the US government, SIFIUS, not to step in, right? So I think there's a lot more here than like, you know, like just YOLO, like there is a lot of hard work, but like OpenAI is just killer at that, right? Like they're gonna execute, but at the same time, like they're an existential threat, right? Like, you know, for at least the next two years, right? Google just has more compute, right? And this is the whole like GPU poor thing, right? It's like, yeah, everyone's GPU poor when you look at Google.

[40:36.891]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[40:56.078]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[40:58.912]πŸ’Ό Dylan Patel: Right? And up until like very recently, it was like, Google, what the hell are you doing with all this? Oh, this compute. And now like, you know, they released Gemini one and then they released 1.5 pro today with a 10 million context length. Right? And it's like, okay, that's, and it's MOE. So it's, it's Gemini two basically, but they, I guess they call it 1.5, uh, for branding purposes. Um, and they're probably going to have a next version. And they're going to have the Gemini 1.5 ultra, and then they're going to have, you know, the next version after that all out this year, probably. Right? So they're fucking chippin.

[41:03.803]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[41:09.531]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[41:23.6]πŸ’Ό Dylan Patel: Right. Um, so we'll see, right. It's, it's, there is a tremendous benefit to like being able to run, you know, everyone's experiments done at a larger scale slash able to do more experiments. Right. And, and, and open AI kind of, you know, in the meantime, they don't have as much, but they're, they're rapidly catching up in terms of what they'll be able to do and throw at the problem on a training standpoint, and maybe that only takes a year or two, but so they have to YOLO and they have to be very concentrated in their bets and, and Hey, they, they have such a good team, they'll be able to pull it off, probably.

[41:24.923]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[41:42.616]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[41:52.539]πŸ‘©β€πŸŽ€ Ate-A-Pi: So let me give you this scenario. Let's say you have a researcher who is, and he's sitting in Google, and he has an idea, and he tries it. And as always, you try it at small scale, and he tries it at small scale, and bigger and bigger, and it doesn't work. But he knows he's going to get promoted if it works. So he goes out and he says, look, scaling will work. Scale is all you need.
So this idea will work if you give me the chips, right? If you give me the compute time, this idea will work. Now, how does someone like a senior engineering manager at Google, how do they make those decisions you think? Because I could see, for example, every single researcher in there thinks their idea is gonna work if they had the compute. Like I could see, if I was there and I had like...
10 of like, you know, 10 or 20 guys, and every guy is like, hey, you know, if you gave me the compute, this is gonna work. And how do you make those, how do you think they can make those decisions? Or is it like, if I don't give it to the guy, he's gonna leave, right? Like, and that's really why.

[43:06.304]πŸ’Ό Dylan Patel: I think it's probably a bit of both, right? It is hard to allocate resources and figure out stuff like that, right? Because I think a lot of people have left, right? I mean, and you go back to even like, all the way back to like, Noam Shazir wanted to do some badass shit, right? And he figured out everything with LLMs and he wrote Mina Eats the World, internal memo, and...

[43:32.943]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[43:33.508]πŸ’Ό Dylan Patel: you know, he, I guess, I guess he just felt like no one cared. And so he went out and built a character instead, and which was very different from like ML research, but it was still, um, you know, a very, um, interesting sort of, uh, you know, path to go down, right. And at the same time, like this is constantly happening where so many, you know, Google people are, are leaving, right. Um, and Laurent, you know, just left recently from Google and he's great. And.

[43:41.259]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[43:51.566]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[44:00.832]πŸ’Ό Dylan Patel: And maybe he felt the same way. I don't know him personally. I just know he left and doing new company, right? Mistral, some folks like that, right? That was meta plus there, but there's a lot of people leaving Google, but at the same time, there's tremendous talent there, right? But how do they allocate resources? That's a wild bet. There's clues to this on Twitter, of people talking about this, murmurs about like...

[44:20.973]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[44:28.08]πŸ’Ό Dylan Patel: this and that, but it's like, you know, there's been people who just like, yeah, Sam's great because he just know, he just has like this intuition on where to allocate compute, right? Like, or like, you know, this other person is great because they just have a intuition on where to allocate compute or, you know, vice versa. It's like, I've seen someone be like, yeah, Sam promised compute here and Sam promised compute there. It's just not enough to go around. And so this person left, right? Like I've seen the same happen to OpenAI too, not just like Google and not necessarily just Sam. I used him as a placeholder because there's a lot of other people there who make those sorts of decisions, but it's like, I've seen stuff like that.

[44:37.332]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[44:51.416]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right, right.

[44:58.116]πŸ’Ό Dylan Patel: you know, whether it's talking to people or on Twitter, right? And it's, and then it happens at Google happens at opening. I guess, obviously, you know, it's, it's going to happen at Google the most given it is the biggest organization. Um, but like, you know, at the end of the day, like that's, there's always going to be corporate politics, right? But yeah, and scale is important, but like scaling laws are more important, right? Like, you know, why, why is, why is, uh, you know, whatever architecture change I make, it has to be better at, at every point along the curve.

[45:20.6]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[45:28.016]πŸ’Ό Dylan Patel: or at least better at far out on the curve. And so that's the other problem, is I don't think people like, I don't think scale is all you need. It's scale with a better architecture. Transformers alone will scale like crazy, but if you try and do what Google did, a 10 million sequence length, that's N squared on the attention. You cannot do that. That is not physically possible because that would balloon your memory requirements for your KV cache to terabytes. Not possible to do that.

[45:31.413]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[45:57.434]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[45:58.356]πŸ’Ό Dylan Patel: So Google did something intelligent there. They did something architecturally, which scales. And then they applied scale to it. So now we're going to see that scale up to 10 million sequence length, or we're going to see some evolution of that 100 million, a billion. How long can the sequence length get without exploding your memory size? What's going to happen? But it's not just scale. It has to be a relevant architecture. Scale can band-aid over a lot of things, which is

[46:04.012]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[46:12.73]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[46:19.579]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[46:26.096]πŸ’Ό Dylan Patel: probably why Gemini 1 was a dense model, but now Gemini 2 has got all these cool things, right? Or 1.5, right? Has the 10 million sequence length, it's MOE, and all these other things that they're probably doing that are cool on the backend that we just don't know about.

[46:37.531]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yep.

[46:41.995]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right, right. So going back a second here, like, you know, I'm diverting again, but what do you think is the likelihood that we're going to see a pretty, like,
vibrant and competition for NVIDIA's business? Like, is AMD gonna get AMD funded, an open source kind of CUDA, and I think they funded one, I don't know whether the library's been released. Or what do you think they're gonna see, significant competition to NVIDIA's position?
From all these like grocs or a brass and you know, these other guys or do you think you know, it's You know, it's not gonna happen



[48:29.727]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh, there you are. Yep, yep, that's good. Yeah. So as I said, what do you think? You have Grok's cerebrus. What do you think the competition to, is there going to be competition to NVIDIA's market position?


[48:43.705]πŸ’Ό Dylan Patel: Yeah. So, I mean, Nvidia is sitting on the throne, right? So, they're obviously having a ton of competition try and come up, right? And obviously, there's AI hardware startups, there's established players in the semiconductor industry, and then there's the hyperscalers, right? And so, kind of taking it through from each of those, right? On the startup side, right? Of course, there's startups like Grok and Cerebris, which have found moderate success. I will say moderate because they're still in business.

[49:08.741]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[49:11.173]πŸ’Ό Dylan Patel: And I foresee them continuing to remain in business, at least for the next year or two, minimum. Whereas there's other firms like Graphcore and Samanova and so on and so forth that are going to fail. They're just not doing well. They're not selling anything. And so I just expect those companies to fail. Maybe I'm wrong, but I think that's the case. And those are competition, but I don't think those are the ones that Nvidia is like sweating

[49:12.824]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh
Brutal. Yeah.

[49:40.513]πŸ’Ό Dylan Patel: their mind off of on, right? Of course, there's new AI hardware startups, right? Like, Maddox, Google folks, right? Leaving, you know, Positron, Etched, all have interesting ideas, but we'll see. Those are kind of further out, if you will, just because they're newer. Maybe a year old, a little bit over a year old. But then there's the next step, which is like, you know, hey, there's the existing players in the semi-interact industry, right?

[49:42.395]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.


[50:10.809]πŸ’Ό Dylan Patel: They're mostly picking backing off of PyTorch and Triton, but of course they have some of their own kernels and stuff like that. Software is fine. It's not the best thing there, but it's not horrendous for MI300 as it used to be in the past. And they're picking up business between Meta and Microsoft and Oracle and a few other players. There's something like $5 to $6 billion of purchases of MI300 this year. So that's not insignificant.

[50:25.54]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.


[50:40.293]πŸ’Ό Dylan Patel: Nvidia is going to sell $100 billion. So, it's kind of like, what's the scale? More than $100 billion, actually, but this year. And then you have the next step, which is the hyperscalers. And so I would group, Google has got their own chip, obviously the most developed TPU, spending $10 billion on it this year. And they work with established semiconductor companies like Broadcom.

[50:43.208]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh, wow.


[51:07.757]πŸ’Ό Dylan Patel: Amazon, they have their Inferentia and Tranium much further behind, but they're catching up and the new one looks a lot better. That is also, they're working with Alchip and Marvell, established semiconductor players. Then there's Microsoft and they have their chip and Meta. They have their chip and is working with Broadcom as well, established semiconductor players. Those chips are generally a bit...

[51:23.663]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah. All right.

[51:35.433]πŸ’Ό Dylan Patel: behind except for in the case of the TPU where there's an argument there. Um, but you know, there's always, you know, the argument of like, Hey, the chip is good, but it's like, well, I don't get to use any of Google's fancy tools, so why would I ever use it externally? Right. The only people that use TPUs externally seems to be, is people who span out of Google and started a startup, right? Character mid journey, you know, uh, you know, some of these, um, you know, various places, there's very few people who did not spin out of Google and use the TPU still, um, externally.

[51:39.471]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[51:47.62]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[51:52.479]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[52:00.956]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[52:01.737]πŸ’Ό Dylan Patel: But even then, there's some folks, like Stability had some TPUs. I don't think they have a ton anymore, but they had some TPUs. There's some other folks that use TPUs, like Apple rented some. But it's not quite challenging in video yet until they figure out how to release deep minds, like software tooling.
Why can't I just use, you know, why can't I just point my model file, you know, my, and, and in my weights and have the inference be hyper efficient, right? Um, no, that doesn't, that doesn't, that's not how it works at, uh, you know, Google deep mind either, right? There's specific people who like optimize the fuck out of it, but the starting state through the compiler and all that is actually quite good and the load balancing and all this sort of stuff that it does on the back end, there's a lot of software on the training side that is also like just taken care of. And like the line researcher doesn't have to worry about it all. And like none of those tools are released externally.

[52:33.241]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[52:41.047]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[52:52.539]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[52:53.653]πŸ’Ό Dylan Patel: So until that stuff happens, why would anyone use it? And Nvidia doesn't have those kind of tools always either, but there's companies out there making tooling or helping with that. Together we'll help you train a model. Databricks will help you train a model. And you go down the list. These companies will help you train a model or they'll help you fine tune or you go on and on and on. An existing open source model. And Google doesn't have any of that. And then Amazon and Microsoft and Meta certainly don't. That doesn't necessarily mean those chips won't be relevant. Their internal use cases can be large enough.

[53:03.063]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[53:11.32]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yep.

[53:21.525]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[53:23.933]πŸ’Ό Dylan Patel: or internal plus like, hey, Anthropic is like really closely tied, right? Like with Amazon now, you know, and they also use TPUs, but you know, with Amazon, especially with Tranium and such, with regards to... There's also like the sort of the next line, which is like, OpenAI and their chips, right? Like I would kind of classify that as on the hyperscale side, right? Because they have a captive customer in themselves, right? If the chip is good enough, whereas like, you know, the startups, even though

[53:24.568]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[53:29.878]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[53:37.615]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh.

[53:49.691]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[53:52.421]πŸ’Ό Dylan Patel: in a sense, right? They kind of at least have a customer and they have money, right? Whereas those startups have to fight for their lives to get there. So I think that's the landscape. And as far as like, is there a competitor? I left out Intel too, but I don't think they're going to be too successful. But as far as the landscape, like Nvidia definitely remains on top, but they can't just sit idle because there's so many people gunning for them, which is why they've done something that's unprecedented, right? They're moving to releasing a new GPU every year.

[53:57.313]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.
Right.

[54:07.041]πŸ‘©β€πŸŽ€ Ate-A-Pi: Hahaha.

[54:14.522]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[54:19.745]πŸ’Ό Dylan Patel: Historically, they did not do that. They did it every two years, right? Even when you look at the most successful semiconductor franchises in the history, right? They were more like 16 to 18 months, right? Release schedules, right? Like Intel, when they were on the top of their game was still like 18 months, right? AMD, the early 2000s when they were top of their game, still was 18 months, right? Like it wasn't annual, right? And AMD on CPUs is 18 months, right? Like, you look back to IBM when they were killing it, it was like 18 months for mainframes, right? It was like...

[54:20.324]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[54:26.68]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[54:48.217]πŸ’Ό Dylan Patel: So Nvidia is trying to accelerate that to annual releases of brand new architectures is pretty crazy, right? Like ampere to hopper to blackwell level jumps, but on an annual basis. So we'll see if they're successful, but that's, you know, they recognize that there's this massive competition and they have to stay ahead, right?

[54:59.791]πŸ‘©β€πŸŽ€ Ate-A-Pi: So the next one will be the H200, right?

[55:06.223]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right. So the next one, the next jump is the H200, right? From the H100. Is that the next jump?

[55:12.993]πŸ’Ό Dylan Patel: H200 is still the same chip. It's just faster memory and more memory. It's still the same chip though, right? And architecture and you know, the next jump, sorry, H200 is coming out. But then the next jump really is B100, which I suspect will be announced next month at GTC, Nvidia's conference. And so I think everyone will want to watch that because that'll be so much fun. But that's the next Blackwell, right? And then after that is a.

[55:17.303]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right. I see.

[55:28.567]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right, right, right.

[55:38.349]πŸ’Ό Dylan Patel: is they call it X100 externally, but I think it's called R100 internally. I don't know. Ruby, they kind of name every generation after a new scientist or a scientist. So this one's Ruby, I think. Not 100% sure. But anyways, they're going to move to an annual release schedule instead of every two years, right? Because Hopper was two years ago. I guess call it three years ago now, but two years ago, right?

[55:49.187]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[56:03.045]πŸ’Ό Dylan Patel: And Ampere was two years before that and V100, Volta was two years before that, and Pascal P100 was two years before that, kind of. So they're kind of trying to move to a one year schedule and that's gonna be exciting to see. And that would maybe make it so they can defend their moat, right? But otherwise, yeah, absolutely. Google's only gonna buy more percentage TPU than they did GPU. They still buy GPUs, but their percentage is only gonna skew more towards TPU. And same with Amazon and same with Meta and same with...

[56:15.483]πŸ‘©β€πŸŽ€ Ate-A-Pi: So.

[56:30.173]πŸ’Ό Dylan Patel: You know, you got Microsoft, you go down the list and they're of course going to try and buy AMD and so on and so forth, right? Other people are gunning. So the only way Nvidia can keep up, you know, accelerate is if they create these new customers like in all these new clouds like Core Weaves and Voltage Park and Fluidstack and you just go to Lambda Labs and all these sorts of folks, right? These secondary clouds or they create and they create like software modes, right? Around like, you know, they're trying to do really cool stuff in various verticals of software and they're trying in robotics and drug discovery. I don't know how successful they're being quite yet.

[56:33.389]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[56:44.966]πŸ‘©β€πŸŽ€ Ate-A-Pi: Mm-hmm.

[56:56.268]πŸ‘©β€πŸŽ€ Ate-A-Pi: it.

[56:59.873]πŸ’Ό Dylan Patel: to have captive markets. They're trying to do model training services for companies, right? You know, see how successful that is, I don't know. But, you know, they're trying to go down all these avenues to create modes, but the number one way to create a mode is just still continue to have the best software and hardware. And, you know, the thing about it is Nvidia charges makes so much money per GPU that others can have a worse chip, but it still makes more sense to buy that chip, right?

[57:14.153]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.

[57:19.608]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[57:23.513]πŸ’Ό Dylan Patel: And that's sort of the case where AMD is at, right? Their chip costs twice as much to manufacture, right? MI300 versus H100, a little bit more than twice as much, but they're selling it for less, right? And they're still making money, right? And that's just because that's a testament to how much Nvidia is making.

[57:23.759]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right, all right. All right.

[57:30.113]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[57:34.599]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[57:38.339]πŸ‘©β€πŸŽ€ Ate-A-Pi: So there is a conference that Jensen went to in Taiwan last year where he has this kind of graph of the T-flops increase over the last 10 years. And he's like, it's a million-fold increase that they've done. And he's like, we're going to do another million-fold in the next 10 years. And he has this graph. I have it on my Twitter somewhere. And he's not very clear on, is it T-flops? Is it T-flops per?
per like a dollar or he's not very, very clear on the graph. It's one of those like marketing graphs without like clear axes. But my take on it was basically it's just kind of T-flops increase overall and he's kind of like agnostic about the pricing. But basically a one million fold increase over 10 years is kind of like a Forex increase every year, right? It's a compounding Forex increase every year. And I'm just like wondering.
Is that like a something that people can expect? Is that something that you would say someone would be like, you know what, we're gonna get a per chip kind of like TFLOPs increase like 4x per year for the next 10 years. Is that something that you kind of capacity plan for? Is that something that you kind of think about or do you kind of think about like.

[58:56.145]πŸ’Ό Dylan Patel: I think the pace will be slower than that. I think part of the number that they showed is Jensen Math, which is always promotional. To be fair, fantastic marketer, fantastic technical person, but he knows how to say things to get people to think things.

[59:01.996]πŸ‘©β€πŸŽ€ Ate-A-Pi: I see.

[59:10.267]πŸ‘©β€πŸŽ€ Ate-A-Pi: All right.
All right.

[59:16.329]πŸ’Ό Dylan Patel: So the math he did is sort of like, yes, can video, you know, officially, what is the performance increase from A100 to H100? And they will say, you know, what do they say? 2000 divided by 312, effectively is what they say, which is 6.4x, right? But everyone knows it's like three to 3.5x, right? In performance, right? And so that's sort of like that gate there, right? But I think that you'll see solid performance increases, but you won't see 4x, especially not annually.

[59:16.597]πŸ‘©β€πŸŽ€ Ate-A-Pi: Right.

[59:24.206]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.

[59:32.857]πŸ‘©β€πŸŽ€ Ate-A-Pi: Yeah.
Right.

[59:39.732]πŸ‘©β€πŸŽ€ Ate-A-Pi: I see.

[59:43.435]πŸ‘©β€πŸŽ€ Ate-A-Pi: I see. I see. Right.

[59:44.405]πŸ’Ό Dylan Patel: I don't even think you see 4x every two years, in terms of just chip performance. Now, workload performance can differ significantly from chip performance.

[59:54.999]πŸ‘©β€πŸŽ€ Ate-A-Pi: So another question for you. When people budget for these chips, do you assume it's just going to depreciate 75% per year? Because if the new chip has a 4x performance increase and you're on par, it's the same price or equivalent, do you budget for a 75% depreciation per year? It's worth 1 fourth. If you buy a chip today, is it worth 1 fourth in a year, in 12 months?

[01:00:25.449]πŸ’Ό Dylan Patel: Yeah, but like, you can't do the work you want to do if you don't buy the chip today, right? Like that's the problem, right? If I don't buy the chip today, then other people are gonna race ahead and implement. And like, you know, the feeling that everyone working in AI kind of has is like it's now or never, right? Like it is my time to like make it or I'll just be like ruled by the AI overlords, right? Like kind of is like what a lot of people seem to think in AI, right? You know, I have to do what I'm gonna do otherwise.

[01:00:32.476]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh my gosh. That's a, that's crazy.

[01:00:37.917]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh, that's crazy.

[01:00:42.943]πŸ‘©β€πŸŽ€ Ate-A-Pi: Oh, wow. Wow.

[01:00:51.106]πŸ‘©β€πŸŽ€ Ate-A-Pi: Wow.

Reply

or to participate.