NCI: Nvidia and GPUs: Are Datacenters the new compute unit?

Guests:
Ram Ahluwalia & Doug O'Laughlin
Date:
05/07/24

Thank you for Listening this Episode!

Support our podcast by spreading the word to new listeners. We deeply appreciate your support!

Episode Description

In this episode, we delve into the semiconductor and AI industries with Doug O'Laughlin, a renowned expert and the mastermind behind the Fabricated Knowledge newsletter. He shares his profound insights with Non-Consensus Investing host Ram Ahluwalia, providing a unique perspective on the industry. The discussions cover the challenges and opportunities in the semiconductor space post-Moore's Law, the strategic investment approaches towards picks and shovels plays, and the anticipated trends within the PC and automotive sectors. They also explore the semiconductor super cycle, the significance of High Bandwidth Memory for AI and data centers, and the roles of key players such as Micron, SK Hynix, and Samsung. The conversations broaden to cover investment strategies, the importance of diversification, and the emerging technologies that could reshape the future tech landscape, including small modular reactors and EDA software, framing a comprehensive view of the evolving semiconductor market and its broader implications.

Episode Transcript

Speaker 1 [00:00:00] All right. Welcome. I am pleased to host Doug O'Laughlin from Fabricated Knowledge, one of my favorite, Substack, columns discussing all things semiconductors. And I so welcome to the next episode of Non consensus Investing. I'm Ram Ahluwalia, your host and chief investment officer at Lumina Wealth, where we specialize in the craft of investing in alternatives. So in this podcast, we're going to draw back the curtain to reveal the strategies employed by the best in the business, so that you can also invest beyond the ordinary. Doug is the mastermind behind Fabricated Knowledge at the top rated tech newsletter. He decodes the semiconductor industry. We've been very much focused on semiconductors because we believe it's one of the best ways to play the rise of AI. You've got a bunch of players that the application layer, whether it's meta, Google once inflection. Now Microsoft now am I a ma I-1 and others spending on compute and that flows down to semiconductors. We love picks and shovels plays. So Doug, after his career at a Texas firm, Bowie Capital, where he's focused on long term investing. He had an interest in semiconductors. Especially the changing landscape with the end of Moore's Law. That inspired him to launch fabricated knowledge to dive deeper. The rest is history. So, Doug, thank you so much for joining us today. 

Speaker 2 [00:01:35] I'm happy to be here. Happy to talk about semiconductors, especially in the context of investing. And, and I, you know, I think that that's really where I shine in terms of making it make a lot of sense simplistically, because, you know, there's one thing, to understand all the technical details, but it's another thing to understand, like, hey, this is really the one KPI for business that matters. Like, this is how it becomes, you know, this is how becomes a business model. There's a lot of places we can start, but honestly, I want to start maybe with one of my first posts ever that truly got me into this. I was working at Bui Capital learning about semiconductors on my own, really pushing in the whole thing, and I kind of really hit Moore's Law ending, being this giant insight that I thought was, you know, inevitable. And when that was happening, like, let's just make this extremely simple. I thought, you know, and then at the same time, I also thought I was going to be a big deal. And so let's think about it this way. You know, it's really just price divided by quantity. Okay. Like or rather like supply demand. Okay. The demand was growing exponentially, and, supply was going to start slowing because of the end of Moore's Law. What does that imply for the, you know, the price of semiconductors, upwards. And, you know, maybe it isn't price per se. You know, that's like a very high simplification. But I do think it's true. And what we've seen so far is that semiconductors has had, a whole new life after being a very mature industry for over 40 years, all of a sudden has become, you know, a whole new breath of fresh air. And that's mostly driven by AI. So, just starting there that that's like the big top down level way to think about semiconductors and AI and being this giant peak in trouble when Moore's Law has ended and becoming the strategic, important, piece in order to enable all the the future of AI. 

Speaker 1 [00:03:21] So and now we have a new kind of law. Maybe it's Jensen's law and it's even faster than Moore's Law. Do you want to double click that? 

Speaker 2 [00:03:30] Yeah, sure. More than happy to. I don't even know. Yeah, I don't know what to quite call it, but essentially, if you think about it, we've been trying to scale out these systems bigger and bigger and bigger. And one of the ways we have been doing that is by, you know, this is in the beginning, there's something called parallel computing. Okay. This was the original thesis that Jensen had was that essentially all of our workloads, the way that they're done on CPUs, is like a single track. But what if we could split the task and make it into many, many different tracks all at once? And that would be a lot faster way to speed up any type of big workload. This is what Jensen means by accelerated compute. And so now what we've done is we've used GPUs, which are highly parallel, and then we're trying to scale out as many GPUs in, you know, as many parallel threads at once in order to do these huge, infinitely large problems. One of the things that's been really interesting is we're starting to see that happen outside of just the chip. In the past, it was a pretty simple it was a pretty simple, solution on making a better, faster chip. And that's by shrinking it, right? If you shrink it, the electrons get pushed back and forth. Not as far. So it takes less energy and it's faster. But the problem is we started to run into, into, laws like eight, asymptotes of scaling. So what we're starting to do is we're starting to scale that outside of just the chip. And I think Nvidia's presentation at GTC really showed system level scaling in a way we've never seen before, and especially, intelligent levels of design scaling that are much larger than just a single chip. 

Speaker 3 [00:05:04] So great. 

Speaker 2 [00:05:05] That's been, that's been like this huge new new vector and been really exciting and awesome. So it's kind of follow along. We can we can dive into the specifics of the GPT 200, but also want to give you an opportunity to talk about whatever you want to. 

Speaker 1 [00:05:18] So yeah, no, let's get to the Nvidia. But yeah, I think to to put a bow on it. Look, we're seeing GPU compute productivity versus cost curves improving as a at a faster rate than Moore's law. 

Speaker 3 [00:05:32] Yes, we're getting. 

Speaker 1 [00:05:33] More compute on GPU per dollar is declining rapidly. Used to take an incredible sum of money, hundreds of millions of dollars to build an LLM. That cost is dropping remarkably. And I think to your point, compute is a kind of currency. And compute also is intelligent. So that is the new form of value is access to compute and how we organize and make decisions. You had this really beautiful, article shortly after the Nvidia GTC conference. I've got it shared on my screen. I'm going to read out some of the highlights here. And you say, imagine the data center as a giant chip. It is just a scaled out advanced package of the transistors of memory and logic for your problem. Instead of a multi chip SoC. Each GPU board is a tile in your chip. They're connected with copper and together as closely as possible. So there's more performance and speed. So, you know, you go on to say, you know, in the case of data center as a chip, you would try to package everything together as closely as possible for as cheap as possible. You get into hardware based memory, which we'll talk about as well. So can you lay out, you know, your thesis, the data center as a giant chip and what that means for Nvidia and the industry. 

Speaker 2 [00:06:58] Yeah. So what's really great about helping set, sending hunters and technology is, frankly, Jensen has been saying this for years. I feel like I finally hit me across the head, and I really understood it this time. Pretty much, if you think about it, the same problem that happens at the nanometer level is also happening at the data center level. It's all it's all like the identity of performance to power and, you know, and essentially changing information. So if we're going to scale it up in order of magnitude, you know, the first order of magnitude is advance packaging, right? Putting multiple chips together as close as possible. Right. In the beginning, we used to have it just on a piece of PCB, and it'd be pretty far away. They don't really talk to each other as fast. And so we thought, well, why don't we just shove them all closer? And that's like, that's the example of how AMD one client from Intel. But Jensen and Nvidia is taking this one step further by, not just by pushing two chips together by, but also pushing as many chips together into a single rack and then also within that rack, making sure it's connected with copper specifically because copper is extremely cheap, fast, and, pretty much is, you know, doesn't need more power to transmit signals. Whereas if we push the optics, we need a lot more power to transmit a signal. So think about it this way. That's just another level advanced packaging. Now instead of having, you know, two chips together as close as possible, which is advanced packaging, we're having as many racks as we can, as close as possible. And by doing that, we we essentially get the same benefits we did with Moore's Law, meaning less energy. It's cheaper and it's faster. And so, might not be the perfect example of this, but I at the keynote, he talked about how in the past it will take 8000. Lose something like 8000 GPUs. Or maybe 16,000 GPUs to to train a GPT four level model. But with the GPT 200, it would take one fourth the amount of GPUs. And so that's the example, because by pushing all these things together, we're able to make it so much more power and performance that we are having essentially another like another level of scaling that's outside of just the chip. 

Speaker 1 [00:09:09] So yeah, no, I think it's kind of remarkable, like most semiconductor firms sell chip or some kind of equipment like chip equipment testing. Nvidia is selling data centers. That's the way to think about this. They're playing the game at a different level. AMD is trying to sell graphics cards and GPUs. And yes, Nvidia has delivered an integrated system where if you're using the Nvidia ecosystem, you get more out of it. For example, things like InfiniBand, which are the, you know, lightspeed ways to communicate information across chips, even the integration of the cooling. They have the digital twin. You step into the equivalent of a multiverse to see your data center arranged, and it takes time to build data centers. There's a lot of risk in being late and risk of cost overruns. So if you can buy a data center at a click, it adds a lot of value. 

Speaker 2 [00:10:09] Yeah, I think I think there's also even a lot of examples from history that creates, that helps make this much more compelling. So if you think about it, this is actually not anything new. Before, you know, like, let's go back to history and talk about the big waves of compute. Okay. The last wave was obviously the smartphone, the the wave before that was the PC. But you know what? There was a wave before that that people forget quite a bit about. It's called the mainframe and the mainframe. And a lot of ways was the the product that Nvidia is pushing today. It's a highly integrated, solution oriented process that at the time was the absolute best in class. And so in the same way that Nvidia's is, you know, selling its GB 200 or it's actually an opinionated data center, it's kind of pursuing this old, old, old semiconductor concept of deep integration and specialization. And so this is, you know, I'm sure you've heard this like this concept of, you know, everything is just bundling and unbundling and a pendulum between the two. And this is an example where I think we're starting to see Nvidia push a really hard bundling. That is kind of really clever against the whole ecosystem because up until now, if you think about the last paradigm, which was maybe say, let's say CPU's in a data center, right? CPUs in a data center was, the, the vendors selling to the CPU. Sorry. Give me a second. The vendors selling CPUs to the cloud guys were pretty much merchant off the shelf products. And the whole goal here was not to have the most performant, but the most performant versus price. Nvidia has been able to essentially circumvent everyone by pushing their own proprietary standards and, opinionated version of the world. And so maybe, you know, maybe it is. Or maybe it isn't the best, most, you know, the best possible data center we can possibly make. But right now, by being opinionated and pushing all these custom integrations together as quick as possible, we're able to to have the best possible compute today. And meanwhile, everyone else is scrambling in these consortiums where it takes multiple players and they're kind of fighting it out like, oh, are we gonna use our AMD GPU? Was the Broadcom switch and whose fabric are we going to use. Like, you know, are we going to use Yoshi or Ultra Ethernet. And versus Nvidia is like no no no no none of that. We just come to you with the solution. It's already done. We already figured out all the engineering things you don't even have to think about. You don't have to have your vendors to fight against each other to to, you know, to come together in open ecosystem to give you a solution. Nvidia just gives you the solution. 

Speaker 1 [00:12:46] And so I agree it's a it's a fantastic point. Nvidia has a view of the world. They have a thesis on how you execute compute at scale. And the answer is the Nvidia framework and their components how they interact. And it delivers a lower total cost of ownership versus, say the Broadcom view of the world in other really compelling company. And we'll go we'll get to custom silicon. And I love your mainframe metaphor too, because now we also have the rise of edge computing, which people really are not talking about moving compute and AI to say, mobile devices. But, you know, the, the other vision of the world of, say, the Broadcom view of the world, which is more networking driven. Do you want to draw a contrast between those two. 

Speaker 2 [00:13:27] Yeah. Let's talk about Broadcom really quickly. let's think about the bottom. the the bottom okay. So let's before we even began let's talk about network architecture of you know a data center right. Maybe a little heavy topic to begin with. But I think of it as a giant. Like the typical is maybe like a full fat treat. Meaning that at the top there's these switches, and then at the bottom there's leaves. Okay. The leaves are the individual pieces of compute, like you're like a rack of data in a data center. One of the problems is that in making these bigger and bigger models, we're not able to do it in one GPU anymore. We actually need to scale it out over multiple racks. And so there's two visions of the world. One is from the Bottoms Up, which Nvidia is doing, and Nvidia is essentially making concentric circles around their GPU. Maybe starting with Cuda on the, the, the software ecosystem, then NVLink, which is like a coherent domain of copper at the rack, and then InfiniBand, which you know, is fighting against Ethernet. But like I think India has bets in both places. And so they're coming from the bottom of this, of this leaf, of the leaf in the tree. Okay. And they're trying to go up and essentially create bigger and fatter leaves so that you don't need to use the whole tree. And versus the other way is Broadcom. Broadcom is the historical. And and to be clear hock tan is no choke. 

Speaker 1 [00:14:49] he's a CEO of Broadcom. 

Speaker 2 [00:14:51] yeah yeah yeah CEO of Broadcom. Hock tan is a killer. Okay. There are so many anecdotal stories of how you know how much he raises price. And you know everyone always thinks you know Broadcom's moat or rather they're there because they have a century monopoly on switches is going to be broken one day because arctan is, you know, just a very cost driven guy. But the reality is Broadcom is an organization that has done a lot with very, very little. And so the Broadcom view of the world is hey we have the switches. We control the you know the trunk of the sleeve okay. All the top level switches. And then they're starting to sell custom custom compute. They're trying to sell the leaves. And in in this version of the world Broadcom gets to you know sell the leaf and gets to sell sell the spine. But importantly they want to spread out as much as possible. And so there's like this this this kind of balance between literally a top down versus bottom up solution. The top down being Broadcom and the bottom up being Nvidia. So and to be clear Broadcom has an amazing very well articulated strategy in this. But at the same time I still think you know Nvidia Nvidia is is really the company to beat. So I. 

Speaker 1 [00:16:02] I agree although I do think Broadcom is winning position number two in terms of the battle of custom silicon. And then you know race number three I guess AMD. And then there's all the rest like Intel. And obviously the the wafers will come back to all that like the edge compute as well. But yeah. Look I mean I think the, you know, you have this other quote, you say Nvidia as a kingmaker and the three headed Hydra. Nvidia yeah, really is a kingmaker. And they, they're also creating their own demand by funding startups like inflection and Core Weave. And who they delivered GPUs to can influence product releases which drives market cap. Do you want elaborate a little bit more on that? I'm going to share this graphic that you have to. 

Speaker 2 [00:16:50] Yeah. Let's do that because I think that's and, you know, I'm actually going to quote my friend, Dylan Patel at some of the analysis because he was really the first person to talk about, the Amazon aspect particularly. So Nvidia kind of cut off Amazon's, access to, to GPUs. And in the H100 era, they really didn't get much. And this kind of put them in a laggard position. Right. If we're talking about the hyperscalers today in their cloud strategies, Google has the vertically integrated TPU plus, plus Gemini product, right? Microsoft has the big partnership with OpenAI. Meta is pursuing the, the Lama open source version of this and is, you know, has a lot of compute and able to do this. And you know, who the only company of the big hyperscalers who doesn't have their own kind of frontier model that they own. Or. Yeah, well, I guess Apple, but I mean, Apple's a consumer hardware company, and I think they're actually going to be they're going to be folded into the Google. But I think that's like a rumor. But we'll see. I believe they're going to be buying Google compute. But but it's actually Amazon like where's where's Amazon's. You know, where's. 

Speaker 3 [00:17:58] The. 

Speaker 1 [00:17:59] the what's infinium titanium. 

Speaker 2 [00:18:02] Oh the, the Trainium and inferentia products. Those are pretty bad, to be honest with you. I don't think you can train a frontier model on them, let's put it that way. Yeah. 

Speaker 1 [00:18:13] Right. And I think part of it also is this goes into the custom silicon topic. There's a question around, hey, look, these hyperscalers, these big tech companies are spending 100 billion plus each on GPU compute. They know they're spending a lot of money. They know Nvidia is getting 80% profit margin. And they're trying to approach that thoughtfully. So we're seeing players like Google and Meta, which are working with Broadcom develop custom silicon. And there's reportedly a third player rumored to be perhaps Apple will find. Out perhaps from Broadcom reports. 

Speaker 2 [00:18:48] That's I think that's I believe it's the MTI at, at Facebook. 

Speaker 1 [00:18:53] Oh, is that right? Okay. Interesting. So does custom silicon represent a threat to Nvidia GPU compute, or is it that custom silicon is focused on lower end compute activities that are more highly specialized, for example, used by meta for like ad targeting, or use by Google to enhance search as opposed to do high ROI compute.