00:00the most commonly used chips today are
00:03AI accelerators who would have thought
00:05that my gaming PC at my Bitcoin miner
00:07would eventually become a good AI
00:09engineer how do you see this industry
00:11moving forward that's a great question
00:13Moore's Law is actually still as of
00:15today Alive and Kicking right power is
00:17becoming an issue heat is becoming an
00:19issue and we need to rely more and more
00:21on Parallel processing
00:27in 2011 Mark Andreessen said software is
00:33and the decade that followed just
00:35solidified this notion but software
00:37infiltrating nearly every aspect of Our
00:41the last year in particular introduced a
00:43new wave of generative AI with some apps
00:46becoming some of the most swiftly
00:48adopted software products of all time
00:51and just like all the other software
00:53that came before it AI software is
00:56fundamentally underpinned by the
00:58hardware that runs the underlying
01:00computation so if software is becoming
01:03more important than ever then Hardware
01:05is following suit plus the world is
01:08constantly generating more data and
01:11unlocking the full potential of these
01:12Technologies from longer contacts
01:15Windows to multi-modality means a
01:17constant need for faster and more
01:20resilient hardware and it's equally
01:22important for us to understand who
01:24builds and controls the supply of this
01:26resource especially since many of even
01:28the most established AI companies are
01:31now Hardware constrained with some
01:33reputable sources indicating that demand
01:35for AI Hardware outstrip Supply by a
01:38factor of 10. that is exactly why we've
01:42created this mini-series on AI Hardware
01:44we'll take you on a journey through
01:46understanding the hardware that has long
01:48powered our computers but is now the
01:50backbone of these AI models absolutely
01:53taking the World by storm and in this
01:55first segment we dive into the
01:57terminology and Technology from GPU to
02:00TPU including what they are how they
02:03work the key players like Nvidia
02:05competing for chip dominance and also we
02:08address the question is Moore's law dead
02:11but make sure to look out for the rest
02:13of our Series where we dive even deeper
02:15covering supply and demand mechanics
02:17including why we can't just print our
02:20way of a shortage how Founders can get
02:22access to inventory whether they should
02:24think about owning or renting where open
02:26source plays a role and of course how
02:29much all of this truly costs and across
02:32all three videos we explore with the
02:34help of e16z special advisor Guido
02:37apenzeller someone who is truly uniquely
02:39Suited for this deep dive as a storied
02:42infrastructure expert I spent my last
02:44couple of years mostly in software but
02:47most recently before joining address
02:49norowitz actually was CTO for Intel's
02:52data center group dealing a lot with
02:53hardware and the low level components so
02:55it's given me so if I think a good
02:57Insight how large data centers work what
02:59the the basic components are that make
03:02all of this I AI boom possible today and
03:04that that really underpin this this
03:06great technological ecosystem keto has
03:09also spent time at ubico VMware big
03:11switch networks and more but let's get
03:14into it as a reminder the content here
03:16is for informational purposes only
03:18should not be taken as legal business
03:20tax or investment advice or be used to
03:23evaluate any investment or security and
03:25is not directed at any investors or
03:26potential investors in any a16z fund
03:29please note that a16z and its Affiliates
03:32may also maintain investments in the
03:34companies discussed in this podcast for
03:36more details including a link to our
03:38investments please see a16c.com
03:51we are increasingly hearing terms like
03:54chips semiconductors servers and compute
03:56but are all of these the same thing and
04:00what role do they play in our AI future
04:02if you're running any kind of AI
04:03algorithm right this AI algorithm runs
04:05on a chip right and the the most
04:08commonly used chips today are AI
04:10accelerators which are in terms of how
04:12they're built actually very close to
04:14Graphics chips right so the cards that
04:16these chips are on that are in these
04:17servers often referred to as gpus which
04:20stands for graphics Processing Unit
04:21which is kind of funny right they're not
04:23doing Graphics obviously but it's a very
04:25similar type of Technology if you look
04:27inside of them they basically are very
04:29good at processing very large number of
04:32math operations per cycle in a very
04:35short period of time so very classically
04:37like an old-fashioned CPU would would
04:39run one instructions you know every
04:40cycle round then they had multiple cores
04:42so maybe you know modern CPU can do can
04:44do a couple of 10 instructions but these
04:47sort of modern AI cards they can do more
04:49than a hundred thousand instructions per
04:51cycle so they're extremely performant so
04:53this is a GPU these gpus run inside of
04:56servers do you think of them as big
04:57boxes you know I have a power plug on
04:59the outside and a networking plugin and
05:01then these server sit in data centers
05:02where you have racks and racks of them
05:04that do the actual compute
05:05let's quickly recap CPU is central
05:08processing unit and GPU is Graphics
05:11processing unit and while both CPUs and
05:14gpus today can both perform parallel
05:16processing the degree of parallelization
05:19is what sets gpus apart for certain
05:22workloads so for example CPUs can
05:24actually do tens or even thousands of
05:26floating Point operations per cycle but
05:29a GPU can now do over a hundred thousand
05:32the basic idea of a GPU is that instead
05:35of just working with individual values
05:37it works with Vector so even mattresses
05:39or tensors more generally the TPU for
05:42example is Google's name for these kind
05:43of chips so either they call them tensor
05:45processing units which is actually a
05:46pretty good good name for them right the
05:47the cores and then and these modern gpus
05:50often called tensor Force they operate
05:52on on tensors and you know basically the
05:55core of their value propositions is they
05:57can do matrix multiplication so if
05:59remember metrics like you know like a
06:01rows and Columns of of numbers they can
06:03for example multiply two mattresses in a
06:05c cycle so in a very very fast operation
06:08and that's really what gives us a speed
06:10that's necessary to run these incredibly
06:12large language and image models that
06:14make generative AI today
06:16today's gpus are far more powerful than
06:19their ancestors whether we're comparing
06:21to the earliest graphics cards in arcade
06:23gaming days 50 years ago or the GeForce
06:26256 the first personal computer GPU and
06:30failed by Nvidia in 1999 but is it
06:33surprising that we're seeing this chip
06:35design applied so readily to this
06:37emerging space of AI or should we expect
06:40a new architecture to evolve and be more
06:42performant in the future in one way I
06:45think it's very surprising right who
06:46would have thought that my gaming PC at
06:47my Bitcoin miner would eventually become
06:49a good AI engineer at the same time
06:52all of what all of these problems have
06:55in common is that you want to
06:58execute many operations in parallel you
07:00can think a GPU was something possible
07:02for graphics but you can think of them
07:03also just as something that's very good
07:05in performing the same operation and a
07:07very large number of parallel inputs
07:09right a very large vector or very large
07:11measures all right so perhaps it's not
07:13so surprising that nvidia's prize gpus
07:15are aligned to this AI wave but they're
07:18also not the only company participating
07:20here is Guido breaking down the hardware
07:23ecosystem the ecosystem comes in many
07:25layers right so let's start with the
07:29and videos King off the hill at the
07:31moment right there a 100 is the
07:32Workhorse that that powers the the
07:34current AI Revolution they're coming up
07:36with a new one now called the h100 you
07:38know which of the Next Generation
07:39there's a couple of other vendors in the
07:41space Intel uh has something called
07:43Gaudi Gaudi 2 right that's uh and as
07:45well as that graphics card with Arc
07:47they're seeing some usage uh AMD has has
07:49a chip in this space and then we have
07:51the large clouds that are starting to
07:53build or in some cases have been
07:55building for some time their own ships
07:56right Google with the TPU you mentioned
07:58before right that is quite popular and
08:00uh Amazon has a chip called trainium for
08:03training and influencia for for
08:04inference and we'll probably see more of
08:06those in the future from some of these
08:07vendors but you know at the moment
08:09Nvidia still has a very very strong
08:11position as the vast majority of of
08:13trainings going on on their chips when
08:15we think about the different trips that
08:16you mentioned like the a100s are the
08:18strongest and maybe there's the most
08:19demand for those but how do they compare
08:22to some of these chips created by other
08:24companies is it like you know double the
08:25performance or is there some other
08:27metric or factor that may make some much
08:30more performant that's a great question
08:32you know if you look at the pure
08:34Hardware statistics so how many floating
08:36Point operations per second can these
08:38chips do there's others that are very
08:40competitive with what an Nvidia has
08:42nvidia's big Advantage is that they have
08:44a very mature software ecosystem so
08:46imagine you are an artificial
08:47intelligence developer or engineer or
08:49researcher you're often
08:51um using a model that's open source you
08:53know somebody else developed and you
08:56know how fast that model runs in many
08:58cases depends on how well is optimized
09:00for a particular chip
09:02and so the big advantage that Nvidia has
09:03today is that their software ecosystems
09:05which is so much more mature right I can
09:07I can grab a model it has all the
09:08necessary optimizations for NVIDIA to
09:10run out of the box right I don't have to
09:12do anything but some of these other
09:14chips I may have to do a lot more of
09:15these optimizations myself right and
09:17that's what giving what gives them the
09:19Strategic advantages
09:20so as we've touched on AI software is
09:23heavily dependent on Hardware but what
09:25Guido is pointing towards here is the
09:27performance of Hardware being heavily
09:29integrated with software so nvidia's
09:31Cuda system makes it easier for
09:33engineers to plug in and make
09:35optimizations like running with lower
09:37Precision numbers here is Guido speaking
09:40to the kind of optimizations that do
09:42exist it happens at all layers of the
09:44stack some of it is coming from Academia
09:46some of it is done by the large
09:48companies that operate in the space
09:50right some of them is frankly by
09:52enthusiasts but just want to see their
09:54model run faster but to give an idea of
09:56how this works like for example you know
09:58typically a floating Point number is
10:00represented in 32 bits right and some
10:02people figured out how to reduce that
10:04216 bit when somebody was like well
10:05actually we can do it in eight bits and
10:07you have to be really careful how you do
10:09they have to normalize to make sure it
10:10doesn't overrun or under run right and
10:13um but if you normalize everything you
10:15can use much much shorter closer
10:16integers for these calculations there's
10:18many tricks like that they're really
10:20good AI developers use to to squeeze
10:23more performance out of the the chips
10:25that they have so to reiterate keto's
10:27Point floating Point numbers are
10:29typically represented in 32-bit it's
10:31that's 32 zeros and ones or binary
10:34digits with the first bit being for sine
10:36the next eight for the exponent and the
10:39next 23 for the fraction this gives a
10:42fairly large range between the smallest
10:44possible value and the largest possible
10:47value but also allows many steps in
10:50between now when many people think of
10:52semiconductors they naturally think of
10:54Morse law that's the term that describes
10:56the phenomenon observed by Gordon Moore
10:59by the way back in 1965 where the number
11:02of transistors in an integrated circuit
11:04doubles every two years but despite our
11:08Collective success for decades to
11:10continue to push more computation onto
11:12smaller chips are we now at the limits
11:15of lithography for example an apple M1
11:18chip from 2022 has 116 billion that's
11:22billing with a B transistors and if we
11:25compare that to the arm one processor
11:27from 1985 that had 25
11:30000 and by the way way the Apple M1 chip
11:32is not even the highest transistor count
11:34today I believe that belongs to the
11:36wafer scale Engine 2 by cerebrus with
11:392.6 trillion transistors so looking
11:43ahead are we at the point where we
11:45really don't see the same kind of
11:47advancement in at least the physical
11:49architecture of chips and if so where do
11:53we see advancements moving forward is it
11:55in the software is it in the
11:57specialization of these chips how do you
11:59see this industry moving forward yeah
12:02great question the so the soothing suit
12:04to to tease apart there like Moore's Law
12:07is actually still as of today Alive and
12:09Kicking right so we're still but Muslim
12:12talks about the density of of
12:14transistors on a chip and we're still
12:15increasing that right now the scale of
12:17transistors going down now I guess it's
12:19exactly the same speed I don't know but
12:20but as of today if you plot the the
12:22curve right it seems to be
12:26there's a second thing called Denard
12:27scaling um which you know used to
12:30basically say just as the number of
12:32transistors I can squeeze uh onto a chip
12:34right doubles every 18 months or so it
12:37essentially meant that the power at the
12:39same time would decrease by the same
12:40factor right it says something about
12:42frequency but let's see the net outcome
12:45and that's for the last 10 15 years or
12:48so no longer it's true if you look at
12:50the frequency of a of a CPU it hasn't
12:53moved much over the past 10 12 15 years
12:56the net result of this is we're getting
12:58chips that have more transistors
13:02um but each individual core doesn't
13:04actually run faster right and what this
13:05means is we have to have lots and lots
13:07more parallel cores and this is why
13:09these tensor operations are so
13:10attractive right I can't add like on a
13:13single core I can't add numbers more
13:15quickly but if I can do a matrix
13:16operation instead right and especially
13:18do many of them in parallel at the same
13:21the second big consequence of that is
13:22that our chips are getting more and more
13:24power hungry if you look at the you know
13:26the even a graphics card for gaming PC
13:29today right you have these these
13:30graphics cards there's like hundreds of
13:32watts of power because of the 500 watt
13:34card right which is much much more than
13:36they than they used to be and that trend
13:38is going to continue and you know we're
13:39seeing what's happening data centers
13:41seeing more more things like liquid
13:43cooling uh at least being experimented
13:45with or in some cases uh you know
13:47getting deployed where basically the
13:49energy densities for these AI chips
13:52right it's getting so high that we need
13:54novel cooling solutions to make them
13:56happen so Moore's Law yes but power
13:58power is becoming an issue heat is
14:00becoming an issue and we need to rely
14:02more and more on Parallel processing so
14:04it sounds like Moore's Law is indeed not
14:06quite dead but perhaps a little more
14:08complex than it once was performance
14:11increases continue as we integrate
14:12parallel cores but we're also seeing
14:14chips become a lot more power hungry all
14:17of this will continue being dynamic as
14:19demand continues to outpace supply for
14:22high performance chips so as we look
14:24ahead what does all this mean for
14:26competition and cost you'll learn a lot
14:29more about that in the rest of our AI
14:31Hardware series tackling the questions
14:33that everybody is asking including we
14:36currently don't have as many AI chips or
14:39servers as we'd like to have how do you
14:41think about the relationship between
14:43compute capital and then the technology
14:46that we have today yeah that's uh that's
14:48the million dollar question or maybe
14:50trillion dollar question I don't know
14:53thank you so much for listening to the
14:55a16c podcast what we're trying to do
14:58here is provide an informed clear-eyed
15:00but also optimistic take on technology
15:03and its future and we're trying to do
15:05that by featuring some of the most
15:07inspiring people and the things that
15:09they're building so if that is
15:11interesting to you and you'd like to
15:13join us on this journey go ahead and
15:15click subscribe and make sure to let us
15:17know in the comments below what you'd
15:18like to see us cover next
15:20thank you so much for listening and
15:22we'll see you next time