AI Hardware, Explained.

a16z2023-08-16

14K views|9 months ago

💫 Short Summary

The video explores the importance of hardware in AI technology, focusing on AI accelerators like GPUs and TPUs, with Nvidia leading in AI hardware development. It discusses Moore's Law, parallel processing, and the increasing demand for high-performance chips. Nvidia's software ecosystem provides an advantage, and advancements in semiconductor technology are driving chip specialization. Novel cooling solutions are needed for energy-dense AI chips, and the future of AI hardware faces challenges in compute capital, technology, and server availability.

✨ Highlights

📊 Transcript

✦

Importance of hardware in AI technology.

01:42

AI accelerators and the growing demand for AI hardware are highlighted.

Key players in chip dominance, like Nvidia, are discussed.

The question of whether Moore's Law is dead is addressed.

Introduction to a mini-series on AI hardware, promising in-depth exploration of supply and demand mechanics, inventory access for founders, and cost considerations.

✦

The Importance of Chips and Semiconductors in AI Technology.

03:51

AI algorithms run on chips, particularly AI accelerators that resemble graphics chips.

GPUs, or Graphics Processing Units, are efficient in processing a large number of math operations per cycle.

GPUs excel in parallel processing capabilities compared to CPUs for certain workloads.

TPUs, or tensor processing units, are utilized by Google in AI technology.

✦

Importance of GPUs in running large language and image models in generative AI.

06:14

Nvidia is a leader in AI hardware development, with powerful GPUs capable of quick matrix multiplication.

Other companies such as Intel, AMD, Google, and Amazon are also entering the AI hardware market with their own chips.

Nvidia's A100 chips are dominant in AI training, but competition is increasing with new innovations from different vendors.

✦

Nvidia's software ecosystem provides a strategic advantage with optimized models for easy plug-and-play usage.

09:20

Hardware performance is integrated with software, particularly with Nvidia's Cuda system facilitating lower precision number optimizations.

Various optimization techniques exist at all layers of the stack, from academia to enthusiasts.

Tricks like using shorter integers can boost chip performance for AI developers.

Floating point numbers are typically represented in 32 bits, offering a wide range of values with multiple steps.

✦

Discussion on Moore's Law, Denard scaling, and advancements in semiconductor technology.

10:54

Increasing transistor count on chips has not significantly improved CPU frequency.

More parallel cores and tensor operations are needed for enhanced processing capabilities.

Chips are becoming more power-hungry, particularly in applications like gaming PCs.

Future directions for semiconductor technology may include software advancements and chip specialization.

✦

Challenges in AI Hardware Development.

14:14

Novel cooling solutions are required for increasingly power-hungry graphics cards and AI chips due to rising energy densities.

Performance gains are becoming more complex with the growth of parallel processing, despite Moore's Law still being relevant.

High demand for high-performance chips is exceeding supply, affecting competition and cost.

The future of AI hardware is uncertain due to challenges in compute capital, technology, and server availability, with the interplay between these factors being a key concern.

00:00the most commonly used chips today are

00:03AI accelerators who would have thought

00:05that my gaming PC at my Bitcoin miner

00:07would eventually become a good AI

00:09engineer how do you see this industry

00:11moving forward that's a great question

00:13Moore's Law is actually still as of

00:15today Alive and Kicking right power is

00:17becoming an issue heat is becoming an

00:19issue and we need to rely more and more

00:21on Parallel processing

00:27in 2011 Mark Andreessen said software is

00:31eating the world

00:33and the decade that followed just

00:35solidified this notion but software

00:37infiltrating nearly every aspect of Our

00:40Lives

00:41the last year in particular introduced a

00:43new wave of generative AI with some apps

00:46becoming some of the most swiftly

00:48adopted software products of all time

00:51and just like all the other software

00:53that came before it AI software is

00:56fundamentally underpinned by the

00:58hardware that runs the underlying

01:00computation so if software is becoming

01:03more important than ever then Hardware

01:05is following suit plus the world is

01:08constantly generating more data and

01:11unlocking the full potential of these

01:12Technologies from longer contacts

01:15Windows to multi-modality means a

01:17constant need for faster and more

01:20resilient hardware and it's equally

01:22important for us to understand who

01:24builds and controls the supply of this

01:26resource especially since many of even

01:28the most established AI companies are

01:31now Hardware constrained with some

01:33reputable sources indicating that demand

01:35for AI Hardware outstrip Supply by a

01:38factor of 10. that is exactly why we've

01:42created this mini-series on AI Hardware

01:44we'll take you on a journey through

01:46understanding the hardware that has long

01:48powered our computers but is now the

01:50backbone of these AI models absolutely

01:53taking the World by storm and in this

01:55first segment we dive into the

01:57terminology and Technology from GPU to

02:00TPU including what they are how they

02:03work the key players like Nvidia

02:05competing for chip dominance and also we

02:08address the question is Moore's law dead

02:11but make sure to look out for the rest

02:13of our Series where we dive even deeper

02:15covering supply and demand mechanics

02:17including why we can't just print our

02:20way of a shortage how Founders can get

02:22access to inventory whether they should

02:24think about owning or renting where open

02:26source plays a role and of course how

02:29much all of this truly costs and across

02:32all three videos we explore with the

02:34help of e16z special advisor Guido

02:37apenzeller someone who is truly uniquely

02:39Suited for this deep dive as a storied

02:42infrastructure expert I spent my last

02:44couple of years mostly in software but

02:47most recently before joining address

02:49norowitz actually was CTO for Intel's

02:52data center group dealing a lot with

02:53hardware and the low level components so

02:55it's given me so if I think a good

02:57Insight how large data centers work what

02:59the the basic components are that make

03:02all of this I AI boom possible today and

03:04that that really underpin this this

03:06great technological ecosystem keto has

03:09also spent time at ubico VMware big

03:11switch networks and more but let's get

03:14into it as a reminder the content here

03:16is for informational purposes only

03:18should not be taken as legal business

03:20tax or investment advice or be used to

03:23evaluate any investment or security and

03:25is not directed at any investors or

03:26potential investors in any a16z fund

03:29please note that a16z and its Affiliates

03:32may also maintain investments in the

03:34companies discussed in this podcast for

03:36more details including a link to our

03:38investments please see a16c.com

03:40disclosures

03:42[Music]

03:51we are increasingly hearing terms like

03:54chips semiconductors servers and compute

03:56but are all of these the same thing and

04:00what role do they play in our AI future

04:02if you're running any kind of AI

04:03algorithm right this AI algorithm runs

04:05on a chip right and the the most

04:08commonly used chips today are AI

04:10accelerators which are in terms of how

04:12they're built actually very close to

04:14Graphics chips right so the cards that

04:16these chips are on that are in these

04:17servers often referred to as gpus which

04:20stands for graphics Processing Unit

04:21which is kind of funny right they're not

04:23doing Graphics obviously but it's a very

04:25similar type of Technology if you look

04:27inside of them they basically are very

04:29good at processing very large number of

04:32math operations per cycle in a very

04:35short period of time so very classically

04:37like an old-fashioned CPU would would

04:39run one instructions you know every

04:40cycle round then they had multiple cores

04:42so maybe you know modern CPU can do can

04:44do a couple of 10 instructions but these

04:47sort of modern AI cards they can do more

04:49than a hundred thousand instructions per

04:51cycle so they're extremely performant so

04:53this is a GPU these gpus run inside of

04:56servers do you think of them as big

04:57boxes you know I have a power plug on

04:59the outside and a networking plugin and

05:01then these server sit in data centers

05:02where you have racks and racks of them

05:04that do the actual compute

05:05let's quickly recap CPU is central

05:08processing unit and GPU is Graphics

05:11processing unit and while both CPUs and

05:14gpus today can both perform parallel

05:16processing the degree of parallelization

05:19is what sets gpus apart for certain

05:22workloads so for example CPUs can

05:24actually do tens or even thousands of

05:26floating Point operations per cycle but

05:29a GPU can now do over a hundred thousand

05:32the basic idea of a GPU is that instead

05:35of just working with individual values

05:37it works with Vector so even mattresses

05:39or tensors more generally the TPU for

05:42example is Google's name for these kind

05:43of chips so either they call them tensor

05:45processing units which is actually a

05:46pretty good good name for them right the

05:47the cores and then and these modern gpus

05:50often called tensor Force they operate

05:52on on tensors and you know basically the

05:55core of their value propositions is they

05:57can do matrix multiplication so if

05:59remember metrics like you know like a

06:01rows and Columns of of numbers they can

06:03for example multiply two mattresses in a

06:05c cycle so in a very very fast operation

06:08and that's really what gives us a speed

06:10that's necessary to run these incredibly

06:12large language and image models that

06:14make generative AI today

06:16today's gpus are far more powerful than

06:19their ancestors whether we're comparing

06:21to the earliest graphics cards in arcade

06:23gaming days 50 years ago or the GeForce

06:26256 the first personal computer GPU and

06:30failed by Nvidia in 1999 but is it

06:33surprising that we're seeing this chip

06:35design applied so readily to this

06:37emerging space of AI or should we expect

06:40a new architecture to evolve and be more

06:42performant in the future in one way I

06:45think it's very surprising right who

06:46would have thought that my gaming PC at

06:47my Bitcoin miner would eventually become

06:49a good AI engineer at the same time

06:52all of what all of these problems have

06:55in common is that you want to

06:58execute many operations in parallel you

07:00can think a GPU was something possible

07:02for graphics but you can think of them

07:03also just as something that's very good

07:05in performing the same operation and a

07:07very large number of parallel inputs

07:09right a very large vector or very large

07:11measures all right so perhaps it's not

07:13so surprising that nvidia's prize gpus

07:15are aligned to this AI wave but they're

07:18also not the only company participating

07:20here is Guido breaking down the hardware

07:23ecosystem the ecosystem comes in many

07:25layers right so let's start with the

07:27chips at the bottom

07:29and videos King off the hill at the

07:31moment right there a 100 is the

07:32Workhorse that that powers the the

07:34current AI Revolution they're coming up

07:36with a new one now called the h100 you

07:38know which of the Next Generation

07:39there's a couple of other vendors in the

07:41space Intel uh has something called

07:43Gaudi Gaudi 2 right that's uh and as

07:45well as that graphics card with Arc

07:47they're seeing some usage uh AMD has has

07:49a chip in this space and then we have

07:51the large clouds that are starting to

07:53build or in some cases have been

07:55building for some time their own ships

07:56right Google with the TPU you mentioned

07:58before right that is quite popular and

08:00uh Amazon has a chip called trainium for

08:03training and influencia for for

08:04inference and we'll probably see more of

08:06those in the future from some of these

08:07vendors but you know at the moment

08:09Nvidia still has a very very strong

08:11position as the vast majority of of

08:13trainings going on on their chips when

08:15we think about the different trips that

08:16you mentioned like the a100s are the

08:18strongest and maybe there's the most

08:19demand for those but how do they compare

08:22to some of these chips created by other

08:24companies is it like you know double the

08:25performance or is there some other

08:27metric or factor that may make some much

08:30more performant that's a great question

08:32you know if you look at the pure

08:34Hardware statistics so how many floating

08:36Point operations per second can these

08:38chips do there's others that are very

08:40competitive with what an Nvidia has

08:42nvidia's big Advantage is that they have

08:44a very mature software ecosystem so

08:46imagine you are an artificial

08:47intelligence developer or engineer or

08:49researcher you're often

08:51um using a model that's open source you

08:53know somebody else developed and you

08:56know how fast that model runs in many

08:58cases depends on how well is optimized

09:00for a particular chip

09:02and so the big advantage that Nvidia has

09:03today is that their software ecosystems

09:05which is so much more mature right I can

09:07I can grab a model it has all the

09:08necessary optimizations for NVIDIA to

09:10run out of the box right I don't have to

09:12do anything but some of these other

09:14chips I may have to do a lot more of

09:15these optimizations myself right and

09:17that's what giving what gives them the

09:19Strategic advantages

09:20so as we've touched on AI software is

09:23heavily dependent on Hardware but what

09:25Guido is pointing towards here is the

09:27performance of Hardware being heavily

09:29integrated with software so nvidia's

09:31Cuda system makes it easier for

09:33engineers to plug in and make

09:35optimizations like running with lower

09:37Precision numbers here is Guido speaking

09:40to the kind of optimizations that do

09:42exist it happens at all layers of the

09:44stack some of it is coming from Academia

09:46some of it is done by the large

09:48companies that operate in the space

09:50right some of them is frankly by

09:52enthusiasts but just want to see their

09:54model run faster but to give an idea of

09:56how this works like for example you know

09:58typically a floating Point number is

10:00represented in 32 bits right and some

10:02people figured out how to reduce that

10:04216 bit when somebody was like well

10:05actually we can do it in eight bits and

10:07you have to be really careful how you do

10:09they have to normalize to make sure it

10:10doesn't overrun or under run right and

10:13um but if you normalize everything you

10:15can use much much shorter closer

10:16integers for these calculations there's

10:18many tricks like that they're really

10:20good AI developers use to to squeeze

10:23more performance out of the the chips

10:25that they have so to reiterate keto's

10:27Point floating Point numbers are

10:29typically represented in 32-bit it's

10:31that's 32 zeros and ones or binary

10:34digits with the first bit being for sine

10:36the next eight for the exponent and the

10:39next 23 for the fraction this gives a

10:42fairly large range between the smallest

10:44possible value and the largest possible

10:47value but also allows many steps in

10:50between now when many people think of

10:52semiconductors they naturally think of

10:54Morse law that's the term that describes

10:56the phenomenon observed by Gordon Moore

10:59by the way back in 1965 where the number

11:02of transistors in an integrated circuit

11:04doubles every two years but despite our

11:08Collective success for decades to

11:10continue to push more computation onto

11:12smaller chips are we now at the limits

11:15of lithography for example an apple M1

11:18chip from 2022 has 116 billion that's

11:22billing with a B transistors and if we

11:25compare that to the arm one processor

11:27from 1985 that had 25

11:30000 and by the way way the Apple M1 chip

11:32is not even the highest transistor count

11:34today I believe that belongs to the

11:36wafer scale Engine 2 by cerebrus with

11:392.6 trillion transistors so looking

11:43ahead are we at the point where we

11:45really don't see the same kind of

11:47advancement in at least the physical

11:49architecture of chips and if so where do

11:53we see advancements moving forward is it

11:55in the software is it in the

11:57specialization of these chips how do you

11:59see this industry moving forward yeah

12:02great question the so the soothing suit

12:04to to tease apart there like Moore's Law

12:07is actually still as of today Alive and

12:09Kicking right so we're still but Muslim

12:12talks about the density of of

12:14transistors on a chip and we're still

12:15increasing that right now the scale of

12:17transistors going down now I guess it's

12:19exactly the same speed I don't know but

12:20but as of today if you plot the the

12:22curve right it seems to be

12:24um intact

12:26there's a second thing called Denard

12:27scaling um which you know used to

12:30basically say just as the number of

12:32transistors I can squeeze uh onto a chip

12:34right doubles every 18 months or so it

12:37essentially meant that the power at the

12:39same time would decrease by the same

12:40factor right it says something about

12:42frequency but let's see the net outcome

12:44is power

12:45and that's for the last 10 15 years or

12:48so no longer it's true if you look at

12:50the frequency of a of a CPU it hasn't

12:53moved much over the past 10 12 15 years

12:56the net result of this is we're getting

12:58chips that have more transistors

13:02um but each individual core doesn't

13:04actually run faster right and what this

13:05means is we have to have lots and lots

13:07more parallel cores and this is why

13:09these tensor operations are so

13:10attractive right I can't add like on a

13:13single core I can't add numbers more

13:15quickly but if I can do a matrix

13:16operation instead right and especially

13:18do many of them in parallel at the same

13:20time right

13:21the second big consequence of that is

13:22that our chips are getting more and more

13:24power hungry if you look at the you know

13:26the even a graphics card for gaming PC

13:29today right you have these these

13:30graphics cards there's like hundreds of

13:32watts of power because of the 500 watt

13:34card right which is much much more than

13:36they than they used to be and that trend

13:38is going to continue and you know we're

13:39seeing what's happening data centers

13:41seeing more more things like liquid

13:43cooling uh at least being experimented

13:45with or in some cases uh you know

13:47getting deployed where basically the

13:49energy densities for these AI chips

13:52right it's getting so high that we need

13:54novel cooling solutions to make them

13:56happen so Moore's Law yes but power

13:58power is becoming an issue heat is

14:00becoming an issue and we need to rely

14:02more and more on Parallel processing so

14:04it sounds like Moore's Law is indeed not

14:06quite dead but perhaps a little more

14:08complex than it once was performance

14:11increases continue as we integrate

14:12parallel cores but we're also seeing

14:14chips become a lot more power hungry all

14:17of this will continue being dynamic as

14:19demand continues to outpace supply for

14:22high performance chips so as we look

14:24ahead what does all this mean for

14:26competition and cost you'll learn a lot

14:29more about that in the rest of our AI

14:31Hardware series tackling the questions

14:33that everybody is asking including we

14:36currently don't have as many AI chips or

14:39servers as we'd like to have how do you

14:41think about the relationship between

14:43compute capital and then the technology

14:46that we have today yeah that's uh that's

14:48the million dollar question or maybe

14:50trillion dollar question I don't know

14:51we'll see you there

14:53thank you so much for listening to the

14:55a16c podcast what we're trying to do

14:58here is provide an informed clear-eyed

15:00but also optimistic take on technology

15:03and its future and we're trying to do

15:05that by featuring some of the most

15:07inspiring people and the things that

15:09they're building so if that is

15:11interesting to you and you'd like to

15:13join us on this journey go ahead and

15:15click subscribe and make sure to let us

15:17know in the comments below what you'd

15:18like to see us cover next

15:20thank you so much for listening and

15:22we'll see you next time

🎥 Related Videos

a16z Podcast | Things Come Together -- Truths about Tech in Africa

a16z Podcast | The Infrastructure of Total Health

The Robot Lawyer Resistance with Joshua Browder of DoNotPay

a16z Podcast | Bots and Beyond

Design Sprints as a Tool for Organizational Change

a16z Podcast | Valuing Today's Fast-Growing Software Companies

🔥 Recently Summarized Examples

Former Priest REVEALS Jesus' MYSTICAL Lost Years & His Connection to BUDDHA! | Fr. Seán ÓLaoire

Kim Kardashian's Plastic Surgery Reversal: Is She Trying to Rewind Time?

How To Succeed As A NEW & YOUNG Realtor [Deals Every Month + Luxury Listings]

BITCOIN EMERGENCY: NEXT PRICE TARGETS REVEALED!! Bitcoin News Today & Ethereum Price Prediction!

Uncovering Ancient Atlantean Ruins: Exploring Evolutionary Pathways and Psychic Phenomenon

Samsung Technician Knives TV To Void Warranty

View original video