00:00Precision delivery of medicine
00:02entertainment franchise games absolutely
00:05exploding small modul reactors and the
00:07nuclear Renaissance plus AI moving into
00:11very complex workflows now these were
00:14just a few of the major Tech innovations
00:16that Partners at a16z predicted last
00:19year and our partners are back we just
00:21dropped our list of over 40 plus big
00:24ideas for 2024 a compilation of critical
00:27advancements across all our verticals
00:30from Smart energy grids to Crime
00:31detecting computer vision to
00:33democratizing Miracle drugs like gp1s or
00:36even AI moving from blackbox to clear
00:40boox you can find the full list of 40
00:42plus Builder worthy Pursuits at
00:46az.com bigideas 2024 or you can click
00:50the link in our description below but on
00:53Deck today you will hear directly from
00:55one of our partners as we dive even more
00:57deeply into their big idea what what's
01:00the why now what opportunities and what
01:02challenges are on the horizon and how
01:04can you get involved let's dive
01:07in as a reminder the content here is for
01:10informational purposes only should not
01:12be taken as legal business tax or
01:14investment advice or be used to evaluate
01:16any investment or security and is not
01:18directed at any investors or potential
01:20investors in any a16z fund please note
01:24that a6z and its Affiliates may also
01:26maintain investments in the companies
01:27discussed in this podcast for more
01:30details including a link to our
01:31investments please see
01:38disclosures my name is Anan maida I'm a
01:40general partner here at a16z and I'm
01:44talking to you today about AI
01:46interpretability which is just a complex
01:48way of saying reverse engineering AI
01:51models over the last few years ai's been
01:54dominated by scaling which is a quest to
01:56see what was possible if you threw a ton
01:59of compute and data at training these
02:00large models but now as these models
02:03begin to be deployed in real world
02:06situations the big question on
02:07everyone's mind is why uh why do these
02:11models say the things they do why do
02:13some prompts produce better results than
02:15others and perhaps most
02:18importantly how do we control them and I
02:21feel like most people don't need
02:23convincing that this is a worthwhile
02:25Endeavor for us to understand these
02:27models a little better but maybe you
02:29could share where we're at in that
02:31Journey what do we or don't we
02:33understand about these llm black boxes
02:35and their interpretability you know it
02:38might help to Reason by analogy here if
02:39you pretend one of these AI models is
02:42like a big kitchen with hundreds of
02:45cooks and when and when you ask uh ask
02:49the kitchen to make something you know
02:51each cook knows how to make certain
02:54foods and when you give the kitchen
02:56ingredients and you say te go cile all
02:59the different Cooks debate about what to
03:00make and eventually they come to an
03:02agreement on a meal to prepare based on
03:04these ingredients now the problem is
03:06where we are um in the industry right
03:09now is that from the outside we can't
03:11really see what's happening in these
03:12kitchens so you have no idea how they
03:14made that decision on the meal you just
03:17get the cake or the taco or whatever it
03:20might be right so you if you ask the
03:23kitchen hey why did you choose to make
03:25lasagna it's really hard to get a
03:27straight answer because the individual
03:28Cooks don't actually represent a clear
03:30concept like a dish or a
03:33Cuisine and so the big idea here is what
03:36if you could train a team of head chefs
03:38to oversee these groups of cooks and
03:40each head chef would specialize in one
03:42Cuisine so you'd have the Italian head
03:43chef who controls all the pasta and
03:45pizza cooks and then you have the baking
03:47head chef in charge of cakes and pies
03:49and now when you ask why lasagna the
03:51Italian head chef raises his hand and
03:52says I instructed the cooks to make a
03:54hearty Italian meal and these head chefs
03:56represent clear interpretable concept
04:00inside the neural network and so this
04:03breakthrough is like finally
04:04understanding all the cooks in that
04:06messy kitchen by training these head
04:08chefs to organize them into TI sort of
04:10Cuisine categories and we can't control
04:13every individual cook but now we can get
04:15insights into the bigger more meaningful
04:17decisions that determine what meal the
04:19AI chooses to make yeah does that make
04:22sense it does but are you saying that we
04:24do actually have a sense now of those
04:27like head chefs or the people
04:28responsible for parts of what might be
04:31happening within the AI obviously it's
04:33not people in this case but have we
04:35actually unlocked some of that
04:37information with some of the new
04:38releases or new papers that have come
04:40out we have we have and you can kind of
04:42inter you can you can break the world of
04:44interpretability down into a pre- 2023
04:47and a post 2023 World in my opinion
04:48because there's been such a massive
04:50breakthrough in that specific um domain
04:53of understanding which Cook's doing what
04:56um you know more specifically what
04:59what's happening is that these these
05:01these models are made up of neurons
05:03right a neuron refers to an individual
05:06node in the neural network and it's just
05:08a single computational unit and
05:11historically the industry sort of tried
05:13to analyze and interpret and explain
05:16these models by by trying to understand
05:18what each neuron was doing what each
05:19cook was doing in that in that situation
05:21um a feature on the other hand which is
05:24this new uh Atomic unit that the
05:27industry is proposing now as an
05:29alternative neuron refers to a specific
05:31pattern of activations across multiple
05:33neurons and so while a single neuron
05:36might activate in all kinds of unrelated
05:38contexts like whether you're asking for
05:39lasagna or you're asking for a pastry a
05:42feature which is this new Atomic unit
05:44represents a specific concept that
05:46consistently activates a particular set
05:48of neurons and so to explain the
05:50difference using you know the cooking
05:52analogy a neuron is like an individual
05:54cook in the kitchen each one knows how
05:56to make certain dishes but doesn't
05:58represent a clear concept
06:00a feature would be like a Cuisine
06:01specialty controlled by a head chef so
06:03for example the Italian cuisine feature
06:05is active whenever the Italian headchef
06:08and all the cooks they oversee are
06:09working on on an Italian Dish and that
06:12feature has a consistent interpret
06:14interpretation which in this case is
06:16Italian food while individual Cooks do
06:19not and so in summary these neurons are
06:22individual computational units that
06:23don't map neatly to Concepts these
06:25features are patterns of activations
06:27across multiple neurons that do
06:29represent clear interpretable Concepts
06:31and so the Breakthrough here was that
06:32now we've learned how to decompose a
06:35neural network into these interpretable
06:37features when previous approaches
06:39focused on interpreting single neurons
06:41and and so the short answer is yes we
06:43have a massive breakthrough where we
06:44actually now know how to trace what was
06:47happening in the kitchen and maybe could
06:49you give an example that's specific to
06:51these llms when we're talking about a
06:52feature I know there's there's still so
06:54much research to be done but like what's
06:56an example of a feature that you
06:58actually see represented in the outputs
07:01from an llm yeah that's a great question
07:03so I think if you actually look at the
07:05paper that um move the industry forward
07:08a bunch earlier this year this's a paper
07:11called um decomposing language models
07:14with dictionary learning this came out
07:15of anthropic um you know
07:17interpretability is a large field but
07:18this paper I think um took a specific
07:21approach called mechanistic
07:22interpretability and the I think one of
07:25the examples the paper has a number of
07:27examples of of features in that they
07:29disc covered in a in a very small almost
07:31toy like model because smaller models
07:34prove to be very useful um pet Tre
07:37dishes for these experiments and I think
07:38an example of one of these features was
07:41a god feature where um when you talk to
07:43the model about religious
07:45Concepts um then a specific God feature
07:49fired over and over again and they found
07:52when they talked to the model about a
07:54different type of concept like uh
07:57biology or DNA a different feature
07:59feature that was unrelated to the god
08:00feature fired whereas where whereas um
08:05the same neurons were firing for both
08:07those Concepts and so there was a the
08:11the feature level analysis allowed them
08:13to decompose and break apart the idea or
08:16the concept of religion from biology
08:19where which which is something that
08:20wasn't possible to tease apart in the
08:22neuron world yeah yeah and maybe you
08:25could speak a little bit more to why
08:27this is helpful I mean maybe it's it's
08:29obvious for folks listening but now that
08:32we have these Concepts that we see and
08:34maybe can also link pretty intuitively
08:36like oh okay I understand biology oh I
08:38understand religion as a concept that's
08:40coming out of these llms now that we
08:43understand these linkages a little more
08:46what does that mean like why does this
08:48now open things up what are we in a new
08:50environment you kind of said pre some of
08:53these uh unlocks and now we post what
08:56does post look like yeah this is a great
08:58question so I think there's three big
09:00things that um that are sort of soats
09:03from from this breakthrough the first is
09:06that that interpretability is now an
09:09engineering problem as opposed to an
09:11open-ended research problem and that's a
09:14huge sort of CA change for the industry
09:16because up until now there were a number
09:18of hypotheses on how to interpret how
09:21these models were were behaving and
09:23explain why but it wasn't quite concrete
09:26it wasn't quite understood which one of
09:28those approaches would work better to
09:29actually explain how these models work
09:31at at very large scale at Frontier model
09:33scale um but I think this approach this
09:36mechanistic interpretability approach
09:37and this paper that came out earlier
09:38this year shows that actually the the
09:41relationships are so easily observable
09:44um at a small scale that the the bulk of
09:46the challenge now is to to scale up this
09:48approach which is an engineering
09:49challenge and I think that's massive
09:50because the engineering is largely a
09:52function of the resources and the
09:54investment that goes into scaling these
09:56models um whereas research can be fairly
09:58open open ended and so I think uh one
10:01big one big conclusion um or takeaway
10:04from from 2023 is that interpretability
10:07is gone from being a research area to
10:09being an engineering area I think the
10:12second is that if we actually can get
10:14this approach to work at scale then we
10:18can control these models in the same way
10:20that if you understood how a kitchen
10:22made a dish and you wanted a different
10:24outcome now you can go to the Italian
10:26chef and say could you please make that
10:27change next time around and and so that
10:31controllability and that's really
10:33important because as these models get
10:34deployed in really important sort of
10:36mission critical situations like um
10:39health care and finance and um in in in
10:43defense applications you need to be able
10:45to control these models very very
10:46precisely which unfortunately Today just
10:48isn't the case we have very blunt tools
10:49to control these models but nothing
10:51precise enough for those Mission
10:52critical situations so I think
10:54controllability is a big is a big piece
10:57that that this unlocks and the third is
11:02uh a byproduct of having control
11:04controlability which is we once you can
11:07control these models you can rely on
11:09them more and I think that's a huge
11:12increased reliability
11:14means not only good things for for the
11:17customers and the users and developers
11:18using these models but also from a pol
11:20policy and Regulatory perspective we can
11:23now have a very concrete grounded debate
11:26about what models are safe and not
11:29how to govern them how to make sure that
11:33the space develops in a concrete
11:35empirically grounded way as opposed to
11:37reasoning about these models in the
11:39abstract with a lot without a lot of
11:41evidence I think one of the problems
11:44we've had as an industry is that because
11:45there hasn't been a concrete way to show
11:47or demonstrate that we understand these
11:49black boxes and how they work that a lot
11:51of the policy work around and policy
11:53thinking is sort of worst case analysis
11:56and and worst case analysis can be
11:58fairly open to fear-mongering and a a
12:01ton of fud and I think instead now we
12:03have an empirical basis to say here are
12:05the real risks of these models and
12:06here's how policy should address them
12:08and I think that's a big uh Improvement
12:10or big uh Advance as well for us all
12:12totally I mean it's huge and it's it's
12:14kind of interesting because we don't
12:17know every little piece of physics but
12:20we're able to deploy that in extremely
12:22effective ways and build all of the
12:24things around us through that
12:26understanding that has grown over time
12:28and so it's really exciting that these
12:29early building blocks are getting in
12:31place maybe you can just speak to that
12:33engineering challenge or the the
12:35flipping that you said happen where we
12:37previously had a research challenge
12:40which was somewhat TBD when is this
12:42going to be unlocked how is it going to
12:44be unlocked and now we have again those
12:46early building blocks where we're now
12:47talking about scale and I'll just read
12:50out a quick tweet um from Chris who I
12:53believe is on the anthropic team and he
12:55said if you asked me a year ago superp
12:57position would have been by by far the
12:59reason I was most worried that
13:01mechanistic interpretability would hit a
13:03dead end I'm now very optimistic I go as
13:06far as saying I it's now primarily an
13:09engineering problem hard but less
13:11fundamental risk so I think it captures
13:13what you you were just mentioning but
13:14maybe you can speak a little bit more to
13:16the papers and the size of or or the
13:19scope that they've done this this
13:22feature analysis within and what the
13:26steps would be to do this when we're
13:27talking about those much much Lar
13:29foundational models yeah I that's a
13:33think um stepping back the way science
13:36in this space is done often is you know
13:38you you start with a a small almost
13:42story like model of your problem see if
13:45some solution is promising and then you
13:47decide to scale it up um to to a bigger
13:50and bigger uh level because if you can't
13:53get it to work at a really small scale
13:55rarely do these systems um work at at
13:57large scale and so I think um you know
14:01while the while of course the mo the
14:03Holy Grail challenge with
14:04interpretability is explaining how
14:07Frontier models that are the gp4s and
14:10clae 2os and Bs of the world which are
14:12you know sever several hundred billion
14:14parameter at in their
14:17scale I think that one of the challenges
14:22attack interpretability of those models
14:24directly is that they're so large and
14:26such complex systems it it it is very
14:29very untractable to try to tease apart
14:31all the different um neurons in in these
14:34in these models at that scale now I
14:36should be clear it's not easy um and and
14:39there are ton of like unsolved problems
14:41in the scaling uh sort of part of the of
14:44this of this CH of this journey as well
14:46yeah if I could just interrupt real
14:48quick I mean you mentioned the scaling
14:49laws and those have continued to scale
14:52but we didn't necessarily know if that
14:54would be the case it has of course
14:56proven to be the case as we move forward
14:58but what are the challenges that you see
15:00that might be outstanding as we look to
15:03scale up some of this mechanistic
15:05interpretability research what open
15:07challenges do you see on that path ah
15:10okay so yes so to to borrow our analogy
15:13earlier of the kitchen you know we the I
15:16think we as an industry now have a a
15:19model of of what's going on and some
15:22proof of what's going on with these with
15:25these features with a kitchen which has
15:27let's say three or four shs
15:29and so to figure out if this would work
15:31at Frontier scale where you have
15:32thousands and thousands of chefs in each
15:34kitchen and in in the case of a model
15:36you have you know billions of
15:38parameters um I think there are two big
15:41open problems that that need to be
15:44solved in order for this approach to to
15:46work at at scale the first is increasing
15:49the the auto encoder which is
15:52conceptually you can kind of think about
15:54as the model that makes sense of what's
15:57going on with each feature
15:59and the auto encoder here is pretty
16:01small in in the paper that came out in
16:03October and so I think there's a big
16:06challenge where the researchers in the
16:08space have to figure out how to scale up
16:10the auto encoder in the order of
16:12magnitude of um almost 100x expansion
16:16factor and so that that's a lot and the
16:20that's pretty difficult
16:22because the you know training the
16:23underlying the base model itself often
16:26requires hundreds and often billions of
16:29dollars worth of compute and so I do
16:32think it's a fairly difficult and
16:34compute intensive challenge to scale the
16:36auto encoder now I think there's a ton
16:38of promising approaches on on how to do
16:40that scaling without needing tons and
16:42tons of compute but those are pretty
16:44open-ended engineering problems I think
16:46the second is to actually scale the
16:49interpretation of these of these uh of
16:52these networks and so as an example you
16:55know if you find all the um the neurons
16:58and all the features related to let's
17:00say pasta or or Italian cuisine and then
17:03you have a separate set of features that
17:05map to pastries right um now the
17:10question is how do you answer a complex
17:13query and you asked the AI hey if I
17:17asked you a um a a provocative question
17:20about whether people have a certain ethn
17:23ethnicity enjoy Italian cuisine or not
17:26right um You need to figure out how
17:29those two features actually interact
17:30with each other at some meaningful scale
17:33and that is a is a pretty difficult
17:36challenge to reason about too and I
17:37think that's that's the second big open
17:39problem that the researchers call out in
17:40their work and so I think um
17:44the the combinatory complexity of each
17:47of those sets of features um interacting
17:49with each other at at increasing scales
17:51um is a nonlinear increase in complexity
17:53that that um has to be interpreted and
17:55so these are sort of the two big at
17:57least at least at the moment these are
17:58the two clear engineering problems that
17:59need to be solved scaling up the auto
18:01encoder and scaling up the
18:02interpretation but there probably are a
18:04list of um uh you know longtail
18:07questions as well that I'm not
18:08addressing here but those are sort of
18:10the two big ones how does this change
18:11the game and maybe you could speak to
18:13what you're excited for specifically
18:15coming into 2024 as it relates to
18:18mechanistic interpretability yeah so to
18:20be clear I'm I'm excited about um all
18:23kinds of interpretability uh or at
18:25explainability I'm I'm broadly very
18:26excited about 2024 as the uh as the
18:30first time or or um at least the the
18:34year where the most amount of interest
18:37and attention is being paid to
18:38explainability you know the last few
18:40years the attention was all on on the
18:42how and the what people are just
18:44incredulous at the capabilities of these
18:46models can we get them to be smarter can
18:48we get them to reason about entirely new
18:51topics that maybe weren't in the
18:52original pre-training data set um and
18:55and that's that's been totally
18:57reasonable but I think um more attention
19:01on the why of these models and to
19:03explain how they work was the has been
19:06the big um Blocker on these models
19:10getting deployed outside of just uh a
19:12few consumer use cases where um the
19:16costs of of the model not being as
19:19reliable or steerable um are are low and
19:23so low Precision environments consumer
19:25use cases where people are more
19:26forgiving and tolerant of Mistakes by
19:28the model and so on is largely where the
19:31bulk of the value um has been generated
19:33in in in AI today but I think if you
19:36want to see these models take over some
19:37of the most impactful parts of our lives
19:39that they currently don't aren't
19:41deployed in things like healthcare I
19:43think that those Mission critical
19:45situations require a lot more
19:47reliability um and predictability and
19:49that's what interpretability ultimately
19:51unlocks if you can explain why the the
19:53the kitchen does something then you can
19:54control what it does and that makes it
19:56much more reliable and therefore it's
19:57going to be used in more and more
19:59situations and in more use cases and in
20:01more more and more impactful customer um
20:04uh Journeys where today a lot of the
20:07models don't actually make it make the
20:09cut yeah no it's so true actually
20:12something that also just dawned on me as
20:14you were talking is almost everything in
20:15this world has margin for error right
20:18there is error inherently in in most
20:21things however if you can understand if
20:23you can explain that error and constrain
20:27it to something thing that other people
20:29can get behind it's just much more
20:30likely that people will want to engage
20:32with that thing because they can at
20:34least understand what is coming out of
20:36it um and so yeah that's I feel like
20:38that picture is is very compelling and I
20:41hope we can get there I hope so too I
20:44think you know to be clear we're not
20:45there yet but we've got the glimmers now
20:47of approaches that might work and I
20:49think 202 what I'm excited about 2024 is
20:51a lot more investment a lot more energy
20:54A lot more of the best researchers in
20:56this space spending their time on
20:58yeah well we have some of the smartest
21:00people in the world working on AI and we
21:02saw how quickly things moved in 2022
21:042023 so hopefully 2024 some of this
21:07interpretability work moves just as
21:09quickly I hope so I've got my fingers
21:11crossed all right I hope you enjoyed
21:13this big idea we do have a lot more on
21:16the way including a new age of Maritime
21:18exploration that takes advantage of AI
21:21and computer Vision Plus AI first games
21:23that never end and wether Voice verse
21:26apps May finally be having their Moment
21:29by the way if you want to see our full
21:30list of 40 plus Big Ideas today you can
21:33head on over to az.com SL Big Ideas