00:05what would the world look like if we

00:07could create biological software that

00:08allows us to compile RNA that's the big

00:10question this week on the podcast Sarah

00:13and I are sitting down with jakub

00:15uskright co-founder and CEO of inceptive

00:18Jacob spent more than a decade at Google

00:19where he co-authored the attention is

00:21all you need paper and several other

00:23papers is set the foundation for today's

00:25AI Revolution he has also started and

00:27led the research teams that transform

00:29Google search Google Translate and

00:30Google Assistant now at inceptive he

00:33builds biological software with aim to

00:35make widely accessible medicines and

00:36biotechnologies Jakob welcome to no

00:38priors thank you thank you for having me

00:42um you worked at Google for more than a

00:43decade working on many leading research

00:45teams you were really seminal in the

00:47original Transformer paper and I think

00:49um when you know when I talk to the

00:50other authors of the Transformer paper

00:51people sort of in the Google you're

00:53widely credited with really coming up

00:55with the idea of focusing on attention

00:57which was sort of the basis for the

00:58attention is all you need paper could

01:00you talk a little bit more about

01:01how you came up with that and how the

01:03team started working on it and sort of

01:04the origins of that pretty foundational

01:06breakthrough in terms of the Transformer

01:09it's really not that simple right it's

01:11also really important to keep in mind

01:13that it always in deep learning you

01:15can't make something

01:16in quotes really work that is maybe

01:20pretty far I would say the theoretical

01:22or formal end without really going deep

01:25on the engineering implementation side

01:27and it just has to be efficient at the

01:29end of the day in my mind that's the one

01:30and only thing we know really works if

01:32you want to push deep learning forward

01:34is to make it faster and more effective

01:36and more efficient on the given piece of

01:37Hardware there's a lot of evidence that

01:41the way we actually understand language

01:43and that's something that then shapes

01:46language in terms of its statistical

01:47properties is actually it's someone

01:49hierarchical and the best piece of kind

01:52of just circumstantial or anecdotal

01:53evidence for that is just looking at

01:55what the linguists do right they draw

01:57these trees and while I don't think that

01:59they're ever really true

02:01they're also definitely not always false

02:04and so they do capture some of the

02:06statistics that are inherent in language

02:07and probably language was actually

02:09evolved this way in order to exploit our

02:12cognitive capacities really in a fairly

02:14optimal way and so you can safely assume

02:18that it is not necessary to go through

02:21the entirety of a sequential signal

02:24beginning to end and maybe also end to

02:26the beginning simultaneously in order to

02:28understand it but actually you can gain

02:30a lot of the understanding in air quotes

02:33by looking at individual groups

02:36of say your signal right and ultimately

02:39if you now are given a piece of Hardware

02:42that has the very key strength of doing

02:44lots and lots of simple computations in

02:46parallel as opposed to complicated

02:48structured computations sequentially

02:51then really that's actually a kind of

02:53statistical property you really want to

02:55exploit right you want to in parallel

02:57understand pieces of an image first and

02:59then maybe that's not possible in its

03:01entirety but you can actually get a lot

03:03of it and then only once you've done

03:05some of that you put these incomplete

03:07understandings or representations

03:09together and as you put them together

03:10more and more that's when you

03:12disambiguate the last remaining or

03:15that's when you get rid of the last

03:16remaining ambiguity at the end of the

03:17day and when you think about what that

03:19process looks like it's a tree

03:21and when you think about how you would

03:23actually run something that evaluates

03:25all possible trees then a reasonable

03:28approximation is that you repeat an

03:30operation where you look at all

03:32combinations of things that's the

03:34quadratic step right that ultimately is

03:37the core of this attention step and then

03:39you effectively pull information in for

03:41a given representation of a given piece

03:43the other representations of all the

03:44other pieces and rinse and repeat and it

03:48seems intuitive and also seems

03:50intuitively clear that that's a really

03:52good fit for the kind of accelerators

03:54that we had at the time that we still

03:55have today and so that's really where

03:57that idea came from if you want to look

03:59at say the biggest difference is for

04:00example between the Transformer as it

04:02was described in the attention is all

04:04you need paper and some of its ancestors

04:06like this decomposable attention model

04:07the big difference is just that the

04:09Transformer was implemented by folks

04:10like normal Industries Etc but in a way

04:13that's such an excellent fit for the

04:15accelerators that we had at the time so

04:16one question that I've kind of heard

04:17people bring up is a lot of the behavior

04:19that we've seen in Transformers to some

04:21extent is most interesting at scale

04:22right you get interesting immersion

04:24properties yeah and there may be other

04:26architectures that have equally

04:28interesting or perhaps more interesting

04:30properties at scale but there's sort of

04:32two impediments number one is people

04:33just aren't throwing a lot of money in

04:35compute at it and two is that under

04:37accelerator architecture actually fits

04:39so well it is dramatically less

04:41performant to do other architectures and

04:42therefore you may never actually test

04:44do you think that's a true statement I

04:46think that the big question is does it

04:48matter it would be really interesting to

04:49evaluate especially if we can make them

04:51simpler to evaluate combinations of

04:55different hardware and then models or

04:57architectures that fit like gloves to

04:58those and I feel at the moment given

05:02where gpus came from they weren't Built

05:07why would it be that they are anywhere

05:11if at least they were engineered for

05:13this purpose and lots of people

05:14basically banged their head against

05:15walls until they had this kind of

05:17somewhat optimized but that's not how

05:19the basic architecture came to be and so

05:22you can talk a lot about and reason a

05:24lot about and I think that some of that

05:25is true the generality of basically

05:27really fast scalable Matrix multipliers

05:29and how that just does everything in

05:31scientific Computing really well sure

05:33but there's still lots of bells and

05:35whistles and there are lots of specific

05:36trade-offs say for example things like

05:39memory bandwidth and ultimately inherent

05:42parallelism versus latency

05:45don't think gpus are at The Sweet Spot

05:47when it comes to large-scale deep

05:49learning with respect to exactly those

05:51trade-offs and so it may very well be

05:53that if we actually try these

05:56combinations we might actually even

05:59quickly find something that's better

06:01when you think about how we get progress

06:03from here usually people think of

06:06software as driving the hardware

06:09right do you think we get accelerators

06:12designed for the large-scale Transformer

06:15architectures we already have or new

06:17Hardware Designs like it's chicken or

06:19egg a little bit here it's a chicken and

06:22egg and if you look at the newest

06:24accelerator designs they are taking this

06:26into account to a significant extent

06:28actually increasingly so there are a

06:32couple of interesting examples we had a

06:33computer vision architecture that really

06:35was just an MLP called mixer and while

06:38it wasn't significantly better it also

06:41wasn't significantly worse than the

06:43vision Transformers right and I think

06:45that already goes to show it's not that

06:46difficult and especially if you simplify

06:48on the way it might really be a

06:50possibility I will say one other thing

06:51aside from efficiency but just really

06:54raw efficiency in terms of its thick

06:57architecture is fit to the accelerator

07:00the other main contributor I think to

07:02the success of this architecture is

07:04optimism and hope so suddenly you were

07:07in a situation where for whatever reason

07:10a bunch of things that people tried with

07:13this started to work and then more

07:15started to work and that's not

07:17coincidence it's really just because

07:18ultimately the human Cycles invested to

07:22getting all these things all these

07:23diverse things to work are ultimately

07:25fueled by suspension of disbelief and

07:28AKA hope or whatever you want to call it

07:30and and that really I mean the community

07:33became so energized so quickly and then

07:36just try everything under the sun and

07:38because if the prior was just a

07:39different one the fire now was oh look

07:41we have this thing where it just works

07:43which is just not true the reality is

07:45that you try something else the first

07:46time and you really have to work hard

07:47for a long period of time and then lo

07:50and behold sometimes it works and if you

07:52do that many more times then it will

07:53work many more times and I think that's

07:56really what we're seeing where do you

07:58think people should invest that sort of

07:59optimism going forward like what are the

08:01big areas that people need to work on to

08:03increase the performance of these

08:05systems or add memory or do other things

08:07that you feel are a few if you were to

08:09sort of paint the road map going ahead

08:10in terms of making these really valuable

08:12performance systems what would you focus

08:14on I mean I think there's one thing that

08:16still boggles my mind in terms of just

08:18from first principles it can't be

08:20optimal and that is that if you think

08:22about it the way you today scale the

08:25compute that's invested in a given

08:27problem right let's say the problem is

08:30what's the response to a prompt in some

08:32large language model

08:34then ultimately the way you scale that

08:36compute depends on the prompt and how

08:39much how long that is the longer the

08:41problem the more compute you get and it

08:43depends on and there's of course many

08:45different screws to tweak here the

08:47length of the response there are many

08:49very hard problems where the response is

08:52and you can in many cases actually

08:54formulate those problems very very

08:55succinctly so you're not going to be

08:57using a lot of compute even though the

08:59problem we know is really really

09:00difficult say I don't know Prime

09:01factoring prime factorization a problem

09:04like that simply stated big potential

09:06impact and right right now there's no

09:10knob that you can easily tweak as a user

09:12but also really there's no knob that the

09:15architecture can tweak itself when it

09:17comes to then basically deciding oh this

09:20is hard I actually need to use more

09:23and ironically and this comes back to a

09:26question that many people ask I think

09:28around does it make any sense to train

09:32because information Theory founding

09:35information Theory very clearly says

09:36nope you're not going to get more

09:38information out of it you can do it all

09:40but there is an artifact or there's an

09:43emission maybe even in that information

09:45in that flavor of information Theory

09:47which is it doesn't take into account

09:49compute it doesn't take that into

09:50account actually the energy expenditure

09:53necessary to generate that data so if

09:55you now think back to these problems

09:56right if you were to just let

09:59llms run generate stuff and then train

10:02new llms or even the same llm actually

10:05on that output what you do is you

10:07amortize compute that was expended at

10:09some point in time and so now suddenly

10:11right you actually have models that if

10:13you retrain them over and over again

10:15they're starting to spend more compute

10:17on the same problems but it's amortized

10:20over all of these iterations effectively

10:21of the system and that seems clunky that

10:24just seems so clunky that ultimately it

10:27should be something where at inference

10:30time at runtime the model effectively

10:34or maybe even query right so there's

10:36this notion of anytime algorithms where

10:38it might just depend on your resources

10:40if you have more time or more money then

10:42let it run longer but you don't want

10:45that to happen in cases where the answer

10:46or the quest the problem in question is

10:49simple you only want to do that in cases

10:51where it's actually hard and that right

10:53now doesn't work because if you pose a

10:56very very simple problem like

10:58two plus two to gpd4 right now and you

11:00write that in a very long-winded way in

11:03a prompt and you ask dpd4 to generate a

11:07very complicated answer right then it

11:09will actually expend a ton of compute to

11:12add to to two which makes no sense so

11:16that I mean out of all the different

11:18problems that I currently see in kind of

11:20at a high level because it's not clear

11:22how you would exactly address it that is

11:24maybe the one that boggles my mind most

11:26yeah are there other big research areas

11:29that you're excited about right now or

11:30areas where you see enormous progress

11:32being made so in terms of Foundations I

11:36different flavors of elasticity

11:39are really interesting so

11:42you could actually claim that a lot of

11:45these questions boil down to the

11:47question that I just or to basically

11:49this problem that I just uh described

11:51right that computers in a certain sense

11:53very crudely allocated but you can look

11:55at different incarnations of this

11:56problem so another one would be why

11:59don't we have models that in an elegant

12:01way manage to consume say visual sensor

12:05output of different resolutions

12:08different sampling rates different

12:09durations right right now it's actually

12:12quite tricky to have other than maybe

12:14recurrent architectures a model that

12:15takes videos of different links

12:17different image resolutions or

12:19ultimately different densities if you

12:21wish in different sizes and and really

12:24elegantly adjusts compute to what you

12:28really want to know about this or the

12:29how difficult it really has to be to

12:31generate the representations that you

12:33that you need in order to do whatever

12:34you want to do and here again an example

12:36that makes this I think pretty clear is

12:39you can take a video you can scale it up

12:41you can frame it with trivial algorithms

12:43and then run it again and if the problem

12:46you're trying to solve conditions on

12:49that video is the same then I wouldn't

12:50want more computers to use but right now

12:52that's what's going to happen you're

12:54going to use a ton more computers and so

12:56effectively these types of in a certain

12:59sense kind of elasticity or or your

13:02flexibility of these models I believe

13:04our lack of techniques addressing those

13:07ultimately is incredibly wasteful I've

13:10seen increasing attention around like

13:11two different concepts in these General

13:15directions one is um I think it's a I

13:17think some people at meta that did depth

13:18adaptive Transformers right so just

13:21adjusting the amount of computation for

13:23each input and like a prediction on that

13:25right and then I don't know how much

13:27more work has gone in that direction and

13:29then I think a number of people are more

13:31excited about doing test time search

13:33especially for problems like like code

13:36generation where you can evaluate with

13:38compilation or something to sort of get

13:40Loop of loop of success in in the model

13:43itself I think it's super effective in

13:47I do think it's clunky

13:49because it's not something that you can

13:51easily enter and optimize so right

13:53basically this is also what I'm what I

13:55was trying to get at a little bit maybe

13:56with with saying well some of these

13:58efficiency improvements that were not

14:00yet yeah really harnessing I believe

14:02would dramatically affect training time

14:04and if you look at kind of how test time

14:05search actually affects training

14:08it's just in the clinic and I don't

14:11think we'll be able to optimize it as as

14:13well although as an engineering in a

14:16I don't know hack could sound negative

14:18that's not what I mean and I think it's

14:20it's an awesome hack as an engineering

14:22hack around this problem it's really

14:24really effective it basically comes back

14:26to this whole idea of amortizing compute

14:27in a certain sense right with the stuff

14:29you already have lying around and

14:30memorized even though it was the humans

14:32that actually put it there in many cases

14:33in terms of adaptive adaptive time

14:36Transformers Etc we tried this Universal

14:39Transformer thing actually a long time

14:41it just hasn't caught on and that's

14:43because it just doesn't work right at

14:45this point it doesn't work well enough

14:46it's not like it doesn't work at all but

14:48if it worked really well then

14:51because of the fact that compute right

14:53now is this incredibly scarce resource

14:57we would see it everywhere and and I

15:00think what that tells us is and I don't

15:02think here it's really just for a lack

15:03of trying a couple years too little

15:04experimentation but at least those known

15:07or or proposed methods here they just

15:10don't work well enough yet

15:12so one thing that you've been working on

15:13for the last few years is inceptive

15:15which is really starting to focus on how

15:16can you apply machine learning and

15:19different aspects of software to biology

15:21could you share a little bit about the

15:23company how you got interested in BIO

15:25and what you view some of the

15:27interesting problems there

15:29Yeah so basically I've always been

15:30interested in BIO and know nothing about

15:32it and that's a conundrum because it's

15:34difficult to learn a lot about biology

15:35when you're not in school and I didn't

15:37want to go back to school but at the

15:39same time it always felt like something

15:40where there's a lot of Headroom in terms

15:43of efficiency and actually also where

15:49alternative approaches at least if what

15:51you are interested in is really solving

15:53acute problems where there's maybe a

15:55dire need for alternative approaches

15:57alternative to basically biology designs

16:00that is trying to develop a complete

16:02conceptual understanding of how life

16:05I don't have very high hopes for

16:07Humanity to develop that complete

16:08conceptual understanding to the level

16:10that we would need in order to do all

16:11the interferences we want to do we don't

16:12really have great Tools in our toolbox

16:15or we didn't have them until somewhat

16:17recently as alternatives to

16:19understanding how it works and then

16:21basically based on that understanding

16:22fixing it if it needs fixing and I think

16:25now we have an alternative that's an

16:27extremely good match and that's deep

16:28learning at scale we're really

16:30we can potentially to a pretty large

16:34extent if not entirely whatever this

16:36means work around the following two

16:39problems number one is we don't know all

16:41the stuff that's going on

16:42in life right so we still just don't

16:45even have a complete inventory let alone

16:47really understand all the mechanisms

16:49and number two we ultimately even for

16:53the stuff that we do know

16:54so far haven't really in many cases been

16:58able to come up with yes sufficiently

17:00predictive theories to really make that

17:03understanding useful a concrete example

17:04here is protein folding right where

17:06basically even if you just act as if

17:08there are no chaperones there is no

17:10other stuff in this environment in which

17:13folding or or whatever you want to call

17:15it in which that process in which kind

17:17of the earliest kinetic steering

17:19translation happen even if you make that

17:21massively simplifying assumption

17:24the theory just wasn't practical and it

17:29seems like deep learning is at least

17:31potentially a really good answer to both

17:32of those aspects because you can

17:34basically treat everything in quotes as

17:36a black box and as long as you are able

17:38to observe that black box in terms of

17:41input output fast enough and a

17:44sufficient scale you might go somewhere

17:46with that so inceptive is pretty

17:48stealthy is there anything you can share

17:49in terms of how you're applying deep

17:51learning or other techniques to to

17:53biology in the context of the company

17:54yep my daughter was born my first child

17:57and just that entire process gave me a

18:01really fundamentally different

18:02appreciation for the fragility of life

18:04and a really wonderful one but also a

18:06pretty fundamentally different one and

18:08so here we are we have this new tool

18:10namely episode 2 that solves one of

18:12these fundamental problems in structural

18:14biology we have instances of a

18:17macromolecule family that's basically

18:19about to save the world

18:21and I basically want to fix life because

18:24I now have this wonderful daughter it

18:26became clear that using the exact rules

18:29we have been working on at Google before

18:31and applying those to this neglected

18:33stepchild namely RNA or more

18:35specifically at first mRNA could have

18:37massive impact on the world and

18:40what we're trying to do is

18:42to design better RNA and at first mRNA

18:46molecules for a pretty broad variety of

18:51infectious disease vaccines are I guess

18:54maybe the obvious first example given

18:57but if you look at the pipelines of

19:00moderna and biontek and all those

19:02companies the at least potential

19:04applicability of RNA mRNA more

19:08specifically is is near Limitless

19:10there's already now hundreds of programs

19:12underway in different stages of

19:14development that number is expected to

19:17climb hitting High triple digits before

19:20the end of the decade and now we're

19:22talking about a modality that might end

19:25before the end of the decade being the

19:28second or third biggest modality in

19:31terms of Revenue and potentially also in

19:32terms of impact and if you now take that

19:36in terms of just trajectory and look at

19:39how sub-optimal in a certain sense the

19:42MRNA vaccines were when you compare it

19:45to what's possible using RNA just

19:47looking around in nature looking at how

19:49severe the side effects were for what

19:51fraction of ultimately patients that

19:53received the vaccines have few people

19:55comparatively really had access to any

19:58of those vaccines when they really were

20:00necessary and needed and it seems like

20:02currently if we look around in our

20:04toolkit the only tool we have to

20:06potentially change that quickly is deep

20:09so an incentive we think of this now as

20:13something that you could call biological

20:15software where mRNA and RNA in general

20:18is maybe the the equivalent to bytecode

20:22that then forms the substrate forms like

20:25the actual stuff that the software is

20:27made of and what you do is you learn

20:30models that allow you to translate

20:35programs that might look like some bit

20:38of python code that specify what you

20:40want a certain medicine to do inside

20:42yourself inside yourselves and translate

20:45those programs compile them into

20:47descriptions of RNA molecules that then

20:50hopefully actually do what you wrote

20:53what you programmed them to do

20:55and ultimately right now if you look at

20:57mRNA vaccines our programming language

20:59is just a print statement right just

21:01print this protein but you can easily

21:03imagine that with self-amplifying RNA as

21:06one example and with ribose which is a

21:08so-called ribose which is basically RNA

21:10is that change dramatically in structure

21:12or self-destruct in the presence of say

21:14given small molecular so you can

21:16effectively have conditionals you can

21:18have recursion and as a computer

21:19scientist you squint and you're like oh

21:21wow okay this is basically touring

21:22complete you have semio and you kind of

21:25have all sorts of tools now at your

21:27disposal to really build very very

21:29complex ultimately medicines that then

21:32might also be produced manufactured and

21:35distributed in a way that is much more

21:37scalable than anything that we've been

21:39able to do so far protein-based

21:40biologics oftentimes don't make it to

21:42the market because it's just not

21:44possible to manufacture them at scale if

21:47we wanted to medicate everybody in the

21:48world with all the protein-based

21:50biologics that they could actually that

21:52they should actually receive the real

21:53estate on the planet wouldn't be enough

21:55to make all the stuff

21:57but right now if you look at RNA

21:59manufacturing and distribution

22:00infrastructure we're going to have six

22:02to eight billion doses two years from

22:03now manufacturable and distributable

22:06across the globe and that number is

22:08going to go up really really quickly at

22:10inceptive right now in our lab we can

22:11actually print pretty much any given our

22:15and that's just something you can't do

22:16with small molecules you can't easily do

22:18with proteins certainly not at scale and

22:20that's not something that only matters

22:22when you have a product in your hand if

22:24you want to treat this as a machine

22:25learning problem you need to generate

22:27training data it doesn't already exist

22:28and so you also really want to have

22:29scalable synthesis and Manufacturing

22:32which is unprecedented as a consolation

22:36so your view is that you can actually

22:38search for the program that codes for

22:42let's say the code Spike protein at a

22:45certain amount with different stability

22:46characteristics with different immune

22:48reaction characteristics that doesn't

22:50need code a cold chain Logistics that

22:54condition of whatever cell type I'm

22:55saying in the future right not inceptive

22:57today but but that's the goal of all of

23:01the 10 to 630 variants that's right yeah

23:04and it's not certain I mean ultimately

23:06it's not going to be a search right just

23:07like today the output of an llm isn't

23:10coming out of a proper search procedure

23:12but it has to be a generation procedure

23:14exactly in the same in the same way and

23:17for the same reason as you basically see

23:19it in in large language models or image

23:21generation models but yeah that's

23:23exactly the goal because screening is

23:26just not going to cut it 10 to the 630th

23:28and that's really just one anti-gender

23:30that we're coding for there when we

23:32actually want to code for many and

23:33update those for any given for any given

23:35yep exactly when you do personalized

23:39it is going to be many antigens for each

23:42patient over time right and there's just

23:44no hope of basically tackling this with

23:47screening approaches at all yeah I'm

23:49excited to just get to the right answer

23:51without having to understand or discover

23:53every single mechanic and do the mass

23:55expensive screens we have today

23:57I mean that's really the big question

23:59are we here maybe at a Crossroads where

24:02the discovery and understanding is

24:05actually a hindrance the hope to

24:07discover and really get it how this

24:09works it might actually be holding us

24:11back and there is a pretty direct

24:13analogy to language understanding the

24:16computational Linguistics and

24:17Linguistics in general tried this for a

24:19while to develop a sufficiently accurate

24:21and complete theory of language to make

24:23this really actionable yeah when you

24:25talked about how Transformer model works

24:26for example actually was thinking about

24:27genomic sequencing where you used to do

24:29the sequential sequencing contact by

24:31contact and you'd have these big chunks

24:33of chromosomes each sequence through

24:34sequentially and then eventually you

24:35moved into an era where you just broke

24:38it up into tons and tons and tons of

24:39tiny little sequences that were randomly

24:41generated and then you'd reassemble it

24:42with the machine right and that felt

24:44like a very interesting parallel or

24:46analog to what you were talking about

24:47from a language perspective it's

24:48effectively the same thing it is exactly

24:51and and the parallels are so striking

24:53when they don't end there so yeah it's

24:55it's really really interesting to see

24:57and the the invariant that I feel just

25:01holds true across the board is

25:03that these formalisms that we make up in

25:06order to communicate our conceptual

25:08understanding our intuitive

25:09understanding than conceptualizing and

25:11explicitly is great for Education it is

25:14it's also great for many other types of

25:16maybe that reasoning about them it might

25:19actually because of our limited

25:20cognitive capabilities really not be the

25:23right tool to actually really predict

25:26what's going to happen with a given

25:28intervention yeah and I think the other

25:29point that I think really resonated in

25:31terms of what you mentioned was just if

25:32you look at drugs especially

25:34traditionally we actually didn't

25:35understand how most drugs worked until

25:38very recently and so aspirin we had no

25:40idea how it worked when it was you know

25:41taken out of the bark of a utree or

25:43whatever in the 1800s and it was fine

25:45like people were fine taking these

25:47things at minimal side effects yeah

25:48there's very popular drugs in the market

25:50like metformin that bind to multiple

25:52targets we still aren't sure exactly how

25:53they work and so a lot of the emphasis

25:55right now from a regulatory pathway for

25:57drugs is oh you need a mechanism a

25:58function or you need a proven pathway

26:00and all these things that create hurdles

26:02that don't necessarily help with drug

26:03efficacy and some of them might actually

26:05also be in a certain sense kind of I

26:09it's a waste of time and money if the

26:10thing works it works yes it's a waste of

26:13time and money and it might not even be

26:14true and we have no way of telling them

26:15yeah because in the end the ground

26:17troops is Right does it work and does it

26:20actually do more good than harm and it's

26:22empirical and yeah maybe there's really

26:25just maybe that should be the focus yeah

26:27and everything else should be treated as

26:29something that we should at least do

26:32the way we get the first I'm gonna take

26:35the first step in that historical frame

26:43actually or if we've discovered their

26:44mechanisms after the fact you know the

26:47end-to-end like black box like deep

26:50learning pipeline approach seems a

26:52little more rational a little less

26:53heretical which I think on upon first

26:56blush it certainly is controversial

26:59yeah I mean the the part that one can

27:01look at as Blasphemous is that now

27:03suddenly you don't know the theory

27:04anymore that you're testing right and

27:06you might never because it's not clear

27:08to us today as far as I can tell that if

27:11there is a theory in that black box

27:13today that we could get it out

27:15there are people trying I think it's

27:16worth trying I I'm not super optimistic

27:19about that I think it'll work for some

27:21cases right where it's simple enough

27:23that we can get it I think there are

27:25many cases where it just isn't all right

27:27let's say climate and weather

27:28forecasting I just don't think we're

27:31gonna get it we're gonna get it in the

27:32sense that we understand and I think we

27:34understand the Schrodinger equation and

27:36how that could be used interactively

27:38though in theory to just solve all these

27:41things but that's not practical and to

27:44develop a theory that is both the

27:47predictive and practical here might just

27:50not be something we can put in our heads

27:52yeah this is kind of interesting because

27:53I actually feel like this again is the

27:55basis of a lot of traditional drug

27:56Discovery from way back when as well as

27:57just the basis for how you think about

27:59genetic screens right you basically do

28:01functional screens so you'd mutagenize a

28:03bunch of organisms you'd look for output

28:05and then you say okay I've identified

28:07genes that are part of this pathway or

28:09output and I can map and somebody so

28:10they're interacting with each other but

28:11before molecular biology we actually

28:13didn't understand anything from a

28:14function that we just understood

28:16sequencing and output right and so

28:18it feels like deep learning is really

28:20just a throwback to other forms of

28:22biology that have been incredibly

28:23fruitful but just with a new sort of

28:25technology and modality to interrogate

28:27these systems exactly so how do you

28:28think about human augmentation in the

28:30context of all this stuff you know how

28:32bullish are you on human augmentation

28:33and what forms do you think it'll take

28:34in the near term I'm very bullish on

28:37human augmentation in the very long term

28:38but it's one that I don't see

28:40intuitively I think looking at our

28:44brains even just physically they seem to

28:47be very focused and this is not

28:49surprising on our IO

28:53why would there somewhere in there be

28:56some kind of computational capacity that

28:59if we just boosted our IO by a few

29:01orders of magnitude could still cope why

29:04would Evolution protector I don't know

29:06why and and so yes you could argue you

29:08know maybe to do long-term planning

29:10tasks and so on and so forth but sure

29:11right let's bound it a lifetime so right

29:14it's just not so clear whether there

29:16would have been any evolutionary

29:18pressures to really make our capacity

29:20there much bigger than say some

29:23multiplier basically time on our i o

29:26capacity if you look at the number of

29:28tokens that you use to train in llm

29:31and then you look at the number of

29:33tokens or words that are used to train a

29:35kid right a child a human baby or a

29:37human toddler I mean a human toddler is

29:40probably exposed to what hundreds of

29:42thousands maybe millions of words before

29:43they can speak like fluently but I think

29:48because we confuse fine tuning and

29:49pre-training pre-training is all of

29:51evolution sure and then basically you

29:54arrive at this thing that

29:55it's maybe doing something that's

29:57completely in a certain sense a

29:59completely irrelevant task at first but

30:01it has all the capacity in there to then

30:03with comparatively small amount of data

30:04maybe it's something between right but

30:06be then fine Beyond towards something

30:08that we would regard as oh so Advanced

30:10cognitively the compute has been

30:12amortized over the last several

30:14Millennia that's right yeah 10 millennia

30:17humans and we come pre-wired for

30:19language and so it only takes a million

30:21tokens at the end exactly and and now

30:23the thing is that you can now say okay

30:25great so we come to buyers rewire kids

30:27look at our wires and try to find

30:30that might not it might not be that

30:32simple right because of course it's this

30:34co-evolution and it's all fuzzy and so

30:36how much we're pre-wired for it how much

30:38language is in a certain sense also

30:39pre-wired for us it might be it might be

30:43the case that it's that it's maybe even

30:44impossible right to to actually yeah

30:47read out what it's pre-wired for from

30:50just looking at the wiring yeah you can

30:52see circumstances where people are

30:55literally born without a hemisphere of

30:57their brain or there's other sort of

30:58mass scale deficiencies brain wise and

31:01then things just rewire to effectively

31:03compensate and so you have parts of the

31:05brain taking over other other

31:06functionality that they're normally not

31:08designed for which is also fascinating

31:09because it seems like certain parts are

31:12extremely specialized visual cortex et

31:13cetera and then other parts are

31:14basically almost general purpose

31:16machines that can be reallocated I

31:18completely agree with what you're saying

31:19I feel general purpose machines is a

31:21really tricky term because right I mean

31:24could they could the brain after massive

31:27trauma rewire to do something very

31:28different fair I'm clear right so it

31:31could be that it's actually still

31:32specific but it is in a certain sense

31:35General namely preparing for a certain

31:38flavor of redundancy and this is also

31:41why I find AGI as a term particularly

31:43problematic because I don't know what

31:45the general means I think they're

31:46referring to General Tso's chicken as

31:48part of no I'm just sorry really dumb

31:55what's the like theory of data

31:58generation and inceptive right I feel

32:00like I understand the mission you

32:02describe and then like you need to go do

32:04wet lab experiments with observation to

32:07understand all the properties of these

32:10sequences and there's you have to figure

32:13out how to do that efficiently right so

32:14a young company with all your pedigree

32:16and resources so yes would love any

32:18intuition on that yeah so let me try to

32:19get across how we think about this so

32:21number one we look at ourselves actually

32:23as one anti-disciplinary team so it's

32:26not quite anti-disciplinary although

32:27there is a correlation maybe with a lack

32:29of discipline or disregard for

32:32fundamental discipline or disciplines

32:34and being anti-disciplinary but we think

32:37we're really in the sense pioneers of a

32:39new discipline doesn't have a name yet

32:41but it draws a lot from Deep learning

32:43and draws a lot from biology we think

32:46designing the experiments or assays that

32:49we're using to generate the data that we

32:51need to then train the models in a

32:55at the core of this discipline if you

32:58the experiments or the assets that we're

33:00running they use the models that we're

33:02training on the data that their

33:04predecessors actually produced and so

33:06really if you squint then in a certain

33:09sense I guess there was always this

33:11dream of and I think it's a pipe dream

33:13of having the cycle between

33:15experimentation and then you put that

33:17into some something in silicos something

33:19running on computers and then that

33:20informs the experiments and then you

33:22kind of iterate that cycle

33:24I think that's just it would be

33:27beautiful and simple and nice I don't

33:28think it's really that easy so what you

33:30see a detective is actually there's not

33:33that one cycle although maybe now

33:35somewhere hazely there actually is that

33:37cycle too but by Design actually there

33:40are tons of little Cycles so right you

33:42started an assay and the first thing you

33:44do is actually you query a neural

33:45network and then you do some stuff and

33:48then you get certain readouts and those

33:49you then together with some other stuff

33:51feed into yet another model and then

33:53that actually gives you parameters for

33:54some instrument and then you run that

33:56instrument on the stuff that you've

33:58created and so it's really just this

34:00kind of giant mess where the boundary

34:03actually is increasingly blurry and so

34:06we actually think that our work happens

34:07on the beach because that's where the

34:09wet and the dry meet in harmony

34:12and so initially folks joint inceptive

34:15and they usually most of them they come

34:18from say either in quotes side right

34:20they've spent most of their careers

34:21working on deep learning or maybe

34:26But ultimately it doesn't take them that

34:28long to start speaking some weird kind

34:32of Creole of all of these languages and

34:34also think in these ways and what then

34:36happens is Magic it's really amazing

34:39because then you suddenly find solutions

34:42to problems that say the biologists they

34:45were two years ago just wouldn't even

34:49and they work together with folks they

34:51would have otherwise maybe never even

34:52met and the results sometimes don't work

34:56at all but sometimes they really are

34:57magic that's a that's a really inspiring

34:59note to end on thanks Jacob thank you

35:03find us on Twitter at no Pryor's pod

35:06subscribe to our YouTube channel if you

35:08want to see our faces follow the show on

35:10Apple podcasts Spotify or wherever you

35:13listen that way you get a new episode

35:14every week and sign up for emails or

35:16find transcripts for every episode at no