No Priors Ep. 29 | With Inceptive CEO Jakob Uszkoreit

No Priors: AI, Machine Learning, Tech, & Startups2023-08-24

2K views|11 months ago

💫 Short Summary

Jakub Uszkoreit discusses the development and efficiency of the Transformer paper in deep learning, emphasizing language understanding and statistical properties. Hardware accelerators aid in efficient processing, with challenges in exploring alternative architectures. Optimizing performance systems and training models on generated data can lead to more efficient resource utilization. Anytime algorithms and elegant models are crucial for adapting to varying inputs and optimizing AI systems. The potential of deep learning in biology is explored, focusing on addressing gaps in knowledge and predicting theories. RNA technology has the potential to revolutionize medicine production and distribution, offering scalable solutions for personalized cancer vaccines and complex medicines. The discussion also touches on the regulatory pathway for drugs, human augmentation, brain rewiring, and the interdisciplinary approach to innovation.

✨ Highlights

📊 Transcript

✦

Jakub Uszkoreit discusses the development of the Transformer paper and the importance of efficiency in deep learning.

00:38

Uszkoreit emphasizes the need for efficiency to advance the field and make deep learning more effective.

He highlights the hierarchical nature of language understanding and the role of statistical properties in it.

Uszkoreit suggests that language evolution may have been optimized to exploit cognitive capacities efficiently.

This optimization allows for comprehension without the need to analyze signals sequentially from start to end.

✦

Efficiency of hardware in parallel simple computations for image processing.

02:51

Hardware excels at understanding pieces in parallel before combining them, aiding in disambiguating and clarifying information in a tree structure-like manner.

Implementation of the Transformer model leverages accelerators for superior performance compared to other architectures.

Challenges in exploring alternative architectures due to limited investment in scaling compute and excellent fit of accelerators.

Evaluating compatibility of different hardware and models could provide insights for future advancements in processing efficiency.

✦

Design of accelerators for large-scale Transformer architectures.

06:12

Emphasis on efficiency, raw architecture, and optimism in accelerator design.

Importance of human effort and suspension of disbelief in fueling innovation and community enthusiasm.

Examples like the MLP called mixer showcasing potential for improvement in accelerator designs.

Iterative process of trying different approaches and necessity of hard work for success in developing efficient hardware solutions.

✦

Optimization of performance systems through scaling compute based on problem length.

08:34

Efficient use of compute resources for complex problems like prime factorization is a challenge.

Training on generated data is discussed, emphasizing the trade-off between information gain and energy expenditure.

Retraining models on generated data can potentially amortize compute costs over time, leading to more efficient model training.

The focus is on maximizing performance while minimizing energy usage in computational systems.

✦

The importance of anytime algorithms in optimizing performance and resource utilization in AI systems.

12:14

Models should be able to adjust compute resources based on the complexity of the problem.

Elegant models are needed to handle various inputs efficiently, such as different resolutions, sampling rates, and durations.

Allocate compute resources wisely to avoid unnecessary usage, especially when the problem remains the same.

Elasticity and flexibility in models are crucial for optimizing performance and resource utilization in AI systems.

✦

Lack of techniques addressing certain concepts is considered wasteful, with attention given to adaptive Transformers and test-time search.

13:15

Test-time search is highlighted for code generation evaluations.

Efficiency improvements are needed to positively affect training time.

The Universal Transformer concept hasn't gained traction due to current limitations.

Applying machine learning to biology within the company Inceptive is explored, focusing on interesting problems in this intersection.

✦

The limitations of learning biology outside of school and the potential of deep learning at scale.

17:54

Deep learning can address gaps in knowledge and predictive theories needed in biology.

Protein folding demonstrates the practical limitations of current theories in biology.

Deep learning is a promising solution for treating biology as a black box and making observations at scale.

The speaker shares personal insights gained from the fragility of life through parenthood.

✦

The potential impact of RNA technology in structural biology and medicine.

18:10

RNA molecules, especially mRNA, are being studied for the development of improved medicines, such as vaccines for infectious diseases.

Hundreds of programs involving RNA technology are currently in development and are expected to grow in the future.

RNA technology has the potential to revolutionize medicine and become one of the leading revenue-generating modalities.

The comparison between mRNA vaccines and the possibilities of RNA technology emphasizes the need for advancements in biological software, drawing parallels to programming with bytecode.

✦

Advantages of printing proteins using self-amplifying RNA.

21:55

The method allows for the creation of complex medicines that can be manufactured and distributed at scale.

More scalable than traditional protein-based biologics, which often face production limitations.

Current RNA manufacturing and distribution infrastructure can produce billions of doses globally, with potential for rapid expansion.

Technology enables customization of proteins with different characteristics without the need for cold chain logistics, revolutionizing the production and distribution of medicines.

✦

Potential hindrance of discovery and understanding in the field of personalized cancer vaccines.

23:59

Extensive knowledge and discovery may be impeding progress, drawing parallels to language understanding and computational linguistics.

Need to shift from traditional sequential sequencing to a more efficient and randomized approach, similar to language processing.

Limitations of current drug understanding, using examples like aspirin and metformin.

Emphasis on the importance of conceptual formalisms for communication and education despite potential cognitive limitations.

✦

Importance of drug efficacy in regulatory pathway.

25:53

Prioritizing empirical evidence to determine if a drug is effective and safe.

Challenges of understanding complex theories in drug discovery.

Use of traditional drug discovery methods and genetic screens for functional screening.

Emphasis on the practicality of developing predictive and practical theories in drug development.

✦

Discussion on human augmentation in the context of deep learning.

28:27

Speaker expresses optimism for human augmentation in the long term.

Emphasis on understanding the relationship between language development and pre-wiring in humans.

Role of evolution, fine-tuning, and pre-training in cognitive advancement highlighted.

Exploration of connections between technology, biology, and cognition.

✦

The brain's ability to rewire itself to compensate for deficiencies or trauma is discussed.

31:01

Specialized parts of the brain can take over different functionalities.

The concept of general-purpose machines in the brain is highlighted, showcasing the ability to reallocate functions.

The term AGI (Artificial General Intelligence) is critiqued for its lack of specificity.

The team describes themselves as anti-disciplinary, combining deep learning and biology to pioneer a new discipline.

✦

Importance of Interdisciplinary Collaboration in Innovation.

34:00

Querying neural networks, performing tasks, obtaining readouts, feeding data into models, generating parameters, and running experiments are key processes.

Collaboration between individuals from different fields leads to previously unthinkable solutions.

Interdisciplinary approach results in both failed and successful outcomes, with some solutions described as magical.

Blurred boundaries between disciplines foster creativity and problem-solving.

00:00[Music]

00:05what would the world look like if we

00:07could create biological software that

00:08allows us to compile RNA that's the big

00:10question this week on the podcast Sarah

00:13and I are sitting down with jakub

00:15uskright co-founder and CEO of inceptive

00:18Jacob spent more than a decade at Google

00:19where he co-authored the attention is

00:21all you need paper and several other

00:23papers is set the foundation for today's

00:25AI Revolution he has also started and

00:27led the research teams that transform

00:29Google search Google Translate and

00:30Google Assistant now at inceptive he

00:33builds biological software with aim to

00:35make widely accessible medicines and

00:36biotechnologies Jakob welcome to no

00:38priors thank you thank you for having me

00:42um you worked at Google for more than a

00:43decade working on many leading research

00:45teams you were really seminal in the

00:47original Transformer paper and I think

00:49um when you know when I talk to the

00:50other authors of the Transformer paper

00:51people sort of in the Google you're

00:53widely credited with really coming up

00:55with the idea of focusing on attention

00:57which was sort of the basis for the

00:58attention is all you need paper could

01:00you talk a little bit more about

01:01how you came up with that and how the

01:03team started working on it and sort of

01:04the origins of that pretty foundational

01:06breakthrough in terms of the Transformer

01:09it's really not that simple right it's

01:11also really important to keep in mind

01:13that it always in deep learning you

01:15can't make something

01:16in quotes really work that is maybe

01:20pretty far I would say the theoretical

01:22or formal end without really going deep

01:25on the engineering implementation side

01:27and it just has to be efficient at the

01:29end of the day in my mind that's the one

01:30and only thing we know really works if

01:32you want to push deep learning forward

01:34is to make it faster and more effective

01:36and more efficient on the given piece of

01:37Hardware there's a lot of evidence that

01:41the way we actually understand language

01:43and that's something that then shapes

01:46language in terms of its statistical

01:47properties is actually it's someone

01:49hierarchical and the best piece of kind

01:52of just circumstantial or anecdotal

01:53evidence for that is just looking at

01:55what the linguists do right they draw

01:57these trees and while I don't think that

01:59they're ever really true

02:01they're also definitely not always false

02:04and so they do capture some of the

02:06statistics that are inherent in language

02:07and probably language was actually

02:09evolved this way in order to exploit our

02:12cognitive capacities really in a fairly

02:14optimal way and so you can safely assume

02:18that it is not necessary to go through

02:21the entirety of a sequential signal

02:24beginning to end and maybe also end to

02:26the beginning simultaneously in order to

02:28understand it but actually you can gain

02:30a lot of the understanding in air quotes

02:33by looking at individual groups

02:36of say your signal right and ultimately

02:39if you now are given a piece of Hardware

02:42that has the very key strength of doing

02:44lots and lots of simple computations in

02:46parallel as opposed to complicated

02:48structured computations sequentially

02:51then really that's actually a kind of

02:53statistical property you really want to

02:55exploit right you want to in parallel

02:57understand pieces of an image first and

02:59then maybe that's not possible in its

03:01entirety but you can actually get a lot

03:03of it and then only once you've done

03:05some of that you put these incomplete

03:07understandings or representations

03:09together and as you put them together

03:10more and more that's when you

03:12disambiguate the last remaining or

03:15that's when you get rid of the last

03:16remaining ambiguity at the end of the

03:17day and when you think about what that

03:19process looks like it's a tree

03:21and when you think about how you would

03:23actually run something that evaluates

03:25all possible trees then a reasonable

03:28approximation is that you repeat an

03:30operation where you look at all

03:32combinations of things that's the

03:34quadratic step right that ultimately is

03:37the core of this attention step and then

03:39you effectively pull information in for

03:41a given representation of a given piece

03:43the other representations of all the

03:44other pieces and rinse and repeat and it

03:48seems intuitive and also seems

03:50intuitively clear that that's a really

03:52good fit for the kind of accelerators

03:54that we had at the time that we still

03:55have today and so that's really where

03:57that idea came from if you want to look

03:59at say the biggest difference is for

04:00example between the Transformer as it

04:02was described in the attention is all

04:04you need paper and some of its ancestors

04:06like this decomposable attention model

04:07the big difference is just that the

04:09Transformer was implemented by folks

04:10like normal Industries Etc but in a way

04:13that's such an excellent fit for the

04:15accelerators that we had at the time so

04:16one question that I've kind of heard

04:17people bring up is a lot of the behavior

04:19that we've seen in Transformers to some

04:21extent is most interesting at scale

04:22right you get interesting immersion

04:24properties yeah and there may be other

04:26architectures that have equally

04:28interesting or perhaps more interesting

04:30properties at scale but there's sort of

04:32two impediments number one is people

04:33just aren't throwing a lot of money in

04:35compute at it and two is that under

04:37accelerator architecture actually fits

04:39so well it is dramatically less

04:41performant to do other architectures and

04:42therefore you may never actually test

04:43them

04:44do you think that's a true statement I

04:46think that the big question is does it

04:48matter it would be really interesting to

04:49evaluate especially if we can make them

04:51simpler to evaluate combinations of

04:55different hardware and then models or

04:57architectures that fit like gloves to

04:58those and I feel at the moment given

05:02where gpus came from they weren't Built

05:05For This

05:06right

05:07why would it be that they are anywhere

05:10near optimal

05:11if at least they were engineered for

05:13this purpose and lots of people

05:14basically banged their head against

05:15walls until they had this kind of

05:17somewhat optimized but that's not how

05:19the basic architecture came to be and so

05:22you can talk a lot about and reason a

05:24lot about and I think that some of that

05:25is true the generality of basically

05:27really fast scalable Matrix multipliers

05:29and how that just does everything in

05:31scientific Computing really well sure

05:33but there's still lots of bells and

05:35whistles and there are lots of specific

05:36trade-offs say for example things like

05:39memory bandwidth and ultimately inherent

05:42parallelism versus latency

05:45don't think gpus are at The Sweet Spot

05:47when it comes to large-scale deep

05:49learning with respect to exactly those

05:51trade-offs and so it may very well be

05:53that if we actually try these

05:56combinations we might actually even

05:59quickly find something that's better

06:01when you think about how we get progress

06:03from here usually people think of

06:06software as driving the hardware

06:09right do you think we get accelerators

06:12designed for the large-scale Transformer

06:15architectures we already have or new

06:17Hardware Designs like it's chicken or

06:19egg a little bit here it's a chicken and

06:22egg and if you look at the newest

06:24accelerator designs they are taking this

06:26into account to a significant extent

06:28actually increasingly so there are a

06:32couple of interesting examples we had a

06:33computer vision architecture that really

06:35was just an MLP called mixer and while

06:38it wasn't significantly better it also

06:41wasn't significantly worse than the

06:43vision Transformers right and I think

06:45that already goes to show it's not that

06:46difficult and especially if you simplify

06:48on the way it might really be a

06:50possibility I will say one other thing

06:51aside from efficiency but just really

06:54raw efficiency in terms of its thick

06:57architecture is fit to the accelerator

06:59Hardware

07:00the other main contributor I think to

07:02the success of this architecture is

07:04optimism and hope so suddenly you were

07:07in a situation where for whatever reason

07:10a bunch of things that people tried with

07:13this started to work and then more

07:15started to work and that's not

07:17coincidence it's really just because

07:18ultimately the human Cycles invested to

07:22getting all these things all these

07:23diverse things to work are ultimately

07:25fueled by suspension of disbelief and

07:28AKA hope or whatever you want to call it

07:30and and that really I mean the community

07:33became so energized so quickly and then

07:36just try everything under the sun and

07:38because if the prior was just a

07:39different one the fire now was oh look

07:41we have this thing where it just works

07:43which is just not true the reality is

07:45that you try something else the first

07:46time and you really have to work hard

07:47for a long period of time and then lo

07:50and behold sometimes it works and if you

07:52do that many more times then it will

07:53work many more times and I think that's

07:56really what we're seeing where do you

07:58think people should invest that sort of

07:59optimism going forward like what are the

08:01big areas that people need to work on to

08:03increase the performance of these

08:05systems or add memory or do other things

08:07that you feel are a few if you were to

08:09sort of paint the road map going ahead

08:10in terms of making these really valuable

08:12performance systems what would you focus

08:14on I mean I think there's one thing that

08:16still boggles my mind in terms of just

08:18from first principles it can't be

08:20optimal and that is that if you think

08:22about it the way you today scale the

08:25compute that's invested in a given

08:27problem right let's say the problem is

08:30what's the response to a prompt in some

08:32large language model

08:34then ultimately the way you scale that

08:36compute depends on the prompt and how

08:39much how long that is the longer the

08:41problem the more compute you get and it

08:43depends on and there's of course many

08:45different screws to tweak here the

08:47length of the response there are many

08:49very hard problems where the response is

08:50incredibly short

08:52and you can in many cases actually

08:54formulate those problems very very

08:55succinctly so you're not going to be

08:57using a lot of compute even though the

08:59problem we know is really really

09:00difficult say I don't know Prime

09:01factoring prime factorization a problem

09:04like that simply stated big potential

09:06impact and right right now there's no

09:10knob that you can easily tweak as a user

09:12but also really there's no knob that the

09:15architecture can tweak itself when it

09:17comes to then basically deciding oh this

09:20is hard I actually need to use more

09:22compute tools

09:23and ironically and this comes back to a

09:26question that many people ask I think

09:28around does it make any sense to train

09:30on generated data

09:32because information Theory founding

09:35information Theory very clearly says

09:36nope you're not going to get more

09:38information out of it you can do it all

09:39you want

09:40but there is an artifact or there's an

09:43emission maybe even in that information

09:45in that flavor of information Theory

09:47which is it doesn't take into account

09:49compute it doesn't take that into

09:50account actually the energy expenditure

09:53necessary to generate that data so if

09:55you now think back to these problems

09:56right if you were to just let

09:59llms run generate stuff and then train

10:02new llms or even the same llm actually

10:05on that output what you do is you

10:07amortize compute that was expended at

10:09some point in time and so now suddenly

10:11right you actually have models that if

10:13you retrain them over and over again

10:15they're starting to spend more compute

10:17on the same problems but it's amortized

10:20over all of these iterations effectively

10:21of the system and that seems clunky that

10:24just seems so clunky that ultimately it

10:27should be something where at inference

10:30time at runtime the model effectively

10:32can decide or

10:34or maybe even query right so there's

10:36this notion of anytime algorithms where

10:38it might just depend on your resources

10:40if you have more time or more money then

10:42let it run longer but you don't want

10:45that to happen in cases where the answer

10:46or the quest the problem in question is

10:49simple you only want to do that in cases

10:51where it's actually hard and that right

10:53now doesn't work because if you pose a

10:56very very simple problem like

10:58two plus two to gpd4 right now and you

11:00write that in a very long-winded way in

11:03a prompt and you ask dpd4 to generate a

11:07very complicated answer right then it

11:09will actually expend a ton of compute to

11:12add to to two which makes no sense so

11:16that I mean out of all the different

11:18problems that I currently see in kind of

11:20at a high level because it's not clear

11:22how you would exactly address it that is

11:24maybe the one that boggles my mind most

11:26yeah are there other big research areas

11:29that you're excited about right now or

11:30areas where you see enormous progress

11:32being made so in terms of Foundations I

11:35think

11:36different flavors of elasticity

11:39are really interesting so

11:42you could actually claim that a lot of

11:45these questions boil down to the

11:47question that I just or to basically

11:49this problem that I just uh described

11:51right that computers in a certain sense

11:53very crudely allocated but you can look

11:55at different incarnations of this

11:56problem so another one would be why

11:59don't we have models that in an elegant

12:01way manage to consume say visual sensor

12:05output of different resolutions

12:08different sampling rates different

12:09durations right right now it's actually

12:12quite tricky to have other than maybe

12:14recurrent architectures a model that

12:15takes videos of different links

12:17different image resolutions or

12:19ultimately different densities if you

12:21wish in different sizes and and really

12:24elegantly adjusts compute to what you

12:28really want to know about this or the

12:29how difficult it really has to be to

12:31generate the representations that you

12:33that you need in order to do whatever

12:34you want to do and here again an example

12:36that makes this I think pretty clear is

12:39you can take a video you can scale it up

12:41you can frame it with trivial algorithms

12:43and then run it again and if the problem

12:46you're trying to solve conditions on

12:49that video is the same then I wouldn't

12:50want more computers to use but right now

12:52that's what's going to happen you're

12:54going to use a ton more computers and so

12:56effectively these types of in a certain

12:59sense kind of elasticity or or your

13:02flexibility of these models I believe

13:04our lack of techniques addressing those

13:07ultimately is incredibly wasteful I've

13:10seen increasing attention around like

13:11two different concepts in these General

13:15directions one is um I think it's a I

13:17think some people at meta that did depth

13:18adaptive Transformers right so just

13:21adjusting the amount of computation for

13:23each input and like a prediction on that

13:25right and then I don't know how much

13:27more work has gone in that direction and

13:29then I think a number of people are more

13:31excited about doing test time search

13:33especially for problems like like code

13:36generation where you can evaluate with

13:38compilation or something to sort of get

13:40Loop of loop of success in in the model

13:43itself I think it's super effective in

13:46custom search

13:47I do think it's clunky

13:49because it's not something that you can

13:51easily enter and optimize so right

13:53basically this is also what I'm what I

13:55was trying to get at a little bit maybe

13:56with with saying well some of these

13:58efficiency improvements that were not

14:00yet yeah really harnessing I believe

14:02would dramatically affect training time

14:04and if you look at kind of how test time

14:05search actually affects training

14:08it's just in the clinic and I don't

14:11think we'll be able to optimize it as as

14:13well although as an engineering in a

14:15certain sense

14:16I don't know hack could sound negative

14:18that's not what I mean and I think it's

14:20it's an awesome hack as an engineering

14:22hack around this problem it's really

14:24really effective it basically comes back

14:26to this whole idea of amortizing compute

14:27in a certain sense right with the stuff

14:29you already have lying around and

14:30memorized even though it was the humans

14:32that actually put it there in many cases

14:33in terms of adaptive adaptive time

14:36Transformers Etc we tried this Universal

14:39Transformer thing actually a long time

14:40ago

14:41it just hasn't caught on and that's

14:43because it just doesn't work right at

14:45this point it doesn't work well enough

14:46it's not like it doesn't work at all but

14:48if it worked really well then

14:51because of the fact that compute right

14:53now is this incredibly scarce resource

14:57we would see it everywhere and and I

15:00think what that tells us is and I don't

15:02think here it's really just for a lack

15:03of trying a couple years too little

15:04experimentation but at least those known

15:07or or proposed methods here they just

15:10don't work well enough yet

15:12so one thing that you've been working on

15:13for the last few years is inceptive

15:15which is really starting to focus on how

15:16can you apply machine learning and

15:19different aspects of software to biology

15:21could you share a little bit about the

15:23company how you got interested in BIO

15:25and what you view some of the

15:27interesting problems there

15:29Yeah so basically I've always been

15:30interested in BIO and know nothing about

15:32it and that's a conundrum because it's

15:34difficult to learn a lot about biology

15:35when you're not in school and I didn't

15:37want to go back to school but at the

15:39same time it always felt like something

15:40where there's a lot of Headroom in terms

15:43of efficiency and actually also where

15:47maybe even

15:49alternative approaches at least if what

15:51you are interested in is really solving

15:53acute problems where there's maybe a

15:55dire need for alternative approaches

15:57alternative to basically biology designs

16:00that is trying to develop a complete

16:02conceptual understanding of how life

16:04works

16:05I don't have very high hopes for

16:07Humanity to develop that complete

16:08conceptual understanding to the level

16:10that we would need in order to do all

16:11the interferences we want to do we don't

16:12really have great Tools in our toolbox

16:15or we didn't have them until somewhat

16:17recently as alternatives to

16:19understanding how it works and then

16:21basically based on that understanding

16:22fixing it if it needs fixing and I think

16:25now we have an alternative that's an

16:27extremely good match and that's deep

16:28learning at scale we're really

16:30we can potentially to a pretty large

16:34extent if not entirely whatever this

16:36means work around the following two

16:39problems number one is we don't know all

16:41the stuff that's going on

16:42in life right so we still just don't

16:45even have a complete inventory let alone

16:47really understand all the mechanisms

16:49and number two we ultimately even for

16:53the stuff that we do know

16:54so far haven't really in many cases been

16:58able to come up with yes sufficiently

17:00predictive theories to really make that

17:03understanding useful a concrete example

17:04here is protein folding right where

17:06basically even if you just act as if

17:08there are no chaperones there is no

17:10other stuff in this environment in which

17:13folding or or whatever you want to call

17:15it in which that process in which kind

17:17of the earliest kinetic steering

17:19translation happen even if you make that

17:21massively simplifying assumption

17:24the theory just wasn't practical and it

17:29seems like deep learning is at least

17:31potentially a really good answer to both

17:32of those aspects because you can

17:34basically treat everything in quotes as

17:36a black box and as long as you are able

17:38to observe that black box in terms of

17:39whatever

17:41input output fast enough and a

17:44sufficient scale you might go somewhere

17:46with that so inceptive is pretty

17:48stealthy is there anything you can share

17:49in terms of how you're applying deep

17:51learning or other techniques to to

17:53biology in the context of the company

17:54yep my daughter was born my first child

17:57and just that entire process gave me a

18:01really fundamentally different

18:02appreciation for the fragility of life

18:04and a really wonderful one but also a

18:06pretty fundamentally different one and

18:08so here we are we have this new tool

18:10namely episode 2 that solves one of

18:12these fundamental problems in structural

18:14biology we have instances of a

18:17macromolecule family that's basically

18:19about to save the world

18:21and I basically want to fix life because

18:24I now have this wonderful daughter it

18:26became clear that using the exact rules

18:29we have been working on at Google before

18:31and applying those to this neglected

18:33stepchild namely RNA or more

18:35specifically at first mRNA could have

18:37massive impact on the world and

18:38ultimately

18:40what we're trying to do is

18:42to design better RNA and at first mRNA

18:46molecules for a pretty broad variety of

18:49different medicines

18:51infectious disease vaccines are I guess

18:54maybe the obvious first example given

18:56the covet vaccines

18:57but if you look at the pipelines of

19:00moderna and biontek and all those

19:02companies the at least potential

19:04applicability of RNA mRNA more

19:08specifically is is near Limitless

19:10there's already now hundreds of programs

19:12underway in different stages of

19:14development that number is expected to

19:17climb hitting High triple digits before

19:20the end of the decade and now we're

19:22talking about a modality that might end

19:25up

19:25before the end of the decade being the

19:28second or third biggest modality in

19:31terms of Revenue and potentially also in

19:32terms of impact and if you now take that

19:36in terms of just trajectory and look at

19:39how sub-optimal in a certain sense the

19:42MRNA vaccines were when you compare it

19:45to what's possible using RNA just

19:47looking around in nature looking at how

19:49severe the side effects were for what

19:51fraction of ultimately patients that

19:53received the vaccines have few people

19:55comparatively really had access to any

19:58of those vaccines when they really were

20:00necessary and needed and it seems like

20:02currently if we look around in our

20:04toolkit the only tool we have to

20:06potentially change that quickly is deep

20:08learning

20:09so an incentive we think of this now as

20:13something that you could call biological

20:15software where mRNA and RNA in general

20:18is maybe the the equivalent to bytecode

20:22that then forms the substrate forms like

20:25the actual stuff that the software is

20:27made of and what you do is you learn

20:30models that allow you to translate

20:33biological programs

20:35programs that might look like some bit

20:38of python code that specify what you

20:40want a certain medicine to do inside

20:42yourself inside yourselves and translate

20:45those programs compile them into

20:47descriptions of RNA molecules that then

20:50hopefully actually do what you wrote

20:53what you programmed them to do

20:55and ultimately right now if you look at

20:57mRNA vaccines our programming language

20:59is just a print statement right just

21:01print this protein but you can easily

21:03imagine that with self-amplifying RNA as

21:06one example and with ribose which is a

21:08so-called ribose which is basically RNA

21:10is that change dramatically in structure

21:12or self-destruct in the presence of say

21:14given small molecular so you can

21:16effectively have conditionals you can

21:18have recursion and as a computer

21:19scientist you squint and you're like oh

21:21wow okay this is basically touring

21:22complete you have semio and you kind of

21:25have all sorts of tools now at your

21:27disposal to really build very very

21:29complex ultimately medicines that then

21:32might also be produced manufactured and

21:35distributed in a way that is much more

21:37scalable than anything that we've been

21:39able to do so far protein-based

21:40biologics oftentimes don't make it to

21:42the market because it's just not

21:44possible to manufacture them at scale if

21:47we wanted to medicate everybody in the

21:48world with all the protein-based

21:50biologics that they could actually that

21:52they should actually receive the real

21:53estate on the planet wouldn't be enough

21:55to make all the stuff

21:57but right now if you look at RNA

21:59manufacturing and distribution

22:00infrastructure we're going to have six

22:02to eight billion doses two years from

22:03now manufacturable and distributable

22:06across the globe and that number is

22:08going to go up really really quickly at

22:10inceptive right now in our lab we can

22:11actually print pretty much any given our

22:14name

22:15and that's just something you can't do

22:16with small molecules you can't easily do

22:18with proteins certainly not at scale and

22:20that's not something that only matters

22:22when you have a product in your hand if

22:24you want to treat this as a machine

22:25learning problem you need to generate

22:27training data it doesn't already exist

22:28and so you also really want to have

22:29scalable synthesis and Manufacturing

22:32which is unprecedented as a consolation

22:36so your view is that you can actually

22:38search for the program that codes for

22:42let's say the code Spike protein at a

22:45certain amount with different stability

22:46characteristics with different immune

22:48reaction characteristics that doesn't

22:50need code a cold chain Logistics that

22:54condition of whatever cell type I'm

22:55saying in the future right not inceptive

22:57today but but that's the goal of all of

23:01the 10 to 630 variants that's right yeah

23:04and it's not certain I mean ultimately

23:06it's not going to be a search right just

23:07like today the output of an llm isn't

23:10coming out of a proper search procedure

23:12but it has to be a generation procedure

23:14exactly in the same in the same way and

23:17for the same reason as you basically see

23:19it in in large language models or image

23:21generation models but yeah that's

23:23exactly the goal because screening is

23:26just not going to cut it 10 to the 630th

23:28and that's really just one anti-gender

23:30that we're coding for there when we

23:32actually want to code for many and

23:33update those for any given for any given

23:35yep exactly when you do personalized

23:38cancer vaccines

23:39it is going to be many antigens for each

23:42patient over time right and there's just

23:44no hope of basically tackling this with

23:47screening approaches at all yeah I'm

23:49excited to just get to the right answer

23:51without having to understand or discover

23:53every single mechanic and do the mass

23:55expensive screens we have today

23:57I mean that's really the big question

23:59are we here maybe at a Crossroads where

24:02the discovery and understanding is

24:05actually a hindrance the hope to

24:07discover and really get it how this

24:09works it might actually be holding us

24:11back and there is a pretty direct

24:13analogy to language understanding the

24:16computational Linguistics and

24:17Linguistics in general tried this for a

24:19while to develop a sufficiently accurate

24:21and complete theory of language to make

24:23this really actionable yeah when you

24:25talked about how Transformer model works

24:26for example actually was thinking about

24:27genomic sequencing where you used to do

24:29the sequential sequencing contact by

24:31contact and you'd have these big chunks

24:33of chromosomes each sequence through

24:34sequentially and then eventually you

24:35moved into an era where you just broke

24:38it up into tons and tons and tons of

24:39tiny little sequences that were randomly

24:41generated and then you'd reassemble it

24:42with the machine right and that felt

24:44like a very interesting parallel or

24:46analog to what you were talking about

24:47from a language perspective it's

24:48effectively the same thing it is exactly

24:51and and the parallels are so striking

24:53when they don't end there so yeah it's

24:55it's really really interesting to see

24:57and the the invariant that I feel just

25:01holds true across the board is

25:03that these formalisms that we make up in

25:06order to communicate our conceptual

25:08understanding our intuitive

25:09understanding than conceptualizing and

25:11explicitly is great for Education it is

25:14it's also great for many other types of

25:16maybe that reasoning about them it might

25:19actually because of our limited

25:20cognitive capabilities really not be the

25:23right tool to actually really predict

25:26what's going to happen with a given

25:28intervention yeah and I think the other

25:29point that I think really resonated in

25:31terms of what you mentioned was just if

25:32you look at drugs especially

25:34traditionally we actually didn't

25:35understand how most drugs worked until

25:38very recently and so aspirin we had no

25:40idea how it worked when it was you know

25:41taken out of the bark of a utree or

25:43whatever in the 1800s and it was fine

25:45like people were fine taking these

25:47things at minimal side effects yeah

25:48there's very popular drugs in the market

25:50like metformin that bind to multiple

25:52targets we still aren't sure exactly how

25:53they work and so a lot of the emphasis

25:55right now from a regulatory pathway for

25:57drugs is oh you need a mechanism a

25:58function or you need a proven pathway

26:00and all these things that create hurdles

26:02that don't necessarily help with drug

26:03efficacy and some of them might actually

26:05also be in a certain sense kind of I

26:08should say

26:09it's a waste of time and money if the

26:10thing works it works yes it's a waste of

26:13time and money and it might not even be

26:14true and we have no way of telling them

26:15yeah because in the end the ground

26:17troops is Right does it work and does it

26:20actually do more good than harm and it's

26:22empirical and yeah maybe there's really

26:25just maybe that should be the focus yeah

26:27and everything else should be treated as

26:29something that we should at least do

26:31after

26:32the way we get the first I'm gonna take

26:35the first step in that historical frame

26:38we don't actually

26:43actually or if we've discovered their

26:44mechanisms after the fact you know the

26:47end-to-end like black box like deep

26:50learning pipeline approach seems a

26:52little more rational a little less

26:53heretical which I think on upon first

26:56blush it certainly is controversial

26:59yeah I mean the the part that one can

27:01look at as Blasphemous is that now

27:03suddenly you don't know the theory

27:04anymore that you're testing right and

27:06you might never because it's not clear

27:08to us today as far as I can tell that if

27:11there is a theory in that black box

27:13today that we could get it out

27:15there are people trying I think it's

27:16worth trying I I'm not super optimistic

27:19about that I think it'll work for some

27:21cases right where it's simple enough

27:23that we can get it I think there are

27:25many cases where it just isn't all right

27:27let's say climate and weather

27:28forecasting I just don't think we're

27:31gonna get it we're gonna get it in the

27:32sense that we understand and I think we

27:34understand the Schrodinger equation and

27:36how that could be used interactively

27:38though in theory to just solve all these

27:41things but that's not practical and to

27:44develop a theory that is both the

27:47predictive and practical here might just

27:50not be something we can put in our heads

27:52yeah this is kind of interesting because

27:53I actually feel like this again is the

27:55basis of a lot of traditional drug

27:56Discovery from way back when as well as

27:57just the basis for how you think about

27:59genetic screens right you basically do

28:01functional screens so you'd mutagenize a

28:03bunch of organisms you'd look for output

28:05and then you say okay I've identified

28:07genes that are part of this pathway or

28:09output and I can map and somebody so

28:10they're interacting with each other but

28:11before molecular biology we actually

28:13didn't understand anything from a

28:14function that we just understood

28:16sequencing and output right and so

28:18it feels like deep learning is really

28:20just a throwback to other forms of

28:22biology that have been incredibly

28:23fruitful but just with a new sort of

28:25technology and modality to interrogate

28:27these systems exactly so how do you

28:28think about human augmentation in the

28:30context of all this stuff you know how

28:32bullish are you on human augmentation

28:33and what forms do you think it'll take

28:34in the near term I'm very bullish on

28:37human augmentation in the very long term

28:38but it's one that I don't see

28:40intuitively I think looking at our

28:44brains even just physically they seem to

28:47be very focused and this is not

28:49surprising on our IO

28:51and

28:53why would there somewhere in there be

28:56some kind of computational capacity that

28:59if we just boosted our IO by a few

29:01orders of magnitude could still cope why

29:04would Evolution protector I don't know

29:06why and and so yes you could argue you

29:08know maybe to do long-term planning

29:10tasks and so on and so forth but sure

29:11right let's bound it a lifetime so right

29:14it's just not so clear whether there

29:16would have been any evolutionary

29:18pressures to really make our capacity

29:20there much bigger than say some

29:23multiplier basically time on our i o

29:26capacity if you look at the number of

29:28tokens that you use to train in llm

29:31and then you look at the number of

29:33tokens or words that are used to train a

29:35kid right a child a human baby or a

29:37human toddler I mean a human toddler is

29:40probably exposed to what hundreds of

29:42thousands maybe millions of words before

29:43they can speak like fluently but I think

29:47that's

29:48because we confuse fine tuning and

29:49pre-training pre-training is all of

29:51evolution sure and then basically you

29:54arrive at this thing that

29:55it's maybe doing something that's

29:57completely in a certain sense a

29:59completely irrelevant task at first but

30:01it has all the capacity in there to then

30:03with comparatively small amount of data

30:04maybe it's something between right but

30:06be then fine Beyond towards something

30:08that we would regard as oh so Advanced

30:10cognitively the compute has been

30:12amortized over the last several

30:14Millennia that's right yeah 10 millennia

30:17humans and we come pre-wired for

30:19language and so it only takes a million

30:21tokens at the end exactly and and now

30:23the thing is that you can now say okay

30:25great so we come to buyers rewire kids

30:27look at our wires and try to find

30:29language

30:30that might not it might not be that

30:32simple right because of course it's this

30:34co-evolution and it's all fuzzy and so

30:36how much we're pre-wired for it how much

30:38language is in a certain sense also

30:39pre-wired for us it might be it might be

30:43the case that it's that it's maybe even

30:44impossible right to to actually yeah

30:47read out what it's pre-wired for from

30:50just looking at the wiring yeah you can

30:52see circumstances where people are

30:55literally born without a hemisphere of

30:57their brain or there's other sort of

30:58mass scale deficiencies brain wise and

31:01then things just rewire to effectively

31:03compensate and so you have parts of the

31:05brain taking over other other

31:06functionality that they're normally not

31:08designed for which is also fascinating

31:09because it seems like certain parts are

31:12extremely specialized visual cortex et

31:13cetera and then other parts are

31:14basically almost general purpose

31:16machines that can be reallocated I

31:18completely agree with what you're saying

31:19I feel general purpose machines is a

31:21really tricky term because right I mean

31:24could they could the brain after massive

31:27trauma rewire to do something very

31:28different fair I'm clear right so it

31:31could be that it's actually still

31:32specific but it is in a certain sense

31:35General namely preparing for a certain

31:38flavor of redundancy and this is also

31:41why I find AGI as a term particularly

31:43problematic because I don't know what

31:45the general means I think they're

31:46referring to General Tso's chicken as

31:48part of no I'm just sorry really dumb

31:50joke

31:52I'm sorry

31:55what's the like theory of data

31:58generation and inceptive right I feel

32:00like I understand the mission you

32:02describe and then like you need to go do

32:04wet lab experiments with observation to

32:07understand all the properties of these

32:10sequences and there's you have to figure

32:13out how to do that efficiently right so

32:14a young company with all your pedigree

32:16and resources so yes would love any

32:18intuition on that yeah so let me try to

32:19get across how we think about this so

32:21number one we look at ourselves actually

32:23as one anti-disciplinary team so it's

32:26not quite anti-disciplinary although

32:27there is a correlation maybe with a lack

32:29of discipline or disregard for

32:32fundamental discipline or disciplines

32:34and being anti-disciplinary but we think

32:37we're really in the sense pioneers of a

32:39new discipline doesn't have a name yet

32:41but it draws a lot from Deep learning

32:43and draws a lot from biology we think

32:46designing the experiments or assays that

32:49we're using to generate the data that we

32:51need to then train the models in a

32:53certain sense is

32:55at the core of this discipline if you

32:57wish because

32:58the experiments or the assets that we're

33:00running they use the models that we're

33:02training on the data that their

33:04predecessors actually produced and so

33:06really if you squint then in a certain

33:09sense I guess there was always this

33:11dream of and I think it's a pipe dream

33:13of having the cycle between

33:15experimentation and then you put that

33:17into some something in silicos something

33:19running on computers and then that

33:20informs the experiments and then you

33:22kind of iterate that cycle

33:24I think that's just it would be

33:27beautiful and simple and nice I don't

33:28think it's really that easy so what you

33:30see a detective is actually there's not

33:33that one cycle although maybe now

33:35somewhere hazely there actually is that

33:37cycle too but by Design actually there

33:40are tons of little Cycles so right you

33:42started an assay and the first thing you

33:44do is actually you query a neural

33:45network and then you do some stuff and

33:48then you get certain readouts and those

33:49you then together with some other stuff

33:51feed into yet another model and then

33:53that actually gives you parameters for

33:54some instrument and then you run that

33:56instrument on the stuff that you've

33:58created and so it's really just this

34:00kind of giant mess where the boundary

34:03actually is increasingly blurry and so

34:06we actually think that our work happens

34:07on the beach because that's where the

34:09wet and the dry meet in harmony

34:12and so initially folks joint inceptive

34:15and they usually most of them they come

34:18from say either in quotes side right

34:20they've spent most of their careers

34:21working on deep learning or maybe

34:24robotics or biology

34:26But ultimately it doesn't take them that

34:28long to start speaking some weird kind

34:32of Creole of all of these languages and

34:34also think in these ways and what then

34:36happens is Magic it's really amazing

34:39because then you suddenly find solutions

34:42to problems that say the biologists they

34:45were two years ago just wouldn't even

34:48think about

34:49and they work together with folks they

34:51would have otherwise maybe never even

34:52met and the results sometimes don't work

34:56at all but sometimes they really are

34:57magic that's a that's a really inspiring

34:59note to end on thanks Jacob thank you

35:03find us on Twitter at no Pryor's pod

35:06subscribe to our YouTube channel if you

35:08want to see our faces follow the show on

35:10Apple podcasts Spotify or wherever you

35:13listen that way you get a new episode

35:14every week and sign up for emails or

35:16find transcripts for every episode at no

35:18dashpriers.com

35:20[Music]

🎥 Related Videos

What vaccinating vampire bats can teach us about pandemics | Daniel Streicker

a16z Podcast | Things Come Together -- Truths about Tech in Africa

2024 TSCRS Applications of anterior segments diagnostic instruments in cataract surgery

a16z Podcast | The Infrastructure of Total Health

The Robot Lawyer Resistance with Joshua Browder of DoNotPay

NES Controllers Explained

🔥 Recently Summarized Examples

4 Steps to Master Any Complex Skill (quickly)

40 Years of Fitness Experience in Less Than 11 Minutes.

Gun Controlling Media Makes FATAL Mistake... They Have Tied Their Fate To Biden's & Gun Rights Win

GET READY! Palantir Is Officially The Next Nvidia.

Abundant Thinking: The Hidden Key to Get Everything You Want (Audiobook)

The Coming Demonic Invasion (Revelation 9:12–21)

View original video