00:05what would the world look like if we
00:07could create biological software that
00:08allows us to compile RNA that's the big
00:10question this week on the podcast Sarah
00:13and I are sitting down with jakub
00:15uskright co-founder and CEO of inceptive
00:18Jacob spent more than a decade at Google
00:19where he co-authored the attention is
00:21all you need paper and several other
00:23papers is set the foundation for today's
00:25AI Revolution he has also started and
00:27led the research teams that transform
00:29Google search Google Translate and
00:30Google Assistant now at inceptive he
00:33builds biological software with aim to
00:35make widely accessible medicines and
00:36biotechnologies Jakob welcome to no
00:38priors thank you thank you for having me
00:42um you worked at Google for more than a
00:43decade working on many leading research
00:45teams you were really seminal in the
00:47original Transformer paper and I think
00:49um when you know when I talk to the
00:50other authors of the Transformer paper
00:51people sort of in the Google you're
00:53widely credited with really coming up
00:55with the idea of focusing on attention
00:57which was sort of the basis for the
00:58attention is all you need paper could
01:00you talk a little bit more about
01:01how you came up with that and how the
01:03team started working on it and sort of
01:04the origins of that pretty foundational
01:06breakthrough in terms of the Transformer
01:09it's really not that simple right it's
01:11also really important to keep in mind
01:13that it always in deep learning you
01:15can't make something
01:16in quotes really work that is maybe
01:20pretty far I would say the theoretical
01:22or formal end without really going deep
01:25on the engineering implementation side
01:27and it just has to be efficient at the
01:29end of the day in my mind that's the one
01:30and only thing we know really works if
01:32you want to push deep learning forward
01:34is to make it faster and more effective
01:36and more efficient on the given piece of
01:37Hardware there's a lot of evidence that
01:41the way we actually understand language
01:43and that's something that then shapes
01:46language in terms of its statistical
01:47properties is actually it's someone
01:49hierarchical and the best piece of kind
01:52of just circumstantial or anecdotal
01:53evidence for that is just looking at
01:55what the linguists do right they draw
01:57these trees and while I don't think that
01:59they're ever really true
02:01they're also definitely not always false
02:04and so they do capture some of the
02:06statistics that are inherent in language
02:07and probably language was actually
02:09evolved this way in order to exploit our
02:12cognitive capacities really in a fairly
02:14optimal way and so you can safely assume
02:18that it is not necessary to go through
02:21the entirety of a sequential signal
02:24beginning to end and maybe also end to
02:26the beginning simultaneously in order to
02:28understand it but actually you can gain
02:30a lot of the understanding in air quotes
02:33by looking at individual groups
02:36of say your signal right and ultimately
02:39if you now are given a piece of Hardware
02:42that has the very key strength of doing
02:44lots and lots of simple computations in
02:46parallel as opposed to complicated
02:48structured computations sequentially
02:51then really that's actually a kind of
02:53statistical property you really want to
02:55exploit right you want to in parallel
02:57understand pieces of an image first and
02:59then maybe that's not possible in its
03:01entirety but you can actually get a lot
03:03of it and then only once you've done
03:05some of that you put these incomplete
03:07understandings or representations
03:09together and as you put them together
03:10more and more that's when you
03:12disambiguate the last remaining or
03:15that's when you get rid of the last
03:16remaining ambiguity at the end of the
03:17day and when you think about what that
03:19process looks like it's a tree
03:21and when you think about how you would
03:23actually run something that evaluates
03:25all possible trees then a reasonable
03:28approximation is that you repeat an
03:30operation where you look at all
03:32combinations of things that's the
03:34quadratic step right that ultimately is
03:37the core of this attention step and then
03:39you effectively pull information in for
03:41a given representation of a given piece
03:43the other representations of all the
03:44other pieces and rinse and repeat and it
03:48seems intuitive and also seems
03:50intuitively clear that that's a really
03:52good fit for the kind of accelerators
03:54that we had at the time that we still
03:55have today and so that's really where
03:57that idea came from if you want to look
03:59at say the biggest difference is for
04:00example between the Transformer as it
04:02was described in the attention is all
04:04you need paper and some of its ancestors
04:06like this decomposable attention model
04:07the big difference is just that the
04:09Transformer was implemented by folks
04:10like normal Industries Etc but in a way
04:13that's such an excellent fit for the
04:15accelerators that we had at the time so
04:16one question that I've kind of heard
04:17people bring up is a lot of the behavior
04:19that we've seen in Transformers to some
04:21extent is most interesting at scale
04:22right you get interesting immersion
04:24properties yeah and there may be other
04:26architectures that have equally
04:28interesting or perhaps more interesting
04:30properties at scale but there's sort of
04:32two impediments number one is people
04:33just aren't throwing a lot of money in
04:35compute at it and two is that under
04:37accelerator architecture actually fits
04:39so well it is dramatically less
04:41performant to do other architectures and
04:42therefore you may never actually test
04:44do you think that's a true statement I
04:46think that the big question is does it
04:48matter it would be really interesting to
04:49evaluate especially if we can make them
04:51simpler to evaluate combinations of
04:55different hardware and then models or
04:57architectures that fit like gloves to
04:58those and I feel at the moment given
05:02where gpus came from they weren't Built
05:07why would it be that they are anywhere
05:11if at least they were engineered for
05:13this purpose and lots of people
05:14basically banged their head against
05:15walls until they had this kind of
05:17somewhat optimized but that's not how
05:19the basic architecture came to be and so
05:22you can talk a lot about and reason a
05:24lot about and I think that some of that
05:25is true the generality of basically
05:27really fast scalable Matrix multipliers
05:29and how that just does everything in
05:31scientific Computing really well sure
05:33but there's still lots of bells and
05:35whistles and there are lots of specific
05:36trade-offs say for example things like
05:39memory bandwidth and ultimately inherent
05:42parallelism versus latency
05:45don't think gpus are at The Sweet Spot
05:47when it comes to large-scale deep
05:49learning with respect to exactly those
05:51trade-offs and so it may very well be
05:53that if we actually try these
05:56combinations we might actually even
05:59quickly find something that's better
06:01when you think about how we get progress
06:03from here usually people think of
06:06software as driving the hardware
06:09right do you think we get accelerators
06:12designed for the large-scale Transformer
06:15architectures we already have or new
06:17Hardware Designs like it's chicken or
06:19egg a little bit here it's a chicken and
06:22egg and if you look at the newest
06:24accelerator designs they are taking this
06:26into account to a significant extent
06:28actually increasingly so there are a
06:32couple of interesting examples we had a
06:33computer vision architecture that really
06:35was just an MLP called mixer and while
06:38it wasn't significantly better it also
06:41wasn't significantly worse than the
06:43vision Transformers right and I think
06:45that already goes to show it's not that
06:46difficult and especially if you simplify
06:48on the way it might really be a
06:50possibility I will say one other thing
06:51aside from efficiency but just really
06:54raw efficiency in terms of its thick
06:57architecture is fit to the accelerator
07:00the other main contributor I think to
07:02the success of this architecture is
07:04optimism and hope so suddenly you were
07:07in a situation where for whatever reason
07:10a bunch of things that people tried with
07:13this started to work and then more
07:15started to work and that's not
07:17coincidence it's really just because
07:18ultimately the human Cycles invested to
07:22getting all these things all these
07:23diverse things to work are ultimately
07:25fueled by suspension of disbelief and
07:28AKA hope or whatever you want to call it
07:30and and that really I mean the community
07:33became so energized so quickly and then
07:36just try everything under the sun and
07:38because if the prior was just a
07:39different one the fire now was oh look
07:41we have this thing where it just works
07:43which is just not true the reality is
07:45that you try something else the first
07:46time and you really have to work hard
07:47for a long period of time and then lo
07:50and behold sometimes it works and if you
07:52do that many more times then it will
07:53work many more times and I think that's
07:56really what we're seeing where do you
07:58think people should invest that sort of
07:59optimism going forward like what are the
08:01big areas that people need to work on to
08:03increase the performance of these
08:05systems or add memory or do other things
08:07that you feel are a few if you were to
08:09sort of paint the road map going ahead
08:10in terms of making these really valuable
08:12performance systems what would you focus
08:14on I mean I think there's one thing that
08:16still boggles my mind in terms of just
08:18from first principles it can't be
08:20optimal and that is that if you think
08:22about it the way you today scale the
08:25compute that's invested in a given
08:27problem right let's say the problem is
08:30what's the response to a prompt in some
08:32large language model
08:34then ultimately the way you scale that
08:36compute depends on the prompt and how
08:39much how long that is the longer the
08:41problem the more compute you get and it
08:43depends on and there's of course many
08:45different screws to tweak here the
08:47length of the response there are many
08:49very hard problems where the response is
08:52and you can in many cases actually
08:54formulate those problems very very
08:55succinctly so you're not going to be
08:57using a lot of compute even though the
08:59problem we know is really really
09:00difficult say I don't know Prime
09:01factoring prime factorization a problem
09:04like that simply stated big potential
09:06impact and right right now there's no
09:10knob that you can easily tweak as a user
09:12but also really there's no knob that the
09:15architecture can tweak itself when it
09:17comes to then basically deciding oh this
09:20is hard I actually need to use more
09:23and ironically and this comes back to a
09:26question that many people ask I think
09:28around does it make any sense to train
09:32because information Theory founding
09:35information Theory very clearly says
09:36nope you're not going to get more
09:38information out of it you can do it all
09:40but there is an artifact or there's an
09:43emission maybe even in that information
09:45in that flavor of information Theory
09:47which is it doesn't take into account
09:49compute it doesn't take that into
09:50account actually the energy expenditure
09:53necessary to generate that data so if
09:55you now think back to these problems
09:56right if you were to just let
09:59llms run generate stuff and then train
10:02new llms or even the same llm actually
10:05on that output what you do is you
10:07amortize compute that was expended at
10:09some point in time and so now suddenly
10:11right you actually have models that if
10:13you retrain them over and over again
10:15they're starting to spend more compute
10:17on the same problems but it's amortized
10:20over all of these iterations effectively
10:21of the system and that seems clunky that
10:24just seems so clunky that ultimately it
10:27should be something where at inference
10:30time at runtime the model effectively
10:34or maybe even query right so there's
10:36this notion of anytime algorithms where
10:38it might just depend on your resources
10:40if you have more time or more money then
10:42let it run longer but you don't want
10:45that to happen in cases where the answer
10:46or the quest the problem in question is
10:49simple you only want to do that in cases
10:51where it's actually hard and that right
10:53now doesn't work because if you pose a
10:56very very simple problem like
10:58two plus two to gpd4 right now and you
11:00write that in a very long-winded way in
11:03a prompt and you ask dpd4 to generate a
11:07very complicated answer right then it
11:09will actually expend a ton of compute to
11:12add to to two which makes no sense so
11:16that I mean out of all the different
11:18problems that I currently see in kind of
11:20at a high level because it's not clear
11:22how you would exactly address it that is
11:24maybe the one that boggles my mind most
11:26yeah are there other big research areas
11:29that you're excited about right now or
11:30areas where you see enormous progress
11:32being made so in terms of Foundations I
11:36different flavors of elasticity
11:39are really interesting so
11:42you could actually claim that a lot of
11:45these questions boil down to the
11:47question that I just or to basically
11:49this problem that I just uh described
11:51right that computers in a certain sense
11:53very crudely allocated but you can look
11:55at different incarnations of this
11:56problem so another one would be why
11:59don't we have models that in an elegant
12:01way manage to consume say visual sensor
12:05output of different resolutions
12:08different sampling rates different
12:09durations right right now it's actually
12:12quite tricky to have other than maybe
12:14recurrent architectures a model that
12:15takes videos of different links
12:17different image resolutions or
12:19ultimately different densities if you
12:21wish in different sizes and and really
12:24elegantly adjusts compute to what you
12:28really want to know about this or the
12:29how difficult it really has to be to
12:31generate the representations that you
12:33that you need in order to do whatever
12:34you want to do and here again an example
12:36that makes this I think pretty clear is
12:39you can take a video you can scale it up
12:41you can frame it with trivial algorithms
12:43and then run it again and if the problem
12:46you're trying to solve conditions on
12:49that video is the same then I wouldn't
12:50want more computers to use but right now
12:52that's what's going to happen you're
12:54going to use a ton more computers and so
12:56effectively these types of in a certain
12:59sense kind of elasticity or or your
13:02flexibility of these models I believe
13:04our lack of techniques addressing those
13:07ultimately is incredibly wasteful I've
13:10seen increasing attention around like
13:11two different concepts in these General
13:15directions one is um I think it's a I
13:17think some people at meta that did depth
13:18adaptive Transformers right so just
13:21adjusting the amount of computation for
13:23each input and like a prediction on that
13:25right and then I don't know how much
13:27more work has gone in that direction and
13:29then I think a number of people are more
13:31excited about doing test time search
13:33especially for problems like like code
13:36generation where you can evaluate with
13:38compilation or something to sort of get
13:40Loop of loop of success in in the model
13:43itself I think it's super effective in
13:47I do think it's clunky
13:49because it's not something that you can
13:51easily enter and optimize so right
13:53basically this is also what I'm what I
13:55was trying to get at a little bit maybe
13:56with with saying well some of these
13:58efficiency improvements that were not
14:00yet yeah really harnessing I believe
14:02would dramatically affect training time
14:04and if you look at kind of how test time
14:05search actually affects training
14:08it's just in the clinic and I don't
14:11think we'll be able to optimize it as as
14:13well although as an engineering in a
14:16I don't know hack could sound negative
14:18that's not what I mean and I think it's
14:20it's an awesome hack as an engineering
14:22hack around this problem it's really
14:24really effective it basically comes back
14:26to this whole idea of amortizing compute
14:27in a certain sense right with the stuff
14:29you already have lying around and
14:30memorized even though it was the humans
14:32that actually put it there in many cases
14:33in terms of adaptive adaptive time
14:36Transformers Etc we tried this Universal
14:39Transformer thing actually a long time
14:41it just hasn't caught on and that's
14:43because it just doesn't work right at
14:45this point it doesn't work well enough
14:46it's not like it doesn't work at all but
14:48if it worked really well then
14:51because of the fact that compute right
14:53now is this incredibly scarce resource
14:57we would see it everywhere and and I
15:00think what that tells us is and I don't
15:02think here it's really just for a lack
15:03of trying a couple years too little
15:04experimentation but at least those known
15:07or or proposed methods here they just
15:10don't work well enough yet
15:12so one thing that you've been working on
15:13for the last few years is inceptive
15:15which is really starting to focus on how
15:16can you apply machine learning and
15:19different aspects of software to biology
15:21could you share a little bit about the
15:23company how you got interested in BIO
15:25and what you view some of the
15:27interesting problems there
15:29Yeah so basically I've always been
15:30interested in BIO and know nothing about
15:32it and that's a conundrum because it's
15:34difficult to learn a lot about biology
15:35when you're not in school and I didn't
15:37want to go back to school but at the
15:39same time it always felt like something
15:40where there's a lot of Headroom in terms
15:43of efficiency and actually also where
15:49alternative approaches at least if what
15:51you are interested in is really solving
15:53acute problems where there's maybe a
15:55dire need for alternative approaches
15:57alternative to basically biology designs
16:00that is trying to develop a complete
16:02conceptual understanding of how life
16:05I don't have very high hopes for
16:07Humanity to develop that complete
16:08conceptual understanding to the level
16:10that we would need in order to do all
16:11the interferences we want to do we don't
16:12really have great Tools in our toolbox
16:15or we didn't have them until somewhat
16:17recently as alternatives to
16:19understanding how it works and then
16:21basically based on that understanding
16:22fixing it if it needs fixing and I think
16:25now we have an alternative that's an
16:27extremely good match and that's deep
16:28learning at scale we're really
16:30we can potentially to a pretty large
16:34extent if not entirely whatever this
16:36means work around the following two
16:39problems number one is we don't know all
16:41the stuff that's going on
16:42in life right so we still just don't
16:45even have a complete inventory let alone
16:47really understand all the mechanisms
16:49and number two we ultimately even for
16:53the stuff that we do know
16:54so far haven't really in many cases been
16:58able to come up with yes sufficiently
17:00predictive theories to really make that
17:03understanding useful a concrete example
17:04here is protein folding right where
17:06basically even if you just act as if
17:08there are no chaperones there is no
17:10other stuff in this environment in which
17:13folding or or whatever you want to call
17:15it in which that process in which kind
17:17of the earliest kinetic steering
17:19translation happen even if you make that
17:21massively simplifying assumption
17:24the theory just wasn't practical and it
17:29seems like deep learning is at least
17:31potentially a really good answer to both
17:32of those aspects because you can
17:34basically treat everything in quotes as
17:36a black box and as long as you are able
17:38to observe that black box in terms of
17:41input output fast enough and a
17:44sufficient scale you might go somewhere
17:46with that so inceptive is pretty
17:48stealthy is there anything you can share
17:49in terms of how you're applying deep
17:51learning or other techniques to to
17:53biology in the context of the company
17:54yep my daughter was born my first child
17:57and just that entire process gave me a
18:01really fundamentally different
18:02appreciation for the fragility of life
18:04and a really wonderful one but also a
18:06pretty fundamentally different one and
18:08so here we are we have this new tool
18:10namely episode 2 that solves one of
18:12these fundamental problems in structural
18:14biology we have instances of a
18:17macromolecule family that's basically
18:19about to save the world
18:21and I basically want to fix life because
18:24I now have this wonderful daughter it
18:26became clear that using the exact rules
18:29we have been working on at Google before
18:31and applying those to this neglected
18:33stepchild namely RNA or more
18:35specifically at first mRNA could have
18:37massive impact on the world and
18:40what we're trying to do is
18:42to design better RNA and at first mRNA
18:46molecules for a pretty broad variety of
18:51infectious disease vaccines are I guess
18:54maybe the obvious first example given
18:57but if you look at the pipelines of
19:00moderna and biontek and all those
19:02companies the at least potential
19:04applicability of RNA mRNA more
19:08specifically is is near Limitless
19:10there's already now hundreds of programs
19:12underway in different stages of
19:14development that number is expected to
19:17climb hitting High triple digits before
19:20the end of the decade and now we're
19:22talking about a modality that might end
19:25before the end of the decade being the
19:28second or third biggest modality in
19:31terms of Revenue and potentially also in
19:32terms of impact and if you now take that
19:36in terms of just trajectory and look at
19:39how sub-optimal in a certain sense the
19:42MRNA vaccines were when you compare it
19:45to what's possible using RNA just
19:47looking around in nature looking at how
19:49severe the side effects were for what
19:51fraction of ultimately patients that
19:53received the vaccines have few people
19:55comparatively really had access to any
19:58of those vaccines when they really were
20:00necessary and needed and it seems like
20:02currently if we look around in our
20:04toolkit the only tool we have to
20:06potentially change that quickly is deep
20:09so an incentive we think of this now as
20:13something that you could call biological
20:15software where mRNA and RNA in general
20:18is maybe the the equivalent to bytecode
20:22that then forms the substrate forms like
20:25the actual stuff that the software is
20:27made of and what you do is you learn
20:30models that allow you to translate
20:35programs that might look like some bit
20:38of python code that specify what you
20:40want a certain medicine to do inside
20:42yourself inside yourselves and translate
20:45those programs compile them into
20:47descriptions of RNA molecules that then
20:50hopefully actually do what you wrote
20:53what you programmed them to do
20:55and ultimately right now if you look at
20:57mRNA vaccines our programming language
20:59is just a print statement right just
21:01print this protein but you can easily
21:03imagine that with self-amplifying RNA as
21:06one example and with ribose which is a
21:08so-called ribose which is basically RNA
21:10is that change dramatically in structure
21:12or self-destruct in the presence of say
21:14given small molecular so you can
21:16effectively have conditionals you can
21:18have recursion and as a computer
21:19scientist you squint and you're like oh
21:21wow okay this is basically touring
21:22complete you have semio and you kind of
21:25have all sorts of tools now at your
21:27disposal to really build very very
21:29complex ultimately medicines that then
21:32might also be produced manufactured and
21:35distributed in a way that is much more
21:37scalable than anything that we've been
21:39able to do so far protein-based
21:40biologics oftentimes don't make it to
21:42the market because it's just not
21:44possible to manufacture them at scale if
21:47we wanted to medicate everybody in the
21:48world with all the protein-based
21:50biologics that they could actually that
21:52they should actually receive the real
21:53estate on the planet wouldn't be enough
21:55to make all the stuff
21:57but right now if you look at RNA
21:59manufacturing and distribution
22:00infrastructure we're going to have six
22:02to eight billion doses two years from
22:03now manufacturable and distributable
22:06across the globe and that number is
22:08going to go up really really quickly at
22:10inceptive right now in our lab we can
22:11actually print pretty much any given our
22:15and that's just something you can't do
22:16with small molecules you can't easily do
22:18with proteins certainly not at scale and
22:20that's not something that only matters
22:22when you have a product in your hand if
22:24you want to treat this as a machine
22:25learning problem you need to generate
22:27training data it doesn't already exist
22:28and so you also really want to have
22:29scalable synthesis and Manufacturing
22:32which is unprecedented as a consolation
22:36so your view is that you can actually
22:38search for the program that codes for
22:42let's say the code Spike protein at a
22:45certain amount with different stability
22:46characteristics with different immune
22:48reaction characteristics that doesn't
22:50need code a cold chain Logistics that
22:54condition of whatever cell type I'm
22:55saying in the future right not inceptive
22:57today but but that's the goal of all of
23:01the 10 to 630 variants that's right yeah
23:04and it's not certain I mean ultimately
23:06it's not going to be a search right just
23:07like today the output of an llm isn't
23:10coming out of a proper search procedure
23:12but it has to be a generation procedure
23:14exactly in the same in the same way and
23:17for the same reason as you basically see
23:19it in in large language models or image
23:21generation models but yeah that's
23:23exactly the goal because screening is
23:26just not going to cut it 10 to the 630th
23:28and that's really just one anti-gender
23:30that we're coding for there when we
23:32actually want to code for many and
23:33update those for any given for any given
23:35yep exactly when you do personalized
23:39it is going to be many antigens for each
23:42patient over time right and there's just
23:44no hope of basically tackling this with
23:47screening approaches at all yeah I'm
23:49excited to just get to the right answer
23:51without having to understand or discover
23:53every single mechanic and do the mass
23:55expensive screens we have today
23:57I mean that's really the big question
23:59are we here maybe at a Crossroads where
24:02the discovery and understanding is
24:05actually a hindrance the hope to
24:07discover and really get it how this
24:09works it might actually be holding us
24:11back and there is a pretty direct
24:13analogy to language understanding the
24:16computational Linguistics and
24:17Linguistics in general tried this for a
24:19while to develop a sufficiently accurate
24:21and complete theory of language to make
24:23this really actionable yeah when you
24:25talked about how Transformer model works
24:26for example actually was thinking about
24:27genomic sequencing where you used to do
24:29the sequential sequencing contact by
24:31contact and you'd have these big chunks
24:33of chromosomes each sequence through
24:34sequentially and then eventually you
24:35moved into an era where you just broke
24:38it up into tons and tons and tons of
24:39tiny little sequences that were randomly
24:41generated and then you'd reassemble it
24:42with the machine right and that felt
24:44like a very interesting parallel or
24:46analog to what you were talking about
24:47from a language perspective it's
24:48effectively the same thing it is exactly
24:51and and the parallels are so striking
24:53when they don't end there so yeah it's
24:55it's really really interesting to see
24:57and the the invariant that I feel just
25:01holds true across the board is
25:03that these formalisms that we make up in
25:06order to communicate our conceptual
25:08understanding our intuitive
25:09understanding than conceptualizing and
25:11explicitly is great for Education it is
25:14it's also great for many other types of
25:16maybe that reasoning about them it might
25:19actually because of our limited
25:20cognitive capabilities really not be the
25:23right tool to actually really predict
25:26what's going to happen with a given
25:28intervention yeah and I think the other
25:29point that I think really resonated in
25:31terms of what you mentioned was just if
25:32you look at drugs especially
25:34traditionally we actually didn't
25:35understand how most drugs worked until
25:38very recently and so aspirin we had no
25:40idea how it worked when it was you know
25:41taken out of the bark of a utree or
25:43whatever in the 1800s and it was fine
25:45like people were fine taking these
25:47things at minimal side effects yeah
25:48there's very popular drugs in the market
25:50like metformin that bind to multiple
25:52targets we still aren't sure exactly how
25:53they work and so a lot of the emphasis
25:55right now from a regulatory pathway for
25:57drugs is oh you need a mechanism a
25:58function or you need a proven pathway
26:00and all these things that create hurdles
26:02that don't necessarily help with drug
26:03efficacy and some of them might actually
26:05also be in a certain sense kind of I
26:09it's a waste of time and money if the
26:10thing works it works yes it's a waste of
26:13time and money and it might not even be
26:14true and we have no way of telling them
26:15yeah because in the end the ground
26:17troops is Right does it work and does it
26:20actually do more good than harm and it's
26:22empirical and yeah maybe there's really
26:25just maybe that should be the focus yeah
26:27and everything else should be treated as
26:29something that we should at least do
26:32the way we get the first I'm gonna take
26:35the first step in that historical frame
26:43actually or if we've discovered their
26:44mechanisms after the fact you know the
26:47end-to-end like black box like deep
26:50learning pipeline approach seems a
26:52little more rational a little less
26:53heretical which I think on upon first
26:56blush it certainly is controversial
26:59yeah I mean the the part that one can
27:01look at as Blasphemous is that now
27:03suddenly you don't know the theory
27:04anymore that you're testing right and
27:06you might never because it's not clear
27:08to us today as far as I can tell that if
27:11there is a theory in that black box
27:13today that we could get it out
27:15there are people trying I think it's
27:16worth trying I I'm not super optimistic
27:19about that I think it'll work for some
27:21cases right where it's simple enough
27:23that we can get it I think there are
27:25many cases where it just isn't all right
27:27let's say climate and weather
27:28forecasting I just don't think we're
27:31gonna get it we're gonna get it in the
27:32sense that we understand and I think we
27:34understand the Schrodinger equation and
27:36how that could be used interactively
27:38though in theory to just solve all these
27:41things but that's not practical and to
27:44develop a theory that is both the
27:47predictive and practical here might just
27:50not be something we can put in our heads
27:52yeah this is kind of interesting because
27:53I actually feel like this again is the
27:55basis of a lot of traditional drug
27:56Discovery from way back when as well as
27:57just the basis for how you think about
27:59genetic screens right you basically do
28:01functional screens so you'd mutagenize a
28:03bunch of organisms you'd look for output
28:05and then you say okay I've identified
28:07genes that are part of this pathway or
28:09output and I can map and somebody so
28:10they're interacting with each other but
28:11before molecular biology we actually
28:13didn't understand anything from a
28:14function that we just understood
28:16sequencing and output right and so
28:18it feels like deep learning is really
28:20just a throwback to other forms of
28:22biology that have been incredibly
28:23fruitful but just with a new sort of
28:25technology and modality to interrogate
28:27these systems exactly so how do you
28:28think about human augmentation in the
28:30context of all this stuff you know how
28:32bullish are you on human augmentation
28:33and what forms do you think it'll take
28:34in the near term I'm very bullish on
28:37human augmentation in the very long term
28:38but it's one that I don't see
28:40intuitively I think looking at our
28:44brains even just physically they seem to
28:47be very focused and this is not
28:49surprising on our IO
28:53why would there somewhere in there be
28:56some kind of computational capacity that
28:59if we just boosted our IO by a few
29:01orders of magnitude could still cope why
29:04would Evolution protector I don't know
29:06why and and so yes you could argue you
29:08know maybe to do long-term planning
29:10tasks and so on and so forth but sure
29:11right let's bound it a lifetime so right
29:14it's just not so clear whether there
29:16would have been any evolutionary
29:18pressures to really make our capacity
29:20there much bigger than say some
29:23multiplier basically time on our i o
29:26capacity if you look at the number of
29:28tokens that you use to train in llm
29:31and then you look at the number of
29:33tokens or words that are used to train a
29:35kid right a child a human baby or a
29:37human toddler I mean a human toddler is
29:40probably exposed to what hundreds of
29:42thousands maybe millions of words before
29:43they can speak like fluently but I think
29:48because we confuse fine tuning and
29:49pre-training pre-training is all of
29:51evolution sure and then basically you
29:54arrive at this thing that
29:55it's maybe doing something that's
29:57completely in a certain sense a
29:59completely irrelevant task at first but
30:01it has all the capacity in there to then
30:03with comparatively small amount of data
30:04maybe it's something between right but
30:06be then fine Beyond towards something
30:08that we would regard as oh so Advanced
30:10cognitively the compute has been
30:12amortized over the last several
30:14Millennia that's right yeah 10 millennia
30:17humans and we come pre-wired for
30:19language and so it only takes a million
30:21tokens at the end exactly and and now
30:23the thing is that you can now say okay
30:25great so we come to buyers rewire kids
30:27look at our wires and try to find
30:30that might not it might not be that
30:32simple right because of course it's this
30:34co-evolution and it's all fuzzy and so
30:36how much we're pre-wired for it how much
30:38language is in a certain sense also
30:39pre-wired for us it might be it might be
30:43the case that it's that it's maybe even
30:44impossible right to to actually yeah
30:47read out what it's pre-wired for from
30:50just looking at the wiring yeah you can
30:52see circumstances where people are
30:55literally born without a hemisphere of
30:57their brain or there's other sort of
30:58mass scale deficiencies brain wise and
31:01then things just rewire to effectively
31:03compensate and so you have parts of the
31:05brain taking over other other
31:06functionality that they're normally not
31:08designed for which is also fascinating
31:09because it seems like certain parts are
31:12extremely specialized visual cortex et
31:13cetera and then other parts are
31:14basically almost general purpose
31:16machines that can be reallocated I
31:18completely agree with what you're saying
31:19I feel general purpose machines is a
31:21really tricky term because right I mean
31:24could they could the brain after massive
31:27trauma rewire to do something very
31:28different fair I'm clear right so it
31:31could be that it's actually still
31:32specific but it is in a certain sense
31:35General namely preparing for a certain
31:38flavor of redundancy and this is also
31:41why I find AGI as a term particularly
31:43problematic because I don't know what
31:45the general means I think they're
31:46referring to General Tso's chicken as
31:48part of no I'm just sorry really dumb
31:55what's the like theory of data
31:58generation and inceptive right I feel
32:00like I understand the mission you
32:02describe and then like you need to go do
32:04wet lab experiments with observation to
32:07understand all the properties of these
32:10sequences and there's you have to figure
32:13out how to do that efficiently right so
32:14a young company with all your pedigree
32:16and resources so yes would love any
32:18intuition on that yeah so let me try to
32:19get across how we think about this so
32:21number one we look at ourselves actually
32:23as one anti-disciplinary team so it's
32:26not quite anti-disciplinary although
32:27there is a correlation maybe with a lack
32:29of discipline or disregard for
32:32fundamental discipline or disciplines
32:34and being anti-disciplinary but we think
32:37we're really in the sense pioneers of a
32:39new discipline doesn't have a name yet
32:41but it draws a lot from Deep learning
32:43and draws a lot from biology we think
32:46designing the experiments or assays that
32:49we're using to generate the data that we
32:51need to then train the models in a
32:55at the core of this discipline if you
32:58the experiments or the assets that we're
33:00running they use the models that we're
33:02training on the data that their
33:04predecessors actually produced and so
33:06really if you squint then in a certain
33:09sense I guess there was always this
33:11dream of and I think it's a pipe dream
33:13of having the cycle between
33:15experimentation and then you put that
33:17into some something in silicos something
33:19running on computers and then that
33:20informs the experiments and then you
33:22kind of iterate that cycle
33:24I think that's just it would be
33:27beautiful and simple and nice I don't
33:28think it's really that easy so what you
33:30see a detective is actually there's not
33:33that one cycle although maybe now
33:35somewhere hazely there actually is that
33:37cycle too but by Design actually there
33:40are tons of little Cycles so right you
33:42started an assay and the first thing you
33:44do is actually you query a neural
33:45network and then you do some stuff and
33:48then you get certain readouts and those
33:49you then together with some other stuff
33:51feed into yet another model and then
33:53that actually gives you parameters for
33:54some instrument and then you run that
33:56instrument on the stuff that you've
33:58created and so it's really just this
34:00kind of giant mess where the boundary
34:03actually is increasingly blurry and so
34:06we actually think that our work happens
34:07on the beach because that's where the
34:09wet and the dry meet in harmony
34:12and so initially folks joint inceptive
34:15and they usually most of them they come
34:18from say either in quotes side right
34:20they've spent most of their careers
34:21working on deep learning or maybe
34:26But ultimately it doesn't take them that
34:28long to start speaking some weird kind
34:32of Creole of all of these languages and
34:34also think in these ways and what then
34:36happens is Magic it's really amazing
34:39because then you suddenly find solutions
34:42to problems that say the biologists they
34:45were two years ago just wouldn't even
34:49and they work together with folks they
34:51would have otherwise maybe never even
34:52met and the results sometimes don't work
34:56at all but sometimes they really are
34:57magic that's a that's a really inspiring
34:59note to end on thanks Jacob thank you
35:03find us on Twitter at no Pryor's pod
35:06subscribe to our YouTube channel if you
35:08want to see our faces follow the show on
35:10Apple podcasts Spotify or wherever you
35:13listen that way you get a new episode
35:14every week and sign up for emails or
35:16find transcripts for every episode at no