00:08let's start with the personal story you
00:11have a background in computer science
00:12and you were working in the hedge fund
00:13world uh that's a hard left turn or it
00:18um that world to being a driving force
00:20in the AO state of the art how did you
00:22end up working in this field
00:24uh yeah I've always been interested in
00:26kind of AI and technology so on the
00:29hedge fund I was one of the largest
00:30investors in video games and artificial
00:32intelligence but then my real interest
00:34came when my son was diagnosed with
00:35autism and I was told there was no cure
00:39treatment and I was like ah well let's
00:41try and see what we can do so I built up
00:43a team and did AI based literature
00:45review this was about 12 years ago of
00:49treatments and papers to try and figure
00:51out commonalities and then did some uh
00:55kind of biomolecular pathway analysis of
00:57neurotransmitters for drug repurposing
00:59and came down to a few different things
01:02that could be causing it you know worked
01:04with doctors to treat him and he went to
01:05mainstream school and that was fantastic
01:07went back to running a hedge fund won
01:09some awards and then I was like let's
01:11try and make the world better and so the
01:13first one was uh non-ai enhanced
01:16education tablets for refugees and
01:18others and that's imagine worldwide my
01:20co-founders charity and then in 2020
01:23covid came and I saw something like
01:25autism a multi-systemic condition that
01:28existing mechanisms that extrapolated
01:31the future from the past wouldn't be
01:33able to keep up with and thought could
01:34we use AI to make this understandable
01:37and so I set up an AI initiative with
01:39the World Bank UNESCO and others to try
01:42and understand what caused covid
01:46um and try and make that available to
01:47everyone then I hit the institutional
01:49wall in a variety of places and realized
01:53that the models and technologies that
01:55had evolved were far beyond anything
01:57that happened before and there were some
01:59interesting Arbitrage opportunities from
02:01a business perspective and more on that
02:03a bit of a moral imperative to make this
02:05technology available to everyone because
02:07we're now going to very narrow
02:10superhuman performance and everyone
02:13should have access to that
02:15it's an amazing journey and
02:17congratulations on all the impact you've
02:22um or as you imply the AI field in
02:24recent years has been increasingly
02:25driven by labs and private companies and
02:28and one of the most obvious paths to
02:32Performance progress is to just make
02:34models bigger right scaling data
02:36parameters gpus which is very expensive
02:39um and then in reaction just to set the
02:41stage a little bit there's been some
02:43efforts over the previous years to be
02:46more Community Driven and open and build
02:48Alternatives like Luther how did you
02:51start engaging in that and how did
02:54stability change the game here
02:56yeah so when I was doing the covert work
02:59um you know we tried to get access to
03:00various models in some cases the
03:02companies blew up other cases we won't
03:03give an access despite it being a high
03:05profile project and so I started
03:07supporting a Luther AI as part of the
03:11um so you know Stella and Connor and
03:13others kind of LED it on the language
03:16model side but really one of my main
03:17interests was the image model side I
03:20have aphantasia so I can't visualize
03:22anything in my right which is more
03:25common than people would think in fact a
03:26lot of the developers in this space have
03:28that like we've got nothing in our brain
03:30you just see words what are you what's
03:32in there just feelings
03:35so like again I thought it was a
03:37metaphor imagine yourself on a beach I
03:39was like okay I feel a beach no
03:40apparently you guys have pictures in
03:42your heads it must be like just
03:46um but then with the arrival of clip
03:49released by open AI a couple of years
03:52um you could suddenly take generative
03:54models and guide them to text prompts so
03:57it's VQ Gan which is kind of the
03:59slightly mushy more abstract version
04:00first but I build a model for my
04:02daughter while I was recovering
04:03ironically from covid and then she took
04:07the output and sold it as an nft for
04:09three and a half thousand dollars and
04:11donated to India code relief and I was
04:12like wow that's crazy
04:14uh so I started supporting the whole
04:16space at Luther and Beyond giving jobs
04:18to the developers compute for the model
04:21creators funding the various notebooks
04:24from disco diffusion to these other
04:26things you know giving grants to people
04:28like mid-journey that were kind of
04:30kicking this off just personally
04:32just personally they were doing all the
04:34hard work and I was like can I
04:35capitalize this because it's good for
04:37uh then about 15 months ago I was like
04:39well these communities are growing it'd
04:41be great if we could create this as a
04:43common good and originally I thought you
04:45got communities you got to make them
04:47kind of coordinated could a dow work or
04:49a dow of dows and that's how stability
04:51started after about a week I realized
04:54that was not going to work and it was
04:56incredibly difficult so then I figured
04:58out commercial open source software
05:01um to create a line technology not just
05:05that would potentially change the game
05:08by making this stuff accessible because
05:10as you said one of the key things this
05:11is in the state of the AI report this is
05:14in AI index as well is that most
05:17research has been subject to scaling
05:19laws and other things Transformers seem
05:21to work for everything and so it was
05:23moving more and more towards private
05:24companies but the power of this
05:26technology is double edged one is that
05:28there are fears about what could go
05:29wrong so it's not released and the other
05:31one is why not keep it for excess
05:35um so you've had this massive brain
05:37drain occurring and no real option
05:39you work in an academic lab you have a
05:41couple of gpus or you go and work at Big
05:43Tech or slash open AI or you set up your
05:46own startup which is very very difficult
05:49as you guys know so I wanted to create
05:51another option and that's what we did
05:53with a Luther and stability and the
05:55other communities that we have grown and
05:58could you talk more broadly about why
06:01you think it's important for there to be
06:03open source efforts in Ai and what your
06:06view of the world is because I think
06:09um stability has really helped create
06:11this alternative to a lot of the closed
06:12ecosystems particularly around imagen
06:15protein folding a variety of different
06:16areas and those are incredibly important
06:17efforts I'd just love to hear more about
06:19your thoughts on you know why is this
06:21important how you all view the
06:22participation of the industry over time
06:23and also what you think the world looks
06:25like in you know five years ten years et
06:27cetera in terms of closed versus open
06:30so I think there's a fundamental
06:31misunderstanding about this technology
06:33because it's a very new thing right
06:36classical open source there's lots of
06:38people working together with a bit of
06:39direction is a bit chaotic but then
06:41you've seen red hat and other things
06:44there aren't many people that train
06:45these models right like we don't invite
06:48the whole community and you have a
06:49hundred people training a model it's
06:50usually five to ten plus a supercomputer
06:53and a data team and things like that and
06:55the models when they come out are a new
06:57type of programming primitive
06:59infrastructure because you can have a
07:01stable diffusion that's two gigabytes
07:02that deterministically converts a string
07:07that's a bit insane and that's what's
07:08led to the adoption here you know on
07:10GitHub Stars we've overtaken ethereum
07:12and Bitcoin cumulatively it took them 10
07:14years we got there in like three four
07:16months if you look at the whole
07:17ecosystem it's the most popular open
07:19source software ever not just AI why
07:21because again is this new
07:23translation file and you do the
07:26pre-compute as it were on these big
07:28supercomputers which means the inference
07:30required to create an image is very low
07:32and that's not what people would have
07:33expected five years ago or to create a
07:37so as infrastructure
07:39I think that's how it should be viewed
07:41and so my take was that what would
07:43happen is everyone would be closed
07:44because you needed Talent data and
07:47supercompute and those would be lacking
07:49as it were so it'd be the big companies
07:51only they would go four or five years
07:53and then someone would defect and go
07:55open source and it would collapse the
07:57market as they would monetize everyone
07:58else's complement so similar to Google
08:01offering free Gmail and all sorts of
08:03stuff around their Core Business
08:05but more than that I realized that
08:07governments and others would need this
08:09infrastructure because if a company has
08:12it privately they will sell to business
08:13to business so maybe a bit of b2c but
08:15we've seen the Cambrian explosion of
08:17people building around this technology
08:19but who's building the Japan model or
08:21the India model or others well we are
08:23and then that means that you can tap
08:24into infrastructure spending which is
08:26very important because it needs billions
08:28but the reality is that's actually a
08:30small drop in the ocean
08:31self-driving cars got 100 billion of
08:33investment we have three hundreds of
08:37trillions and for me this is 5G level so
08:41from an ethical moral perspective I was
08:44like we've got to make this as equitably
08:46available as possible so a business
08:48model perspective I thought was a good
08:49idea as well but I thought we were held
08:50here inevitably so I decided to create
08:54stability to help coordinate and drive
08:57and what's hopefully a moral and
08:59reasonable way like you know the
09:01decisions that we make have a lot of
09:03input and they're not easy but we are
09:05trying to be kind of Switzerland in the
09:06middle of all of this and provide
09:08infrastructure that will uplift everyone
09:09here what do you think this world looks
09:12like in five years or ten years do you
09:14there's a mix of clothes and open source
09:16do you think the most Cutting Edge
09:17models the the giant language models are
09:20going to be both or do you think like
09:21Capital will eventually become such a
09:23large obstacle that it'll make um the
09:25private World more likely to try
09:27progress for it and I know you have
09:29plans in terms of how to offset that but
09:30I'd just love to hear about those
09:32the reality is we have more compute
09:34available to us than Microsoft or Google
09:36so I have access to National
09:38supercomputers and I'm helping multiple
09:40Nations build exascale computers
09:42so to give you an example we just got
09:44seven million hour Grant on Summit one
09:46of the fastest supercomputers in the US
09:48and like I said we're building exoscale
09:51literally the fastest in the world
09:52private companies don't have access to
09:54that infrastructure because governments
09:57thanks to ours are realizing that this
09:59is infrastructure of the future so we
10:00have more compute access we have more
10:02cooperation from the whole of Academia
10:04than all of them do because their
10:06agreements tend to be commercial
10:08there's no way that private Enterprise
10:11and our costs are zero as well
10:13when you actually consider that whereas
10:15they have to ramp up tens of billions of
10:17dollars of compute so my take is that
10:19Foundation models will all be open
10:21source for the Deep learning phase
10:23because we're actually about multiple
10:25phases now the first stage is deep
10:26learning that's creating of these large
10:28models and we will be the coordinator of
10:31the open source the next stage is the
10:32reinforcement learning the instruct
10:34models flan Palm or instruct GPT or
10:36others that requires very specified
10:39annotation and that's something that
10:40private companies can excel in
10:42the next stage beyond that is fine
10:43tuning so actually let's give a
10:46practical example Palm is a 540 billion
10:48parameter model it achieves about 50 on
10:51medical answers right
10:53flan Palm is the instructed version of
10:57that and that achieves 70 Med Palm they
11:01took medical information they fed it in
11:03this is a recent paper from a few weeks
11:05ago achieved 92 percent which is human
11:07level on the answers
11:10and the final stage for that is you take
11:11this med palm and you put it into
11:13clinical practice with human in the loop
11:15for me the private sector will be
11:17focused on the instruct to human in the
11:20loop area and the base models will be
11:23infrastructure available to everyone on
11:25an international generalized and
11:28particularly because when you combine
11:30models together I think that's Superior
11:32to creating multilingual models so
11:34that's quite a bit there and I'm sure
11:35you want to unpack that yeah that's very
11:38exciting yeah could you actually talk
11:39about the range of things or efforts
11:41that are going on at stability right now
11:42I know that you've done everything from
11:45these Foundation models on the lot on
11:47the language side protein folding image
11:49gen et cetera if you if you could just
11:52what is the spectrum of stuff that
11:53stability does and supports and works
11:56with and then what are the areas that
11:58you're putting the most emphasis behind
12:01yes I think we are the only independent
12:03multimodal AI company in the world so
12:05you have amazing research Labs like fair
12:07at meta and others and deepmind doing
12:10everything from protein folding to
12:12language to image and there are
12:13cross-learnings from all of these
12:16um basically we do yeah everything from
12:18audio uh to language coding models
12:24any kind of almost private model we are
12:27looking at what the open equivalent
12:28looks like and that's not always a
12:29replication right so with stable
12:31diffusion for example we optimized it
12:33for a 24 gigabyte vrm GPU now as of the
12:38release of distill stable diffusion it
12:40will run in a couple of seconds on an
12:41iPhone and we have neural engine access
12:43because our view of the future is
12:46creating models that aren't necessarily
12:47bigger but that are customizable and
12:52so this is a bit of a different emphasis
12:54and we think that's a superior thing for
12:56scale than scaling I think things like
12:58the chinchilla paper that's the 67
13:00billion parameter model that's as
13:01performant as gpt3 at 175 billion are
13:05important in that because it said that
13:06training more is important and actually
13:08when you dig into it it actually said
13:09data quality is important because now
13:12we're seeing that the first stage the DL
13:14stage is it where the Deep learning
13:16stage is let's use all the tokens on the
13:18internet you know but maybe we can use
13:21better tokens that's what we see when we
13:22instruct and use reinforcement learning
13:24with human feedback and we've also been
13:25releasing technology around that so our
13:27Kappa lab representative learning we
13:30released our instruct framework that
13:32allows you to instruct these big models
13:33to be more human the way I kind of put
13:35it is that though our focus is thinking
13:39what are the foundation models that will
13:40advance Humanity be it commercial or not
13:43what needs to be there and what's very
13:45susceptible to this transform-based
13:47architecture that takes about 80 of all
13:49research in the space
13:51making that compute and knowledge and
13:53understanding of how to build these
13:54models available to Academia independent
13:56research and our own researchers and
13:59then from a business perspective really
14:00focusing on where our Edge is and our
14:02edges in two areas one is media and so
14:05this is why image models video models
14:07and Audio models have been a focus 3D
14:09soon as well and the other area is
14:12private and regulated data
14:14because what's the probability that a
14:17gpt3 model weight or a palm model weight
14:20will be put on-prem it's very low versus
14:24an open model it's very high and there's
14:26a lot more valuable private data than
14:28there is public data
14:29so it is a bit of everything but like I
14:32said there are certain focuses on the
14:34business side on media and then I think
14:37on a breakthrough side computational
14:38biology will be the biggest one
14:41that's really cool and on the
14:42computational biology side I guess
14:43there's a few different areas there's
14:45things like protein folding and then to
14:46your point there's things like Med Palm
14:47are you thinking of playing a role in
14:49both of those types of models in terms
14:50of both the medical information yes we
14:52will release an open medpal model well
14:58um and then protein folding we are the
15:00one of the key drivers of open fold
15:01right now so we just released a paper on
15:03that much faster ablations than Alpha
15:06fold we're doing as well
15:08um DNA diffusion uh for predicting the
15:11outcome of DNA of sequences we have
15:13bioelem around taking language models
15:16for chemical reactions and that's an
15:18area that we will aggressively build
15:20because there's a lot of demand from the
15:22computational biology side for some
15:24level of standardization there there
15:25have been initiatives like Melody and
15:27others looking at Federated learning but
15:29there is a misalignment of incentives in
15:31that space that I think we could come in
15:33and fix and I think that's where we
15:34really view ourselves
15:36how can you really align incentive
15:38structures and create a foundational
15:39element that brings people together and
15:42I think that's where we are most
15:43valuable because private sector can't do
15:45it that well public sector can't do it
15:47that well a mission oriented private
15:49company that has this broad base and all
15:52these areas could potentially
15:54yeah I think also the the global nature
15:56of your focus is really exciting because
15:57when I look at things like medical
15:59information or medical models
16:02um you know ultimately the big Vision
16:04there which a number of people have
16:05talked about for decades at this point
16:07is that you'd have a machine that would
16:09allow you to have very high access to
16:12care and medical information no matter
16:13where you're in the world and especially
16:15since you can take images with your
16:16phone and then interpret them with
16:18different types of models and then have
16:19like an output you know you should if
16:21you have a cardiac issue you should have
16:23care equivalent to the world's best
16:25cardiologist from Stanford or you know
16:28Center of Excellence available to
16:30anybody in the world whether they're
16:32rich poor developing country not et
16:35you know it's very compelling to see
16:37this big wave of technology and sort of
16:39the things that may be able to enable
16:40including some of the things that you
16:41mentioned around AI medicine so
16:43I think it's very interesting as well
16:45because this technology is being adopted
16:47so fast I mean let's face it Microsoft
16:49and Google two trillion dollar companies
16:51have made a core of their strategy which
16:54is crazy insane for technology that's
16:56basically five years old let's say two
16:57years old really breaking through
17:00because it can adapt to existing
17:02infrastructure you know like it sits
17:05there and it absorbs knowledge when you
17:06fine-tune it but then my thing is I look
17:09to the future and I'm like that best
17:13which bits of that should be
17:15infrastructure for everyone and which
17:17bits of that should be private and so
17:19that's how I kind of oriented my
17:21business I looked at the future I come
17:22back and I think what should be public
17:24infrastructure and how can I help build
17:26that and coordinate that and that's
17:27valuable and then everything else other
17:30people can build around how do you think
17:32about the traditional pushback that's
17:34existed in the medical world around some
17:35of these Technologies so for example you
17:37know the first time an expert system or
17:39a computer could actually
17:41outperform Stanford University
17:44physicians at predicting infectious
17:46diseases within the 1970s with this
17:48mycene project where they literally
17:49trained an expert system or designed an
17:51expert system to be able to predict
17:53infectious disease but here we are
17:55almost 50 years later
17:57with none of that technology adopted
17:59and so do you think it's just we have to
18:01do a lot of human the loop things and
18:02it's a doctor's assistant and that'll be
18:04good enough do you think it's just a sea
18:06change they're not in Physicians like
18:07what's the what do you think is the
18:09driver for the technological adoption
18:10and something so important today
18:12so I think the infrastructural barriers
18:15are huge for adoption of Technologies
18:17particularly in private sector I think
18:19there is a new space of Open Source
18:21technology adoption that could be very
18:23interesting and a willingness now that
18:25people kind of understand this which
18:27wasn't there even 10 years ago you know
18:29the nature open source now it runs the
18:31world servers and databases and I think
18:33there's another level of Open Source
18:34which is open source complex systems as
18:38um previously in other discussions I've
18:40talked about our education work so right
18:42now we're deploying four million tablets
18:44to every child in Malawi by next year
18:46we'll have hundreds of millions of kids
18:47hopefully that we deploy to it's not
18:49just education it's Healthcare and it's
18:51working with the governments it's
18:54working with multilateral to say can we
18:56build a healthcare system from the
18:58bottom up that can do all of these
19:01without an existing infrastructure
19:03because they don't have an existing
19:04infrastructure it's one doctor per
19:06thousand kids ten thousand kids one
19:08teacher for 400 kids
19:10I am certain that system will outperform
19:12anything in the west within five years
19:14which is crazy to say but then our
19:17Western systems can then take bits of
19:19that and adapt to it because I think
19:21it's competitive pressure is quiet
19:22because Western systems are very hard to
19:24change and in the UK we've done that
19:26with HDR UK the genomic Banks and others
19:28and that was a massive uphill battle as
19:31you know to get these Technologies
19:32adopted because we should there should
19:34be barriers to adoption of this
19:35technology when it comes to things as
19:36important as Healthcare but at the same
19:39time I think now is the time to open it
19:41yeah I think there is an interesting
19:43loose analogy to different pace of
19:46adoption of different Technologies in in
19:49different GEOS in the past right so one
19:51that comes to mind is
19:53um today I think it's very commonplace
19:55amongst uh consumer internet investors
19:57to look at what's happened with Mobile
19:59in East Asia as a precursor to
20:02interactions that might happen here and
20:05you know mobile technology advanced much
20:07more rapidly in China Korea many other
20:10places one because of private partner
20:12private public partnership and and two
20:15because you know there were um there was
20:17more I guess Green Field in terms of
20:19access to information and different
20:21infrastructure that supported mobile is
20:23the primary communication medium and I I
20:26could certainly see that happening with
20:28some AI uh Native products I think
20:32that's an excellent point I agree 100 I
20:34think just as they LeapFrog to mobile a
20:37lot of the Emerging Markets Asia in
20:38particular will LeapFrog to generative
20:41AI or personalized Ai and I can see this
20:44because I'm having discussions with the
20:45governments right now
20:47um like what is the reaction over the
20:49Christmas holiday I was getting a few
20:50hours of sleep finally I got like six
20:52calls from Headmasters of UK schools
20:55saying M ad what is our generative AI
20:57strategy I was like you what and they
20:59were like all our kids are using chat
21:01GPT to do their homework
21:03and so it's kind of one of the first
21:05little moments an amazing interface that
21:07opened AI built it's going mainstream
21:09and I was like well get good you know
21:11stop assigning essays so now in some of
21:14the top private schools in the UK they
21:15actually have to write the essays during
21:17the lessons without computers which I
21:18think is wrong because my discussions in
21:21an Asian context for example with
21:23certain leading governments that are
21:24about to put tens of billions into this
21:25space they're embracing their technology
21:27and they're like how can we have our own
21:29versions of this and how can we
21:31implement this to help our students get
21:33even better right because also it's very
21:36even though there might be bureaucracy
21:37in some of these nations if they want to
21:39get something done they get it done
21:41and this technology is very different in
21:44that the costs are not continuous like a
21:475G network uh like the capex profile and
21:50other things are very different like you
21:52know you can say it costs 10 million
21:53dollars to train a GPT it doesn't cost
21:55that much anymore that's really valuable
21:57if you can have a chat GPT for everyone
21:59like the rois are huge so yeah I do
22:02think that a lot of these nations like
22:03the African context is one that we're
22:05driving forward with education as a core
22:07piece and right now we're teaching kids
22:09with the most basic AI in the world
22:11literacy and numeracy in 13 months on
22:13one hour a day in refugee camps
22:15that's insane that's already better it's
22:17going to get even better but I think
22:19Asia in particular they're going to go
22:21directly to this technology and embrace
22:24it fully and then we have to have a
22:26question if you're not embracing this in
22:29in America in the UK you're going to
22:31fall behind because ultimately this can
22:33translate between structural
22:34unstructured data quicker than anything
22:36I'd like to see what uh you know pace of
22:39adoption we can have the United States
22:41that's its technology as well but um but
22:44I I can see uh the the prediction coming
22:46true if we just go back to the most
22:49advanced like mature use case with
22:52instability and as you said media as an
22:54advantage what does the future of media
22:56look like and actually if even if we go
22:58back before that you know you're
23:00involved in um sort of early ecosystem
23:03efforts uh with Luther and such how did
23:06you even identify that this was an area
23:08of interest for you versus everything
23:09else going on across modalities
23:12so you know I've always been interested
23:13in meaning like uh semantic is even part
23:16of my email address and that's my
23:18religious studies as well around
23:19epistemology and ethics ironically
23:23um the way that I viewed it is that the
23:25easiest way for us to communicate is
23:26what we're doing right now via words
23:28right and that's held constant but now
23:30we can communicate via phones and
23:31podcasts or whatever and it's nice
23:33writing was more difficult and the
23:35Gutenberg Press made it easier but
23:37visual communication is incredibly
23:38difficult be it a PowerPoint which is
23:40visual communication or art which is
23:43visual communication and then you have
23:45video and things like that which is just
23:46impossible now you have tick tocks and
23:48others making it easier I saw this
23:50technology and I was like if the pace of
23:51acceleration continues
23:53visual communication becomes easy for
23:56like my mom sending me memes every day
23:58telling me to call more or kind of
23:59whatever and I'm like that's amazing
24:03because that creation will make Humanity
24:05happier like you see art therapy that's
24:07visual communication and it's the most
24:10effective form of therapy what if you
24:12could give that to everyone so there was
24:13that aspect to it but then I saw movie
24:15creation and things like that so my
24:17first job was actually organizing the
24:19British independent film Awards and
24:20being a reviewer for the rain dance Film
24:22Festival so uh you know every year I put
24:25a movie on for my birthday and we give
24:26the proceeds to charity you can see my
24:28favorite movie with my friends it's
24:29pretty cool and then I was the bigger
24:32video game investor in the world at one
24:33point so these types of communication
24:35interaction really interesting and I
24:37thought that people really misunderstood
24:38the metaverse ugc and the nature of what
24:41could happen if anyone could create
24:43anything instantly it's not going to be
24:45a world for everyone or a world that
24:47everyone visits it's going to be
24:48everyone sharing their own worlds and
24:51seeing the richness of humanity and
24:52again I thought that was an amazing
24:54ethical slash moral imperative for
24:57making Humanity better but also an
24:59amazing business opportunity because the
25:02nature and way that we create media will
25:04transform as a result of this technology
25:05and we're seeing it right now we have
25:07amazing apps like dscript right where
25:10you could take this podcast and you can
25:11edit it with your words live you know
25:13you have amazing kind of gaming things
25:16come out where you create assets and
25:17instances or you know some of this new
25:203D Nerf technology where you can reshoot
25:21stuff we are working with multiple movie
25:24studios at the moment who are saving
25:25millions of dollars just implementing
25:27stable diffusion by itself let alone
25:29these other Technologies and that was
25:30for me tremendously exciting to allow
25:33anyone not to be creative because people
25:35are creative but to access creativity
25:37and then allow the creatives to be even
25:40more creative and tell even better
25:46opening eyes and they don't think image
25:48generation is kind of like core on the
25:51path to AGI it's obviously really
25:53important to you personally and to
25:58tell us about your stance on AGI and if
26:00that's part of the stability Mission
26:01yeah I don't care about AGI except for
26:04it not killing us I mean like they can
26:08um my thing what I care about is
26:10intelligence augmentation
26:12you know this is the classic kind of
26:14memex type of thing how can we make
26:16humans better like our mission is to
26:17build the foundation to activate
26:18Humanity's potential
26:20um so look AGI is fine
26:22um again we have to have some things
26:24around that I do believe that they are
26:27incorrect around multi-modality being or
26:29images being a core component of that
26:32um but like I think there are two
26:33paradigms here one is stack more layers
26:35and I'm sure gpt4 and palm 18 and all
26:38these things will be amazing stacking
26:40more layers and having better data as
26:43but like one of the things we saw for
26:45example stable diffusion we kind of we
26:48put it out together and then people
26:50trained hundreds of different models
26:52when you combine those models it learns
26:55all sorts of features like perfect faces
26:58and perfect fingers and other things and
27:01this kind of is related to the work that
27:03deepmind did with gato and others that
27:05show that auto regression of these
27:06models in the latent spaces becomes
27:09really really interesting so what if the
27:11route to AGI is not one big model to
27:14rule them all trained on the whole
27:15internet and then narrowed down to human
27:18preferences but instead millions of
27:21models that reflect the diversity of
27:22humanity that are then brought together
27:25I think that is an interesting way to
27:27kind of look at it because that will
27:28also more likely to be a human aligned
27:30AGI rather than trying to make this
27:34elder god of weirdness about your will
27:37you know uh which is what it feels like
27:41yes we're gonna have a high of elder god
27:43instead you've you've mentioned uh that
27:46stability is still working on language
27:48uh the application of diffusion models
27:51too image is a really unique
27:52breakthrough and it's not as
27:54computationally intensive as like the
27:56known approaches to language so far I
27:58think you've said that the core training
27:59Run for the original stable diffusion
28:01was 150 000 a 100 hours which is like
28:03not that huge in the grander scheme of
28:05things what can you tell us about your
28:07approach to language
28:09um so yeah so via the kind of Luther AI
28:12side of things and our team there you
28:14know we release GPT neoj and X which
28:17have been downloaded 20 million times so
28:18the most popular language models in the
28:22um you kind of basically either use gpt3
28:24or use that they go up to about 20
28:25billion parameters and like I said we've
28:27released our trlx from the copper lab
28:29which is the instruct framework the
28:31training you know multiple models in the
28:34up to 100 billion parameters now and I
28:37think you need more uh chinchilla
28:39optimal to enable a chat open chat GPT
28:41equivalent you know enable an open
28:43Claude equivalent I think that will be
28:46an amazing Foundation from which to
28:47train sector specific and other models
28:49that then again can be Auto regressed
28:51and there will be very interesting
28:52things around that language requires
28:56um not necessarily because of the
28:57approach and diffusion breakthroughs
28:59like uh recently Google had their
29:01newspaper where they showed a
29:02Transformer actually can replace the vae
29:05um so you don't necessarily need
29:06diffusion for great images
29:08um it's more because language is
29:09semantically dense I think versus images
29:11and there's a lot more accuracy that's
29:14required for these things
29:16um that I think there are various
29:17breakthroughs that can occur like we
29:19have an attention free Transformer model
29:21basis in rwkb that we've been funding
29:24we've got a 14 billion parameter version
29:25of that coming out that has showing
29:27amazing kind of progress
29:29but I think that the way to kind of look
29:32at this is we haven't gone through the
29:34optimization cycle of language yet
29:36so open AI again amazing work they do
29:39they announced in struck GPT they're 1.3
29:41billion parameter version outperformed
29:44175 billion parameter gpd3 you look at
29:48um kind of flan T5 the instruct version
29:52of the t5x Excel model from Google
29:56the three billion parameter version
29:58outperforms GPT at 175 billion
30:01in certain cases you know these are very
30:04interesting results and it's one of
30:05those things that as these things get
30:06released it gets optimized so like with
30:09stable diffusion leave aside the
30:10architecture day one 5.6 seconds for an
30:13image generation using an a100 now 0.9
30:17seconds with the additional
30:18breakthroughs that are coming through
30:19will be 25 frame images a second that's
30:22100 times speed up over 100 times just
30:25from getting it out there and people
30:26interested in doing that I think
30:27language models will be similar and I
30:30don't think that you need to have
30:31ridiculous scale when you can understand
30:32how humans interact with their models
30:34and when you can learn from the
30:36collective of humanity
30:37so like I said a very different approach
30:40small language models or medium ones
30:43versus let's train a trillion parameters
30:46and I think there will be room for both
30:48I think it will be use these amazingly
30:51packaged services from Microsoft and
30:54if you just want something out the box
30:55or if you need something trained on your
30:57own data with privacy and things like
30:58that that may not be as good but maybe
31:00better for you use an open source space
31:03and work with our partners at sagemaker
31:04or whoever else you know can you talk
31:06more about that in the context of your
31:08business model and in your approach you
31:10mentioned that you think that some of
31:12the areas of stability will be focused
31:14on is media and then proprietary and
31:16regulated data sets so and if there's
31:18things you can share right now in that
31:19area if not no worries but if you kind
31:20of be interesting to learn more about
31:22you know how you view the business
31:23evolving sure so like now we're training
31:26on hundreds and soon thousands of
31:28Bollywood movies to create Bollywood
31:30video models with our partnership with
31:33um you know and that is exclusively
31:34licensed we'll have Audio models coming
31:36as well as the era one model or whatever
31:40um you know we're talking to various
31:41other entities as well and this is why
31:43we have the partnership with Amazon and
31:45sagemaker so there'll be additional
31:46services that can train models for your
31:48behalf for most people our focus is on
31:51the big models for kind of Nations the
31:54big models for the largest corporates
31:55who will need to train their own models
31:57one day and that's really difficult
31:58there's only like 100 people who can
32:00train models in the world like it's not
32:02really a science it's more an art like
32:04losses exploit all over the place when
32:05you try to do something and so we're
32:07going to make it easy for them and we're
32:09going to be inside the data centers
32:10training their own models that they
32:12control and our open source models then
32:14become The Benchmark models for everyone
32:16like again we have access to the neural
32:17engine dedicated teams at Intel and
32:19others kind of working on optimizing
32:22that is the model of the framework and
32:24the open model is optimized and then we
32:26take and create private models and again
32:28I think that's complementary to the apis
32:30and other things you will see from
32:31Microsoft Google Etc because yeah you
32:35yeah some of the other areas that you've
32:37talked about I think in interesting ways
32:39is about how AI can be used to make our
32:42democracy more direct and digital a
32:44little bit more about
32:46um you know broader Global impact could
32:47you could you extrapolate a bit more
32:49yeah so I think you know if you have to
32:52look at intelligence augmentation right
32:54like information Theory and classical
32:56Channel ways information is valuable in
32:58as much as it changes the state and
33:00we've obviously seen political
33:01information become more and more
33:04like manipulation of stories and things
33:07like that so the Divide has been grown
33:08what if we could create an AI that could
33:10translate between various things make
33:11things easier to understand and make
33:14people more informed I think that would
33:16be ideal with some of these National and
33:18public models and interfaces being
33:19provided to people and then that can be
33:22very positive for democracy and allowing
33:24people to really understand the needs
33:26like you can already with chat GPT when
33:29you train it on nature of yourself it
33:31can summarize for your perspective
33:33you know that's amazing thing right you
33:36can to talk like a five-year-old or a
33:38six year old or an eight-year-old or a
33:3910 year old once it starts understanding
33:41Sarah in a large that will be even
33:42better and again you don't need to send
33:44open students to do that the opening AI
33:46embeddings API is fantastic but I think
33:48there'll be more and more of these
33:49services that allow there to be that
33:50filter layer between us and this mass of
33:53information on the internet that will be
33:55amazing I think if we build the
33:56education systems and other things
33:58correctly as well this young lady's
33:59Illustrated primer that we're going to
34:01give to all the kids in Africa and
34:02Beyond like again let's really blue sky
34:05think how can we get people engaged with
34:07their communities and societies because
34:09it will be a full open source stack not
34:11only education and Healthcare and Beyond
34:12that's super exciting I think again
34:15that's the future of how we come
34:17you want to come together to form a
34:19human Colossus like in the weight but
34:20why style where you get done on my
34:23language and I think this is one of the
34:24best ways for us to do that leveraging
34:26these Technologies it's okay we don't
34:28have commercial sponsors
34:30there's actually a book called Lady of
34:33Mazes that's a agi-centric book from
34:36like 10 years ago and basically the idea
34:37is sort of what you mentioned where as a
34:40different agis gain models of how a
34:43subset of the population thinks about
34:45certain issues to substantiates into a
34:48virtual person who's basically
34:49representing them in some House of
34:51Representatives equivalents so you don't
34:52actually have to vote
34:54the AGI just kind of synthesizes group
34:56opinions and then turns it into
34:57Representatives yeah and you have to
34:59think about you know with the advances
35:01like meta's amazing work on Cicero for
35:03example you know beating humans on
35:06diplomacy they used eight different
35:07language models combined like I think
35:08this is the future not just zero shot
35:10multiple models interacting with each
35:12other is the way full stop
35:14like any type of the issue and mechanism
35:18designs perspective of kind of the game
35:20theory of our current economy is that
35:22there is no Central organizing factor
35:23that we trust like what is the trust in
35:25Congress like I think they trust
35:26Congress less than cockroaches no
35:28offense to Congress please don't bring
35:29me up like it's just a poll right
35:34people will earn towards trusting
35:36machines as it were and machines are
35:39capable of load balancing now they're
35:40capable of load balancing facts and
35:42things and so we have to be super
35:43careful as we integrate these things
35:45what that looks like because they will
35:46make more and more decisions for us
35:49um that could be for our benefit you
35:50know like I said as you said having an
35:52AI that speaks on our behalf and
35:54amalgamates but then we need to make
35:56sure that these aren't too profile and
35:58fragile as we see more and more of our
36:00own personal authorities to them because
36:02they are optimizing this is also one of
36:04the dangers on the alignment side like
36:06you know as we introduce rlhf into some
36:08of these large models there are very
36:10weird instances of mode collapse and how
36:14um I do say these large models as well
36:16should be viewed as fiction creative
36:18models not Factor models because
36:19otherwise we've created the most
36:21efficient compression in the world does
36:23it make sense you can take terabytes of
36:24data and compress it down to a few
36:26gigabytes with no loss now of course you
36:28lose something you lose the factualness
36:30of them but you keep the principle-based
36:33analysis of them so we have to be very
36:35clear about what these models can and
36:36can't do because I think we will see it
36:38more and more of our Authority
36:39individually as a society to the
36:42coordinators of these models could you
36:45talk more about that in the context of
36:46safety because ultimately one of the
36:48concerns that sort of increased in the
36:50AI Community is AI safety and there's
36:52sort of three or four components of that
36:54there's alignment you know will Bots
36:56kill humans or whatever form you want to
36:58put it in there's um not kill us but
37:01yeah they'll just have a giant rohf farm
37:04on top of us or something
37:06um there's the concern around certain
37:08types of content pedophilia et cetera
37:11um people don't want to have exist in
37:13society for all sorts of positive
37:14reasons uh there's politics you know
37:17there's concerns for example that
37:20um AI may become the next big
37:22Battleground after social media in terms
37:25of political viewpoints being
37:26represented in these models with the
37:27claims that they're not political
37:28viewpoints and so I'm sort of curious
37:30how you think about AI
37:32safety more broadly particularly when
37:34you talk about trust of models to your
37:36point part of it is fact versus fiction
37:38but part of it may also be well it looks
37:39like it's political and so therefore
37:41maybe I can't trust it at all yeah I
37:43don't think technology is neutral
37:45so I'm not one of the people that
37:47adheres to that especially with the way
37:48we build it it does reflect the biases
37:51and other things that we have in there I
37:53did kind of Follow The Open Source thing
37:55because I think we can adjust that on
37:56the alignment side you know it was
37:57interesting Luther basically split into
37:59two part of it instability and the
38:01people who work here on capabilities the
38:03other part is conjecture that does
38:05specific work on alignment
38:07um and they're also based here in London
38:09and I think it's not easy right I think
38:12that everyone is ramping up at the same
38:14time and we don't really understand how
38:16this technology works but we're doing
38:18our best you know yeah people like kind
38:20of Riley Goodside and others prompt
38:21Whisperers who are like wait like what
38:23Earth it can do all these kind of things
38:27um I think that there needs to be more
38:29formalized work and I actually don't
38:30need some regulation around this
38:33we are dealing with an unknown unknown
38:36and I don't think we're doing good
38:38enough kind of tying things together
38:39particularly as we stack more layers and
38:41we get bigger and bigger and bigger I
38:43think small models are less dangerous
38:44but then the combination of them may not
38:47you've mentioned before like this um
38:50support for the idea of regulation of
38:52large models what would be a productive
38:55outcome of that regulation that you can
38:57imagine I think that a productive act on
39:00that regulation is anything above a
39:01certain level of flops needs to be
39:02registered similar to
39:04well bio weapons and things that have
39:07the potential for dual use I think there
39:09needs to be a dedicated international
39:10team of people who can actually
39:12understand and put in place some
39:14regulations on how do we test these
39:16models for things like you know the
39:17amazing work anthropic recently did with
39:19constitutional models and other things
39:21like that we need to start pulling this
39:22knowledge as opposed to keeping it
39:24secret but there is this game theoretic
39:26thing of one of the optimal ways to stop
39:28AGI happening is to build your own AGI
39:30first and so I'm not sure if that will
39:33ever happen but we're in a bit of a bind
39:35right now which means that everyone's
39:36having their own arms race when
39:38governments decide and they don't
39:39believe they're decided yet that having
39:41an AGI is a number one thing
39:43tens of billions hundreds of billions
39:45will go into building bigger models and
39:47again this is very dangerous I think
39:49from a variety of different perspectives
39:50so I prefer multilateral action right
39:52now as opposed to in the future
39:54so I put that out there I can't really
39:56drive it I'm already dying from all the
39:58other workers it is but I do believe
39:59that should be the case
40:01um I think going on to kind of the next
40:03one as you said the political biases and
40:05things like that we can use this as
40:06filters in various ways
40:08um and I think one of the interesting
40:10things and the other thing I've called
40:11regulation of uh maybe I should do a bit
40:13more loudly is you have a lot of
40:15companies that have ads as one of their
40:18and adds a basic manipulation and these
40:20models are really convincing they can
40:22write really great Pros my sister-in-law
40:24creates a company semantic they can do
40:26human realistic emotional voices she did
40:29like Val Kilmer's voice for his
40:30documentary and stuff like that before
40:33it's going to be crazy the types of ads
40:35that you see and we need to have
40:37regulation about those soon because
40:38you're going to see Mata and Google and
40:40others trying to optimize for engagement
40:44like manipulation fundamentally and I
40:47think that those can then be co-opted by
40:50various other parties as well on the
40:51political Spectrum so we need to start
40:52building some sort of protections around
40:56um what was the final one sorry elad uh
40:59I was just asking about uh and I think
41:01Sarah asked the question around
41:03um you know where do you think
41:04regulation should be applied in what are
41:06the what would be positive outcomes of
41:08that versus negative outcomes yeah so I
41:10think you know there should be
41:11um these elements around identification
41:14of AI especially on Advertising I think
41:16that there should be regulation on very
41:18large models in particular
41:20um European Union introducing a CE Mark
41:25generative AI restrictions where the
41:28creators are responsible for the outputs
41:30I think is the wrong way but there are
41:32other ones as well like I would call for
41:34optout mechanisms and I think we're the
41:37only ones building those for data sets
41:39um because we're also building some of
41:40these data sets and trying to figure out
41:42attribution mechanisms for opt-in as
41:44well on the other side like right now
41:46the only thing that is really kind of
41:50checked is robots.txt which is kind of
41:52thing on scraping but I think again it's
41:54evolving so fast that people might be
41:56okay with scraping but they may not be
41:57okay with this legally it's fine but
42:00then I think we should make this more
42:02and more inclusive as things go forward
42:03so that's for example if an artist
42:05doesn't want their work represented in
42:07the Corpus that a machine is trained on
42:09for example yes and it's difficult it
42:11isn't just a case of you know don't look
42:13at deviantART on my website like what if
42:15your picture is on the BBC or CNN with a
42:18label it will pick that up
42:20you know so it's a lot harder this is
42:22why like we trained our own open clip
42:24model we have the new clip G Landing
42:26this week that's even better on zero
42:30um because we need to know what data was
42:31on the generative and the guidance side
42:34so that we could start offering opt out
42:36and opt-in and these other things yeah
42:38and then I guess uh one other area that
42:40people often talk about safety is more
42:42on defense applications and the ethics
42:44of using some of these models in the
42:46context of defense or offense from a
42:48national perspective
42:49what's your view on that
42:51I think the bad guys I'm gonna put that
42:53in quotes uh have access to these models
42:55already and thousands and thousands of
42:57a100s I think you have to stop building
42:58defense but it's a very difficult one
43:00like we were going to do a 200 000 deep
43:03fake detector prize but then it was
43:04pointed out quite reasonably that if you
43:06create a prize for a detector then
43:09well a balancing effect where you have a
43:11generator and a detector and they bounce
43:12off each other and you just get better
43:14and better and better so now we're
43:15trying to rethink that maybe we'll offer
43:17a prize for the best ejection of how to
43:19kind of do this similar to you know chat
43:21GPT is detectable but not really
43:23um so I think the defense implications
43:26of this it's largely around kind of
43:27misinformation disinformation this is an
43:29area that I have advised multiple
43:30governments on with my work on counter
43:32extremism and others it's a very
43:34difficult one to unpick but I think one
43:35of the key things here is having
43:37attribution-based mechanisms and other
43:38things for curation because our networks
43:41are curated and so this is where we've
43:43teamed up with like Adobe on content or
43:45authenticity.org and others I think that
43:47metadata element is probably the winning
43:49one here but we have to standardize as
43:51quickly as possible around trusted
43:52sources I think people already don't
43:54believe what they see though which is a
43:57good thing and a bad thing if you want
43:58to have those trusted coordinators
44:00around this uh beyond that and some of
44:03the more severe kind of things around
44:05drones and Slaughter Bots and things
44:07like that I I don't know how to stop
44:11and I think that's a very complicated
44:13thing but we need an international
44:14compact on that because again this
44:16technology is incredibly dangerous when
44:19used in those areas and I don't think
44:20there's enough discussion at the highest
44:22levels on this given the pace of
44:26I think that's all we have time for
44:27today so one last important question for
44:29you what controversial prediction you
44:32seem like an optimist but uh good or bad
44:34about AI do you have over the next five
44:37um I think that small models will
44:39outperform large models massively likes
44:41of the hive model aspect and you will
44:44see chat GPT level models running on the
44:46edge on smartphones in five years which
44:50great thanks so much for joining us
44:52amazing conversation as usual it's my