00:05thanks Percy great welcome
00:08um so I think just to start can you tell
00:10us a little bit about how you got into
00:12uh the machine learning research field
00:14in your personal background yeah so I've
00:17been in the field of machine learning
00:19and natural language processing for over
00:2120 years I started getting into an
00:23undergrad I was undergrad at MIT I like
00:26Theory I had a fascination with
00:28languages I was fascinated by how humans
00:31could just be exposed to just strings of
00:35text uh I mean a speech and somehow
00:39acquire very sophisticated understanding
00:40of the world and also syntax and learn
00:42that in a fairly unsupervised way and my
00:46dream was to get computers to do the
00:48same so then I went to grad school at
00:51and then after that started at uh
00:54Stanford and ever since I've been
00:56pursuit of uh you know developing
00:58systems that could really truly
01:00understand natural language
01:02um and of course in the last four years
01:05um this once upon a time kind of dream
01:08has really kind of taken off in as in a
01:13um maybe in a not a way that I would
01:15necessarily ex expect but with a coming
01:18out of uh large language models such as
01:21gpt3 it's truly kind of astonishing how
01:24much of the structure of language and
01:28the world that these models can can
01:30capture in some ways it kind of Harkens
01:33back when I actually first started in
01:34NLP I was training language models but a
01:39very different type it was based on
01:40hidden Markov models and there the goal
01:43was to discover hidden structure in in
01:46text and we were I was very excited by
01:48the fact that it could learn about tease
01:51apart what words were like city names
01:53versus days of the week and so on but
01:56now it's kind of on a completely
01:58different level was there a moment since
02:00you know you've worked on multiple
02:02generations of NLP at this point you
02:04know pushing the Forefront of semantic
02:06parsing was there a moment at which you
02:09um decided that you know you were going
02:10to focus on Foundation models and large
02:12language models yeah there was a very
02:15decisive moment and that moment was when
02:17gpt3 came out okay that was in the
02:19middle of the pandemic
02:20um and it wasn't so much the
02:22capabilities of the model that um
02:25shocked me but it was a way that the
02:26model was trained which was basically
02:28taking a massive amount of text and
02:30asking them all to predict the next word
02:31over and over again you know billions of
02:34times and just that simple objective and
02:37a very simple principle
02:40what Rose from it was not only a model
02:43of that could generate fluent text but
02:45also a model that could do in context
02:46learning which means that you can prompt
02:49the language model with instructions for
02:51example summarize this document give it
02:53some examples and have the model on the
02:55fly in context figure out what the task
02:59was and this was a paradigm shift in my
03:02opinion because it changed the way that
03:04we conceptualize machine learning and
03:07NLP systems from these bespoke systems
03:09where you're it's trained to do question
03:10answering to train to do this to just a
03:15substrate where you can ask the model to
03:18do various things and then the idea of a
03:21task which is so Central to AI I think
03:24begins to dissolve and I find that
03:28um and that's the reason later in 2021
03:32we founded the center for research and
03:35Foundation models we coined the term
03:37Foundation models because we thought it
03:39there was some Dean that was happening
03:41in the world that was that somehow large
03:44language models didn't really capture
03:46the significance and it was not just
03:48about languages about images and
03:50multimodality it was a more General
03:51phenomenon and we coined the term
03:53Foundation models and then
03:55um then the center started and it's been
03:58sort of you know a kind of a roller
04:00coaster ride ever since we're going to
04:02be talking a thing about both um your
04:04experiences in research in Academia and
04:06then we'll also separately be talking
04:07about together which is a company you're
04:09involved with now can you tell us a
04:11little bit more about what the center
04:12does and what you're focused on
04:14yes so the center for research on
04:16Foundation models uh started two years
04:18ago is under the human-centered AI
04:21Institute at Stanford and the main
04:25mission of the center is I would say to
04:29increase transparency and accessibility
04:31to Foundation models so Foundation
04:34models are becoming more and more
04:37ubiquitous but at the same time one
04:40we have noticed is the lack of
04:43transparency and accessibility of these
04:45models so if you think about the last
04:47decade of deep learning it has profited
04:51a lot from having a culture of openness
04:53with tools like Pi torch or tensorflow
04:56data sets that are open People
04:58publishing openly about paper about the
05:02research and this has led to a lot of
05:05community and and progress not just in
05:09Academia but also in Industry with
05:11different startups and hobbyists and
05:13whoever just getting involved and what
05:15we're seeing now is sort of a retreat of
05:18that open culture where models are now
05:23being only accessible via apis we don't
05:27really know all the secret sauce that's
05:29going behind them and they're sort of
05:30limited access what's your diagnosis of
05:33why that's happening
05:35I think that this is very natural
05:38because these models take a lot of
05:42um you know Capital to to train there
05:45are enormous amount of um you can
05:47generally a lot of value and it's a
05:49competitive Advantage so you know
05:52incentives are to to keep these under
05:54control there's also another Factor
05:57which is safety reasons I think these
06:01models are extremely powerful and
06:04um maybe the models right now I think
06:06are well if they were out and open it
06:09would be maybe okay but in the future
06:10these models could be extremely good and
06:12having them you know anyone anything
06:14goes might uh we might have to think
06:18about that a little bit more carefully
06:19how do you think all this evolves in
06:21terms of if you look at the history of
06:22ml or NLP or AI we've had these waves of
06:26innovation and Academia and then we've
06:28had waves of innovation and
06:29implementation in industry and it's in
06:31some cases we've had both happening
06:32simultaneously but it feels a little bit
06:34like it's ping-ponged over time in
06:35different ways now that people are
06:38starting to be more closed in terms of
06:39some of these models on the industry
06:41side and Publishing less and being less
06:43open how do you view the role of
06:45Academia and Industry diverging if at
06:47all like do you think it'll be different
06:49types of research that each type of
06:51institution tackles do you think
06:53there'll be overlap and sort of curious
06:54how you how you view all that evolving I
06:56mean I think industry and Academia have
06:57very distinctive and important functions
07:00and otherwise when I tell my students
07:02well we should be working on things that
07:04are lean on academia's competitive
07:06advantage and historically I think this
07:08has meant different things so before ml
07:12was that big I think a lot of academic
07:15research was really about developing the
07:17tools to make these models work at all I
07:20remember working on systems and being ml
07:23models back in grad school and basically
07:26it wasn't working I mean computer wasn't
07:28working Vision wasn't working question
07:30answering wasn't working and and I think
07:33the goal of Academia there was to make
07:35things work and and a lot of advances uh
07:40that were born out of Academia then
07:43influence other ideas and influence
07:44other ideas before it started clicking
07:47and now we're seeing this uh a lot of
07:49the fruits of both Academy industry as
07:52research fueling this kind of Industry
07:54Drive that you you see today
07:57and now today I think it's the dynamic
08:00is is quite different because
08:03it's no longer Academy's job isn't just
08:06to get things to work because
08:08um you can do that in other ways there's
08:10a lot of resources going into
08:12um tech companies where there's if you
08:15have data on compute you can just sort
08:18scale and blast through a lot of
08:21barriers and I think a lot of the role
08:24of Academia is understanding because
08:27these models for all their impressive
08:31Feats we just don't understand what they
08:33work how they work what the principles
08:38um you know what's how does this
08:40training data how does a model
08:41architecture affect the different
08:42behaviors what is a the best way to
08:46weight data how do you what's a training
08:49objective many of these questions
08:50benefit from a more rigorous you know
08:53analysis the other piece which is a
08:56different type of understanding is
08:57understanding social impact and this is
08:59going back to the question about what is
09:03um is uh is a center with over 30
09:07different faculty across 10 different
09:09departments at Stanford so it's quite
09:12interdisciplinary so we're looking at
09:14Foundation models not just from a
09:15technical perspective of how do you get
09:17these models to work but also thinking
09:20about the their economic impact the
09:23challenges when it comes to uh copyright
09:26and legality or working on a paper that
09:28explores some of those questions we're
09:30looking at you know different questions
09:32of you know uh social biases and
09:35thinking through carefully how the
09:38impact of these models have on issues of
09:42you know homogenization where you have a
09:45model that's making perhaps decisions
09:49for a single user across all the
09:52different aspects and this so some of
09:56these are the types of questions they're
09:58also people at the center looking at
10:00risks of disinformation monitoring to
10:03what extent these uh these tools are so
10:05persuasive which they are getting
10:07increasingly so and what are the actual
10:10risks when it comes to let's say foreign
10:13State actors leveraging this technology
10:16and there's also people at the the
10:19center who are in medicine and we're
10:22exploring ways of leveraging foundation
10:24models and deployment and actual
10:27clinical practice and that's very
10:30exciting because that's again something
10:33that's we benefit from having a hospital
10:36attached to to Stanford on your term do
10:39you think some of the deployments are
10:40because if you go back into the 70s
10:42there was like the mycene project here
10:43at Stanford which is an expert system
10:45that outperformed Stanford Medical
10:47School staff at predicting what
10:49infectious disease somebody had for
10:50example and that was 50 years ago or
10:52almost 50 years ago and it never really
10:55got implemented in the real world and so
10:56one of my concerns sometimes in terms of
10:58the impact of some of these things is
10:59are there industries that are resistant
11:01to adoption or resistance to change and
11:03it's exciting to hear that you know at
11:05Stanford they're actually starting to
11:07look at how do you actually integrate
11:08these things into real clinical care do
11:10you view those things as very far out on
11:11the healthcare side to view them as sort
11:13of nearer I know that isn't the main
11:14topic we're going to cover but I'm a
11:15little bit curious given how close you
11:16are to all this yeah I think it's a it's
11:21um I think there are a bunch of
11:23different issues that need to be
11:25resolved uh for example Foundation
11:27models are training a lot of data how do
11:29you deal with privacy how do you deal
11:31with robustness because once you're
11:34talking about you know in the healthcare
11:36spaces especially there are cases where
11:39we know that these models can still
11:41hallucinate uh facts and sound very
11:43confident in doing so
11:48yeah there you go so but you've also
11:51taken a point of view that we should you
11:53know expect superhuman if we if we see
11:56superhuman performance from these models
11:58like holding them to the standard of a
12:00human doctor is actually insufficient as
12:02well right yeah I I think that's a
12:04that's a great point is that
12:06for ages human level has been the target
12:09for for AI and that has really been kind
12:14of a North star that has fueled many
12:17dreams and efforts and so on over the
12:19decades but I think we're getting to a
12:21point where a lot of many axes it's it's
12:24a superhuman or should be superhuman and
12:27and I think we should maybe Define more
12:30of an objective measure of like what we
12:32actually want we want something that's
12:34very reliable it's grounded uh you know
12:37I often want more statistical evidence
12:40when I you know speak to doctors and
12:42sometimes fail to get that and have
12:45something that would be sort of a lot
12:47more principled and rational and and so
12:51this is more of a general statement
12:53about how we should think about
12:54technology not just chasing after
12:57mimicking a human because we already
12:59have a lot of humans and yeah that's an
13:01interesting point because
13:02um if you're pushing a lot of metrics
13:04around what actually works from an
13:05adoption perspective that's an area that
13:07certain aspects of healthcare work
13:09extremely well at and then certain areas
13:10are still deficient in and so it'll be
13:13interesting to see how you have to
13:14change certain aspects of culture in
13:15order to be able to measure when you
13:17adopt a new technology it's impact in
13:19that specific area so I think it's it's
13:20really fascinating to watch all this
13:22evolve right now now you've done
13:23extensive research on natural language
13:25processing and computational semantics
13:27can you explain what those terms mean
13:29and how they're relevant to the
13:33so computational semantics is the
13:37process where you take language text and
13:41compute quote-unquote meaning from it
13:43and that is something I'm not going to
13:46you know maybe try to attempt to
13:49um Define there's a huge literature of
13:52linguistics and you know philosophy
13:54about what what meaning is I would say
13:57that a lot of my research in in the past
13:59maybe 10 five to ten years ago was
14:03adopting this view that
14:05language is a programming language it
14:08computes you can give orders you can
14:10instruct you can do things with language
14:12and therefore it was natural to model
14:14natural language as a formal language so
14:17a lot of semantic parsing is about
14:20mapping natural language into a uh you
14:24know formal space so that machines could
14:26execute this and so one concrete
14:28application of this that I worked on for
14:30a while is mapping natural language
14:31questions into essentially SQL queries
14:34which obviously has many different
14:36applications as well
14:40nice about this framework is that you
14:43were really sort of understand to really
14:47do this you have to understand how the
14:49words contribute to different parts of
14:51the SQL query and then you could get
14:53something that was a program that you
14:56could execute and you deliver the
14:57results as opposed to many question
14:59answering systems which you ask a
15:02question maybe retrieve some document
15:04and you're retrieving the answer or
15:06either that or making something up
15:08rather than Computing it rigorously so
15:11so that was a paradigm I was working in
15:14um maybe five or ten years ago
15:16but the main problem is that the world
15:18isn't a database a small part of the
15:20world is a database but most of a world
15:22is unstructured and then I started
15:26thinking about question answering in
15:28general and we developed the squad
15:30question answering Benchmark to fuel uh
15:33progress in open domain question
15:36answering and that in turn and many
15:40other data sets that were developed
15:41either both at Stanford and elsewhere I
15:44think led to the the development of
15:47these powerful language models that then
15:49like Bert and Roberta
15:52that and Elmo back in about 2018 to then
15:57many years ago many years ago ancient
16:00um to more like 2020 generation of you
16:05know these large Foundation models
16:08there are cases where you want to just
16:11map natural language into say people
16:14call it tool use like you ask some some
16:18question that reverse calculation you
16:20should just use a calculator rather than
16:21trying to sort of quote do it in the
16:24Transformer's head but there's also a
16:27lot of aspects of reasoning which are
16:30not quite formal we do this all the time
16:33and a lot of that happens kind of
16:36natively in the in the language model
16:38and I think it's still an interesting
16:42question how to kind of marry the two I
16:46feel like the two are still jammed
16:48together in a way where and maybe it's
16:51natural because there's certain things
16:53you can do in your head so certain
16:54things you can invoke a tool to use but
16:58this has been also one of the the
17:00classic debates in AI there's neural
17:04versus symbolic for a while symbolic AI
17:06was dominant now neural AI has come
17:09really taken off and become dominant but
17:13some of those Central problems of how do
17:15you do planning how do you do reasoning
17:17which was this focus and study of
17:20um symbolic AI are now again really
17:23relevant because now we've moved past
17:25just simple classification
17:26and just entity extraction but now more
17:30to to more ambitious tasks what do you
17:32think of some of the more interesting
17:34research programs right now in that area
17:40it's it's interesting to uh remark on
17:43what's happening because to A first
17:47larger models trained on the relevant
17:50data seem to do well on various
17:55I think that maybe there isn't enough
17:58data efficiency and how quickly you can
18:01get and how robustly you can get to
18:03these points because we know uh it has
18:06been well documented that benchmarks can
18:07be gamable so even though you do want a
18:09benchmark doesn't mean you've
18:10necessarily solved the problem so I
18:14think one has to be a little bit
18:17so obviously scale on more data is just
18:19one clear Direction but in terms of
18:22orthogonal directions what are the the
18:23methods several things have to happen
18:26one is uh we have to have ability to
18:30handle greater context lines if you
18:32think about a long reasoning chain you
18:35know Transformers are fixed and there's
18:38ways to extend it but fundamentally it's
18:40sort of a fixed model
18:44let's say Advanced problem solving for
18:46example if you want to solve
18:48a math problem you'll improve something
18:51the language model generates sort of
18:53thinks I'll uh this Chain of Thought and
18:55generates token by token and then it
18:57generates something but we know that
19:00humans when they solve a problem you try
19:03different things you backtrack there's
19:05it's much more flexible iterative
19:09um and it can last a lot longer than
19:12you're going for a few iterations and
19:15what is the architecture that can handle
19:18that level of complexity I think is
19:20still an outstanding question is there
19:24any aspects of foundation or large
19:26language models that are emergent that
19:27you didn't anticipate or that really
19:30I think going back to gpd3 I think in
19:34context learning is something that
19:37um surprised many people including me so
19:41here you're prompting a language model
19:42with an instruction and input output
19:46um you know here's a sentence it's
19:49classified positive here's a sentence a
19:54um and the model is somehow able to
19:56latch on to these examples and sort of
20:00figure out what you're trying to do
20:02um and solve the task and this is really
20:05intriguing because it's it's emergent it
20:08wasn't hand coded by the designers too
20:10oh I want to do in context learning this
20:12way now of course you could have done
20:14that but I think the the real sort of
20:16magic is you didn't have to do that in
20:18it yet it still does something it's not
20:20completely reliable but but it's it sort
20:23of can get better with um with better
20:26models and you know better data then
20:28there's Chain of Thought that sort of
20:31emerges from at a certain scale
20:35um do you want to explain what that is
20:36so the idea is if I have a question
20:39that's present to a language model
20:42the language model could just answer and
20:45it'll maybe get it right or wrong but if
20:47you ask a language model to generate an
20:49explanation of how it would solve the
20:53problems kind of thinking out loud then
20:55it's much more likely to get the answer
20:57right and this is very
20:59natural that um you know it would be the
21:02case for humans as well but the fact
21:04that again the Chain of Thought uh the
21:07capability is something that you know
21:09emerges the other thing I think is
21:12really wild is this and I think it's
21:15maybe a general principle which is the
21:17ability to mix and match so you can ask
21:21the model to explain the quicksort
21:23algorithm in the style of Shakespeare
21:25and we'll actually construct something
21:27that is semantically pretty on point but
21:31also uh stylistically you know much much
21:35better than whatever many people could
21:37come up with which means that it has
21:41learned different concepts of what
21:43Shakespeare and what quicksort are and
21:46is able to fuse them so if you think
21:48about creativity I think this is sort of
21:50an example of creative use
21:53um you know people say that sometimes
21:55all language models just memorize
21:56because they're so big and train on
21:58clearly a lot of text but these examples
22:01I think really indicate that there's no
22:03way that these language models are just
22:05memorizing because this text just
22:07doesn't exist and you have to have uh
22:10some creative juice and invent something
22:14um and I I think it's just to kind of go
22:16on Riff on that a little bit I think the
22:19creative aspects of these language
22:21models with the potential for scientific
22:24discovery or doing research or pushing
22:27the boundaries of what we uh beyond what
22:32humans can do I think is really really
22:34fascinating because up until now again
22:36remember the the AI dream tops out at
22:39humans but but now we can actually go
22:42beyond in many many ways and I think
22:45that unlocks a lot of possibilities yeah
22:47there are a lot of really interesting
22:48examples I mean you could actually argue
22:50that uh like connecting New Concepts in
22:53any novel way is creativity but I love
22:56the one that is just discovering like
22:58new tactics and go that humans haven't
23:00discovered after thousands of years at
23:01play yeah yeah actually it will ask if
23:04you'll risk making a prediction that is
23:08um emergent behaviors of models at the
23:11next level of scale anything you might
23:14emerging capabilities if we wouldn't
23:15have thought chain of uh Chain of
23:18Thought or in context learning would
23:20I I can give you an example of something
23:23I think is emerging and I can give you
23:25an example of a hope but I don't know
23:28what I would call a prediction so what
23:30we're seeing today is
23:33the ability to instruct a model
23:36using natural language to do certain
23:39things you see a lot of this online with
23:42chai gbt and Bing chat where you can
23:44just and some of anthropics work as well
23:46you can instruct a model uh to be
23:50succinct generate three paragraphs in
23:53the style of and so on you can lay out
23:56these guidelines and have the model
23:59actually follow so this instruction
24:00following ability is getting extremely
24:03good now I will say that
24:07how much is emerging and how much is uh
24:10not it's hard to tell because a lot of
24:13these models it's not just a language
24:14model that's trained to predict the next
24:16word there's a lot of you know Secret
24:19Sauce that goes under the hood so and if
24:21you Define emergence of you know it was
24:24not intended by the designers I don't
24:26know how much of that is emergent but at
24:28least it's a capability that I think is
24:31language models currently makes stuff up
24:33they hallucinate and this is clearly a
24:40um and almost in some ways a very
24:44difficult problem to crack the hope is
24:46that as models get better that some of
24:51this will actually go away
24:53I don't know if that will happen
24:56um but but I think that would be
24:58extremely nice because I I guess the way
25:01I think about these these models is that
25:03on their they're doing some sort of if
25:07anything about predicting the next word
25:09it's it seems very simple but you have
25:12to really internalize a lot of what is
25:15going on in this context what are the
25:17previous words what's the syntax what's
25:19who's saying them and all of that
25:23information and context has to get
25:25compressed and then that allows you to
25:27predict the next word
25:29and if you're able to do this extremely
25:33well then you sort of
25:36have a model of what's happening in the
25:38world at least the world that you've
25:40captured in in text and so while the
25:45notion of Truth might be you know
25:46ambiguous in in many cases I think the
25:49model can get an idea of what certain
25:53you know parts of Internet are maybe
25:55reliable and what parts of the internet
25:57are not and what kind of you know the
26:02you know entities and you know dates and
26:04locations and what activities there are
26:08I I think that will maybe uh become more
26:13Salient in the model like if you think
26:15of model language model that's just
26:18um predicting the next word and it's
26:19only trained to do that and you say elad
26:25of course it's going to mix you know
26:27something up without further context but
26:30if it has a better understanding of
26:32what's happening and of of course with
26:34more context then maybe it can use that
26:37context to actually know that well okay
26:40well I don't know maybe I should ask
26:42words yeah yeah so scale is basically
26:44increasing the statistical accuracy of
26:46the prediction on the next word because
26:47you have more context and more data by
26:49which to infer what's coming and
26:51therefore it will reduce hallucinations
26:52because you're increasing accuracy
26:54yeah so I I think there's
26:58pre-training which is uh predicting the
27:01the next word and developing a world uh
27:04model so to speak and with those
27:07capabilities then you can you still have
27:11to say don't hallucinate but it will be
27:13much easier to control that model if it
27:16has a notion of what hallucination even
27:20there was um I was talking to somebody
27:22who was close to the development of the
27:24uh Transformer model and his claim was
27:26that one of the reasons it's done so
27:28well is to your point around scale right
27:29eventually you hit enough scale that you
27:31see that it's it clearly has these
27:32really interesting emergent properties
27:33so you keep scaling it up and you keep
27:35sort of growing it and so therefore it's
27:37like a self-reinforcing loop to keep
27:38using these types of models and his
27:40claim was that um it's expensive to do
27:42that sort of scale and so therefore
27:44there may be other architectures or
27:46approaches that we've just never scaled
27:48up sufficiently in order to actually see
27:50if they have the same emergent
27:51Properties or certain characteristics
27:52that may be superior how do you think
27:54about that from the perspective of you
27:56know just going down the path of the
27:57Transformer side versus other
28:00architectures it may be really
28:01interesting and maybe neglected because
28:02we just yeah we just haven't thrown
28:03enough compute at them because it's
28:04expensive yeah I really hope that in 10
28:07years we won't be reusing the
28:08Transformer because I think the
28:10Transformer is as I mean it's a very
28:12good Orchestra people have tried to
28:13improve it but it's sort of like kind of
28:15good enough for for people to press
28:17ahead but scientifically there's no
28:20reason to believe that this is the one
28:24and there have been some efforts so one
28:27of my colleagues Chris Ray and his
28:28students have developed other
28:30architectures which are actually at
28:33smaller skills competitive with with
28:35Transformers and actually don't require
28:38the central operation of attention and I
28:41would love to see much more research
28:44exploring other alternatives to
28:45transformers this is something again
28:47that Academia I think is very well
28:49suited to do because it involves kind of
28:52challenging the the status quo you're
28:54not really trying to just get it to work
28:57and get it out there but you're trying
28:59to reflect on what are the principles
29:00what can we learn from Transformers what
29:03is it trying to do and how can we
29:04incorporate them in a much more you know
29:09at some level it's still going to be
29:12about compute right so scaling loss for
29:15lstm show that if you were able to scale
29:19um maybe they would work you know pretty
29:22well as well but the amount of compute
29:24is you know many times more and given a
29:28fixed compute budget we're always in a
29:30compute constrained environment it's an
29:32efficient enough architecture to keep
29:34trying yeah you would you would not use
29:35an LSM that's Transformer strictly
29:37dominates an lstm from the perspective
29:39of giving a fixed compute budget so this
29:42question of like what if I could scale
29:44the LCM it becomes a little bit sort of
29:47irrelevant so for the things where you
29:49see transformer-like performance what
29:50sort of compute budget would you need in
29:52order to be able to test them out is it
29:54the scale of a million dollars 10
29:55million dollars 100 million dollars of
29:56compute I know it changes based on
29:57compute pricing and I'm just trying to
29:59get a rough sense of you know how
30:01expensive is it to try it in and then if
30:03we extrapolate down a compute curve
30:04three years from now maybe it's
30:05tractable again or something so yeah I
30:08it really depends on the the gaps that
30:12um right now in Academia you can train
30:14one billion in parameter models I mean
30:17it's not it's not cheap by Academia
30:19standards but you can you can do it and
30:22you know here at crfm we're training
30:24like you know six or seven billion uh
30:33um be able to try out some ideas but
30:36ultimately because of emerging
30:39properties and importance of scale you
30:43do need to go out farther along the
30:46curve to see whether you're you can only
30:48make a hypothesis you can find something
30:51like oh this seems promising at smaller
30:53scales you still have to go out and test
30:55whether it's really pans out or the Gap
30:59and maybe this is a good segue to talk
31:02about to compute and the the uh together
31:07we found it together on the the premise
31:10that compute was is a central bottleneck
31:13in Foundation models
31:15on the other hand there's a lot of
31:18compute that's decentralized that's
31:20maybe underutilized or idle and if we
31:24could harness that compute and bring a
31:29um for both you know research and also
31:31commercial purposes then we could
31:34actually do all a lot more there are
31:37some you know pretty hefty technical
31:40challenges around doing that because
31:42Foundation models are typically trained
31:45very high-end data center environments
31:48where they interconnect between devices
31:50is extremely good whereas if you just
31:54grab your average desktop or home
31:57interconnect it's it's you know a
31:59hundred times or more you know slower
32:02but you know with uh you know Chris Ray
32:06and Sir John and others really they did
32:08a sort of most of the credit for this
32:11um we've developed some techniques that
32:14allow you to Leverage this weekly
32:16connected compute and actually get
32:20um you know pretty interesting training
32:22going so so hopefully with that type of
32:26infrastructure we can begin to unlock a
32:30bit more of compute both for academic
32:33research but also for you know other you
32:37know startups and so it's really cool so
32:38it sounds a little bit like earlier
32:40predecessors of this may be things like
32:42folding at home where people did protein
32:43folding collectively on their computers
32:46or study at home where there was search
32:48through different astronomical data and
32:50now you can actually do this for
32:51training a an AI system on your desktop
32:55or you know access compute the existed
32:57data centers or in other places
32:59yeah so so folding at home is I think a
33:03great uh inspiration for a lot of this
33:05work at some point during the middle of
33:07the pandemic they actually had the
33:09world's largest supercomputer in terms
33:10of flop count because it was used to
33:15um do molecular Dynamic simulations for
33:20um the main challenge with Foundation
33:22models is that there's a lot of big
33:24models and big data that needs to be
33:26shuffled around so the task
33:28decomposition is much much harder so
33:31that's why uh many of the the technical
33:34things that that we're doing about
33:37scheduling and compression enable
33:41us to overcome these hurdles
33:46um and then there's a question of
33:47incentives so I think there's two
33:50aspects of what together is building one
33:52is so sort of what I would call a
33:54research computer which is for academic
33:57you know research purposes where
34:00um people can contribute compute
34:03um and in the in the process of
34:05contributing compute they're able to use
34:09um the sort of the the decentralized
34:12cloud for doing training when they're
34:16not using it and when they are using it
34:19they can use much more of that so the
34:21hope is that it provides a much more
34:23efficient use of the the compute because
34:26you're spreading it across a larger set
34:30and then you know on the commercial side
34:33the hope is that the open models that
34:36are developed and through this um in the
34:41open source ecosystem can the together
34:45platform can allow people to fine-tune
34:49and adapt these models to various
34:51different uh use cases
34:54um one thing I think is noteworthy is
34:56that you know we think of foundation
34:59models today as you know maybe there's a
35:01few Foundation models that are you know
35:04very good and exist but I think in the
35:07future there's going to be many
35:09different ones for different kind of use
35:12cases as the space takes off many of
35:15them will be derived from maybe existing
35:18Foundation models but many of them will
35:20also be perhaps trained on from from
35:23scratch as well I think this is actually
35:25a pretty uncommon Viewpoint right now
35:27can you talk a little bit about like
35:28where you um or you know research
35:31efforts you're associated with choose to
35:33train models like and maybe buy a PubMed
35:37or whatever else you think is relevant
35:39here okay so there's Foundation models
35:43um category of and many of the the sort
35:47of the core Center is you know large
35:49language models that are trained on lots
35:51of you know internet data
35:54um we've trained a model here at crfm
35:58um in collaboration with Mosaic and a
36:02bot called biomed LM and it's not a huge
36:06model but it's trained on PubMed
36:07articles and it exhibits
36:10um you know pretty good you know
36:12performance on various benchmarks for a
36:16while uh you know we were able to be
36:19state of the art on the U.S medical
36:21licensing exam you know Google did come
36:24up with a model that was I think 200
36:26times larger and they they beat that
36:28model so you know scaled doesn't matter
36:30but but I think there are many cases
36:32where you for efficiency reasons maybe
36:35you do want a smaller model since cost I
36:43scientific fraud using this I'm just
36:45wondering if effectively you could
36:46screen all the papers and see which ones
36:48appear to be off relative to the
36:50literature or reuse of images or it just
36:53some interesting things that you could
36:54potentially on surface through the use
36:57um yeah the Corpus of this information
36:58so so I think uh you know stepping back
37:01I alluded to how these models can be
37:04misused you know for frauds spam
37:07disinformation but also plagiarism you
37:09know a lot of students are using chat
37:12gbt to basically do their homework
37:14um and you know I think there are
37:17you know several things that one can do
37:20so we I was excited about I was actually
37:22thinking about the other way can you use
37:23the model to detect fraud
37:25um given that you understand the Corpus
37:26of biomedical information you should be
37:29able to say well this is inconsistent or
37:31this is a result that is somehow
37:33duplicative or plagiaristic or
37:36yeah so definitely I think you you can
37:42um you can well it's gonna try this
37:43tonight yeah I'm actually thinking about
37:45that sounds really interesting well you
37:46can uh you can review paper stuff I mean
37:49I I think that one has to be a little
37:53um when uh you know doing these things
37:56um especially for a more consequential
38:00um but in principle you know if we think
38:03about these models as truly capturing
38:05enough knowledge about a field
38:09um at least it can flag certain things
38:12yeah I don't know if you know Finding
38:14plagiarism is necessary the sure the the
38:16ultimate application ultimate
38:18application maybe there's some pseudo
38:19form of peer review that it helps with
38:21before you do open publication or I'm
38:23just brainstorming right but it just
38:24seems like a really interesting area
38:25that yeah I haven't heard a lot of
38:27discussion around so I was just curious
38:29um about it yeah I think there is a
38:31problem right now where there's just so
38:34many papers that are generated I mean
38:41um and for a researcher it's actually
38:45becoming hard to you know really just uh
38:49distinguish the signal from the noise so
38:50having tools that could do literature
38:55review and really summarize and allow
38:58you to ask questions and search for
39:00things I think would be a really
39:02important part of um you know the
39:07um you know I know illicit is a company
39:09that builds these tools based on
39:11language models that can Aid in some of
39:15these processes yeah it seems like
39:17there's also work being done on the
39:18embedding side to to do similar ways to
39:21just you know have a mini Corpus that
39:22you're synthesizing or looking over
39:24interacting with so it seems like a
39:26really exciting area yeah I think you
39:29know one of my you know now dreams is if
39:31you could really have them
39:33um A system that could
39:36really do research in the sense of
39:39reading it has already read the
39:42literature can you generate hypotheses
39:44can generate interesting questions and
39:46can it proposed experiments can I write
39:48code can it under actually run the
39:51experiments and use the results to
39:54revise it's it's understanding of the
39:57world it's sort of like a scientist and
40:01I think you know obviously having a a
40:04human in the in the loop um to you know
40:07guide it to say okay I think these are
40:09the right questions I think that would
40:11really accelerate the the pace of
40:13scientific progress how far away do you
40:15think we are from that is that something
40:16we can do today is that two years away
40:18is that five years away
40:20you know these um projection questions
40:22are extremely hard these days
40:27Deuce uh limited things
40:30um already in terms of
40:33doing literature research it can
40:38and I think you're at the level where it
40:41could probably generate things and you
40:43know I think it would still be a lot of
40:45you know human Loop but you could
40:46generally probably uh let's say a I know
40:49a class project uh type of project
40:53um could it really do something
40:54completely like a breakthrough that
40:57seems maybe harder but on the other hand
41:00alphago was able to discover
41:02completely kind of alien different
41:05strategies and with the right you have
41:09to set it up correctly I don't think you
41:11can just generate from a language model
41:12but if you set it up properly maybe
41:13these models can actually discover new
41:17I remember reading a paper even maybe it
41:21was like five years ago where a bunch of
41:23materials scientists used
41:25um just word effect which is just War
41:27vectors from over 10 years ago and they
41:31were able to discover new
41:33um you know thermodynamic properties of
41:35you know materials and I imagine that
41:38today with a much more powerful models
41:41you should be able to do you know a lot
41:45yeah I mean we just did talk to um
41:47Daphne Kohler who you I'm sure know very
41:49well about what ncitro is doing and so
41:51you know as some heavily assisted
41:54version of data generation and you know
41:57better search and optimization like I
41:58think that's one example of that sort of
42:01effort yeah I think that aspect is
42:03really exciting yeah I want to talk
42:06um some of the I think like most
42:08important or hopefully most important
42:10work that the center's done so far can
42:12you explain what a Helm is and what the
42:14goal has been yeah so Helm stands for
42:17holistic evaluation of language models
42:20which is this project that happened over
42:23the last year and the goal is to
42:26evaluate language models so the trouble
42:30is that language models is
42:37um thing it's like saying evaluate the
42:40um what is that what does that mean the
42:42language model takes text in and text
42:43out and one of the features of a
42:47language model is that it can be used
42:49for a myriad uh different applications
42:55um and so what we did in that paper is
42:59to be a systematically and as rigorous
43:01as we could in laying out the different
43:03scenarios in which language models could
43:05be used and also measure aspects of the
43:10these uses which include not just
43:13accuracy which a lot of benchmarks focus
43:16on but also issues of how robust it is
43:18how well it's calibrated meaning that
43:21whether does the model know what it
43:26whether the models are
43:29um you know Fair according to of you
43:32know of some definition of fairness
43:34whether they're they're biased whether
43:36they spew out toxic content how
43:39efficient they are and then we go and we
43:42basically grab every language model
43:44that's prominent that we could access
43:46which includes open source models like
43:49opt and Bloom but also getting access to
43:53apis from cohere AI 21 openai and also
43:58anthropic and you know Microsoft so
44:01overall there were 30 different models
44:0342 scenarios and seven metrics and we
44:07ran the same evaluations on on all of
44:11that we've put all the results on our
44:15the helm website so that you could see
44:18the top level statistics and accuracies
44:21but also you can drill down into on this
44:25particular Benchmark what are the
44:26instances is one of the predictions that
44:27these models are making all the way down
44:30to what prompts are you using for the
44:32language models so the idea here is that
44:36we're trying to provide transparency to
44:38the space right we know that these
44:40models are powerful they have some
44:45um and we're trying to lay that all out
44:49in a kind of a scientific uh manner so
44:53I'm pretty excited about this project
44:54the challenging thing about this project
44:56is since we put out the paper maybe
44:58three months ago a bunch of different
45:01models have come out including chat gbt
45:03llama you know coherent AI 21 have
45:07updated their models
45:09um GPT form might come out at some point
45:12um so what had this project has evolved
45:15into is this dynamically updating where
45:19every two weeks we refresh it with new
45:24um models that are coming out as well as
45:27um scenarios because one thing we also
45:30realize with uh which was made clear by
45:32Chachi PT is that the type of things
45:35that we ask of a language model is
45:37changing we don't ask it just to do
45:38question answering or just to do is they
45:40increase in capability increasing
45:42capabilities now they can do a lot more
45:44they can you know write an email or
45:46um or give you you know life advice on
45:50XYZ if you've put in a scenario and
45:53in or write a you know an essay about
45:58and I think what we need to do with the
46:01Benchmark is also add the scenarios that
46:04capture these capabilities as well as
46:07kind of new uh risks so we're definitely
46:11um benchmarking how persuasive are these
46:14these language models which governs you
46:16know what are the risks that someone is
46:18going to be using them to
46:21um and how and also how secure they are
46:23one thing I'm actually also worried
46:26about is given all that the jailbreaking
46:28that is extremely common with these
46:30models where you can get the models to
46:32can do any basically bypass safety
46:37if these models start interacting with
46:40the world and accepting external inputs
46:42now you can not only just sort of
46:46jailbreak your own model but you can
46:48Jailbreak other people's model and get
46:49them to do various things and then so
46:52that could lead to sort of a Cascade of
46:56um some of these are the concerns that
46:58we hope to also capture with the model I
47:01should also mention we're also trying to
47:03look at multimodal models which I think
47:05is going to be pretty pertinent so lots
47:08to do a bunch of the things that you've
47:11described as uh sort of the role you see
47:14for the center or even like Academia in
47:17the age of foundation models broadly
47:19like they have more of an intersection
47:21with policy than traditionally like
47:23machine learning research like how do
47:25you think about that yeah actually we've
47:27I'm glad you asked that because we've
47:30about this social implications of these
47:33models and sort of the
47:35not the models themselves which we focus
47:38a lot on talking about but the
47:40environment in which these models are
47:46I think it's interesting to think about
47:49there are a few players in the space
47:52um with different opinions about how the
47:56models should be built some are more
47:58closed some are more open
48:01um and there's also again this sort of
48:05lack of transparency where we have a
48:09model that's produced and it's aligned
48:14um apparently to human values but then
48:16once you start a kind of questioning you
48:18can ask a question okay well you know
48:21which which value which humans are we
48:23talking about who determines these
48:26values what legitimacy does that have
48:29um and what's the sort of accountability
48:31then you start noticing that well a lot
48:34of this is just kind of completely of a
48:37black box so one thing that we've been
48:39working at the center on is developing
48:44um starting with transparency I think
48:46transparency is necessary but not
48:48sufficient you need some level of
48:50transparency to even have a conversation
48:52about any of the the policy issues
48:56um so making sure that uh the public can
49:00understand how these models are are
49:05um what's at least some notion of like
49:08what the data is what are the
49:11instructions that are given to
49:14um to align the models
49:16um we're trying to advocate for
49:19greater you know transparency there
49:23um and I think this will be really
49:26important as these models really get
49:29deployed at scale and start impacting
49:33um you know our lives
49:34um you know what kind of a analogy I
49:37like to think about is you know
49:38nutrition labels or any sort of
49:40specifications institutes on electronic
49:42devices there's some sort of uh
49:45obligation I think that um you know
49:48producers of some products should have
49:50to make sure that their product is used
49:56um you know some bounds on it
49:59I I guess I'll ask two questions
50:01um one is if people wanted to
50:02participate in together is there a
50:03client they can download and install or
50:05use or how can people help support the
50:07together efforts yeah so we are
50:11um that will be made available both from
50:15the perspective joining the together
50:17clouds so that you can contribute your
50:19compute but also where we have an API
50:22that we're developing so that people can
50:26um the the together infrastructure to do
50:29inference and fine-tuning models we are
50:33also training some open models so we
50:35have this um something called open chat
50:40um uh we're releasing soon and this is
50:43built on top of illusory ai's Neo X
50:47model but um you know improve to include
50:51various different types of capabilities
50:54um it's still of you should think about
50:56it as really a work in progress what
50:59we're trying to do is open it up so that
51:03um play with it give feedback and have
51:05the community improve this
51:10um rather than us trying to produce some
51:13finished product and putting it out
51:14there this goes back to the point about
51:17involving you know the spirit of Open
51:20Source and involving the community to
51:22build these Foundation models together
51:25as opposed to someone unilaterally
51:29while we're talking uh timelines and
51:31predictions that you don't uh quite feel
51:33comfortable making how do you think is a
51:36rigorous scientist about AGI
51:39I must say that my opinions about egi
51:41have changed over time I think that for
51:49um it was you know perceived by most of
51:53um you know laughable yeah I will say
51:55that uh in the last 10 years I have been
51:59aware of you know there's a kind of a
52:02certain community of uh
52:04um who think about AGI and also
52:06existential risk and things like that
52:10been in touch with people who think
52:13about these I think I see the world
52:15maybe differently I think of perhaps um
52:19certainly these are powerful
52:21Technologies and could have extreme
52:23social consequences and but there's a
52:25lot of more near-term issues I focus a
52:28lot on kind of robustness of ml systems
52:31um in the last you know five years but
52:34you know one thing I've learned about
52:37Foundation models because of their
52:39emerging qualities I've learned to be
52:43um uh open-minded I would say I was
52:46asking a lot earlier about what no
52:47priors where that comes from and I think
52:49it's a fitting way to think about
52:52um you know the world because I think
52:54even you know everyone including
52:57scientists often get sort of drawn into
53:00a particular world view and Paradigm and
53:03and I think that you know the world is
53:06is changing both on the technical side
53:10how we conceive of AI and you know maybe
53:15even humans at some level and I think we
53:18have to be open-minded to you know how
53:21that's going to evolve over the next few
53:25awesome thanks for doing this
53:26conversation yeah thank you very much