00:05gnome welcome to no priors well thank
00:07you for having me yeah thanks a lot for
00:09joining so you know I think in the world
00:11today when a lot of people think about
00:12AI they think about it is basically you
00:13put a you put a couple words into a
00:15prompt and then you get out an image or
00:17you have uh Chachi PT summarize James
00:21Burnham's professional managerial class
00:22for you in a rhyming essay in the voice
00:24of a cat or something and I think you've
00:26pushed in really interesting uh
00:28directions uh that are very different in
00:31some ways from what a lot of people have
00:32been focused on and you've been more
00:33focused on game theoretic actors
00:34interacting with humans and with each
00:36other and in parallel you're kind of
00:37known as um as Sarah mentioned as sort
00:39of one of these true 10x engineers and
00:41researchers pushing the boundaries on
00:42the on an AI and so I'm sort of curious
00:44like what first sparked your interest in
00:46games and researching AI to defeat games
00:48like poker and diplomacy well I think uh
00:51you know my journey is a bit uh
00:52non-traditional I mean I started out in
00:53finance actually towards the end of my
00:55undergrad career and also like right
00:57after undergrad I worked in algorithmic
00:58trading for a couple years and I I kind
01:00of realized that while it's it's fun and
01:04it's you know exciting it's kind of like
01:06a game you know you got to score at the
01:07end of the day which is how much money
01:08you made or lost uh it's not really the
01:10most fulfilling thing that I want to do
01:11with my life uh and so I decided that I
01:14wanted to do research and it wasn't
01:16really clear to me in what area I was
01:18originally planning to do economics
01:19actually and so I went to the Federal
01:20Reserve I worked there for two years
01:22honestly I wanted to figure out how to
01:23structure financial markets better to
01:25encourage more pro-social behavior and
01:27so in the process I became interested in
01:29in Game Theory and I thought I wanted to
01:30pursue a PhD like in economics focused
01:34on Game Theory two things happened so
01:35first of all I became a bit a bit jaded
01:38with the pace of progress in economics
01:40because if you come up with an idea you
01:42have to get it passed through
01:43legislation and it's a very long process
01:45computer science is much more exciting
01:46in that way because you can just build
01:48something you don't really need
01:48permission to do it and then the other
01:50thing I figured out was that a lot of
01:52the most exciting work in game theory
01:53was actually happening in computer
01:54science it wasn't happening in economics
01:56and so I applied for grad schools with
01:59the intention of letting algorithm game
02:01theory in a computer science department
02:02and when I got to grad school there was
02:05conveniently a professor that was
02:06looking for somebody to do research on
02:08AI for poker and I thought this was like
02:11the perfect intersection of everything
02:12that I wanted to do I was interested in
02:13Game Theory I was interested in you know
02:15making something interested in AI I had
02:18played poker when I was in high school
02:20and college and you know never for high
02:22stakes but always just kind of
02:23interested in the strategy of the game I
02:24actually tried to make a poker bot when
02:27I was an undergrad and it it did
02:28terribly but it was a lot of fun and so
02:31to be able to do that for research in
02:34grad school I thought this was like the
02:36the perfect thing for me to work on and
02:37also I felt like there was an
02:39opportunity here because it felt doable
02:41and I I kind of recognized that if you
02:44succeed in making an AI they can play
02:46poker you're going to learn really
02:48valuable things along the way and and
02:50that could have like major implications
02:51for the future so that's kind of how I
02:53got started in that that's what I call
02:54it energy you have a specific end goal
02:56of your work when you started it or was
02:57it just interest in other words some you
02:59know you talk to a lot of people in the
03:00field down they say oh our end goal is
03:01Agi and it's always been and I think
03:03sometimes that's sort of invented later
03:04as sort of an interesting story for what
03:06they're doing to give you this as just
03:08doing primary research and it's just
03:10personal interest did you view it as
03:11like there's a path leading to agents
03:13that function on behalf of people or was
03:15there some other sort of driving
03:16motivator well so I started grad school
03:18in 2012 and it was a very different time
03:21in 2012 uh you know the idea of AGI was
03:24was really science fiction um there were
03:26there were some people that were you
03:28know uh serious about it but but very
03:29few the majority opinion was that AI was
03:31if anything it was kind of a dead field
03:33um I actually remember like emailing a
03:36professor and having this conversation
03:37where I was like look I'm really
03:38interested in AI but I'm kind of worried
03:40to pursue a PhD in this because you know
03:43I I get the impression that it's just a
03:45dead field and I'm not I'm worried I'll
03:46be able to get if I'll be able to get a
03:48job afterwards conveniently like a
03:49couple years into grad school things
03:51changed pretty drastically and um and I
03:54happen to be in the right place at the
03:55right time I think is really fortunate
03:57in that respect so the original
03:58intention wasn't to to pursue AGI the
04:01original intention was you know you
04:03learn interesting things about
04:05um Ai and Game Theory and you build
04:07slowly and it was really only a couple
04:10years into grad school that it became
04:12clear that the pace of progress was was
04:14quite dramatic was there a specific
04:16moment that really drove that home for
04:18you I know for some people they
04:19mentioned oh Alex and it came out or oh
04:21you know some of the early Gan work got
04:23like a wake-up call I'm just sort of
04:24curious that there's a specific
04:25technology or paper or something else
04:27that came out or was it just kind of a
04:28Continuum I think it was a slow drip I
04:30mean I think for me especially it was it
04:33was the alphago moment you know like
04:34when when you see that it's just very
04:38clear I mean Alex in that too I mean I I
04:40remember taking an AI class when I was
04:42um before before I started grad school
04:44actually I took a computer vision class
04:45and they were talking about like you
04:47know sift and all this stuff you get
04:48something like Alex and that and just
04:50like throws all that out the window and
04:51it's just like mind-boggling how
04:52effective that could be ma'am can you
04:54can you explain actually like why Alpha
04:57is so important and like just size of
04:58search space and how you might contrast
05:00that to previous games yeah so there was
05:03you know a big milestone in AI was deep
05:06blue beating Gary castrov in chess in
05:071997. and that was that was a big deal
05:10um and I think we actually the it's kind
05:12of downplayed today I think in like by a
05:14lot of machine learning researchers but
05:16it was it was we learned a lot from that
05:17we learned that scale really does work
05:19and in that case it wasn't scaling you
05:21know training and neural Nets it was
05:23scaling search and but the techniques
05:25that were used in Deep Blue
05:26they didn't work in a game like go
05:28because the pattern matching was just
05:31not there like you couldn't a big
05:33challenging go was figuring out like how
05:35do you even evaluate the state of a
05:36board how do you tell who's winning
05:38um in chess it's like difficult but you
05:40can kind of write a function but you can
05:42handcraft the function to estimate that
05:43right like you calculate oh each piece
05:45is worth this many points and you add it
05:47together and you can kind of get a sense
05:48of who's winning and who's losing and in
05:49go that's just almost impossible to do
05:51by hand it's essentially too big to do
05:53that it's too big it's too subtle um
05:56it's just it's just too complicated and
05:58there's too much nuance and if you ask
05:59you know the difference is also if you
06:00asked a human who's winning they could
06:02tell you who's winning but they couldn't
06:03tell you why you know one of the things
06:05that people assumed was that he you know
06:06humans are just better at pattern
06:09um and to have an AI come along and like
06:11demonstrate that it can do this pattern
06:12matching better than a human can and
06:14even if it's in this constrained game
06:16um that was that was a big deal I think
06:18that was a wake-up call to a lot of
06:19people not not just me but I think uh
06:22across the world I remember as a former
06:24like go nerd looking uh just trying to
06:27understand the moves that alphco made to
06:29try to figure out how to play better
06:30because it was like a such a
06:32mind-blowing moment yeah and you know if
06:35any if any of your listeners hasn't
06:36haven't seen the alphago documentary I
06:38highly recommend watching it you can I
06:40think it's on Netflix and uh or YouTube
06:42and it's it really just like you can see
06:44just how significant this was to a lot
06:46of the world um when you watch that how
06:48did you end up choosing diplomacy as the
06:49next thing to work on after poker
06:51there's obviously like a wide space of a
06:53variety of different types of games and
06:54so what what drove your selection
06:55criteria there and how did you think
06:56about choosing that as the the next sort
06:58of interesting research problem so
07:01basically what happened we succeeded in
07:03poker and um when we were trying to pick
07:06the the next Direction it became clear
07:08that AI was progressing very quickly
07:10like much quicker than I think a lot of
07:12people appreciated and there were a lot
07:14of conversations about like what should
07:15the next Benchmark be a lot of people
07:16were throwing around these games like uh
07:18Hanabi was one uh somebody was talking
07:20about like werewolf or so Catan these
07:22kinds of things and I just felt like you
07:24know this was 2019 and in 2019 you had
07:27gpt2 come out which was just
07:29mind-blowing and then you also had you
07:32know deepmind Grand Masters in Starcraft
07:342 you had open AI beating human experts
07:36in Dota 2 and that was just after like a
07:38couple years of work of research and
07:40Engineering to then like go to a game
07:42like just it just felt like too easy
07:45like you could just take a team of five
07:47people spend a year on that and you'd
07:48have it cracked and so we wanted to pick
07:50something that would be truly impressive
07:52like that would require fundamentally
07:55new techniques in order to succeed not
07:56just scaling up something that already
07:57exists and we were trying to think of
08:00what we the hardest game to make an AI
08:01for and we landed on diplomacy the idea
08:04that you could have an AI that
08:07negotiates in natural language with
08:08humans it like strategizes with them it
08:10really just felt like science fiction
08:12and even in 2019 knowing all the success
08:14that was happening and it still felt
08:16like science fiction and so that's why
08:17we aimed for it I think that was the
08:19right call I mean I'm really glad that
08:20we we aimed high at that point I was a
08:22little afraid to do that to be honest
08:23um it's it's a high risk thing to aim
08:27all research is high risk reward or at
08:29least it should be what was the most
08:31unexpected thing to come out of working
08:32on diplomacy in terms of what Cicero
08:35could do I mean I think the most
08:36unexpected thing was just honestly how
08:38it didn't get detected as a bot we were
08:40really worried about this leading into
08:41into the human human competitions
08:44first of all there's no way to like
08:45really test this ahead of time like we
08:47can play with the bot but we know that
08:49it's a bot and we can't really like
08:50gather a bunch of people together and
08:51stick them in a game and you know have
08:53them play with a bot without them
08:55without telling them or having them
08:56realize that something's up right like
08:58if if this company's hiring them to like
09:01play a game like and they know that
09:03we're working on diplomacy like clearly
09:04they're going to be playing with a bot
09:05and when people know that they're
09:07playing with a bot they behave very
09:08differently right we didn't want this to
09:09turn this into a Turing test and so we
09:11had to enter the bot into these games
09:13where players did not know that there
09:15was a bot in the mix that was the only
09:17way that we could get like meaningful
09:20um and just to be clear like for those
09:22of you that are not familiar with
09:23diplomacy the reason for this is because
09:24like diplomacy is a natural language
09:26negotiation game and so you're having
09:27these like really complicated long
09:30conversations with these with these
09:32people and um it's kind of hard to get
09:35away with that if as a bot and not be
09:38detected and so our big concern was like
09:40we stick the spot in a game and within
09:43five games maybe even two games
09:45um they figure out it's a bot word gets
09:47out all the diplomacy the diplomacy
09:48Community is pretty small so they all
09:49talk to each other and then in all the
09:51future games everybody's like asking you
09:54know touring test questions trying to
09:55figure out who the bot is and our
09:57results our experiments are just like
09:59meaningless and so we figured like okay
10:01maybe we get lucky and we managed to get
10:03like 10 games in before they figure this
10:05out but at least we have like 10 games
10:06worth of data but surprisingly we
10:08managed to go like the full 40 games
10:09without being detected as a bot that was
10:11surprising to me and I think I think
10:13that's a testament to the progress of
10:15language models um in the past past
10:17couple years especially and also that
10:20maybe humans aren't as good at talking
10:22as we might think like it made me
10:25appreciate also that you know
10:27if somebody's saying something a little
10:29weird because they'll say weird things
10:30every once in a while their first
10:32instinct is not going to be like Oh I'm
10:34talking to a bot their first instinct is
10:35going to be like oh this person is like
10:37dumb or distracted or like you know
10:39they're drunk or something and then way
10:41down on the list is like oh this person
10:43is a bot uh so I think
10:45we got pretty lucky in that respect I
10:47mean but but also I mean the bot did
10:49manage to like actually go these 40
10:51games without being detected and so I
10:52think that is a testament to to the
10:53quality of the language model I think
10:55Matt is actually uh planning to release
10:57the data which is going to be so
10:58interesting but can you um can you just
11:01like describe like a an interaction from
11:03the bot you thought was interesting in
11:05these negotiations oh yeah I mean I
11:08one of the one of the messages that was
11:10like really you know honestly kind of
11:12scary to me was just when it was it was
11:14talking to another player and the player
11:15was saying like hey you know I'm really
11:16nervous about your units you know nearby
11:18border and and the bot honestly was not
11:21planning to attack the player
11:22um it was planning to go in the other
11:23direction and it just it sent the player
11:25this like really empathetic message
11:27where it was like look I totally
11:28understand where you're coming from I
11:30can assure you 100 like I'm not planning
11:32to attack you I'm planning to go the
11:33other direction you have my word and um
11:35it really felt like a very
11:38um a very human-like message and um you
11:42know 100 I would have never expected
11:45that to come from a bot and that when
11:47you see when you see stuff like that
11:48like it makes you appreciate like yeah
11:49there's something really powerful here
11:53how do you think about the Turing test
11:54in the context of all this like or
11:56what's your updated model of
11:58whether the test is still relevant or
11:59how to think about it so there was
12:01actually a New York Times article that
12:04um from Cade meds at the New York Times
12:05on on like the touring test and what it
12:08means and he actually talks about Cicero
12:12um and basically his his view is that
12:14the Turing test is is kind of dead and I
12:17kind of agree with that I think the
12:18touring test is no longer really a
12:21um measure the way it was intended to be
12:25um certainly just because we have Bots
12:28yeah I want to say the capacitoring test
12:31they're getting they're getting close
12:32enough that it's no longer that useful
12:35um it doesn't mean that we have general
12:37intelligence um I think there's still a
12:39long way to go on that there's a lot of
12:41things that these Bots can't do well but
12:43yeah I think my view now is that the
12:45Turing test is not that useful of a
12:47measure anymore it doesn't necessarily
12:49mean that it was always a useless
12:50measure I think it just shows like how
12:51much progress we've made uh we're not
12:53100 there but you know the progress
12:55really has been staggering especially in
12:57the past few years what measure do you
12:59think or measures do you think makes
13:00sense to use and then also what do you
13:02think is missing on sort of the road to
13:03general intelligence
13:04um I think there's a few things that are
13:05missing the big the big thing that I'm
13:07interested in particular is reasoning
13:09capabilities you have these Bots and
13:12they basically they're all doing next
13:14word prediction right the Cicero is a
13:15bit different actually in that it's
13:16actually conditioning its dialogue
13:17generation on a plan
13:19um and I think that's one of the really
13:21interesting things that distinguishes
13:22Cicero from a lot of the work that's
13:24happening in language models today but a
13:26lot of the research that is happening is
13:28is using next word prediction and when
13:29it's trying to do something that's like
13:31more sophisticated in terms of reasoning
13:33capabilities it's a lot of Chain of
13:34Thought where it's just like rolling out
13:36you know the kind of reasoning that it's
13:38observed humans do and they're in in its
13:40training data and seeing where that
13:42leads I think there's a general
13:43recognition among AI researchers that
13:46um this is a big weakness in the Bots
13:49today and that if we want truly General
13:52artificial general intelligence then
13:53this this needs to be addressed now
13:55there's a big question about how to
13:56address it and that's actually why I
13:57really like this direction because it's
13:59still an open question about how how to
14:01actually fix this problem there's been
14:02some progress but I think there's a lot
14:03of room for improvement what do you
14:05think are the most uh promising possible
14:07directions uh that is that is the
14:10trillion dollar question
14:11um yeah the trillion dollar man foreign
14:14I think I think there's like clear clear
14:17base I mean like first of all Chain of
14:19Thought really was like a big step and
14:20it's kind of shocking just like how
14:21effective that was given how simple of
14:23an idea it is I tell myself every day
14:25when I wake up now let's think step by
14:27yeah so for those of you that don't know
14:28it's just like you add to the prompt
14:30like oh let's think through this step by
14:32step and then the the AI will like
14:34actually generate uh a longer like
14:37thought process about how it reaches its
14:39conclusion and then that
14:40um actually leads to better conclusions
14:42but you can you can kind of see that as
14:44like just rolling out the
14:47the thought process that it's observed
14:49in human data and so there's a question
14:50of like okay well instead of just
14:51rolling that out could you actually
14:52improve it as it's going through each
14:56um and so I think things like that I
14:58mean I'm kind of keeping it like very
14:59abstract because it's it's an important
15:01question and also I think there's not a
15:04um so I don't want to speculate too much
15:06but I think that there is like room for
15:08improvement in this direction what was
15:10the actual data set that was necessary
15:13in order for the training here and and
15:15sort of maybe to take a step back you
15:17know I've been having a series of
15:18conversations with people about data and
15:19sort of like when do we run out of data
15:22that's easily available and when do we
15:24have to start creating either large
15:27uh synthetic data or
15:30um human rlh data or um you know do you
15:33literally pay bounties do people just
15:35record themselves all day so you can
15:36start collecting interesting data off of
15:37them to do different things with over
15:39time right as these models scale to a
15:41certain point where you know you've used
15:42up the internet and you start using up
15:44all the video content you start running
15:45out of stuff I'm just sort of curious
15:47like how you thought about data in this
15:48context and what's necessary to really
15:49take things to the next level from a
15:51self-driven agent perspective like this
15:53it's not clear that data really is the
15:55the bottleneck on performance here I've
15:58talked to AI researchers about this and
16:00there isn't as much of a worry
16:03um about this as people might think
16:04partly that's because there's a lot more
16:06data that's out there than people might
16:08um and that and that people are using
16:09right now and also it's because I think
16:12there are going to be improvements to
16:14um sample efficiency as the research
16:17progresses I think we'll be able to
16:18stretch the data more what do you think
16:20it's a bottleneck I think the bottleneck
16:22is going to be scaling I mean so you
16:24look at at the models that exist today
16:25like they probably cost 50 million
16:30you can probably easily 10x that you
16:32know I wouldn't be surprised if there's
16:33a 500 million dollar model that's
16:34trained in the next year or two
16:37um you can maybe even go another order
16:39of magnitude and like train a five
16:41billion model if you're like the US
16:42government or something or like a really
16:46um but what do you do beyond that do you
16:47do you train a 100 billion dollar model
16:50um you'll probably see some improvement
16:51but at some point it just becomes like
16:53not realistic anymore and so that's
16:54that's going to be the bottleneck like
16:55we maybe get like two orders magnitude
16:57more scaling and and then we have a big
16:59problem and people are
17:01focused on like okay how do we how do we
17:03make this more efficient how do we train
17:05this uh a cheaper more paralyzed but you
17:09can only squeeze so much out of that I
17:10think we've squeezed a lot already this
17:12is why I'm interested in the reasoning
17:13Direction because I think
17:15there is this whole other dimension that
17:17people are not scaling right now which
17:19is the amount of compute at uh at
17:24you know you can spend
17:2650 million dollars training this model
17:29um ahead of time like pre-training this
17:31model and then when it comes to actual
17:32inference it costs like a penny
17:34and you know what happens if instead of
17:37it returning an answer in a second it
17:40returns an answer in like an hour or
17:42even five seconds or ten seconds you
17:44know sometimes if people want to give a
17:47better a better answer they'll sit there
17:49and they'll think of it and that leads
17:50to a better outcome and I think that
17:52that's one of the things that's missing
17:53from these models so I think that that's
17:55one of the ways to overcome the scaling
17:58Challenge and that's probably why I'm
17:59interested in working on that going back
18:01to related to what alad said the
18:04diplomacy problems specifically didn't
18:06have like you know internet scale data
18:09right as you mentioned it's like a
18:11relatively small community can you talk
18:13about what you guys did in terms of
18:14self-play and the data that actually was
18:16involved so diplomacy the problem was
18:19interesting because yes there's actually
18:21not a ton of data out there I mean we
18:22had a relatively good data set about
18:24about 50 000 games with dialogue
18:26um we did from like web diplomacy yeah
18:29this is from a site called Web
18:30diplomacy.net it's been around for like
18:32almost 20 years where people play uh
18:35diplomacy casually on this site we were
18:37very lucky to get this data set I mean
18:41scouring the internet trying to find
18:42like all the sites that have available
18:43data and this was basically the only
18:46sites that had a meaningful amount of
18:48data like there was another popular site
18:50but they periodically deleted their data
18:52which was you know just mind-boggling to
18:54me it's just you're sitting on a gold
18:55mine here and you're just deleting it
18:56because it's taking up server space
18:58um I guess they didn't appreciate that
18:59like AI researchers would one day be
19:02um and then other sites just like refuse
19:04to hand over their data and so I'm
19:06really glad that we managed to like work
19:07out a deal with web devocy.net because
19:09otherwise uh the project would have just
19:11never happened now that's about 50 000
19:13games of diplomacy about 13 million
19:15messages and that is a good sized data
19:18set but it's not it's not enough to
19:19train a bot from scratch
19:21uh fortunately we are we're able to
19:23leverage like you know a wider data set
19:25from the internet so you kind of like
19:27have a pre-trained language model and
19:28then you fine-tune it on the diplomacy
19:30data and you get a bot that can actually
19:32communicate pretty well in the game of
19:34diplomacy now that helps with the
19:36dialogue um but there's still a problem
19:38which is that the strategy isn't
19:41going to be up to par and that's partly
19:44you can't do that well with just
19:46supervised learning you can't learn like
19:47a really really good strategy in these
19:49kinds of games just with supervised
19:50learning and it's also because the the
19:52people that were that are playing these
19:54games are not very good at the game like
19:55the most most of the data set is from
19:57Fairly weak players you know that's just
19:59a reality you have a you have a bell
20:00curve the actual strong players are like
20:02a relatively small fraction of any data
20:04set you have and I should say that
20:05they're not limited to diplomacy like we
20:08also found in chess and go we actually
20:13just pure supervised learning on a giant
20:15data set of human uh chess and go games
20:19the bot that you get out from that is
20:22not an expert chess or go player even if
20:24it's like conditioned to like behave
20:27like a chess Grandmaster it's not going
20:29to be able to match that kind of
20:32um because it's it's not doing it's not
20:34doing any planning that's really what's
20:36missing so in order to come up with
20:39in order to get a strategy that was able
20:42to go beyond just like average human
20:43performance or even like you know strong
20:45Human Performance to something that's
20:46like much better we had to do self-play
20:49and this is like how all these like
20:51previous gamingis have been trained
20:52right like you look at alphago you look
20:55um especially Alpha zero the latest
20:56version of alphago and you look at the
20:58you know the Dota 2 bot the way they're
21:01trained is by playing against themselves
21:03for Millions uh or billions of
21:07um that's also how our pokerbot was
21:09trained um for two-player and six player
21:12now the difference is like when you go
21:15those games to diplomacy suddenly
21:17there's this Cooperative aspect to the
21:19game like you can't just assume that
21:22everybody else is going to behave like a
21:24machine like identically it's the way
21:26you're going to behave and so in order
21:27to overcome that we had to combine
21:33um a recognition that humans are going
21:34to behave a lot like how our data
21:37suggests and so using the data set that
21:40we have we're able to build up this
21:42model of a rough model of how humans
21:45behave and then we can improve on that
21:48um using self-play we're figuring out a
21:50good strategy but basically a strategy
21:52that's compatible with how humans are
21:54playing the game so to give some
21:56intuition for this like you know there's
21:58this is it's not obvious why this
22:00changes when you go from a two-player
22:02zero some game like chess to a
22:03Cooperative game like diplomacy I mean
22:05also I should say like diplomacy is both
22:07cooperative and competitive but there is
22:08a Cooperative a big Cooperative
22:09component like let's say you're trying
22:12to develop a bot that negotiates
22:14if you train that bot from scratch with
22:18it's going to it could learn to
22:20negotiate but it could learn to
22:21negotiate in a language that's not
22:23English it could learn to negotiate in
22:24some like gibberish robot language and
22:27then when you stick it into game with
22:28six humans that's a negotiation task
22:30like diplomacy it's not going to to
22:32learn it's not going to be able to to
22:34communicate with them and they're just
22:36going to all work with each other
22:37instead of with the bot
22:39um that same Dynamic happens even in the
22:43strategy game the moves in the game the
22:44non-verbal communication aspect like
22:47the bot will develop these like norms
22:50and and expectations around like what
22:52its Ally should be doing this turn like
22:55I'm going to support my Ally into this
22:57territory because I'm expecting them to
22:58go into this territory and I don't even
23:00have to talk to them about this because
23:01it's just so obvious that they should be
23:03um but the humans have their own
23:05metagame where like oh it's actually
23:06really obvious that I should be
23:07supporting you into this territory if
23:09you don't understand the human norms and
23:11conventions then you're not going to be
23:13able to cooperate well with humans and
23:15they're just gonna not work with you and
23:16work with somebody else instead so so
23:18that's what we really had to overcome uh
23:20in Cicero and we managed to do that by
23:23using the human data to build this model
23:24of how humans behave and then adding
23:26soft play on top of that as kind of like
23:28a modifier to to the human data set that
23:31actually has some really interesting
23:32implications right like if you believe
23:34in the long term we are going to have
23:36Bots that take action in the real world
23:38interacting with humans and humans are
23:40perhaps not very good at optimal play in
23:43the game of life and you're interacting
23:45with them like you know it sort of just
23:49brings home the point of how important
23:51reasoning could be versus learning
23:53learning pattern recognition I think
23:55you're absolutely right that like this
23:57this matters a lot if you want to make
24:00interact with humans in the real world
24:01right like if you have a car driving on
24:04the road a self-driving car you don't
24:06want it to assume that all the other
24:07drivers are machines that are going to
24:10act like perfectly optimally in every
24:11step every step of the way like you want
24:14the self-driving car to recognize that
24:16these other drivers are humans and
24:18humans make mistakes and somebody could
24:20like swerve into my lane and um
24:24yeah and also like you know just like
24:26day-to-day interactions
24:29understanding like the non-verbal cues
24:32of humans and like what that means these
24:34are things that um or even the verbal
24:36cues these are things that an AI has to
24:38be able to to cope with if it's going to
24:39like really be useful to humans in the
24:41real world and not just beating them at
24:43chess games have been used um for a
24:45while now is a way to measure AI
24:46progress and you've worked on poker
24:48variants and diplomacy variants and you
24:49mentioned before uh other work people
24:51have done in terms of Chess and go and
24:53things like that what do you think is
24:55the next Frontier in terms of games and
24:57sort of research on on them in the lens
24:58of AI there's a long history of games as
25:01benchmarks for AI and this goes all the
25:03way back to like the very found
25:05foundations of AI back in like the 50s
25:08um like chess in particular was held up
25:10as this like Grand Challenge for AI
25:12because if we can make an AI that was
25:14like as smart as a human chess Grand
25:17Master then like imagine all the other
25:18smart things you could do
25:20um of course that turned out to be like
25:22kind of a false promise right like you
25:23get it out that plays chess and it turns
25:24out it doesn't really do anything else
25:25but but we've learned a lot along the
25:27way and and games are useful as a
25:29benchmark because you can compare very
25:32objectively to top Human Performance
25:34like it becomes very clear
25:36um when you're surpassing human ability
25:38in this domain even if it's a restricted
25:40domain you also have this Benchmark
25:41that's existed before the AI researchers
25:43came along like AI researchers it's
25:46really easy for them to come up with a
25:47benchmark once they have the technique
25:50already created you know you come up
25:52with a you come up with the technique
25:53and then you're saying like Okay well
25:54now uh it's really easy to come up with
25:56a benchmark that this technique will
25:57work for and you don't want that you
25:59want the uh the problem to come first
26:01and games give you that but I think
26:04we're reaching a point now where
26:06individual recreational games
26:08um are just no longer that interesting
26:11of a of a challenge I think
26:14you know I said earlier we chose
26:17because we thought it would be the
26:18hardest game to make an AI for and I
26:23I can't think of any other game out
26:25there where like if somebody made an AI
26:27they could play that game I would be
26:28like wow that's super impressive and I
26:30did not think that that was possible and
26:32so I think going forward the field needs
26:35to move Beyond looking at individual
26:37games and starting to look at first of
26:40all going Beyond games but also looking
26:41at generality the approach that we've
26:43used in diplomacy is very different from
26:45what we previously did in poker and what
26:48others have done in chess and go and
26:50Starcraft and now there's a question of
26:53like okay well if we really want a
26:54general system a general AI
26:58play all these games at a superhuman
27:01level and also able to do things like
27:04you know image generation and uh
27:06question answering and like all these
27:08tasks and if we could accomplish that
27:09then that becomes incredibly impressive
27:12and so I think games will will continue
27:14to serve as this Benchmark but it's not
27:16instead of serving as a benchmark that
27:18the research kind of overfits to my hope
27:21is that it will serve as a benchmark
27:22that we use along other benchmarks
27:25um outside of games like you know image
27:28generation benchmarks and and language q
27:30a benchmarks and these kinds of things
27:31do you think the thing that you just
27:34described that um two-player games now
27:36multiplayer games of negotiation
27:38cooperation it seems clear that if you
27:42can beat diplomacy you can beat most
27:44games and given that the AI has already
27:47won in these restricted domains that are
27:49challenging in specific ways like how do
27:52you think about the domains that are um
27:54that are going to be human skill
27:55dominant like are there going to be
27:57domains that like that well certainly
27:59anything in the physical world I mean
28:00humans still dominate I mean when it
28:01comes to actually like you know
28:03manipulation tasks these kinds of things
28:05robotics is really lagging behind I'm
28:07trying to avoid doing anything in the
28:10physical world for that reason
28:12um software is just so much nicer to
28:14there's still things that you can't that
28:16humans are are definitely better at even
28:18in restricted domains um you look at
28:19something like writing a novel
28:23I don't think you can get an AI to
28:25Output like the next Harry Potter just
28:28yet that might not be that far off maybe
28:30it's like five years away or something
28:32um but I don't think it's it's happening
28:34just yet it's kind of scary that it's
28:36I'm really struggling to come up with
28:37domains where I'm like oh yeah uh AI is
28:39not gonna be able to surpass humans in
28:40this I feel like people often talk about
28:42areas where humans will always have an
28:44advantage just because they're humans
28:45they want to feel good about the future
28:46uh versus because there's necessarily
28:49something that shouldn't be tractable
28:50from a at least your logical perspective
28:52right yeah uh it certainly is I mean I
28:55think that the big advantage that humans
28:56have and it's not clear when AI will
28:58surpass humans in this is generality
29:01um the ability to learn from a small
29:03number of samples to be able to like you
29:05know be useful across a wide variety of
29:07domains but it's about generality
29:08overstated because I feel like in the
29:10examples that you mentioned you said
29:11everything from like image gen to
29:13diplomacy in like a single architecture
29:16or AI or something and often it seems
29:18like you know if you look at the average
29:19person if they're very good at one thing
29:21they're usually not good at everything
29:22right and so I kind of feel like the bar
29:25in terms of generality for AI sometimes
29:27is higher than the bar we'd use for
29:29generality for people
29:30in some sense or is that not a true
29:32statement I think it's it's not just
29:34about generality it's really about
29:35sample efficiency like how many games
29:37does it take for an a for a human to
29:38become a good chess player or a good
29:40diplomacy player or a good artist the
29:42answer is orders of attitude less than
29:44it takes for an AI and
29:47that is going to pose a problem when
29:49you're in domains that don't have that
29:50but there isn't much data now that seems
29:52like a problem that could be overcome
29:53it's I'm just saying that's a problem
29:55that hasn't been overcome yet and I
29:56think that that's one of the clear
29:57advantages that that AIS have today over
29:59humans that's right that humans have
30:01over ai's today when do you think we'll
30:03see the emergence of AIS in financial or
30:05economic systems and obviously we have
30:07like algorithmic trading and other
30:08things like that and then we have things
30:10like crypto where you effectively have
30:11programmatic approaches to effectively
30:13money wrapped as code right and the
30:15ability to interact with those things in
30:16reasonably Rich ways through smart
30:19you know do you think we're
30:21um there's any sort of near-term horizon
30:22of people experimenting with that or
30:24just interesting research being done in
30:25terms of the actual interaction of a bot
30:26with a financial system I think it's
30:28already being done if you look at
30:29financial markets I'm sure there's tons
30:31of uh trading powered by Deep learning
30:34um I've actually talked to a lot of uh
30:36finance companies about this there's a
30:38lot of I used to work in finance and
30:39also like a lot of finance companies
30:40love poker and so I've given a few talks
30:43at like various places on AI for poker
30:45and I've talked to a few places about
30:47like is reinforcement learning actually
30:48useful for financial markets for trading
30:51and the answer I get is usually no I
30:54think the major challenge with uh using
30:57things like reinforcement learning for
30:59uh trading is that it's a non-stationary
31:04so you can have all this historical data
31:08it's it's not a it's not a stationary
31:10system and it's gonna like the markets
31:12respond to world events these kinds of
31:13things so you need a technique ideally
31:18that really understands the world not
31:20just creating everything like a black
31:21box but could that Adolf you didn't know
31:24what you were saying about spending um
31:25more compute on inference versus
31:27training in other words incorporating
31:28real-time signals at the point of
31:30or did you mean something else by that
31:32in terms of model architecture that
31:34would enable you to update weights in
31:35certain ways or things like that over
31:36time well I think I think it goes back
31:38to the sample efficiency problem that
31:40humans are pretty good at adapting to
31:41novel situations and you run into these
31:44like novel situations pretty frequently
31:45in financial markets yeah I I think it's
31:48also a problem of of generality that
31:51um you need to understand so much about
31:52the world to really succeed now that
31:54said I mean I think that the AIS are
31:55successful in in financial markets and
31:59um ways uh certainly if you want to like
32:01break up big orders these kinds of
32:02things also I should say like I'm not an
32:03expert in this like this this is kind of
32:05outdated Knowledge from for me
32:07um because I I'm sure like there's a lot
32:09of cutting edge stuff that's happening
32:10that people are not telling me about
32:11because it's making money
32:12um but I can tell you this is like kind
32:14of the perspective as of like maybe five
32:16years ago it's being used in limited
32:17ways but it only gets it's fully
32:18replacing humans yet do you think we're
32:20going to get bots that um negotiate with
32:23humans soon let me promise that is we
32:25are eventually going to get them what do
32:27you think the timeline is or the use
32:28case that that seems doable it depends
32:32how constrained the domain is I think if
32:34you were to look at constrained domains
32:36uh Syrian negotiation tasks I think that
32:38ai's could probably do better than
32:40humans in that today I mean I'm trying
32:42to think of like specific examples but
32:46you know if you wanted to negotiate over
32:49um it could probably do better than than
32:52a human a lot of those in a lot of those
32:54situations I think if there's things
33:00it might do better than humans at that
33:03um I think it depends on how much you
33:04need to know about the world I think
33:06contract negotiations for example would
33:08still be difficult because there's so
33:10much subtlety there's so much Nuance to
33:11like every contract and it's not going
33:13to replace a professional negotiator for
33:15that kind of task just yet
33:17um but kind of the things that are more
33:19constrained don't require as much like
33:21outside knowledge about the world I
33:24think AIS are probably up to the task
33:25already so a friend of mine um who used
33:27to work with you says that one of the
33:29things you're really exceptional at is
33:30you tend to pick a neglected research
33:32domain with lots of promise you commit
33:34to it long term and then you become the
33:35best at it and many people in the world
33:38kind of get attracted to shiny things
33:40instead and kind of distracted by you
33:42know whatever is in Vogue but then it
33:43turns out to be less interesting
33:46um what are you thinking about working
33:47on next or what interests you is sort of
33:49the next wave of stuff to do
33:50I think the big thing I'm interested in
33:53is the reasoning problem and this is
33:55kind of motivated by my experience in
33:57these in these this game Space you look
34:01Alpha zero uh the latest version of
34:06I think that's held up like alphago in
34:08particular is held up as this like
34:11this big milestone in deep learning and
34:14to some extent it is like it was not
34:16doable without deep learning but it
34:18wasn't deep learning alone that enabled
34:20that if you take out
34:22the planning that's being done in
34:25alphago and just use the raw policy
34:28Network the raw neural network it's
34:30actually substantially below top human
34:34and with just like raw neural Nets we
34:36have all these things that are
34:37incredibly powerful like you know
34:41um image generation software but
34:44the raw neural net itself still can't
34:46play go it requires this extra planning
34:48algorithm on top of it to to achieve top
34:50Human Performance and that planning
34:52algorithm that's used in alphago Monte
34:54Carlo tree search is very domain
34:58um I think people don't appreciate just
35:00how domain specific it is because it
35:03works in chess it works in go and these
35:05have been like the classic domains that
35:07people have cared about for
35:09investigating these kinds of techniques
35:10um it doesn't work in poker it doesn't
35:12work in diplomacy because I've worked in
35:14those domains I kind of like recognize
35:15that this is this is a major weakness of
35:17these kinds of algorithms and so I think
35:19there's a big question of like okay how
35:20do we get these models to be able to do
35:24these like complex reasoning planning
35:28um with a more General system that can
35:29work across a wide variety of domains
35:32and if you can enable that if you can if
35:35you can do that if you can succeed in
35:36that task then it enables a lot of
35:38really powerful things
35:39like one of the domains that I'm
35:42is theorem proving you know it doesn't
35:45seem crazy to me that you could have a
35:49model that can prove the Riemann
35:50hypothesis within the next five years
35:54um if you can if you can solve the
35:56reasoning problem in a truly General way
35:58and yeah you know maybe maybe the
36:00inference cost is huge like maybe it
36:01costs a million dollars per token to
36:04generate that proof but that seems
36:06totally worth it if you can pull it off
36:07and maybe you can do other things with
36:09it too like maybe maybe that's the move
36:11that allows you to like you know write
36:14prize-winning novel maybe that enables
36:17you to come up with like life-saving
36:18drugs just for contacts the Riemann
36:20hypothesis is like considered the the
36:22most important unsolved problem in math
36:24where I don't know the first X set of um
36:28Solutions have been checked but we we
36:29don't know for sure yet yeah and I I
36:31think the the key is that
36:34that I'm really interested in is the
36:35generality like it we can approach we
36:38can solve this problem in domain
36:42but then it always ends up like kind of
36:43overfit to that domain and so I think
36:45what we need is something
36:47um as general as what we're seeing with
36:51um where you just throw it at any sort
36:53of problem and it works surprisingly
36:55well and um I guess you're implying that
36:59um there are ways to there are ways to
37:01frame the problem to make progress that
37:04are more General but really interesting
37:06to making progress in reasoning and that
37:07could be around math or possibly code is
37:11is that the right understanding my hope
37:14is that the techniques are General I
37:15mean I think it's important to also look
37:16at a wide variety of domains in order to
37:18like prevent you from overfitting
37:20um and yeah one of the domains that I
37:21think would also be a good fit is code
37:22generation because I think to write good
37:24code like next next token prediction is
37:27going is getting you surprisingly far
37:29um but I don't think it's gonna get you
37:30all the way there to like replacing you
37:33um engineers at at big companies yeah
37:36maybe one piece of just context for
37:38listeners is um copilot is amazing right
37:42but what we're doing with code
37:43generation today is very local context
37:45specific yeah and so if you want to like
37:48plan out like a whole a whole product
37:50like that's doesn't seem doable with
37:52existing technology and you know I think
37:54I think the perspective of a lot of
37:55people when they when they hear me say
37:56this is like well you know but you just
37:58scale it you know you scale up the
37:59models you scale the training and that's
38:00always worked in the past
38:02um and the example I like to give is you
38:04look at okay you look at alphago you
38:06could just in theory scale up the
38:08training scale up the model capacity and
38:10you don't need planning then you just
38:12have like a really large you run this
38:14reinforcement learning algorithm for a
38:15really long time you have this really
38:16big Network and it will eventually learn
38:18and in theory at least how to beat uh
38:21expert humans and go but there's a
38:23question of like okay well how much
38:24would you have to scale it up how much
38:25would you have to scale up this raw
38:27neural net the capacity and the training
38:29in order to match the performance that
38:31it achieves with Monte Carlo research
38:33and if you crunch the numbers and it
38:35ends up being a hundred thousand X
38:37now these models are already costing
38:40like 50 million dollars like clearly
38:41you're not going to scale them
38:43um by a hundred thousand X and and
38:47um and so then there's a question like
38:48okay well what do you do
38:51and the answer in alphago is like well
38:55having all that computation be during
38:59you also have it spend like 30 seconds
39:02to figure out what move to make next
39:03when it's actually playing the game and
39:06shifts the the cost burden from having
39:09to like pre-compute everything to then
39:11be able to think on the Fly
39:14and so that's why I think that Avenue
39:16seems like um the the piece that's
39:19missing a really random question because
39:21if you look at the human brain you have
39:22these various specialized modules with
39:25very specific functions right you have
39:26the visual cortex for visual processing
39:28you have like different things for
39:29emotion in terms of specific modules
39:31like there's specific um parts of the
39:33brain that if you ablate you remove
39:35um certain emotive or other capabilities
39:37right there have been accidents where
39:39like polls have gone through people's
39:40heads and ablated a very specific place
39:41and then people have survived and so you
39:43see this sort of very specific ablation
39:45of function through the ablation of
39:46specific modules why is it the correct
39:48assumption to think that there should be
39:49a generalizable architecture versus you
39:51just have a bunch of sub models that are
39:52all running together
39:54that collectively enable a wide range of
39:56behavior which is effectively what we
39:58see in the brain that's a good question
40:02that we need to be tied to a specific
40:04technique and the answer might be that
40:05we need to have like more specialized
40:07systems instead of just like one truly
40:09General architecture
40:10I think what I'm thinking about is is
40:12more the goal rather than the approach
40:13we want something that's able to succeed
40:15across a wide variety of domains and
40:17having to come up with like a unique
40:19approach to every single domain
40:20that gets you part of the way but I
40:22think that eventually that will be
40:24superseded by something that is really
40:26General yeah that makes sense and I
40:28guess you know one big domain is just
40:29reasoning right so I didn't mean to
40:31imply that it's different subtypes of
40:32reasoning will require different
40:34approaches but more there may be really
40:37uh fundamentally May function in a very
40:40um and again that may be incorrect right
40:41the brain is a evolved system which
40:44means it has enormous limitations in
40:45terms of where it came from and how it
40:47got created and you often end up with
40:50local Maxima when you evolve a system
40:52right um I was sort of curious about how
40:54you thought about that
40:55yeah there's certainly a risk always
40:56with research that you you could end up
40:57in a local minimum and it's like hard to
41:00people like overfish that and
41:02I think I think actually like machine
41:04language is an example of this like deep
41:06learning not many people were focused on
41:08this because they kind of assumed it was
41:09this dead end there were only a few
41:10people out in the you know like Canadian
41:13wilderness that were working on this
41:16um and that ended up like being
41:17tremendously successful
41:20this value and diversity um there's
41:22value and diversity of approaches and um
41:25I think I think it does help to try to
41:27think outside the box and try to do
41:28something that's look a little bit
41:29different than what everybody else is
41:30doing uh gnome you are gonna go work on
41:33this really interesting area
41:35um I'm sure there are other problems you
41:37think are interesting especially given
41:41I don't know practical limits of how
41:43much money we're willing to spend on
41:45scaling up Beyond another magnitude or
41:47two what do you think other researchers
41:50or teams should be working on that
41:51they're not paying enough attention to
41:53well I think we're in an interesting
41:55place now in AI where
41:58there is a huge opportunity to build out
42:02products for it's given what given where
42:04things are at now there's already an
42:06opportunity to build up products
42:08um that can have a big impact on the
42:09world it's great to see that there are
42:13going in that direction and trying to
42:15like bring these this Research into the
42:17real world and have a big impact there
42:18make people's lives better for what it's
42:21worth both a lot and I got emails from
42:23multiple people telling us that they're
42:25building price negotiation agents as
42:28they as we speak well that's I like I
42:31said I think it's doable so I think I
42:34think it's the right call
42:35I think on the research side there's
42:36there's still a lot of interesting
42:38questions about like how do we make
42:39these things more efficient
42:40um are there are there better
42:42architectures we can use I mean I think
42:43there's just so many questions across
42:45um that are interesting
42:47I think that the big thing I would I
42:48would recommend to researchers is not
42:51about like which area to focus on but
42:53just like the style of research I think
42:54there's a tendency to play it safe and
42:58to not take big risks and I think it's
43:00important to recognize that
43:02research is an inherently risky field
43:04you know there's a high probability that
43:07what you're working on is not going to
43:10is not going to be useful in the long
43:12you have to kind of accept that and be
43:14willing to take that risk anyway I mean
43:15this happened to me like by the early
43:19in the grand scheme of things really
43:22um it like it didn't make
43:24as much impact in the long term as I as
43:26I would have hoped and
43:27um and that's okay because
43:29you know I had one thing that ended up
43:31being quite impactful
43:36it's uh it's important to be able to
43:39um kind of like going to the field
43:40recognizing that that you are taking a
43:42risk already by going into research
43:45you heard it here first be like gnome
43:47work on things that make you nervous
43:49do you want to give a quick um minute
43:51overview of diplomacy so people can
43:52understand what it is and why the
43:54research was such a breakthrough yeah
43:55diplomacy is this game
43:57um it was developed in the 50s
43:59um it was actually developed by by this
44:02saw what happened to World War one and
44:04kind of viewed this as a diplomatic
44:05failure and so he wanted to create this
44:07game that would teach people how to be
44:09better diplomats essentially and so it
44:12takes place at the onset of World War
44:14One there's seven player powers that you
44:16can play as England France Germany Italy
44:19Russia turkey and Austria-Hungary and
44:22you engage in these like complex
44:24negotiations every turn and your goal is
44:27to try to control as much of the map as
44:28possible and the way you win is by
44:30controlling the majority of the map it's
44:31kind of like Hunger Games where even
44:34though only one person can win at the
44:35end of the day there's still this like
44:37incentive to be able to work together
44:38especially early on because you can both
44:40benefit and have a better chance of
44:42winning in the end if you work together
44:43and so you have these like really
44:45complex negotiations that happen
44:47um players and all the communication is
44:49done in private so yeah unlike a game
44:51like risk for example or similar to
44:54Katan where like all the negotiation is
44:55done in front of everybody else and
44:58diplomacy you will actually like pull
44:59somebody aside go into a corner like
45:01scheme about who you're going to attack
45:03together this turn who's going to
45:07after you've negotiated with everybody
45:08you write down what your moves are for
45:11and so then all the moves are read off
45:13at the same time and you can see if
45:15people like actually
45:16um follow through on their promises
45:17about like helping you and or maybe they
45:20lied to you and they're just going to
45:21attack you this turn
45:23um so it has it has like some elements
45:25of risk poker and Survivor because
45:30um big trust component
45:32and that's really the the essence of the
45:34game like can you build trust with
45:37others because the only way to succeed
45:39in this game is by working together even
45:43um you know you always have an incentive
45:46to to attack somebody and um grow at
45:49so yeah that's the game it's been around
45:50for a long time like I said since the
45:5250s it was JFK and Kissinger's favorite
45:53game there's research uh for this game
45:56for this game on from an AI angle going
46:00um but the idea that you could play this
46:02game in natural language with humans and
46:04beat them was just complete science
46:07um until a few years ago like it was
46:10still science fiction but we at least
46:11thought it was like worth pursuing it
46:13um and research really took off in 2019
46:15when researchers started using deep
46:17learning to make very big Bots for this
46:19game that could play the non-language
46:21version so there's no communication you
46:23just write down your moves and um and
46:26you kind of have to communicate
46:27non-verbally through the actions that
46:28you take we were doing research on this
46:30deepmind was doing research on this and
46:32then also University of Montreal and a
46:35couple other places as well and there
46:38was there was a lot of interest and
46:43take the risky bet of just like jumping
46:45to the end point and instead of taking
46:46an incremental approach aiming for full
46:48natural language diplomacy
46:50um and I'm glad that we aim for that
46:52yeah it seems like one of the pretty
46:54amazing things about what you all did is
46:55you basically created Bots that
46:58um other people that humans thought were
47:00other people and therefore they had to
47:03learn how to collaborate with each other
47:04or how to sometimes lie or deceive how
47:06to sometimes think through sort of
47:08multiple moves from a game theoretic
47:10perspective and so it's a radically
47:11different thing than playing chess or or
47:13playing go against another person and
47:15then just having almost a probabilistic
47:17tree of moves or something yeah you run
47:20into this like human element
47:22um you really have to understand the
47:24um what's really interesting about
47:25diplomacy aside from just the natural
47:27language component is that it really is
47:29the first major game AI breakthrough in
47:32a game that involves cooperation that's
47:34really important because you know at the
47:35end of the day when we make these AIS
47:36that play chess and go
47:38um we we're not developing them with the
47:40purpose of of beating humans and games
47:42we want to you know have them be useful
47:44in the real world and if you want to
47:45have these AIS be useful in the real
47:46world then they have to understand how
47:48to cooperate with humans as well a lot
47:50and I were talking about
47:51um Centaur play and uh whether or not
47:54that would persist as an idea at all
47:56given you know uh like we've accepted
47:59that AIS are going to win games at this
48:04you know the idea that AIS are going to
48:07take action by cooperating with humans
48:09that needs to be a core capability seems
48:11obvious and I am perhaps this is the um
48:15making myself feel better story but I am
48:18hopeful that that is a human skill that
48:20remains quite important being able to
48:22cooperate with AIS well from what I hear
48:24Centaur play is like hey guys have
48:27gotten so strong in games like chess
48:29that it's not clear if the human is
48:30really adding that much uh these days
48:32that's what I told Sarah too yeah I'm
48:35crying I get it I get it I accept it
48:36yeah um I don't I think I think the
48:39humans are still useful in a game like
48:40go because like the AIS are super strong
48:42but they will also sometimes like a few
48:44times in each game make these like
48:46really weird blunders
48:48um and in diplomacy I think yeah it's
48:51super helpful to have like an
48:53experienced human in addition to the AI
48:55though like it you know eventually I'd
48:57imagine that these systems become so
48:58strong that like it kind of goes the way
49:00of Chess where like the humans just kind
49:01of like adding a marginal difference at
49:03the end yeah I'm I'm actually just you
49:05know wondering how long that window is
49:07for humans and Centaur play in the game
49:09of life right but it's okay it's okay I
49:11got it a lot was right uh hopefully yeah
49:14hopefully forever but no we'll see yeah
49:17so do you mind explaining the work that
49:19you've done in poker and some of the
49:21breakthroughs that you made there as
49:22well yeah my PhD research was really
49:25focused on how do you get an AI to beat
49:27top humans in the game of No Limit Texas
49:29Hold'em poker especially specifically
49:30during my PhD it was on heads up no
49:32limit Texas hold and poker that's that's
49:36and this was a long-standing challenge
49:39problem actually if you go back to the
49:41original Papers written on Game Theory
49:43by John Nash the only application that's
49:45discussed in the paper is poker he
49:48actually analyzes this like simple
49:49three-player poker game uh in the paper
49:52and works out the national equilibrium
49:55um and then actually at the end he says
49:56like oh yeah it'd be really interesting
49:58to analyze a much more complex poker
50:01game using uh using this approach
50:03um so I'm glad we finally got a chance
50:04to do that you know 60 years later
50:09and it's it's interesting I think
50:12especially after alphago this became a
50:16very uh popular problem because after
50:21there was a big question of like okay
50:22well AIS can now beat humans at chess
50:24they can be humans that go what can't
50:25they do and the big thing that they
50:27couldn't do was be able to reason about
50:29hidden information be able to understand
50:32that okay this other player knows things
50:34that I don't know and I know things that
50:36they don't know and being able to
50:37overcome that problem in a strategic
50:40um was a big unanswered question
50:42and yeah so that was the focus on my
50:44research from basically
50:45um my whole grad school experience and
50:49there were a few different research Labs
50:50that were working on this and what would
50:53is every year we would all make a poker
50:55bot and we would play them against each
50:57other in this competition called the
50:59annual computer poker competition
51:00basically what happened is when I
51:02started my PhD there had already been
51:04like some progress in AI for poker and
51:08so the competition really turned into a
51:09competition of scaling there's about
51:11like 2.5 billion different hands that
51:13you could have on the river like the
51:15last round of Poker uh in Texas Hold'em
51:19what we would do is cluster those hands
51:23and uh using k-means clustering and like
51:25treat similar hands and hands
51:27identically and that allows you to
51:29compute a policy for poker because now
51:33instead of having to worry about 2.5
51:34billion hands and like having to uh come
51:37up with a policy for each one of those
51:38you can now like bucket them together
51:40and now you have like 5 000 buckets or
51:42something and you can actually compute a
51:44policy for that many buckets
51:46and so this was like before neural Nets
51:48that's why we were doing these like this
51:49Gaming's clustering thing instead of
51:50deep neural netspeaking kind of think of
51:52it as like the number of buckets that
51:53you have is kind of like the number of
51:55parameters that you have in your in your
51:57and so in grad school it kind of turned
52:00into a competition of scaling
52:02um how many buckets could you have in
52:04your Bot when I first year was like five
52:06thousand buckets then we got up to 30
52:08000 buckets and then 90 000 buckets
52:10um every year we would have these bigger
52:13and bigger models we would train them
52:14for longer parallelize them and they
52:16would always beat the previous year's
52:20we actually won the annual computer
52:22poker competition and after that we
52:24decided to take our bot
52:26and play it against expert human players
52:28and so this was the first
52:31um what was called the brains versus AI
52:32poker competition where we invited like
52:34these four top heads up no limits Texas
52:36hold and poker pros and we had them play
52:3880 000 hands of poker against our bot
52:41and the bot actually lost by a pretty
52:46and it occurred to me like during this
52:48competition that the way the humans were
52:50approaching the game was actually very
52:51different from how our bot was
52:55we would train our bot for like two
52:57months leading up to this competition
52:59you know on a thousand CPUs
53:02um but then when it came time to
53:03actually play the game it would act
53:06and the humans would would do something
53:08different like they would uh you know
53:10obviously they would practice ahead of
53:11time they would develop an intuition for
53:13the game but when they were playing the
53:15game against the bot and they were in a
53:16difficult spot they would sit there and
53:19and sometimes it was like five seconds
53:21sometimes it was like a minute but but
53:22they would think it that would allow
53:24them to come up with this better
53:26and it occurred to me that this might be
53:28like something that we're missing from
53:30our bot and so I did this analysis after
53:32the competition to figure out okay if we
53:34were to add this search this planning
53:36algorithm that would come up with a
53:39um when it's actually in the hand how
53:42much better could it do
53:44and the answer was it improved the
53:47performance uh by about a hundred
53:49thousand X it was the equivalent of
53:52like scaling the number of parameters
53:54scaling the training by a hundred
53:59the three years of my PhD at that point
54:01I had managed to scale things by about
54:04100x and you know that's like quite good
54:06I was very proud of that but
54:09when I saw that result it made me
54:10appreciate that everything I had done in
54:12my PhD up until that point was just a
54:14footnote compared to adding search and
54:18uh and so for the next year
54:21I just worked basically non-stop like
54:23100 hour weeks trying to scale up search
54:25make it as like throw as much
54:27computation at the problem at inference
54:29time as possible and then we did another
54:32competition in January 2017
54:34where we played against four top expert
54:38um two hundred thousand dollars in prize
54:40money to incentivize them to play their
54:41best and this time we completely crushed
54:43them people were literally telling us
54:45like poker players were literally
54:47telling us they did not think it was
54:49possible to beat uh expert poker players
54:52by that kind of margin yeah and so
54:56the story of like you know my grad
54:58school experience working on poker AI
54:59that was for two player poker
55:01um we ended up after that working on
55:03multiplayer poker on six player poker
55:05again the big breakthrough there was
55:07that we developed a more scalable search
55:09technique so instead of always having to
55:10search to the end of the game it could
55:12search just a couple moves ahead
55:14um and what was really interesting there
55:16is the bot we did another competition
55:18that bought one and that bot cost under
55:22150 to train if you were to run it on
55:24like a cloud computing service
55:27and I think that shows that this wasn't
55:33um compute it really was an algorithmic
55:36breakthrough and this this kind of
55:37result would have been doable 20 years
55:39ago if people knew the approach to take
55:43how about if you look at a lot of other
55:44games the those sorts of uh big shifts
55:48in performance from a bot relative to
55:50people then shifts how people play right
55:52they learn from the bot or they adapt
55:54their game from watching games at the
55:56Bots play how did that play out in terms
55:58of Poker oh that's yeah that's a great
56:00question so you know the competition
56:03it was it was really interesting because
56:05you know so kind of like as a last
56:07minute thing we uh we added this ability
56:08so okay the way the bot works
56:11we give it different bed sizes that it
56:14can use like there's the game that we
56:16were playing there's 20 000 chips 100
56:20um or 50 100 blinds actually
56:22um and so it can bid any it can bet any
56:24amount it wants from like a hundred
56:25dollars up to twenty thousand dollars
56:28there's not much value in like being
56:30able to bet both five thousand dollars
56:32and five thousand one dollars and so we
56:33would discretize that action space to
56:35constrain it to like only considering a
56:37few different options and so there's a
56:38question of like okay well what sizes do
56:39you give it the choice between
56:41you know towards the end when we were
56:43developing the spot it like we just had
56:44room for extra computation so we just
56:46like threw in some exercises
56:48um like four Forks the Paw 10x the pot
56:50like it's it doesn't cost that much more
56:52so wanna just give it the option
56:55um I didn't think it would actually use
56:56those sizes and then during the
56:57competition it it actually ended up
56:59using those sizes a lot
57:02um and it would sometimes bad like you
57:03know twenty thousand dollars into a 100
57:05pot which was completely unheard of in a
57:09professional poker play and um
57:12you know I was a little worried about
57:13this because I thought it was a mistake
57:14at first and I think the the players
57:16they were playing against also thought
57:18it was mistaken first
57:19um but then they they found that they
57:20kept ending up in these like really
57:22tricky situations and you know they
57:25would just really struggle with like
57:26whether to call or fold and that's
57:28that's how you know you're playing good
57:29poker if you see the other person like
57:31really struggling with the decision that
57:33is a sign that you're doing something
57:36and at the end they told us like yeah
57:38that's the one thing that we're gonna
57:39try to incorporate into our own play
57:41adding these like what are called over
57:43bets into into our strategy so instead
57:48typically the the strategy was like oh
57:50you bet between a quarter of the size of
57:52the pot and one times the pot and now in
57:55professional poker play it's actually
57:56really I want to say comment but it it
57:59it's uh it's as part of the strategy to
58:02bet sometimes like 5x the pot 10x the
58:05um if you can pull it off in the right
58:07way it can be a very powerful strategy
58:09reciprocal and I should also say like
58:12the way professional poker players train
58:15they all use Bots to assist them it's a
58:17lot like chess where
58:19um you play the game and then you have a
58:22bot analyze your play at the afterwards
58:23and see like okay did you make mistakes
58:25where do you make mistakes how could you
58:28um the game really has been demystified
58:30and become a lot like chess I kind of
58:34describe poker as essentially High
58:36dimensional chess it's ch it's like
58:38chess where you have to reason about
58:39like a probability distribution over
58:42actions instead of just like uh discrete
58:44actions yeah it's really it's really
58:46interesting because I don't think people
58:47really believe there was fully optimal
58:49play in poker before like they
58:51understood the probability distribution
58:52but uh if you're playing live poker like
58:55there's social cues right and social
59:00um that has clearly been swept out not
59:02as an activity of like
59:05uh enjoyment but uh in terms of a
59:08strategy that actually wins yeah I think
59:10that's surprising to a lot of people
59:11that this idea that there is an optimal
59:16you know there's this thing called the
59:18Nash equilibrium where if you're playing
59:19that strategy you always win yeah you
59:22you'll never lose so there's a subtle
59:27it guarantees that in the long run you
59:29will not lose an expectation and the
59:30reason for that is because like if
59:31you're playing against somebody else
59:31that's also playing the national
59:32equilibrium like obviously you can't
59:35um is going to lose or you're going to
59:37tie and so in expectation if you're
59:39playing against each other against
59:40somebody else that's playing the
59:41national equilibrium
59:42um you're gonna end up tying but in
59:44practice what ends up happening is if
59:45you're playing the national equilibrium
59:46in a complicated game like poker the
59:48other person is going to make these
59:50small mistakes over time and every
59:51mistake that they make is money into
59:54um and so you display the national
59:55equilibrium wait for them to make
59:57mistakes and you end up winning and that
59:59is now the conventional wisdom among
01:00:01poker players that you start by playing
01:00:03the Nash equilibrium
01:00:05if you're if you're really good you can
01:00:07look at the other players see how
01:00:10um deviating from the national
01:00:11equilibrium playing sub optimally and
01:00:12maybe you can like deviate yourself to
01:00:14capitalize on those mistakes but really
01:00:15the safe thing to do is play the
01:00:17national equilibrium let them make
01:00:19mistakes and every mistake that they
01:00:21make costs them money and put the money
01:00:22in your pocket I think that's all we
01:00:24have time for uh thank you so much for
01:00:26joining us on the podcast now yeah thank
01:00:28you very much for having me