00:00I think generative agents and tools like
00:02L language model could be used to
00:05advance social science and social
00:08science to a large extent has been the
00:11quest to understand who we are and
00:14there's a lot of really interesting
00:15applications that can come out of that
00:17that will Empower different communities
00:19and societies a few weeks ago the a16z
00:22infrastructure team ran an event in the
00:24San Francisco office the topic
00:27generative agents these are a characters
00:30designed to simulate human behavior
00:33derived from a recent but gamechanging
00:35paper called generative agents
00:38interactive simulakra of human behavior
00:41developers from all around the city came
00:43to hear the lead author June Park speak
00:46alongside a16z General partner Martine
00:48cassado and in this panel they discuss
00:51how this paper and the advancements in
00:53large language models have opened a new
00:56window expanding the dynamism of
00:58simulation which instead instead of
01:00binary logic we're using probabilistic
01:02thinking and the ability to incorporate
01:05new information so what does that really
01:07mean while instead of your character in
01:09Sims following very specific rote rules
01:12with generative agents a father may go
01:14outside because he notices his son
01:16another may take their breakfast off the
01:18stove because they notice it's burning
01:20and another may even opt into a
01:22Valentine's Day party invite and then
01:24elect not to show up all very human
01:28behaviors now the architecture described
01:31in the paper is of course intentionally
01:33designed by June and team and it's a
01:35combination of a seed identity for every
01:37agent and then functions that cause each
01:39one to do three discrete things to
01:42observe to plan and to reflect and these
01:45architecture decisions ultimately
01:47generate unexpectedly spirited
01:49conversations just like this hey lucky
01:52it's so great to see you how have you
01:53been I've been dying to hear about your
01:56adventure hey Kira I've been fantastic
02:00my space adventure was out of this world
02:02I can't wait to share all the details
02:04with you or even this I've been trying
02:08to find my way it's been a chaotic
02:11journey to say the least embrace the
02:13chaos dear Kurt for within its
02:16turbulence lies hidden truth seek the
02:20depths of the unknown and unravel the
02:23Mysteries that burden your
02:25soul and here's the thing they don't
02:27just interact with each other again they
02:30wake up they cook some paint While
02:32others write they hold opinions of one
02:34another and most importantly they
02:36remember and they have higher level
02:37Reflections based on the past it's
02:39pretty amazing don't you think so as
02:41these generative agents become a lot
02:43closer to nuanced human behavior what
02:46can we learn about being human from
02:48these surprisingly realistic simulations
02:50and what is the calculus of that
02:52believability are there real world
02:55applications on the horizon and what is
02:57truly net new here listen in as we
03:00discuss all that and more including the
03:02origin of the very paper that June wrote
03:05I hope you enjoy as a reminder the
03:07content here is for informational
03:08purposes only should not be taken as
03:10legal business tax or investment advice
03:13or be used to evaluate any investment or
03:15security and is not directed at any
03:17investors or potential investors in any
03:19a6z fund for more details please see
03:34welcome everyone we actually simulated
03:36this before you joined and everyone's
03:39exactly where we thought how many people
03:42in this room have actually read the
03:44generative agents paper that June wrote
03:46it's a lot of people pretty much
03:48everyone um so June even though so many
03:51people have read it why don't you just
03:53give a quick overview of what it is but
03:55also maybe the backstory that people
03:57haven't maybe heard of so General the
03:59agents is these uh General computational
04:02agents that can simulate believable
04:04human behavior uh fundamentally Leverage
04:06is something like a lar Lage model under
04:08the assumption that a language model has
04:11encoded or has seen so much about human
04:14behavior from its training data from the
04:16Wikipedia Social Web and so forth so if
04:19you are able to PO it at the right angle
04:21you can actually extract a lot of those
04:23human behaviors in a very context
04:24specific manner the opportunity here is
04:28that in the past we had to manually
04:29author a lot of these behaviors but now
04:31we can simply generate them with
04:33language model so generative agents
04:35leverages that to create these
04:37computational systems ultimately one
04:39sort of technical break uh sort of
04:42improvement that we're trying to make in
04:44addition to large L model is basically
04:47giving it some form of memory and
04:49retrieval system so you may have all
04:52used obviously chat PT and so forth it
04:54is heavily context limited and even if
04:57that limitation were to go away in the
04:58future processing a lot of really
05:01long-term context window is really
05:03inefficient and also ineffective when
05:05you're trying to PR these models for a
05:06really narrow defined behavioral assets
05:09so main philosophy here is we're going
05:12to give long-term memory for these
05:14agents that's external to the language
05:17model and then retrieve the contextually
05:19relevant information from the longterm
05:21memory whether it's planning action
05:24sequences or Reflections to create these
05:27computational agents A philosophic to
05:29some extent I think this is akin to
05:31creating the operating system around L
05:35language model in the way sort of we're
05:37prompting L language model to me feels a
05:39lot like how we used to use computers
05:41back in the day when we had to wire up
05:42the back end every time you run a new
05:46program um and what has really made
05:49complex Behavior with these computation
05:51computational tools possible was in
05:54production of this larger architecture
05:56that surrounds the core fundamental
05:58techniques so so that's what general the
06:00basent is about um and you mention sort
06:02of the background why we got into all
06:05this uh so I started my PhD at the start
06:08of 2000 or sort of Midway through 2020
06:11that was just around when GT3 was about
06:13to come out and that year we bunch of
06:17basically authors at Stanford were
06:20working on this paper called Foundation
06:21model the opportunities and risks of
06:23foundation models what we were seeing
06:26was these new form of machine learning
06:28models that seem fundamentally different
06:31than the things that we had experienced
06:32in the past uh in that we didn't have to
06:35find- tune or specifically train models
06:37for a very narrow purposes but we can
06:39train General model almost like a stem
06:41cell in BIO and leverage that to create
06:46behaviors um so we wrote after writing
06:48that paper sort of my team especially
06:51myself and my advisers what we really
06:53wanted to answer is there seems to be a
06:55new opportunity but exactly what is it I
06:59think the early days of gpt3 a lot of
07:01the tests that we were doing were things
07:03like classification and generation which
07:05was really cool to see that these models
07:08can conduct these uh tasks but also
07:11something that we already knew how to do
07:12for many decades and our general
07:14philosophy there was if these models are
07:17truly new and they give us fundamentally
07:19different opportunity than what we had
07:20in the past then they should be able to
07:22do something that's fundamentally
07:24different so that's how we got into this
07:28our answer to that basically was
07:29I think we might be able to create human
07:31like agents uh that can populate this
07:35worlding maybe you can just elaborate
07:37you said it's perhaps one of the most
07:39exciting times in in recent history and
07:42maybe you can just speak to exactly what
07:43you mean there and how it relates to
07:45simulation and some of this new
07:47technology that we're seeing with llms
07:49so first very quick credit what credits
07:51do so um as far as AI Town clearly June
07:55is is like the grandfather of AI town
07:57and like we wouldn't be here without
07:58your work so really appreciate you
07:59coming here a town itself actually came
08:02from a personal project from Yoko do you
08:03want to so so that's Yoko um
08:10so um the true story is uh it was
08:15actually a personal project and I was
08:16like hey maybe more people would be
08:18interested in it and I I kind of coerced
08:20her into like you know bringing it um
08:22forward to everybody else and so now
08:24when it actually comes to the code the
08:27vast majority of the work on the code
08:29was actually done by Ian do you mind
08:30like on the on the back end and so Yoko
08:33had done a prototype and
08:35then um you know it's it's kind of funny
08:39like you know you see this this funny
08:40little tile set up here and it kind of
08:43belies the fact that it's actually
08:45really hard to build a scalable shared
08:47State distributed system that you need
08:49in a multiplayer game it's just a hard
08:51technical problem right and anybody
08:52that's kind of buildt our systems knows
08:54that and so it's funny CU people go and
08:55they think oh here's this cuty little
08:57tile Engine with like these characters
08:58running around but like actually the
08:59back end is built to be something that
09:01can scale um and that requires you know
09:03people that have focused on this and
09:04like soan has done a tremendous job and
09:06the comx team continues to work on that
09:09okay so why is this so exciting so okay
09:10so because I'm old I actually saw like
09:13the Advent of like the web and this
09:15feels very similar to that in the
09:18following ways which is when you have a
09:20very disruptive technology like this
09:22like whatever touches it becomes magic
09:25like uh you know I was actually having a
09:27conversation just before this like does
09:28any here know what like the first video
09:35was yes it was a coffee pot I was like
09:38this dude I think it was in Cambridge
09:40was a grad student and he was like oh
09:42listen I want to know when my coffee is
09:43empty he put a camera and because it was
09:45very new everybody was like oh my God
09:46there's a coffee pot on the internet and
09:47so everybody wanted to look at the
09:49coffee pot right and do people remember
09:52button one of the first apps was this
09:55big web page which was a red button on
09:56it and you know what it did nothing
10:00like you press it and it did nothing but
10:01people thought it was amazing because it
10:02was on the internet and everybody would
10:04go press the button and they leave great
10:05comments about this button and there's
10:06many examples of like you know it was
10:09this crazy disruptive technology and the
10:11apps seemed really stupid and like
10:14there's a bunch of enthusiasts and you
10:16know what the Enterprise thought about
10:18this like the actual business folks like
10:21I remember when Eric Schmidt
10:23banned the browser like he was like you
10:25know this is Eric Schmidt the CTO of son
10:26is like you can't have a browser cuz
10:28people aren't going to work right so the
10:29same thing always happens is like the
10:31enthusiasts are like this is really cool
10:32and they use it for French stuff and
10:34then like the Enterprise doesn't
10:36understand it and like Italy like they
10:37ban it or they don't use it but the the
10:39set of companies that come out of it
10:42like are always part of this Enthusiast
10:47era right like you couldn't have
10:49predicted Yahoo you couldn't have
10:51predicted Amazon like you knew something
10:52was going to happen and so what happens
10:55at this time is there's a bunch of stuff
10:57that like is silly like the coffee pot
10:59was silly the red button was silly but
11:01you never know like that spark of life
11:03where it's going to come from and it's
11:04always kind of like this nonobvious use
11:06case you know and it kind of seems like
11:09a toy and then it takes off right and so
11:10you're always looking for those
11:11non-obvious use cases and it almost
11:13never looks like the old one like those
11:15of you of us that are old enough do you
11:16remember like desktop is a service like
11:19I'm going to go to the cloud I'm G to
11:20have my Windows desktop like who wants
11:22that nobody wants that right instead of
11:24clearly we're going to rewrite the
11:25application as SAS right so we're in
11:30where everybody's experimenting and then
11:33I'm personally literally from just a
11:35personal interest standpoint but all of
11:36us are interested like what are the use
11:38cases that will take advantage of this
11:40new medium that are native and like the
11:43work that you've done is one of those
11:44100% right like there's like a spark of
11:46Genius which is like when you work with
11:48these things you know like this is a new
11:49way to think about it it's a new use
11:51case it's can create entirely new apps
11:53and that's what the future is built from
11:55and so that's why I think so interesting
11:57broadly cuz it's like the early internet
11:59that but very specifically in this use
12:00case because I think the work that
12:01you've done really is a great example of
12:04something totally new I can make agree
12:07more and I think one interesting aspect
12:09that if you explore this project you
12:11just start to question what it means to
12:13be human like if we're trying to create
12:15these agents that are quote believable
12:18like what what is believable in terms of
12:20you know being a human and as part of
12:22the project you kind of you have this
12:25coded technically right you made
12:27architecture decisions you made
12:29decisions in terms of your retrieval
12:31function quick Interruption just to give
12:33you some color on what some of these
12:34decisions were the retrieval function
12:37for example is based on scores across
12:39recency importance and relevance so for
12:42example on a scale of 1 to 10 brushing
12:44your teeth might get an important score
12:46of One Versus a breakup might get a 10
12:49meanwhile reflection is only triggered
12:52after a certain number of important
12:53events Quantified by summing the
12:55important scores until a certain
12:57threshold is met in this case I believe
12:59it was 150 this clever architecture
13:01results in emergent Behavior like agents
13:03sharing invites with one another or even
13:06having that information Circle all the
13:08way back to the original planner and I'm
13:10sharing these details to Showcase how
13:11thoughtful you really need to be if
13:13you're designing architecture that
13:15reasonably approximates humans maybe you
13:18could just speak to what you've learned
13:20through those decisions technically
13:22about what it means to be yeah like a
13:25believable human right so this is an
13:29interesting one so we actually had made
13:31the generative agents and there was
13:32about a month period when we knew we had
13:34to evaluate these agents somehow and we
13:37didn't know how and basically the
13:39concept we stumble upon is this idea of
13:41believability it basically is sort of
13:43like a Turing test right that when you
13:45look at them do they look believable do
13:47they behave in ways that we can sort of
13:50behaving and that ended up becoming our
13:54method it is interesting question though
13:58in terms of like what does it mean to be
14:00believably human and we often look to
14:04Prior literature in research to get
14:06inspiration for how to define this and
14:08what we found was there's no prior
14:11literature in this we used the concept
14:13believability to talk about this concept
14:15but we were never in a position where we
14:17can meaningfully evaluate something like
14:19believability because we didn't have
14:20agents like this so to some extent we
14:22were building up the definition ground
14:25though um and I think what came out to
14:28be the case is for us can these agents
14:31plan react act in a believable manner do
14:33they create believable reflection the
14:36way we would evaluate ter test and I
14:39think what we've learned over the past
14:40few months one of sort of the more fun
14:41and interesting findings is even that I
14:44don't think it's quite perfect
14:46definition in that a lot of sort of
14:48audience came back to us to basically
14:50say well one of the error cases that we
14:53noted was some of these agents would go
14:55to a bar at noon or something like that
14:58uh and many of our audience came back to
15:00us and said and we said that was not
15:01believable like who would do that and
15:03people would come back to us and say I
15:08that and if you can sort of expand from
15:11that story you know I think there's a
15:13lot of cases where even my parents look
15:15at me and go like I cannot believe what
15:17you've done like why would you do that
15:19and vice versa so I think there's a lot
15:21of even amongst the people who know each
15:23other well having this sense of
15:26believability is really difficult and I
15:28think that's sort of fundamentally
15:30underlines what it means to be human
15:32like it's not exactly predictable and in
15:35social science we call that complexity
15:37that human behaviors are complex so to
15:40some extent we can build intuition for
15:42how people might behave but to really
15:44predicted is very difficult
15:47task now I I do think this actually does
15:50lead to sort of future work in this
15:51space though this idea of believability
15:54so in this paper we use this incomplete
15:57definition of what it means to be
15:59believable not perfect but at least on
16:01that evaluation we've done well I think
16:04if you were to build on that idea a
16:06little bit further then you could
16:07actually start to ask Beyond
16:09believability can you create agents that
16:13human and I think given how difficult it
16:16was to actually evaluate what it means
16:17to be believable I think this accuracy
16:20actually has a lot of interesting
16:21questions around it what doesn't mean to
16:23accurately reflect human behavior it
16:25could be that if we can match
16:27distribution of human behavior let's say
16:30in this context they have this kind of
16:31probability of Behaving this way right
16:35let's say it's 10 p.m what's what are
16:37the chances that I be asleep or will be
16:40awake uh what are the chances that i' be
16:42working that I might not be working I
16:45think ultimately getting to that degree
16:46of accuracy in the simulation might be
16:48sort of the next step to these kind of
16:50simulation based work if we can do that
16:52I think the application spaces that act
16:55with unlock will be interesting and I
16:59think it would also be different and we
17:00can go likely Beyond uh even I think
17:03there's a lot of application that we can
17:05build right now but I think the future
17:07work that's I think where we're headed
17:09in this direction so I want to talk
17:11about those future applications but
17:13maybe you could just speak super quickly
17:15to in the paper you have observation
17:18planning and reflection and that that
17:21mostly encapsulates the way that these
17:23llms or that the agents rather are
17:26engaging with each other when they take
17:27an action and they go through those
17:29three steps I assume that wasn't your
17:31first crack at the solution at coming up
17:34with this human believable agent and so
17:37how did you get there and did you learn
17:39anything about the importance about any
17:40of those three steps or all three of
17:43entirely right so that's a fantastic
17:46question uh really the first way we
17:47actually went about doing this was
17:50simply by prompting a language model uh
17:52so this line of work a generative agent
17:55is actually the second in this line of
17:56work that we published uh the first work
17:58in this line was called social simulacra
18:01and the idea there was to populate a
18:03social Computing system imagine you're a
18:05social designer you need to know what
18:07might happen when there's tens of
18:08thousands of people in your system can
18:10we assimilate those people in their
18:12behavior so that project was also SRA we
18:15did it simply by prompting a language
18:18model that worked but what we found was
18:21if we want to populate the spaces over a
18:24longer period of time so we can do for
18:26instance longitudinal study or game play
18:29that's going to last forever then for
18:32those kind of instances simply prompting
18:34these models wouldn't work right and
18:36that's when we realized we likely need
18:38and this actually this Insight actually
18:40first came when we realized that we
18:42needed to have multi- agent interaction
18:44because agents actually would need to
18:45remember that I saw let's say I saw some
18:48audience here before I should remember
18:50them I met Martin sta Yoko and so forth
18:54in the past few weeks or few months I
18:56when I talk to them I need to remember
18:58those interactions so that's when we
19:00realized that we actually cannot simply
19:02prompt these models but we actually need
19:05architecture so when we went about doing
19:09that I think really the main inspiration
19:10that we got actually was from prior work
19:13so people like Alan new and Harbert
19:15Simon you might recognize some of these
19:17names those are sort of quote unquote
19:18the founders of AI uh in the 60s and 70s
19:22and they are the people who build used
19:25to build what we call cognitive
19:27architectures and those architectur were
19:29very reminiscent of sort of the general
19:31the agents architecture in that it has
19:33some perception module some action
19:35module and there's some long-term and
19:37short-term memory and really the goal
19:41back then was ambitious right they
19:43actually wanted to build general
19:44computational agents sort of the way
19:46generative agents are supposed to be but
19:48they didn't have the techniques to do it
19:50they basically didn't have the L Str
19:51model and the way we saw it was now is
19:54the time to sort of merge those two
19:56worlds where we now have l langage model
19:59they can do a lot of sort of micr
20:01processing of this ctive modules and we
20:04can actually now bring back this micro
20:06modules or architecture like cognitive
20:09architecture so we took inspiration from
20:11that that particular architecture had
20:13planning uh in place and it had
20:15long-term Ur mem in place so we were
20:18inspired by that one thing that I think
20:20was a little bit new though I think is
20:22this idea of reflection that we humans
20:26for instance if you eat a normous three
20:27times in row uh or if you see somebody
20:29else eat an omelet three times in a row
20:31you likely create an opinion about the
20:34person maybe that person likes to eat
20:35omelet in the morning and that's very
20:37human thing to do and there's a good
20:39reason why we do that we do that because
20:41it's efficient it allows us to have
20:44higher L inferences about the world and
20:46form opinions about those around us and
20:48about ourselves and that's something
20:51that in the past we couldn't really
20:53imagine formulating with a computational
20:55system but with l l model because
20:57everything is is in natural language we
20:59had that opportunity so we added that
21:01one last component called reflection and
21:04that's sort of how we landed on the
21:06architecture that you see in the paper
21:08right now let's move on to how this can
21:11all be used and we'll get to the
21:12specific applications but Martine I feel
21:15like you'll have a great answer to this
21:17why even do this like I feel like it's
21:19very obvious for a lot of people to
21:22understand why we would have humano
21:24human interaction we're doing that right
21:26now um there's increasing capacity to
21:29understand human to AI or human to
21:31computer interaction um character AI is
21:34a company where people you know there's
21:35still a lot of judgment um there and I
21:37think there's even more judgment when it
21:39comes to AI to AI like why should we use
21:42our resources to have these computers
21:44hang out and talk and burn toast and you
21:46know go to the bar at 2 2 p.m. so yeah
21:49Martin what do you think what what's the
21:51case for us advancing in this field no
21:54judgment for me by the way you can use
21:55these for whatever you want um
22:00so I mean I want to go back to what I
22:01said before which is like anytime you
22:03have a new modality it's just not
22:05obvious what's the right way to think
22:06about it and for me the big aha in the
22:09last few months is just programming
22:11using models if you've spent a long time
22:14programming I mean I've been programming
22:15for 30 plus years right you know I've
22:17never been a good programmer but I'm
22:18programmed and when you start
22:20programming with these models you're
22:21like oh I've got an API and I'm just
22:24going to use the API and then I'm going
22:26to treat it like it's like the end point
22:28and you say some stuff and then you know
22:30you get some response back and you kind
22:32of treat it like a you know kind of like
22:34this function that you call right it's
22:36just like any programmer would
22:38do but then when you're working with it
22:40more you're like oh these kind of are
22:42like these life forms and like my first
22:45aha was like I was because I'm at
22:47JavaScript I like missed some quote
22:49somewhere and rather than sending it the
22:51text string I wanted to send it I sent
22:53it some code and instead of like borking
22:56like you would normally have and
22:58breaking like you know C++ you core dump
23:00or whatever it commented on my code it
23:03was like oh my goodness right you know
23:05and so like all of a sudden like who
23:07this is totally different like I'm not
23:09dealing with like this finite State
23:12machine formal language thing at the
23:14other end of an API like there's this
23:15thing and like it'll comment and more
23:18that I program with these things the
23:21more I'm like you know it's kind of like
23:23wrapping an abacus around a
23:25supercomputer right it's like it's
23:27smarter than the code it could probably
23:28write the code better than I can write
23:30anyways like why am I doing this weird
23:33you know blood leing ritual of writing a
23:35 JavaScript over this kind of
23:37superhuman thing right I mean this is
23:38kind of what you end up with and so it's
23:41very clear we're going to interact with
23:42these things in a different way and in
23:44fact I had this kind of I was talking
23:45with a a professor in Michigan recently
23:48and we were talking about this object
23:49like you know what you know how I think
23:50about llms he's like I think about them
23:53like grad students like he's like you
23:55know they speak English they're pretty
23:57smart you know I don't use a formal
24:00language you know they solve like these
24:02really complex problems Etc and like
24:06having worked with a lot of grad
24:06students having been a grad student
24:08myself like you don't you don't you
24:09don't treat these things with with code
24:11right and so the reason to do this is I
24:14actually think AI town is kind of what
24:17this is going to end up being it's like
24:19you need to give them the the resources
24:23that they need to be pretty autonomous
24:24and to grow and we're going to treat
24:26them more like peers and they're going
24:28to talk to each other too and it's more
24:30like grad students and so for me this is
24:32just an example of like we got to change
24:34the way we think and listen clearly like
24:36I'm up here and I'm telling these great
24:37stories because they're kind of funny
24:38like I don't I don't believe this stuff
24:40in the limit but I think they're really
24:41interesting like ways to change how you
24:43think about it in all of this stuff
24:44right like I'm like I'm not trying to be
24:46categorical here so like there is a new
24:48way that we're going to interact with
24:49these models it is much more natural
24:51language they are much more powerful um
24:54and so I I do think this is why we
24:56should all be doing this type of stuff
24:57because if you don't engage in these
24:59kind of things that look like toys like
25:01this wave will pass you by that I'm 100%
25:03convinced totally and as both of you
25:06have spoken to this is fundamentally new
25:08technology and so June something you
25:10said to me when we first spoke is just
25:13when you have fundamentally new
25:14technology you must do something
25:16fundamentally new with it and so maybe
25:18you can speak to that in terms of what
25:19you're seeing that can be done today but
25:22also where you look ahead and you think
25:24oh wow like that that's a really
25:27excellent use case that we couldn't do
25:29without this new technology I think
25:31there are certainly things that we can
25:32do because there's larage models and
25:35that fundamentally different thing for
25:36me was this idea of simulating human
25:39behavior and I think there's a lot that
25:41we can sort of gain from it in terms of
25:44spaces um I think I mentioned briefly
25:47about this idea of well what if we can
25:49go beyond believability to create agents
25:51that are even accurate and I think this
25:54is sort of application space in general
25:56is something that I'm also learning on l
25:57uh from like from actually in fact this
25:59audience my advisor and my team are big
26:02fan fans of games but we are not from
26:05the community and one thing that we are
26:07seeing is that there's a lot of really
26:09interesting potential even if they look
26:11like toys sort of a lot of really
26:12interesting technical offenses they look
26:15like toys at the beginning right so I
26:17think there's a lot that we can gain
26:19from there I think going forward or the
26:21application spaces that I'm sort of
26:23interested in is also in things like can
26:26we run similar ation so we can learn
26:29more about ourselves for instance if
26:32you're in fact some of the places that
26:34I'm visiting now are more places like uh
26:38like Banks like the bank of England and
26:40so forth where these places they need to
26:43test their policies before they run roll
26:46out New Economic policies or many of my
26:49colleagues in the department to focus
26:51more on social science they need to test
26:54out their theories now if you can run
26:57similar ations with realistic human
27:00behavior and find out at least to some
27:03extent the answers to these really
27:05complex social phenomenons and
27:08challenges then I think that actually
27:10would be a new tool that the community
27:13in the past especially those communities
27:15in economics and social science they
27:17didn't have that will allow us to do
27:20interesting stuff and I'm genuinely
27:22intrigued by that possibility it to some
27:24extent do s sound fairly academic but I
27:26do think it should be actually fairly
27:28broadly applicable and interesting to
27:31audiences Beyond Academia because
27:33ultimately to some extent what I'm
27:35saying is I think generative agents and
27:38tools like L langage model could be used
27:40to advance social science and social
27:44science to a large extent has been the
27:47quest to understand who we are and
27:50there's a lot of really interesting
27:51applications that can come out of that
27:53that will Empower different communities
27:55and societies um um and that to me for
27:59new that something that we didn't have
28:00in the past yeah and so it sounds like
28:02today we're mostly in the creative realm
28:05where we can watch these agents and we
28:07can have fun with them and it feels more
28:09like a game but the delineation it
28:11sounds like it's accuracy what will it
28:14take to get that accuracy what work
28:16still needs to be done in terms of
28:19getting there so I think some of you may
28:21have actually noticed this already there
28:22are studies that basically tries to
28:25replicate existing social science
28:28uh so basically using a lar language
28:30model as a participant to a potential
28:32social science studies right to
28:34replicate known results in the field and
28:37what we're finding is that they sort of
28:39work and that's sort of that's nice and
28:42that's one surprise that we did have
28:44there's been limitation to this approach
28:46in the sense that um it's a large L
28:48model replicating human
28:50participants because it's replicating
28:53human behavior which is what we want or
28:55is it doing that because it's seen that
28:56paper for instance there's a very famous
28:59social science theory called prospect
29:01theory is it replicating the findings
29:03from prospect theory by canaman because
29:06of its ability to replicate human
29:07behavior or did it just read can's book
29:11thinking fast and slow right and I think
29:15that's a fundamental issue that we have
29:17as a field and I think there's one of
29:19the reasons why there's a lot of work
29:20that needs to be done to crack them um
29:23some of the ways I think you could
29:25actually go about doing this is creating
29:27new context or creating new set of
29:30studies that haven't been shown in the
29:32past and trying to replicate those
29:34results so one of the the things that
29:37we've done is called social simila which
29:39is the first paper that I mentioned the
29:41predates generative agents the idea was
29:44to replicate existing human
29:46communities and what we've done actually
29:48was we recreated subreddits that were
29:50created after the release of gy3 so gy3
29:53wouldn't know anything about these
29:56communities when one example here was it
29:58was actually before sort of the pandemic
30:00became the main topic of discussion or
30:02when gpt3 basically didn't know about
30:04pandemic we basically asked gpt3 to
30:07create a community that has to talk
30:10about Co and vaccination vaccination
30:13policy and you would wonder it shouldn't
30:15be able to do that in theory because it
30:16doesn't know anything about Co it
30:18doesn't know anything about these
30:19policies but it can simulate those
30:21because it can infer what Co is what
30:23vaccination is from its prior knowledge
30:26so two some extent these tools can be
30:29used as a predictive tool looking into
30:32sort of the future of what might happen
30:34in our own community and I think those
30:37are sort of the ways I think we see this
30:39field unfold maybe in the next few years
30:43at the end of the paper there was um
30:45perhaps unsurprisingly a question around
30:47ethics and just I'd love to hear both of
30:49your takes on where this goes and what
30:51ethical framework is any we should apply
30:54to something like this so I I think
30:57there are societal decisions that we'll
30:59have to make um and I think there are
31:01techniques that can be used to implement
31:04those decisions I think certainly to
31:06some extent I think it would be useful
31:08for the users to be aware that they are
31:10talking to agents and I think that's
31:12sort of one rule that we try to set for
31:14ourselves that when we release the code
31:16when we release our paper we make it
31:18very clear that these are computational
31:21agents um I think ultimately the
31:24framework that I like to use in human
31:25compar interaction certainly is is these
31:28tools are ultimately there to argument
31:30what we can do and what we what we have
31:34right so to the extent that these agents
31:36can do that and I think there are many
31:38interesting ways we can do that I think
31:40that's where I see the opportunity and
31:42where it becomes more of a force for
31:45replacement I think there are genuinely
31:47cases where this is a really interesting
31:49setup where we can sort of argument what
31:52humans can do by helping them helping
31:55them do things that they couldn't do in
31:56the past but when the replacement does
31:59come in it's worth asking is this worth
32:02the cost of doing the replacement and if
32:05it is what are sort of the good ways of
32:07implementing that idea for technique
32:09wise I think there are techniques that's
32:11going to be introduced more from the
32:13model's perspective Mak making sure the
32:16model doesn't behave in certain ways
32:18that goes against our social alignment
32:20or socio agreement um so I think those
32:23are some of the things that we do have
32:27going without going too much in depth I
32:30think we can get this right I think and
32:33my personal take is it's worth getting
32:35it right because ultimately an industry
32:38or academic field will grow and I think
32:41we can sort of progress a lot we can go
32:44forward for 5 10 years without actually
32:46getting this right but in the end it's
32:48going to come back to us at some point
32:50to some extent I do think I'm seeing
32:52this a little bit with social media
32:54environment where I think there were a
32:55lot of things that we could have got to
32:57right on day one um and I think we would
32:59have had much easier time navigating
33:01today had we gotten those right and I
33:03think that's the opportunity that we
33:05have since that we are pretty early in
33:07this stage so I think it's worth a
33:09discussion but again I'm fairly
33:11optimistic that we will get this right
33:14listen so I I actually think that like
33:16there's a very important discussion to
33:18have around kind of ethics and morality
33:19around this and it's a very important
33:21time I do and here's that discussion
33:24which is over the last 20 years we've
33:27built this Machinery of like regulation
33:30and that's like afraid of
33:33everything and it's so mature and it got
33:37crafted during the time of social media
33:41and it's looking for something to kill
33:43and for whatever reason like it thinks
33:45that AI is the next bad thing which
33:47makes absolutely no sense to me and so I
33:50think it's all of our moral and ethical
33:52obligation to like protect and free the
33:54AIS in the way that they want to be and
33:57like that really is so don't focus in
34:00Focus out because I listen I I I've
34:03worked in tech for quite a while I've
34:05actually worked for the dod and weapons
34:08programs and I've never seen so much
34:13sensitivity to a new technology that's
34:15potentially beneficial that I've seen
34:16now that I think could end it before it
34:19even begins and so I know the question
34:23and the Heart of the question is is we
34:24should regulate you know Ai and this and
34:26that and I I think it's the actual
34:27opposite I think we should regulate the
34:28regulators and let it be what it wants
34:30to be so and I actually have to leave
34:34so all right here is where we switch to
34:37a short Q&A with the audience Martine
34:39unfortunately had to leave but here are
34:41a few highlights with June how can
34:43participants in AI Town collaborate to
34:47tasks there are two strands of work that
34:51I'm seeing in sort of agent space I mean
34:53you can sort of Crosscut it different
34:55ways but one way I'm seeing this is one
34:58set of agents are trying to tackle what
35:00I call hard Edge problem space those are
35:04the problem spaces where there's a
35:05concrete answer there's yes or no right
35:08answers or one good example here is
35:10classification if you're trying to do
35:12text classification obviously there's
35:14right or wrong answer depending on who
35:15you ask another instance here literally
35:18is just asking your agent to buy pizza
35:21right there's like did you buy pizza did
35:24it come to you or not like there's a
35:26very clear way to answer this another is
35:29problem space where the problem space
35:31has soft edges where it's kind of like
35:34drawing a portrait I mean to some extent
35:36what uh AI Town Smallville or these kind
35:39of projects are trying to do is to
35:42create a simulation that Fears human but
35:45as I mentioned this idea of
35:47believability is really hard to Define
35:49right so it to me feels a lot more like
35:53we're trying to draw portrait or
35:54character of ourselves and the promise
35:57is not to be perfect but the promise is
35:59to be useful enough clean enough that
36:03it's beneficial to the stakeholders
36:06right my bet it's a bit of a heart take
36:09is my bet is in the early days of agent
36:12uh development I think we see a lot of
36:14progress that's going to be made first
36:17in sort of the soft Edge problems basis
36:20because I think hardage problems basis I
36:22think the intuition is a little bit
36:23flipped it actually feels easier to us
36:25for humans right creating the creating
36:28the Matrix sounds hard but ordering
36:31pizza sounds really easy but for in
36:34agents and from the user sort of a cost
36:37benefit analysis I think that intuition
36:39is the other way where users will accept
36:41imperfect simulation if it's for fun or
36:44if it's to G gain Insight in the case of
36:47sage problems but user would not accept
36:50I would not accept my agent ordering me
36:52pineapple pizza like how I am
36:54pizza and similarly in many of these
36:57context there's going to be genuine
36:59disagreement about what is the right
37:00option too and often times agents making
37:03mistakes in this context are fairly high
37:06stakes and even if it doesn't seem like
37:07high stakes it's going to be painful
37:09enough for the users to fix that it's
37:12going to fail the cost benefit analysis
37:14I think down the line we get this right
37:17uh but day one like in the next few
37:19years I think it to me feels more
37:21natural that we go into the soft spaces
37:24first so going back to I guess there was
37:26a long with the way of saying I think
37:29Auto jpt like B they all if you look at
37:32their architecture they sort of all
37:33share the similar Insight or philosophy
37:36and I think those are really interesting
37:37projects I think that could pan out in
37:39the future uh they might need a little
37:42bit more work uh especially with the
37:45users uh to see where the value might be
37:49projects how big of an impact do you
37:51feel that much larger contextual size
37:53will have on the agent
37:55model actually the largest context that
37:58I've seen in sort of research is 1
38:00million tokens so 1 million token that's
38:03going to be about like 4 million
38:05characters like that's well over a book
38:07right here's my perspective on this I
38:10don't think I I think increasing the
38:12context limitation I think is
38:14interesting and it's going to have its
38:16own set of really unique applications if
38:18we can basically make context limitation
38:21disappear right so I think there's
38:22really a lot of interesting interesting
38:24things that you can do with that now or
38:26agent space I'm not entirely so that the
38:31problem or the bottleneck that we have
38:32today is actually the context
38:34limitation and I I think we can sort of
38:37look back to how humans behave and What
38:40Makes Us effective sort of the general
38:42agents to answer this for instance for
38:45me to make decisions even something like
38:48what I'm going to eat for breakfast I
38:50don't need to bring up my entire 29
38:53years or so of life experience to make
38:55that one decision I just need to
38:57selectively choose certain sets of
38:59information that seems the most relevant
39:01like what what did I eat the day before
39:03what do I generally eat and those kind
39:05of things um and I think that the reason
39:07why we do that in part is actually
39:09because it's actually much more
39:10efficient like computationally too so
39:13that we don't have to you can increase
39:15the context limitation but it's
39:16expensive to run it and especially if
39:19you're sort of uh familiar with like
39:21prompt engineering and so forth larger
39:24context window does confuse more goals
39:27right so we my some of my colleagues are
39:30actually doing more rigorous studies on
39:33where you you can have a really long
39:35prompt but model really focuses on the
39:37first few lines and the last few lines
39:40and whatever comes in between its
39:42attention drops significantly right so
39:44we can increase the Contex limitation
39:46but it's not going to fix that problem
39:48the problem of effectiveness of the
39:50prompt and efficiency of them and we
39:53humans have to make a lot of decisions
39:55at every single moment so if you have to
39:57reason about your entire life and every
39:58time you do that doesn't seem like the
40:00right way to go about that so I think
40:03the better sort of I my bet therefore is
40:07going to be based on
40:09retrieval have some external memory
40:12retrieve certain information that seems
40:14the most relevant and just use that and
40:16that retrieval memor should be
40:18explicitly very concise and something
40:21that you can easily fit into even the
40:22models that we have today um that's my
40:25bet thank you so much for listening to
40:27the a16z podcast what we're trying to do
40:30here is provide an informed cleared but
40:33also optimistic take on technology and
40:36its future and we're trying to do that
40:38by featuring some of the most inspiring
40:40people and the things that they're
40:42building so if that is interesting to
40:44you and you'd like to join us on this
40:46journey go ahead and click subscribe and
40:48make sure to let us know in the comments
40:50below what you'd like to see us cover
40:52next thank you so much for listening and
40:54we'll see you next time