00:05IMB is a company developing AI agents
00:08that can reason and code today elad and
00:10I sit down with Ken Jin q and Josh
00:12albrech co-founders of imbu to discuss
00:15training large Foundation models for
00:17high level reasoning why agents require
00:19architectures different from large
00:21language models or language token
00:23prediction models and how current
00:25computers are getting in the way of
00:26their users kenun Josh welcome to no
00:29priors thank you thanks so perhaps you
00:32can start by just telling us the story
00:33of how you guys know each other and
00:35where the idea fromb came from Josh and
00:38I met at a conference and then started a
00:40big house together uh it was Big House
00:4320% house and also started this first
00:47company around the same time I've always
00:49been really interested in agency uh and
00:51kind of like how do we enable humans to
00:53have more agency and Josh has always
00:55been really interested in AI uh and so
00:58kind of made sense we at that time
01:01talked about like oh you know someday
01:02we're going to be able to have ai
01:04systems that give humans a lot more
01:05agency fast forward to 2018 or so uh we
01:11were running an AI recruiting company
01:14called sorceress and that was actually
01:16kind of the first AI agent that we built
01:19um it was you know not Transformer
01:21models uh like more old school NLP uh
01:24but it was a system that recruiters used
01:27and kind of automatically got candidates
01:29in their inbox and we learn a lot about
01:31oh if you have an autonomous system like
01:33this like what do you actually need to
01:34make it work and around that time some
01:37of our housemates were building gbd3 and
01:40uh we were seeing like oh scaling works
01:43you know if we just keep scaling
01:45actually you're going to get pretty far
01:46with a lot of these language models so
01:49our question at that time was you know
01:51how far can we get with language models
01:53does this kind of self-supervised
01:54learning which is working so well on
01:56language work in other modalities as
01:58well so in early 2020 that's when we
02:02first started seeing self-supervised
02:04learning working across video and images
02:07and language and we were like huh
02:09there's something really interesting
02:10here where maybe machines are learning
02:12the same kinds of representations or
02:13similar representations to what humans
02:15are learning and maybe they can get to a
02:16point where they can actually do the
02:18types of things that humans are able to
02:20do and that's when we first started imbw
02:22or start started talking about inew you
02:25clearly know a bunch of people working
02:27at sort of large uh language model
02:29research Labs well when you looked at
02:31what they were doing how did the focus
02:34come to be on agents in particular and
02:36how is that different from a general
02:38language model yeah I I think we've
02:40always been interested in agents in not
02:43just you know recommender systems or
02:46classifiers or things like that but in
02:48systems that are going to go do real
02:49work for us right that are going to
02:51actually be useful in the real world
02:53right now you can ask some kind of chat
02:54poot something and it'll give you back a
02:56response but the burden is sort of on
02:57you to go do something with that to
02:58verify whether it's correct or or not I
03:00think the real promise of AI is if we
03:02can get systems that can actually act on
03:03our behalf and can accomplish goals and
03:05kind of do these larger things and sort
03:07of free us up to uh to focus on the
03:09things we're interested in yeah one
03:10thing that I think we often forget
03:12because we're in it every day is our
03:14computerss are actually like they need
03:17to be micromanaged the reason we're in
03:18front of our computers every day is
03:20because nothing really happens like they
03:21can't make decisions on their own
03:22nothing really happens unless I'm in
03:24front of it and I'm like in front of it
03:26doing all this really detailed stuff uh
03:28kind of like operating a factory machine
03:30with all of these like little knobs that
03:32are really specific and there is a
03:34future where computers don't need to be
03:36micromanaged that I can like give a
03:37computer an instruction whether it's
03:39natural language or something some other
03:41kind of instruction like it can go off
03:42and understand what I'm trying to do and
03:45help me do it the diff between this
03:48where we are today and that is kind of
03:49like the diff between the first
03:50calculator and where computers are today
03:52like the first computer very first
03:54digital computer was a room siiz
03:55calculator all it did was calculate uh 4
03:59transforms and things like that and uh I
04:02think that's kind of the potential for
04:04where AI can be given the current like
04:08where te where the technology is going
04:11um it's very possible what do you think
04:13is um so when I look at Technologies
04:14there's there's almost like three types
04:16right there's things that just are never
04:17going to work and maybe some aspects of
04:20Theos were that right like there were
04:22questions whether the physics of Theos
04:23would ever work um as you Minze things
04:26sufficiently there's things that can
04:28work immediately today or with a little
04:30bit of work or engineering you can get
04:31there and then there are things that are
04:33clearly going to happen and they're at
04:35some point in the future so for example
04:37in the 90s people def find out almost
04:39everything that cell phones would do and
04:41then eventually it got there once you
04:42had better processors on phones and more
04:44bandwidth in terms of cellular networks
04:46like you need to build a bunch of stuff
04:47in terms of infrastructure and it was
04:49clear what was going to happen what do
04:51you think is missing if anything
04:52technologically to start to build Real
04:55Worlds sort of Performing agents the way
04:57we think about agent tasks is it's a
04:59spectrum of difficulty so some agents
05:02are very possible today like we we see a
05:04lot of them there are these
05:06conversational thoughts that take over
05:07some of the customer success workflows
05:09and they'll fail over to real customer
05:11success people if the agent doesn't know
05:14how to deal with it um and those are
05:16actually what we see inside companies is
05:18that they actually have pretty complex
05:21reasoning workflows um they're somewhat
05:23hardcoded and so they're not General
05:25they don't generalize to other company
05:26workflows but you know we are already
05:29seeing agents um and then maybe there
05:32are two Spectra there's like specific to
05:35General so today we have very specific
05:37agents and over time we can you know if
05:40they are better at reasoning and better
05:42at certain other things um interacting
05:44with your computer then they become more
05:45General be able to use the same agent
05:48and it'll learn something new um and
05:50then there's also a spectrum from uh
05:53like co-pilot to more autonomous so
05:56today we see a lot of co-pilots uh and
05:58the human in the loop and over time it
06:00becomes kind of incrementally more
06:01autonomous and so I don't see it as
06:04being so binary like oh there is a
06:06technology missing for agents but rather
06:08that as capabilities improve we're going
06:10to see more and more of these use cases
06:13be eaten up by more General more
06:15autonomous agents there are few
06:16categories of things that are in the way
06:18today so I would say where we are today
06:21we're kind of um in the era of like um
06:25maybe lossy ethernet with no air
06:27correction or like analog computer
06:30something like that where um we have
06:33these models and they don't work
06:36reliably and that's you know when we
06:38talk to Founders building agents that's
06:39really the biggest thing it's really
06:41hard to get these systems to work
06:42reliably and output exactly what I
06:44wanted to output and um kind of do the
06:47right thing at every step all the time
06:49and so the question is okay how do you
06:51get it to work more reliably well
06:54there's you know a lot of why we work on
06:56reasoning and when we say reasoning um
06:58it's kind of all of the things
07:00around uh getting tasks done in the
07:02world like should when does a system
07:04come back to you how does it know it's
07:05not certain about its output um Can it
07:08kind of think through different action
07:10plans and figure out okay this plan is
07:12the better plan and we should try going
07:14down this path first reasoning is one
07:16big piece of improving reliability and
07:20second chunk of things is like all of
07:23this error correction and and I think
07:25like Chain of Thought tree of thought uh
07:27these are air correction techniques we
07:29have lot of other techniques internally
07:31and uh that also helps improve
07:33reliability and so if we think about
07:35this problem as a reliability problem
07:37then you can incrementally make a lot of
07:39progress on it I I loved your framework
07:41of sort of generalizability and um sort
07:43of that two by two that you had if I
07:45look at a lot of the language models
07:47today what I'm observing a lot of people
07:48doing is they basically start off
07:51prototyping something say on GPT 4
07:53because it's the most advanced model
07:55they see if it works or not and if it
07:57works and they have any sort of scale
08:00in some cases they moved to GPT 3.5 and
08:02sometimes you know they've thought about
08:04fine tuning or not but sometimes they'll
08:05mve to an open source model which works
08:07dramatically less well in some cases but
08:09then they'll fine-tune it for a high
08:11volume use case and it's all because of
08:13cost optimization basically as you know
08:15like if you have a really big model it
08:17costs a lot more for inference in terms
08:19of compute than like a smaller model how
08:21do you think about that relative to
08:22generalizability because I guess if you
08:24make something really generalizable my
08:25assumption which may be incorrect is
08:27it's more expensive right you'll need
08:28some forms of memory for it you need
08:30some broader logical capabilities versus
08:32just saying I'm just going to do the
08:33thing that's going to like order flights
08:35really well or whatever it may be in
08:37terms of agents and so I'm sort of a
08:39little bit curious about that framework
08:41I think that's a that's the right way of
08:42thinking about it I think when Ken Jin
08:43was saying you know the Spectrum from
08:45more specialized to more generalizable I
08:47think we're talking about the ability to
08:49solve more General problems like the
08:51ability to do these problems you've only
08:52seen once or twice I think even as that
08:54ability goes up we're still going to see
08:56kind of you know a thing coming behind
08:58that a force that takes each of those
09:00things like maybe you start out by doing
09:01your plane booking with gp4 but
09:03eventually you realize like oh actually
09:05like this is so expensive and slow like
09:06I just want the thing to be really good
09:08at it but what you can do is you can
09:09apply these agents and this is part of
09:11the reason why we're interested in
09:12agents that code you can apply those
09:14agents to the original General system to
09:16have it go make a more specialized
09:17version of that so it's kind of
09:19specializing the things that you're
09:20doing a lot and you can look at like
09:21each of those things like okay I'm
09:22making 10,000 calls to this this is
09:24super expensive can I just write a piece
09:25of python code that does this right as
09:27you have more General capabil
09:29you actually can use those more General
09:31capabilities to kind of do that
09:33specialization what we see with a lot of
09:35the agent Builders today is that they'll
09:37use uh you know an agent workflow is
09:39complicated it has lots of different
09:41pieces and so they may use a specialized
09:43model for parts of it and a general
09:44model for other parts of it and the way
09:46we think about it it's quite pragmatic
09:48uh basically that as capabilities
09:50increase uh what we want is like minimal
09:53viable models for each capability um and
09:56so a lot of the models are much smaller
09:58and very specific and like pretty
10:00specifically trained in the personal
10:02computer kind of revolution around that
10:04time I think there was kind of like
10:05branching so some people built
10:08supercomputers they're like oh we're
10:09going to like make the more powerful
10:10computer and other people built personal
10:12computers and it turns out like personal
10:14computers much bigger market and
10:15supercomputers not that many people
10:16needed that much computing power and I
10:19suspect we're going to see something
10:20similar where a lot of use cases are
10:22going to be able to be addressed by
10:24something pretty pragmatic and
10:25relatively small we're like not we're
10:28definitely not pushing the of what we
10:29can do with data today on small models
10:32and so yeah smaller things can work well
10:35I want to go back to um what I think is
10:38like a really deep topic of discussion
10:40at imbue in terms of how you Define
10:43reasoning and like this being an area of
10:46differentiation in terms of your
10:48research efforts like you know we have a
10:51bunch of friends at open aai and other
10:53labs working now publicly on multi-step
10:55reasoning and more process supervision
10:58as you were Des describing what makes
11:00you excited and confident that there
11:02needs to be a different approach versus
11:04just general language models in order to
11:06make the reasoning you need for agents
11:08to work work I think there's a different
11:11process like language models are great
11:13they're really good you know predicting
11:14the next word they're good at you know
11:16making like a very easy classifier
11:19they're good at all sorts of things but
11:21there is and there are obvious limits
11:23like we know even in theoretical senses
11:25like they cannot learn to do
11:26multiplication in the general sense cuz
11:28it literally doesn't fit in the context
11:30window right like multiplication they
11:32can learn to do addition in a modular
11:34sense and they can learn to do it
11:35actually almost perfectly if you train
11:36them in the proper way but they're not
11:37learning the general algorithm for
11:39addition instead if you want something
11:41to actually execute the general
11:43algorithm for addition you need to have
11:44a thing that works in a different way
11:46that has some sort of outer loop about
11:48what step should I take next right
11:49that's just a kind of like definitional
11:51thing there has to be some other sort of
11:53rapper there has to be a different sort
11:54of outside process everyone at open Ai
11:56and at MW and at anthropic we all like
11:58know how this works I don't think anyone
12:00is proposing like it's just you know
12:01shove it all in the language model you
12:03can get really far but I think we're
12:04interested in what is that other higher
12:06level system how do we decide what is
12:08the right Next Step to take when should
12:10I go collect more information you know
12:12am I certain about this like all these
12:13kinds of other things those are I think
12:15the questions that are much more
12:16interesting and I think there's actually
12:17a lot of work to be done there I think
12:18we're still very early in the days of
12:20like creating these systems natural
12:22language is not a bad um medium for it
12:24code is also another example of medium
12:26for it language is pretty compressed uh
12:28and so that's that's helpful for dealing
12:31with these situations is that one of the
12:33reasons you all um decided to focus on
12:35code is one of the first types of agents
12:37that you have started with or could you
12:39explain more about the logic behind that
12:41yeah I think for us code is useful when
12:46we're thinking about reasoning one way
12:48that we are one way that we're sort of
12:49making you know collectively reasoning
12:51agents today is Founders are just
12:53hardcoding the reasoning process of like
12:55okay if there's a customer support
12:57complaint about this thing then I do
12:58this if it's like this then I do that
13:00and so you have this like very special
13:01case version of the thing right and
13:02there's a spectrum between code and
13:05language or more kind of General
13:06reasoning abilities but it's a spectrum
13:07it's not a binary thing I think and so
13:09you can have code now that we have these
13:11language models that kind of mixes the
13:13language models and the code layer right
13:15where it's like sometimes you're using
13:16the language model to decide what to do
13:17sometimes you're using an if statement
13:19and so it's more about like a fusing or
13:21like melding of these two different
13:22things and being able to like be in the
13:24right place on that spectrum and so code
13:26is actually like a really important part
13:28of this and as you do things that you
13:30want to do more robustly and you want to
13:31do in a more repeatable way then you
13:33want to move it more towards code right
13:35and so to the extent that you've never
13:37seen this task before maybe you should
13:38be doing it in this more kind of
13:39nebulous intuitive sense and then over
13:41time get better at it critique it and
13:42turn it more into code
13:44actually yeah and when we see Founders
13:47and ourselves building these agents um
13:49and people shipping them into production
13:51um and us shipping them internally for
13:53ourselves like basically the the agent
13:56Loop can be very complex and breaks down
13:58into different chunks and we can like
14:00turn certain chunks into code um and it
14:02really feels like programming in a lot
14:03of ways um so there's something kind of
14:05interesting can you talk a little bit
14:08about just um where you begin in terms
14:11of like how to structure the research
14:13effort like if there are certain tasks
14:15you work on if you start by working on
14:18policy or reinforcement on certain tasks
14:20or there's data you want to collect like
14:22how do you start yeah so we have this
14:25idea we call serious use where basically
14:27we should be building agents that we
14:29want to use every day this is actually
14:31one of the biggest blockers like it's
14:32really hard to get agents we want to use
14:34every day because of the reliability
14:36issues and so uh a lot of what we work
14:39on is coding agents but we also work on
14:40agents for other operational business
14:42processes and that kind of helps Drive
14:46oh okay like these parts of the agent
14:48Loop are really complicated like can we
14:50simplify them can we make them more
14:52reliable and in a lot in a lot of ways
14:54it is an incremental kind of set of work
14:58that helps us get from like you know 60%
15:01reliable to 70% to 80% and that's what
15:04forces development of new techniques
15:06it's not like oh magical you know we
15:07train a giant model and stick everything
15:09into it and then magically it works like
15:11work um it'll get better at random parts
15:14of the agent Loop but that's not what we
15:16want and is the premise here like you
15:18start with a serious use smaller task in
15:22code or something like a um like a
15:24recruiting communication automation task
15:27or how do you choose yeah we pick tasks
15:31kind of depending on a bunch of
15:32different factors one like how useful
15:34how frequent how possible is this going
15:36to be to do right how generally
15:37applicable is it how much is it going to
15:39help push the techniques that we want to
15:41push forward does it scale to more
15:43complex versions of the task yeah so
15:45we're purposely trying to pick you know
15:46some with some diversity like we have
15:48you know one agent that will just go do
15:50a random to do in your code base and so
15:52this can be super super General it can
15:53take a really really long time to do
15:54this right and we have another the
15:56opposite end of that spectrum is we have
15:58agent that will look at every single you
16:00know pull request and run a linter
16:03against it and ask like okay are there
16:04any like type errors okay how do I fix
16:06them all right great here's like a PR
16:08with me fixing the type errors for you
16:09but very very specific but really you
16:11can imagine how you know you can invoke
16:13the to-do agent to fix a specific type
16:15error and you can expand the type error
16:17fixer to do unit tests and to do
16:19security flaws and to do renaming these
16:21variables and they sort of meet in the
16:22middle as you kind of make these things
16:23both more capable and so there just
16:25different ways of kind of looking at the
16:27problem of how do we make a use full
16:28coding agent and Eli to your point of
16:30kind of the specialized versus General
16:32dichotomy um one thing that is kind of
16:35interesting that we're seeing in agents
16:36is like agents can call sub agents so
16:39our to-do agent can figure out like oh
16:41there's already a sub agent for this
16:42thing um for this function you're trying
16:44to write and like let me call that sub
16:45agent because it seems likely to succeed
16:48and then if I try and it doesn't succeed
16:50I'll do something else and so uh you can
16:53kind of have this like more General
16:55reasoning layer and also a bunch of sub
16:57agents where where it that that General
16:59reasoning later is actually very
17:01specific it's a specific planner um it's
17:04not that good at like browsing the web
17:06and things like that but the system
17:07itself alog together is more General as
17:09a result how do you guys um do
17:12evaluation for both these like let's say
17:15categories of agents that you're working
17:16with today from the I assume more uh
17:20closer to production grade to do to
17:22broader coding agents yeah the
17:24evaluations I think are actually one of
17:26the most important parts and one of the
17:27places where we spend the most time and
17:29think about it kind of the most there's
17:30there's a lot of work in specifying
17:33exactly what you want from the to-do
17:34agent for example right like how do you
17:36know like it gives you back some code
17:38okay is that good there's sort of a
17:40spectrum but like if it's faster it's
17:42better if it gives you less code that's
17:44better but if there's bugs that's not
17:46good so you really need to take it and
17:48like break down what did I really want
17:49to happen here and I think when you
17:51start to break this down and you start
17:52to say okay there's some things that are
17:53kind of qualitative like do I trust it
17:55did it come back with test can I run
17:56this code immediately like the kind of
17:58feel of it there's other things that are
18:00just for the code itself there are
18:01different attributes is it in the same
18:02style does it have good variable names
18:04like is it a minimal change or did it
18:06change all sorts of stuff that it didn't
18:07really need to change each of those
18:08things are actually something you can
18:09measure a little bit more easily than
18:11the overall task so you can make another
18:13kind of metric that's like okay how good
18:15are the variable names well all right
18:16how similar are they you can break that
18:18down you can kind of keep breaking it
18:19down until you get to a point where it's
18:20like okay I mean you know a regular
18:21language model even just a person
18:22looking at this like there's an
18:23objective answer one of the reasons why
18:25we work on code is that there are
18:27objective answers to a lot of these
18:28questions either the test pass or they
18:30don't either the function is correct or
18:31it isn't those kind of things are much
18:33easier to evaluate and so we're starting
18:35a lot more of our tests are in that zone
18:37as we sort of build up eventually to the
18:39ones that are a little bit more
18:39qualitative because the evaluation is so
18:41much harder there but I think the whole
18:43the strategy of breaking these things
18:45down like basically the strategy is we
18:47take the output or the answer and we
18:50like ask a bunch of questions about the
18:52output and then we evaluate those
18:53questions and we also evaluate the
18:55output and the interesting thing about
18:57that is it's scales pretty well to like
18:59non-code tasks so for like our
19:00recruiting tasks we can also do a very
19:02similar process um I think part of why a
19:05lot of teams try to work on just math or
19:07code reasoning is because those are the
19:09easiest to evaluate and like the
19:11clearest answers but um but just relying
19:15on like is the output correct or not
19:17that loses a lot of information in the
19:19evaluation yeah I I think it's likely to
19:21be a pretty rich space I'm curious for
19:23your point of view but we've looked at a
19:24lot of startups building let's say
19:27interesting AI like development tools
19:29right and and one of the things that
19:31we've spent a bunch of time thinking
19:33about is like what makes for a good
19:35scalable eval Loop right and that could
19:38be objective and easy to test right like
19:40it doesn't compile to things that as you
19:43said might be richer in data like how
19:45easy is it to check the functionality of
19:48something right do you have to do static
19:50analysis is the performance better are
19:52there examples if you want to focus on a
19:55particular problem like let's say like
19:57python 2 to 3 upgrades or something
19:59right I think one of the things that's
20:01most attractive about this domain is
20:04there are lots of ways to evaluate right
20:07even beyond the the contributions to
20:09reasoning that that you guys describe
20:10and it's just going to be productive uh
20:12I mean maybe on that topic like do you
20:15do you guys think of yourselves as a
20:16product company like is it important to
20:18go get this um functionality in front of
20:21users or or or just focus on research
20:24and and sort of how do you think about
20:25that sequencing yeah of course we're a
20:27product company we're
20:28um but I think looking at the history of
20:31computing there is like a right time for
20:33technology uh and today I think what you
20:36see what you both see is that it's
20:39pretty hard to make agents that work
20:49productionizing as like a bucket term
20:52we've described a little bit all of the
20:54nuances of what we're actually trying to
20:56do to get agents to work and then we
20:57lump that all under the term reasoning
20:59because it's easier for people to
21:00conceptualize but the reality is like
21:02what we're trying to do is to make kind
21:04of a system a set of tools and maybe
21:08Frameworks that actually makes it so
21:10that we can build reliable agents really
21:13fast really easily so today my like
21:16writing agents feels like writing code
21:18in assembly and that really limits the
21:21types of Agents we can build and also
21:23limits the number of people who can
21:24build them and kind of what where we're
21:28going toward is like program programming
21:31languages that are a little bit more
21:33ergonomic at where we can build agents
21:35much more easily where they can work
21:37much better and where a lot more people
21:39can build them whatever it is that we
21:41release that's what we hope is it's
21:43going to enable um and so that's why we
21:46kind of work on different parts of the
21:48stack we work on the underlying models
21:50because there need to be like more
21:53specific underlying models that work uh
21:55for specific things and that's what
21:58allows a lot of these capabilities um
22:00and agents to be more reliable we also
22:02work on other other pieces of it as well
22:04maybe if we just project forward a
22:06little bit like what are you guys most
22:08excited about you want to be a um tools
22:11at different levels of the stock company
22:13uh what are you imagining people build
22:15or what are you already seeing people
22:16build that you think is going to be
22:18let's say useful a year from now and
22:20useful five years from now yeah I think
22:22a year from now we're going to start to
22:23see some of these use cases actually
22:26work that today you could you can these
22:28like we have the capabilities you can
22:29make some kind of agent to triage your
22:31email or to do scheduling or many of
22:33these workflows like we really should
22:35like why don't we have that today that
22:36definitely can be done right like
22:37there's nothing stopping us uh and I
22:39think five years from now we're going to
22:41have something where it's not just you
22:42know okay we have a scheduling bot we
22:44have this other thing but we really have
22:45these more General more robust systems
22:47where each of us can individually say
22:50like I want a thing that does this I
22:51want to do this particular weird
22:52research workflow and I want it to work
22:54like this and blah blah blah and just
22:55specify it in language I think one that
22:58personalized agents one yeah one thing
23:00that our recruiter mentioned yesterday
23:02that I thought was kind of funny is
23:03she's been describing to Ken and it's
23:04like we're actually sort of a software
23:08Dev tooling company but the idea is that
23:11in the future everyone is going like as
23:14we make these things easier and easier
23:15to program really everyone's going to be
23:17a like sort of software engineer in that
23:18sense like we'll be able to make our own
23:19agents right just by sort of working in
23:22natural language and like describing
23:23what we want to do and how we want it to
23:24be done and interacting at that level
23:27and so since we're going to be working
23:28when these agents we're kind of making
23:29we're like trying to move towards that
23:30kind of tooling and so I think the goal
23:32in five years is for people to be able
23:34to really specify some huge range of
23:36possible agents that you know that do
23:38exactly what they want like they can
23:39interact with their computer in whatever
23:40way they want I think specifically what
23:42she said is we're a software Dev tooling
23:44company but in the future everyone will
23:45be a software engineer and so everyone
23:47will need de Dev tools um and we think
23:49of Agents agents are a very technical
23:51term uh that's like the specific memory
23:54architecture of the computer um but
23:56agents what they enable is there like a
23:58natural language programming language
24:00and so in the future you know it's you
24:03know computer programming computers
24:05today a way to think about the problem
24:07is that it's really not very intuitive
24:10to get our computers to do what they we
24:12want them to do and computers have been
24:14becoming more and more intuitive over
24:16time and the best tools are very
24:17intuitive and so one day you know
24:19language is very intuitive to us Vis
24:21like Vision kind of seeing understanding
24:23things that way very intuitive to us and
24:25our computers will become much more
24:26intuitive so that people can make them
24:28do what they want more people can one
24:31one uh major Milestone um that you had
24:33recently was you announced a $200
24:34million fund raise from uh estera Nvidia
24:39and a variety of other folks how do you
24:41think about um What proportion of that
24:42will go to things like compute versus
24:44team and how in general should AI
24:46companies think about uh the capital
24:49they raise and how to deploy it relative
24:50to different potential objectives and
24:52outcomes I mean I think actually a
24:54significant fraction of that is going to
24:55go to compute I think I can't speak to
24:58other companies how they should deploy
24:59it but I think for us given that our
25:01goal is to make agents what what we
25:03really want actually as a company is not
25:05to become a huge company we don't want
25:07tens of thousands of people we want to
25:08make our product actually work so that
25:10we can make AI agents so we can have
25:12some huge impact and have a relatively
25:14small close-knit team where the
25:15communication is much easier it's really
25:17hard to communicate with 10,000 people
25:18it's much easier to get 100 people in a
25:20room and know you know what the heck you
25:21want to do and and agree on things and
25:23so I think we're trying to ideally
25:25leverage ourselves and we're already
25:26starting to do that today and what that
25:27looks like is by spending a bunch on
25:29compute today you know we don't have ai
25:31agents that are running off and doing
25:32all sorts of things on their own but we
25:34do have the beginnings of those you know
25:35we do have our internal you know hyper
25:37parameter Optimizer for example which
25:39saves us a ton of time instead of our
25:40researchers manually deciding like oh
25:42this learning rate I should do this
25:43experiment we just it go we come back
25:45after the night and it's like oh great
25:47everything is optimized this is really
25:48nice right but that used a lot of
25:49compute like we're using a huge amount
25:51of compute relative to each person yeah
25:53we're like training State
25:54state-of-the-art models with like 14 13
25:58um most of us are not working on
25:59training the model the and most of us
26:02are not working on that um and so it's
26:04it's the total team size is very small
26:08for the what we're able to do because of
26:10the way we think about our
26:11infrastructure it's like a very agentic
26:13kind of approach to infrastructure it's
26:16now sort of broadly viewed that there
26:17will not be a fully monolithic
26:19architecture for lots of useful models
26:22and people have like mixture of experts
26:23and such given what you want to do with
26:26agents with like planning and
26:28reinforcement learning and more test
26:30time compute like I think it's sort of a
26:33uh belief among the largest research
26:37under 5,000 gpus like under some um
26:41reasonable level like you cannot compete
26:44on state-of-the-art reasoning at least
26:47as the core llms describe it today
26:50obviously that that bar keeps moving um
26:53does that number apply to you do you
26:54think the architecture is just very
26:58have a lot of gpus so number may or may
27:00not apply but we do have a lot of gpus
27:02we have enough compute to be able to
27:05train models that are as large as the
27:06largest models have been trained today
27:08to date so we have a ton of comput we
27:10can train these really large models uh
27:12and it may not be the best use of our
27:14time and resources actually because um I
27:17think just as with like computers things
27:20just get more efficient and what we see
27:22is that things are getting more
27:23efficient in training so like learning
27:25how to use data more effectively uh so
27:28that the models get much better
27:29performance with less data um learning
27:31how to uh like do training runs so that
27:35so things don't you know diverge like
27:36there's all things don't diverge we're
27:38not having to like rerun the same thing
27:39again there's like a bunch of there's a
27:40bu of like uh hyper parameters Sy set
27:43and like tooling to build around it and
27:45monitoring and stuff like that that just
27:46makes it more efficient to train these
27:48things and then also I think the data
27:50piece is just so big and so
27:52underexplored like people don't really
27:55um we all know that data is the that
27:57matters um and I think like a lot of
28:00efficiency gains are going to come from
28:02better data um and so that's actually
28:04quite a bit of what we work on could you
28:06tell us a little bit more about why you
28:07decided to focus on coding and what are
28:09the types of systems you're really
28:10focused on building yeah so there's a
28:12there's a bunch of different reasons for
28:13focusing on coding one of them is that
28:16the evalu like we talked about before
28:17the evaluations are much easier to do
28:19and it's objective another one is that
28:21coding is part of reasoning another one
28:23is that coding really helps us
28:25accelerate both our own work and the
28:27agents that we end up building so as
28:29we're making the tools for ourselves we
28:31already are starting to see like this
28:32kind of Leverage from the systems that
28:34we've built where like we can run this
28:36agent now you know I think probably
28:38within the next year we'll probably you
28:39know not be hiring as many recruiting
28:41coordinators because oh we're going to
28:42do some of the scheduling with the agent
28:43that we've built right but we also can
28:45do the same thing on the software
28:46engineering side we're writing unit
28:48tests literally right now automatically
28:51okay and that's just helping accelerate
28:52us helping you know remove the bugs it's
28:54additive it's incremental it's like okay
28:56we get a 5% gain a 10% gain here but as
28:58we make more and more tools those things
29:00compound and I think over time it's
29:02going to be possible to make much more
29:04robust systems much more quickly and
29:06we're building you know we're using
29:07these coding agents to write the coding
29:09agents and I think this is kind of the
29:11like you know sort of recursive
29:12self-improvement thing that people have
29:14always been sort of worried about or
29:15excited about in in AI but I think what
29:18it really looks like in practice is not
29:20this scary like oh you you know you
29:22leave your computer on overnight and all
29:23of a sudden it's a super super God thing
29:25the next day instead it's like slow
29:28grind of making things a little bit
29:30better every day but a 1% Improvement
29:33every day over a year is huge right and
29:35so I think that's the kind of thing that
29:37that we're really excited about with
29:38code is that not only can we apply it to
29:40our own workflows but also as we start
29:42to actually get coding agents that can
29:43really write code now we're in a very
29:46unlimited like very interesting space
29:48right now the bottleneck for most
29:49companies is the ability to hire
29:51software Engineers that can write really
29:52robust code right but if you can just
29:55turn compute into really good code
29:57now this is a totally different world
30:00now there's none of this like oh yeah
30:02well MB is so much smaller than this
30:03other company blah blah blah it's like
30:05no no we we can write way more code than
30:08anyone else right so I think this is
30:10kind of a pretty interesting thing that
30:11over time I think uh you know we'd like
30:13to work towards and so that's another
30:14reason for code as well there also code
30:16is really useful for Action so agents
30:19acting um and today you know even uh the
30:22models can like do really simple things
30:24like write code to write Integrations uh
30:26like API Integrations and so that saves
30:27us a lot of time writing API
30:29Integrations which is super annoying um
30:31also think like software is just
30:32dramatically underwritten because it's
30:34so hard to write code today so you know
30:36as we said in the future like computers
30:38will be able to be programmed by Regular
30:39People what that means is like we're
30:40going to write way way way more software
30:42all the time um and like people will
30:44write software but maybe not by having
30:46to write code the agents write the code
30:48I think it's not only just more software
30:50but also better software right if like
30:52already we're having our agents kind of
30:53look at our po request you know fix the
30:54type error is okay but we can extend
30:56this to adding new unit tests to fixing
30:58the existing unit tests to looking for
31:00security flaws like I'm very excited
31:02about agents that can go out and help
31:04all sorts of organizations improve the
31:06quality of their codebase how can we
31:08simplify this refactor it fixed security
31:09flaws I think there'll just be a huge
31:11flourishing of much higher quality
31:13better software as a result not just
31:14more software but just taking the
31:16existing software and making it so much
31:17better which will make it so much nicer
31:19and more fun to interact with as
31:20programmers as well also uh much more
31:23custom software uh like something that
31:25we do some of is generating interfaces
31:28and it's pretty interesting like if I
31:29can have like a custom interface for
31:31whatever it is that I'm trying to do it
31:33has exactly the right form Fields it's
31:35like kind of nice and then I can cach
31:36that interface and like reuse it um so
31:38you know pragmatic but pretty
31:40interesting yeah I mean I did this over
31:42the weekend actually for Mid journey I
31:43got really sick of typing out mid
31:44Journey prompts on my phone uh in
31:46Discord you can't keep iterating the
31:48prompts so I just made like a little a
31:49little thing that interacts you know
31:51with it via the API well my version of
31:53the API but uh yeah and so but I think
31:56everyone will be able to to do this like
31:57it didn't actually take that much code
31:59when we have agents that can write code
32:00someone else who wants to use it in a
32:01different way great you can just like
32:03ask the agent to do that come back five
32:05minutes later and you have like your own
32:06perfect way of interacting with this I
32:08think that's just going to make our
32:09computers feel so much nicer to interact
32:11with that I think is an inspiring note
32:13to to end on we're going to have 25
32:15person companies who can change the
32:17world we're going to have more software
32:18more custom software and higher quality
32:21software um for us all to use so thanks
32:24so much for doing this uh Josh and
32:25kenjin yeah thanks for joining us thank
32:28s find us on Twitter at no prior pod
32:32subscribe to our YouTube channel if you
32:33want to see our faces follow the show on
32:35Apple podcasts Spotify or wherever you
32:38listen that way you get a new episode
32:40every week and sign up for emails or
32:41find transcripts for every episode at