00:01hi this is Lan from Lang chain I want to
00:04talk about using Lang graph for code
00:06Generation Um so co- generation is one
00:09of the really interesting applications
00:10of llms like we've seen projects like
00:13GitHub co-pilot become extremely popular
00:16um and a few weeks ago a paper came out
00:20um by the folks at codium AI called
00:23Alpha codium and this was really cool
00:26paper in particular because it
00:28introduced this idea
00:30of doing code generation using what you
00:34can think of as like flow engineering so
00:37instead of just like an llm a coding
00:41prompt like solve this problem and a
00:43solution what it does is it generates a
00:47set of solutions ranks them so that's
00:50fine that's like kind of standard like
00:52kind of prompt response style flow but
00:56what it does here that I want to draw
00:58your attention to is if it actually
01:00tests that code in a few different ways
01:03on public tests and on AI generated
01:06tests and the key point is this it
01:08actually iterates and tries to improve
01:11the solution based upon those test
01:12results so that was really
01:15interesting and a tweet came out by
01:18karpathy on this theme which kind of
01:22mentions hey this idea of flow
01:24engineering is a really nice Paradigm
01:26moving away from just like naive prompt
01:30flow where you can build up an answer
01:32itely over time using
01:35testing so it's a really nice idea and
01:39what's kind of cool is a few weeks ago
01:42we introduced Lang graph as a way to
01:44build kind of arbitrary graphs which can
01:46represent different kinds of flows and
01:49I've done some videos on this previously
01:51talking about Lang graph or things like
01:53rag where like you can do retrieval and
01:55then you can do like a retrieval quality
01:57check like grade the documents if
01:59they're not good you can like try to
02:02retrieve again or you can like do a web
02:04search or something but it's a way to
02:06represent arbitrary kind of logical
02:10llms in a lot of the same way we do with
02:12agents but the benefit of graphs is that
02:14you can outline a flow that's a little
02:16bit more it's kind of like an agent with
02:18guardrails it's like you define the
02:21steps in a very particular order and
02:24every time you run the graph it just
02:27order so what I want to do is I want to
02:30try to implement some of the ideas from
02:31alpha codium using L graph and we're
02:35going to do that right now so in
02:38particular let's say we want to answer
02:40coding questions about some part of the
02:43Lang chain documentation and for this
02:45I'm going to choose the L chain
02:47expression language docs so it's a
02:49subset of our docs it's around 60,000
02:52tokens and it focuses only on line chain
02:55expression language which is basically a
02:56way you can represent chains using
02:58inline chain and we'll talk about that
03:01bit but I want to do a few simple things
03:04so I want to have one what we're going
03:07to call a node in our graph that takes a
03:09question and outputs an answer using
03:12Lang chain expression language docs as a
03:15reference and then with that answer I
03:19want to be able to parse out components
03:22so given the answer I want to be able to
03:23parse out like the Preamble what is this
03:26answering the import specifically and
03:29then the code and to do this I want to
03:32use some like a pedantic object so it's
03:36formatted if I have that I can really
03:39easily Implement tests for things like
03:42check to make sure the Imports work
03:45check to make sure the code executes and
03:47if either of those fail I can loop back
03:50to my generation node and say Hey try
03:52again here's the error Trace so again
03:55what they're doing in Al codium is way
03:57more sophisticated I don't mean to
03:58suggest we're iing imple M this as is um
04:01this actually works on like a bunch of
04:03public coding challenges it actually has
04:06tests um for each question that are both
04:08add and publicly available so again
04:13we're doing something much simpler but I
04:14want to show how you can Implement these
04:16kinds of ideas and you can make it
04:18arbitrarily complex if you
04:20want so I'm going to copy over some code
04:24into a notebook that I have running and
04:26all I've done is I've just done some pip
04:27installs and I've BAS to find a few
04:30environment variables for Lang Smith
04:32which we'll see later is pretty
04:34useful and I'm going to call this
04:36docs so this is where I'm going to
04:38ingest the docs related to Lang
04:40expression language and I'm going to
04:43kick off uh this right now so that's
04:47running so again this is using a URL
04:49loader grab all the docs sort them and
04:51clean them a little bit and here we go
04:54so here we go these are all the docs
04:56related to Lang and expression language
04:58it's around 60,000 token tokens I've
05:00measured it in the past so there's our
05:03docs now I I want to show you something
05:05that's very useful um I'll call it tool
05:09use um with open ey models and and other
05:12LMS have similar functionality but I
05:14want to show you something that's really
05:16useful um what I'm going to do here is
05:19show how to build a chain that will
05:23output remember we talked about in our
05:25diagram we want three things for every
05:27solution we want a preamble we want want
05:29Imports we want code as a structured
05:32object that we can like work with
05:33individually I'll show you right here
05:35how to do that so we're doing is we're
05:38importing um uh from pantic this base
05:42model and field and we're defining a
05:43data model for our output so I want a
05:46prefix which is just like the plain
05:48language solution like here's the setup
05:50to the problem the import statement and
05:53the code I want those as three distinct
05:54things that I can work with
05:56later I'll use dpd4 uh1 25 say 128
06:00context window model um and what I'm
06:04going to do is I'm going to take this
06:06this data model I'm going to turn it
06:07into a tool and I'm going to bind it to
06:10my model and so basically What's
06:12Happening Here is it's going to always
06:14perform a function call to attempt to
06:15Output in this format I specify here
06:18that's all it's happening I Define a
06:21prompt that says here's all the L cell
06:23they're Lang CH expression language
06:25pronounced or or substituted as LC
06:29here's all LCL docs answer the questions
06:32structure your output in a few ways but
06:34what's cool is we're always growing that
06:36function call to basically try to Output
06:38a pantic object so there we go now
06:41what's nice is I can just invoke this
06:44with a question so let's just try
06:47that so I'm going to say
06:50question and I'm going to say how to
06:54create a rag chain in NLC we want to
07:02um okay this needs to be a dict there we
07:08running now this is actually just we can
07:11see right here we passed in all those
07:14docs that we previously loaded so it's
07:16like 60,000 tokens of context and again
07:19you think about newer long context llms
07:21like Gemini that becomes more more
07:23feasible to do do things like this take
07:26like a whole a whole code base a whole
07:28set of documentation load it and just
07:29stuff it into a model and have it say
07:32hey answer questions about this that's
07:34still running now the latency is
07:36definitely higher because it's very very
07:37large context but that's fine we have a
07:40little bit of time and we can go over to
07:42Lang Smith Al this is running and have a
07:43look so we can see here was our prompt
07:46okay so there you go look at this 63,000
07:49tokens you can see it's a lot of context
07:51um that's fine and we can actually see
07:53it all here so it's in Langs Smith um we
07:56don't want to scroll through all that
07:57mess but you can see we've asked a
08:00question we're grounding the response in
08:02all this L cell docs and we're going to
08:05hopefully output the response as a
08:07pedantic object that we can play with so
08:09let's just see and okay nice it's done
08:12so you can see our object here has a
08:15prefix um and it actually has um it's
08:21also going to have our Imports here as
08:23well we can actually can see that in
08:26lsmith uh the answer is going to be here
08:29and there you go see your Imports your
08:32code and your prefix and these can all
08:34be extracted from that object uh really
08:37easily um so it's basically a list and
08:39it's a pantic object code and you can
08:41extract each one just like Co you know
08:44answer. prefix answer do uh whatever our
08:47variables or whatever our keys are
08:49answer. Imports answer. code so that's
08:51great so that just shows you how tool
08:54use Works um and how we can get the
08:56structured output out of our generation
09:00now what I'm going to do here is now
09:03that we've established that we can do
09:04that I'm going to start setting up our
09:07graph and what I'm going to do is first
09:09I'm going to find our graph state so
09:11this is just going to be a dictionary
09:13which contains things relevant to our
09:14problem it'll contain our code solution
09:16it'll contain any errors and that's all
09:20need and here is where this is all the
09:23code related to My Graph and we're going
09:25to walk through this so don't worry too
09:26much but I just want to kind of get this
09:29so here's our code now if we go up the
09:32way to think about this is simply this
09:35um I want to go back to my diagram here
09:38so every node in our graph just has a
09:40corresponding function and that function
09:43modifies the state in some way so what's
09:45happening is our generation node is
09:48going to be working with question and
09:51iteration those are the parts of state
09:53that we want as like inputs you can see
09:55it kind of maps to here you have
09:56question and you have iteration just
09:58counts how many times you've tried this
10:00we'll see why that's interesting
10:02later um this is exactly what we saw
10:06before data model llm tool
10:10use all the same stuff right template
10:13now here's where it's
10:14interesting if our state contains an
10:17error this error key what that means is
10:21we've fed back from some of our tests
10:24and we have an error that's already been
10:25generated so we're retrying here's why
10:29interesting if we're
10:31retrying what we're going to do is
10:33append our prompt just like we saw above
10:35we're going to add something to our
10:36prompt that says hey you tried this
10:38before here was your solution we saved
10:41that as generation key um and in our
10:46states you can see it's right here code
10:47solution generation here is your error
10:51please retry to answer this so it's kind
10:52of like inducing a reflection based on
10:59error and retry so that's a very
11:02important point because basically gives
11:03us that feedback from if there's a
11:05mistake and either the Imports or the
11:07executions we're feeding that back to
11:08generation and generation is going to
11:10retry with that information present so
11:12that's that's all that's happening there
11:15um and we're basically adding that to
11:16the prompt um and we're then invoking
11:19the chain with that error and then we're
11:20getting a new code solution so again
11:22that's if errors in our state dick if it
11:26isn't then we're going to go ahead and
11:28generate our solution just like we did
11:33easy um one little thing is every time
11:37we return the the basically we're going
11:39to rewrite that output to the state
11:41we're going to increment our iterations
11:42and say Here's how many times we've
11:44tried to answer this question that's
11:45really it and you can see that's all we
11:48do return the generation return the
11:50question return the number of iterations
11:54easy now here's what's kind of nice we
11:56talked about having these two import
11:58checks the check for imports to check
11:59for execution let's our check import
12:02node just going to be really simple we
12:05solution from the solution we can get
12:07the Imports out just like we showed
12:09above this code solution Imports is from
12:11our pantic object um I'll move it over
12:14so you can kind of see so a pantic
12:16object has Imports we can get the
12:18Imports and all we do is just attempt to
12:22execute the Imports if it fails we alert
12:25hey import check failed and here's the
12:28key point we're just going to create a
12:29new key uh error in our dict identifying
12:34that hey there's an error present um
12:37something failed here and you'll see
12:40we're going to use that later now one
12:41other little trick if there was a prior
12:44error in our state we're just going to
12:46pend it so we do want to kind of
12:47maintain that um if there's like an
12:49accumulation of errors as we run
12:51multiple iterations we want to keep
12:53accumulating them so we don't like
12:55revert and make the same mistake we
12:57already did on a future iteration so
12:59we're going to maintain our set of
13:01Errors now if there's no error here then
13:04we're going to rewrite none so we're
13:06going to say we're good keep going uh
13:09and basically the same thing with code
13:10execution right in that case we're just
13:12extracting our code and our Imports we
13:15create a code block of imports Plus Code
13:17try to execute it again if it fails
13:21write our error and append all prior
13:23errors if it doesn't return none that's
13:26it that's all you really need to know
13:29now here note that we're going to do two
13:31kind of gates so we want to know if did
13:33either of those tests fail and again all
13:36we need to do is we can uh grab our
13:39error and remember if there is no error
13:44then if error is none keep going so here
13:47we're at the code execution like
13:50decision point so do you want to go to
13:51code execution or do you want to like
13:53revert back and retry so you can see
13:56here if there's no error when we get to
13:58the this point um then because we've
14:01done our import check if there's no
14:03error there keep going go to code
14:05execution we can see we return this node
14:07we want to go to um and if there is an
14:10error we can say hey return to the
14:12generate node so really what these
14:14functions do so these are conditional
14:17edges what these do is they do some kind
14:20of conditional check based upon like our
14:22output state so again if there's an
14:25error or not if there's no error it
14:28tells you go to this node if there is an
14:31error it tells you go back to generate
14:33node that's it same deal with deciding
14:35to finish again if there's no error and
14:39now here's the iteration thing for the
14:42sake of Simplicity what I say is give it
14:45three tries I don't want it to run
14:47arbitrarily long uh if there's no error
14:50or if you try three times just end
14:53that's it uh otherwise go back to
14:56generate so again same kind of thing
14:58decide to finished based upon um yeah
15:02based upon whether or not there's an
15:04error in our code execution or not
15:06that's really it that's all we're doing
15:08so we can go down we already grabbed all
15:11this now here is where we actually
15:13Define our what we call our
15:16workflow um and so this is actually
15:20where we defined all our edges and nodes
15:22as these functions and here's just where
15:23we kind of stitch them all together um
15:26so it's actually pretty straightforward
15:27it just follows exactly like the diagram
15:29we showed above um we like we're
15:31basically adding all of our nodes and
15:33we're building our graph following the
15:36diagram that we show so we can go back
15:39to the diagram so like you can kind of
15:41follow along right set your entry point
15:43it's generate add an edge generate check
15:47code Imports now our conditional Edge um
15:51so if we're going to decide to check
15:54code execution that was our function
15:59so if um basically depending on the
16:03output here we can decide the next node
16:05to go to so um if the output of the
16:08function says check code
16:10execution we go to that node if the
16:12output says go to generate we go back to
16:14generate so these are where you specify
16:17the logic of the next node you want to
16:18go to and same here so that's all we do
16:22compile it done and just map to this
16:26diagram um kind of like one to one so
16:29that's actually pretty
16:31straightforward and there's just one
16:33little thing we now need to do uh we are
16:41question so here's a question I've I've
16:44run a bunch of these tests already this
16:46is a question that seems kind of random
16:48but it's like we actually built a NE
16:49valve set and so it's a question that we
16:51we've sound that there's some problems
16:52with so I want to show you why this is
16:53pretty cool I'm passing it text Key Food
16:56in my prompt I want to process it with
16:57some function process text how do I do
17:00this using uh line transpression
17:02language so it's a weird question but
17:03you'll see why it's kind of fun in a
17:05short in a little bit and what I'm going
17:07to do is I'm just going to run my graph
17:09so what we can see because we print out
17:11what happens at every step we're can to
17:13kind of follow along and see what's
17:16here um so it's going to generate
17:18solution now we can see this may take a
17:20little bit because it's the same kind of
17:22long context generation that we saw
17:23previously so this is now running we can
17:26go to Lang Smith and we can actually
17:27just check this Lang graph and we can
17:29see okay so it's loading up and we're at
17:32generate so it's actually doing this
17:34generation this is still pending here is
17:36all our input docs so you can see that
17:38um that uh you know we passed this very
17:42large context to our LM uh so that's
17:45cool okay so here's this is interesting
17:48so what's happening is it's going
17:49through some checks so um the code
17:52import check worked decide to check code
17:55execution a decision testing code
17:59here's an interesting one code block
18:02failed um decision retry so it's
18:10so okay let's see it looks like it came
18:16to an answer um let's actually go and
18:21look at what happened in our Lang graph
18:23to kind of understand what happened so
18:26what happened if we look at the
18:31um let's look at when we
18:35attempted yeah exactly so here let me
18:39actually pull up the error
18:47response um and what I want to show you
18:51is the error that we appended to our
18:59and we can actually make this a bit
19:02scroll this is the Crux of what I want
19:04to show you um okay here it is so what's
19:10cool is our initial attempt to solve
19:13this problem introduced an error there
19:16was an execution error it unsupported
19:19Opera for types dict and string so
19:21basically it did something wrong and we
19:25passed that in the prompt to the llm
19:29when it performs this retry so the our
19:31initial solution was here and it had a a
19:34coding error as noted but here you can
19:39see we provide that error and we say
19:42please try to reans this structure you
19:44know like the same instructions before
19:46here was uh here was the the question
19:50and we can see okay so this is actually
19:54the test of code execution which now
19:56works we can see previously when we
19:59tried this uh it fails and this was the
20:02error that error was passed along in the
20:04prompt like we just saw the new the new
20:07test indeed Works our final solution is
20:12code that's it so you can kind of get
20:14some intuition for the fact that when
20:16you have this retry Lube you can recover
20:18from errors using a little bit of
20:20reflection that's really the big idea um
20:23and again you get your answer out
20:25here um and so there's a bunch of keys
20:29and we don't necessarily I'll show you
20:32quickly uh keys and then we can just
20:35look at the generation
20:39cool and it's going to be a list so
20:42let's just break it out so there it is
20:43there's our code object we can see
20:46prefix okay so there's the prefix
20:51import uh and let's try code and hey
20:56let's just convince ourselves this
20:57actually works so we can just run exe
21:04exec the code and this should work it's
21:08doing something there it tells a joke
21:10great um so this is pretty cool it
21:13initially when to try to answer this
21:14question produce an error and it then
21:17retry by passing that error back to the
21:19context just like we outlined in our
21:22graph and on the second try it gets it
21:24correct so that's nice it's a good
21:26example of how you can do this feedback
21:27and and reflection stuff now we actually
21:31have done quite a bit more work on this
21:34so I built an eval set of 20 Questions
21:36related to Lang and expression language
21:39and evaluated them all uh using this
21:43approach relative to not using Lang
21:46graph and here's the results I want to
21:48kind of draw your attention to this
21:49because it's a pretty interesting result
21:51for the import check without Lang graph
21:54versus with Lang graph it's about the
21:56same so Imports weren't really a problem
21:59before this like retry reflection stuff
22:02Imports were okay on oural set of 20
22:05questions I should make a note we
22:07actually ran this uh this is showing
22:09standard errors we ran this four times
22:11and so I basically accumulate the
22:12results I compute standard errors you
22:13can see that there's there's some degree
22:15of statistical reasonable inness to
22:17these results um in any case import
22:20checks were were kind of fine without it
22:23but here's a big difference there's a
22:25big difference in our code execution uh
22:27per performance with and without land
22:29graph so before land graph if you just
22:30try like kind of single shot answer
22:32generation a lot of the times this was
22:34like a 55% success rate many of the
22:37cases we saw code execution fail but
22:40with Lang graph with this kind of retry
22:42and reflection stuff we actually can see
22:45that the the success rate goes up to
22:47around I believe it was 80% so it's
22:50around like a almost a 50% Improvement
22:52in performance um with and without using
22:55L graph so that was actually really
22:57impressive and and it just shows the
22:58power of like a very simple idea um
23:01attempting code generation with these
23:03kinds of like just very simple checks
23:06and reflection can significantly bump up
23:08your performance and again the alpha
23:11codium paper shows this in like a very
23:12sophisticated context but what's cool is
23:15this is like a very simple idea you can
23:17imp Implement yourself in not much time
23:20um and we have this all available as a
23:22notebook you can run this on any piece
23:25of code you want so just take whatever
23:27documents want here Plum them in and you
23:29can run this and you can test this out
23:31for yourself but I've been really
23:33impressed I think it's pretty cool um
23:36and in general I think lra's a really
23:38nice way to build these kind of like
23:40reflective or self-reflective
23:41applications where you can build these
23:43feedback loops to you can do a check if
23:47the check fails try again with that Fe
23:50with that uh feedback
23:52present um in the retry and I'll just
23:56show you we have a Blog coming out I'm
23:57not sure there's anything else in that
23:58blog I haven't already showed you
24:02um yeah not nothing really to highlight
24:05this was our results again um this is
24:07maybe a little bit clearer to see um but
24:10again pretty significant Improvement in
24:12performance simple idea uh I definitely
24:15encourage you to experiment with this um
24:17and of course all this code will be
24:18available for you so um uh you know feel
24:21free to experiment and let us know how