00:22hey Chris you hear about the new LL
00:26v0.10 release in that new llama parse
00:29Library you hear about this yes I sure
00:32did hear about that Greg yeah man it it
00:35says that it can actually parse embedded
00:39tables and figures you ready to check
00:42this out today and see if it does what
00:45it says on the 10 I did hear it does
00:48that yeah and absolutely I can't wait to
00:50dig in yeah man well let's get right
00:52into it today we'll see you back for the
00:55results and conclusions on exactly what
00:58this thing is doing for us welcome
01:01everybody my name's Greg that's Chris
01:04AKA Dr Greg in The Whiz we are
01:06co-founders of AI maker space and today
01:09we're going to look at one of the newest
01:11tools to hit the opsource AI llm
01:16Builders Market llam parse and we want
01:20to take a close look at this and see if
01:21it actually improves on the rag for
01:26complex PDFs that we've looked at
01:29previously and that the entire industry
01:32really all Industries will continue to
01:34look at if you've got questions that pop
01:36up throughout today's demo please drop
01:38them in the slido link that we'll throw
01:42now but with that let's go ahead and get
01:45right into it we want to kind of cover a
01:47few things because there's been a lot of
01:49new stuff released from llama index
01:51including llama parse but also even a
01:54little bit more so we're going to see if
01:56all this comes together to give us
01:58really a superior production grade rag
02:01experience as with all sessions we're
02:04going to align our aim today and try to
02:06figure out exactly what you're going to
02:08get if you stick around with us for the
02:09hour we're going to do an overview of
02:13v0.10 we're going to understand llama
02:16pars performance on embedded tables and
02:19figures this is what we set out to try
02:21to take a really close look at and we're
02:23going to see exactly how to build a
02:27query engine using lava par for your
02:30documents that you can leverage in your
02:32rag applications so first we're going to
02:34go ahead and check out llama index
02:38v0.10 and llama parse we're gonna sort
02:42review llama index rag just to
02:45contextualize this a little bit a lot of
02:48the docs have been changing with a lot
02:49of these tools and we want to kind of
02:51keep you updated on the latest and
02:52greatest in the way people are
02:54communicating these tools and their
02:56capabilities so of course uh we start
02:58with the new release llama Index
03:02v0.10 this is the sort of Next Step
03:05along with the Llama Cloud platform that
03:07was released towards making llama index
03:09a real Next Generation production ready
03:11and we still see this key word again
03:13here data framework for llm
03:17applications similar to other new
03:20releases we've seen we see that llama
03:25core that it's going to contain the main
03:30that we have talked about previously and
03:33that many of you are probably familiar
03:35with already and the separation between
03:39the core constructs and the third party
03:43Integrations is the key aspect of the
03:47Hub Additionally the service context
03:51object that if you've been building with
03:53llama index you're familiar with has
03:58cumbersome over time time and
04:00increasingly difficult to use it was
04:03meant to be used as an intermediate user
04:06facing layer uh to let you sort of
04:08Define parameters but it's it's sort of
04:11um it's sort of become not really the
04:14best solution given this new core llama
04:19Hub differentiation so service context
04:22is no longer going to be part of your
04:25build if you start upgrading to the new
04:28version and of course the number of
04:30thirdparty Integrations is
04:32growing uh lots and lots and lots of
04:34them um many hundreds at this point
04:37which is really really cool so you know
04:39this is kind of from their blog dropped
04:42here the core package underlies and then
04:45we have Integrations we have all the
04:47Llama packs and then there's some
04:49experimental and fine-tuning stuff
04:50happening as well so keep an eye out for
04:53that but the the sort of the big
04:55takeaways from V 0.10 are the service
04:57context removal and the core versus the
05:04right so let's talk a little bit about
05:12general they've updated all of their
05:15docs as they came out with
05:17v0.10 but they still are very much a
05:20data framework this is something that's
05:22unique in the industry and they're
05:26focused on helping you build llm
05:30that can benefit from quote context
05:35augmentation this is the big idea behind
05:38llama index context augmentation let's
05:40demystify this for a second let's
05:43demystify this idea of context
05:47augmentation what are we talking about
05:49we're talking about augmenting the
05:52prompt in the context window of the
05:56llm that's it we're talking about rag
06:00that's what we're talking
06:02about why rag well rag because we don't
06:05like confident responses that are false
06:10hallucinations fake news nobody likes it
06:14we need to be able to fact check with
06:17reference material that we can add to
06:20our prompt augment our prompt with it
06:24and then we can generate better
06:25answers now we we talk about context
06:28augment a one way to think about
06:32rag that we've been communicating to our
06:36audience and that we encourage everybody
06:38to sort of break down into its core
06:40component pieces is as dense Vector
06:44retrieval Plus in context learning so we
06:48sort of see this context augmentation
06:51here so we're going to kind of walk
06:54through when you ask a question we're
06:57going to send that question to an
06:58embedding model it's going to create a
07:02vector format of that question after the
07:06tokenization and the embedding process
07:08we're going to then look in our Vector
07:10store our Vector store going be made up
07:11of our documents and we're going to look
07:14for similar stuff to the question using
07:17a simple similarity metric we'll set up
07:19a prompt template that we can use to
07:23context and it'll say something like use
07:25the provided context to answer the
07:27user's query uh don't answer if you
07:31know and we can then take the materials
07:35that we find that are similar we can
07:37shove those into the prompt context
07:42window this process is dense Vector
07:46retrieval right we're simply using a
07:49dense Vector representation or sometimes
07:51it's a sparse dense right sometimes it's
07:52a little more computationally efficient
07:54if you're using something like pine cone
07:55that's out of the box but let's say it's
07:57dense Vector retrieval in general in a
07:59naive way and we're returning natural
08:03prompt now as we set up our prompt
08:06template and we're giving it more
08:11context this is the in context learning
08:13piece this is the big idea from the gpt3
08:17paper language models are F shot
08:19Learners now together these two things
08:25rag these are context augmentation the
08:29the important thing to note about this
08:32setup is that it is completely
08:37of the llm that you put it all into when
08:44done and then the llm provides the
08:47response in the end so we're augmenting
08:51the context window we augmenting The
08:56Prompt so why are we doing this
09:00well we're doing this because when we
09:03prototype we want to make
09:05sure that we're going through the same
09:08industry standard process that everybody
09:11goes through we start with
09:13prompting we move to
09:17Rag and then generally we're thinking
09:19about fine-tuning it's not always linear
09:24is and this sort of mental model shown
09:30is to start with prompt
09:32engineering you can think about
09:35optimizing the context what the model
09:39needs to know through rag you can also
09:44optimizing the llm the way the model
09:47needs to act through
09:51fine-tuning eventually you'll probably
09:53do both end up fine-tuning both your
09:56embedding model and your chat model in
09:59as you try to reach human level
10:03performance in your application here's
10:06an example from open AI
10:10devday so generally we're
10:14seeing a order of operations where we're
10:20fine-tuning this is generally a cheaper
10:25First Step it's going to be easier to
10:28update with the latest information and
10:30it's going to give us that fact
10:38so what you can take away from this is
10:41that again rag is context
10:45augmentation rag is context
10:52augmentation and note this is all
10:54independent of the llm so the cool thing
10:56about llama index as a data framework is
10:59that llama index and rag in general they
11:01really pose no restriction on how you
11:03use the ls you can of course go to fine
11:05tuning you can put different things in
11:09to different llms but it's really really
11:11focused on the data piece on being data
11:14Centric right because the data Centric
11:16Paradigm hasn't gone
11:18anywhere and as we heard in L index's
11:21v0.10 release rag is only as good as
11:26data and I like the sort of Al model the
11:29framework that they provided in that
11:35blog of the rag data
11:39stack that's different from classic
11:42ETL we're going to load the language
11:44data we're going to process the language
11:45data we're going to embed the language
11:47data and then we set up our Vector
11:50DB what's interesting about this is that
11:55data we're chunking we're tokenizing
12:00as we're processing we're deciding on
12:02chunk sizes we're we're trying to figure
12:05out do we need any sort of uh meta data
12:07or hierarchy or or how are we exactly
12:10setting up the way to think about this
12:14either short form or long form as we do
12:17embeddings there are obviously many
12:18different embedding models you can
12:20fine-tune them as well and then the
12:22actual Vector database setup or the
12:25setup of many indices many Vector
12:28databases and how to exactly move
12:30between them is sort of an art unto
12:33itself especially depending on your
12:37so in contrast to Classic ETL all of the
12:41decisions that were taking in the rag
12:45stack they affect the ultimate
12:51application and the key pain points for
12:54people building today this is again from
12:56llama index which generally we
12:59absolutely agree with this is what we're
13:01hearing from folks out there in the
13:03marketplace as well is that results are
13:05sort of not right they're not accurate
13:07they're not good enough a lot of the
13:10time and there's just too many things to
13:13think about from chunk sizing to hybrid
13:16retrieval to exactly which model to if I
13:19should fine tune to all of this stuff
13:21and it's just a lot to deal with and
13:23then of course everybody has just
13:25ridiculous amounts of portable document
13:27format PDFs sitting around that they'd
13:30love to be able to use and PDFs are
13:34famously hard for us as humans to deal
13:36with and they've been something that's
13:39for llms to deal with as well so that's
13:43where we are today as
13:46we try to tackle llama parse we're
13:49looking at this not accurate too many
13:52parameters and PDFs these pain points
13:54the data syncing issue is sort of this
13:58sort of live data as it's changing and
13:59updating uh this separate issue we're
14:01not going to cover this
14:02today so let's talk about llama index
14:07second we've covered this previously
14:11we're going to link a few events to you
14:13guys where we have covered it but we
14:17essentially have a way to ingest data
14:20those are data connectors we have a way
14:23to structure the data those are indices
14:25those are vector databases which is the
14:28simplest type of of index and then there
14:31are different engines we're going to
14:33build a query engine today and as many
14:35of you have heard us say before query
14:38engines are to llama index as sort of
14:41the chains are to Lang chain so the
14:42query engine is really at the heart of
14:46llama index now we're seeing this sort
14:49of chat engine emerge from llama index
14:51as well which is cool and that is going
14:54to dovetail directly into this idea of
14:56agents and you know the data frame work
15:00agents because with the chat interface
15:02you're able to sort of go back and forth
15:04a little bit better as you engage with
15:08and in interact with your applications
15:13reasoning and different cycles of
15:20making again if you would like to know
15:22more about the constructs we've covered
15:24this in previous events and sort of
15:26digging in deeper to the core llama
15:28index package that has been sort of put
15:31together in v0.10 definitely check these
15:33out but that's enough a background for
15:37today right we're here to talk about
15:38lamap pars and it's in public preview
15:40mode and we want to understand exactly
15:46doing what it's doing well what it's not
15:49doing so well and what to expect in the
15:51future and really llama
15:55parse at the highest level it's
15:59and it's a parsing algorithm for
16:01documents that have embedded
16:03objects so we read that it had embedded
16:06table and figure capability we wanted to
16:10out it's also allowing us to build
16:13retrieval over more complex documents
16:16sort of semi-structured meaning they
16:18have tabular and unstructured data
16:21meaning it's like tables and language
16:23data and then this is all sort of in the
16:27spirit of going toward towards
16:29production grade context
16:34augmentation now lamap par is built on
16:40retrieval algorithms and work that llama
16:44index has done previously so it's sort
16:49Step that step is to parse out tables
16:53and text in markdown
16:56format because they've built a lot of
16:58tools already that integrates very very
17:01well with that markdown
17:08can build more complex R systems with
17:11more complex data now the sort
17:15of Flagship example here from their
17:18release is the Apple 10K filings and
17:22they did a comparison of llap pars
17:24versus Pi PDF over these 10K filings
17:29they also compared Pi M PDF textract and
17:31PDF minor uh you'll notice that the red
17:34is uh where the information was not
17:37extracted very well and so lot of red um
17:41this was the least red amongst all of
17:45the comparisons the pi PDF so it was the
17:48second best and then you'll notice there
17:49are a few red pieces in the LL pars one
17:53I know you probably have to squint to
17:54look at this but like here and here so
17:57it's not it's not per perfect today but
18:00it is an improvement over the standard
18:07into our testing so what we did is we
18:11said okay well we want to test out if
18:14this can kind of work on of course the
18:16classics right so we we picked up an
18:18Nvidia 10K filing we were also very
18:21interested like could this work on
18:23infographics could this work on more
18:26complex figures that were embedded in
18:31documents first I was throwing out a
18:33couple of image ideas but but images
18:35isn't really the same as PDFs so we
18:39found not just the Nvidia filings but we
18:41found a great sort of related in in many
18:44ways in sort of meta ways to what we're
18:47doing now ai and the future of teaching
18:50and learning from the office of
18:52educational technology May 2023 it's a
18:56long PDF document 7 plus Pages the
18:59Nvidia 10K filing is 90 plus pages so
19:02long chunky documents and uh you know
19:05lots of sort of infographic esque
19:08figures we're wondering you know can it
19:11extract the text can it extract the
19:13numbers what's going on now um drum roll
19:17please well here's what we found the in
19:20conclusion and we'll sort of walk you
19:22through how we got here but we'll give
19:24you the conclusions kind of up front
19:26here is that when it came to parsing
19:29there was very inconsistent speed and
19:32especially with the recursive retriever
19:33that we built which was the recommended
19:35one that that they uh that they asked
19:38that they recommended
19:41um it took actually minutes to run
19:45requests now the tabular extraction the
19:48tabular data was very good and when it
19:52worked it worked very very very well so
19:56uh definitely that was the The Shining
19:59highlight although there was no figure
20:02extraction and I think this was
20:06perhaps something that we we read but
20:11then as we double clicked in and looked
20:14specifically at the release
20:16blogs uh we noticed that they actually
20:19still in the process of building out
20:20better support for figures forther
20:23document types and of course this is the
20:25natural progression right and and then
20:28you get into this figure space you're
20:29kind of getting into the image space so
20:31I can imagine how challenging of a
20:34problem this is and of course this is
20:37the kind of feedback that they're
20:38getting from lots of folks and I'm sure
20:41that you're very interested in the day
20:43when we can do figure extraction but
20:45that day is still not
20:48today so how did we figure this out well
20:51let's go through it specifically and
20:54exactly how llama parse is working on
20:58end we used some simple models we used
21:02open AI text embedding three small model
21:04the latest and greatest from them but
21:06the small we used open ai's gbt 3.5
21:10turbo we built a recursive query
21:15engine to their recommended we use the B
21:19ba AI BGE ranker large this is kind of
21:24the old two-step right you uh do
21:26embedding based retrieval get the docs
21:32them and that's it so you know we had
21:36the Nvidia 10K filings we had the
21:38department of Ed we used open AI models
21:42we used the recommended recursive
21:43retriever we use llama parse we use
21:47v0.10 and with that I'm going kick it
21:51over to the whz to show you exactly how
21:53this looks in code and give you a little
21:56more nuanced Vibes and information
22:00related to what you might be able to
22:01expect in your application whiz over to
22:05man thank you Greg yes okay so we're
22:08going to go ahead and drop this notebook
22:10into the chat so you can follow along
22:13and uh we're just going to go through a
22:15couple things we're g to start with a a
22:18a straightforward kind of uh you know
22:21portion of this which is getting actual
22:24uh you know llama pars to work uh and
22:27then we'll move on to uh creating that
22:30uh those retrieval pipelines that we saw
22:32those query engines that Greg was
22:34describing so uh first things first the
22:38Llama parse release comes along with
22:40llama Cloud llama cloud is has more than
22:44just llama parse but for right now
22:45that's what we're going to leverage it
22:47uh for uh basically llama parse is
22:50exactly as uh was described right it is
22:55proprietary uh algorithm that they're
22:57using it is behind an API so we don't
23:00have any access to exactly what's
23:03happening uh behind the scenes we we can
23:05kind of infer what might be happening
23:07but the idea is that it's an API that
23:10accepts PDFs and it returns documents
23:13and it can return those documents in
23:15multiple formats one of those formats is
23:18markdown and the power of having it
23:21return uh markdown is that markdown can
23:24help us capture these kinds of you know
23:26structural relationships
23:28within our documentation we can use
23:31llama index's markdown node parser from
23:34there to help us really understand
23:36what's uh what's going on in these
23:38documents so that's great we also have
23:43v0.1 uh zero so this is v0.10 huge
23:48basically you know it is exactly the
23:51same kind of thing we've seen uh
23:52recently from uh other libraries
23:55including Lang chain which is this idea
23:57of things were getting kind of you know
23:59bloated right we have a lot of different
24:02uh we have a lot of different you know
24:05possibilities a lot of different things
24:07we can do uh and they were kind of
24:09glutting up the core library and so
24:11those were split uh effectively llama
24:14Hub is now the source of Truth for
24:16everything community and integration and
24:19then llama core focuses on just what
24:21llama uh index is supposed to do which
24:24is awesome so how do we use the actual
24:27cool well first of all we're going to
24:29need some uh you know we're going to
24:32need some dependencies so we're going to
24:33grab llama parse uh from our
24:35dependencies uh and then we need to
24:37create a llama API key we're going to do
24:40this through the Llama Cloud so I'll
24:42I'll just zoom in some here so you guys
24:43can see it basically when you arrive on
24:45this llama index page we can go to our
24:49resources in the bottom left which is
24:50our API key and then we can generate a
24:53new key give it a name and then uh store
24:55it somewhere safe so this is easy easy
24:57as it gets you'll also notice I'll zoom
25:00in a lot here so you can see real
25:01clearly you'll also notice that you get
25:03quite a few pages per day that you can
25:06use with the PDF parser and you'll also
25:09notice that this is a PDF parser so
25:12right now this is only something that
25:13works with PDFs uh there's no other file
25:17types that are currently accepted uh and
25:19as well we can we can see when we look
25:22through the uh the code uh that there's
25:24only two return types which is either
25:28or markdown but you get 10,000 pages per
25:32day which is pretty awesome uh so let's
25:34head back to the notebook once we have
25:36our API key we can provide it here uh
25:39we'll also be using open AI so we're
25:40going to slap our open AI key in here
25:43and then we need to do this classic uh
25:46you know uh cheat code for Google collab
25:50we're going to be using asynchronous uh
25:53functions here uh you know there is no
25:58other uh way around this we just need to
26:01run it like boilerplate if you're
26:02running this in a notebook um and that's
26:05correct yeah absolutely because it only
26:08accepts PDFs unless your file is a PDF
26:11you're going to have to find a way to
26:12get it into a PDF uh luckily a lot of
26:15files are already PDFs or they're easily
26:17converted so uh it should it should be
26:21lower burden to convert a file to PDF
26:25than it is to convert from PDF and
26:27that's the problem that this tool is
26:29solving so the next thing we're going to
26:31do is just initialize llam parse uh
26:34couldn't be easier right uh we set up
26:36our object we're going to say we want it
26:38in markdown this is because we're very
26:40keen on that uh that structural
26:43relationship I'll just zoom in a little
26:44bit here um you know we really want to
26:47know what the structured data is saying
26:49and we want a way to interface with it
26:52uh that preserves that structure right
26:54so this plain text idea doesn't really
26:56help us do that as well whereas markdown
26:58gives us notation that we can use with
27:00the markdown node parser to uh to
27:03understand structural relationships
27:05remot equals true or false it's up to
27:07you how much text you want to read uh
27:09language there's a number of languages
27:11that are supported by default we're
27:13going to use English and uh in this
27:15example we also be using English and
27:17then of course we have a number of
27:19workers we are going to uh we're going
27:23to go ahead and set two workers because
27:25we're going to parse two f right so this
27:28is the idea uh you can have up to 10
27:31workers at a time so you can do this
27:33kind of in batched sets of 10 uh once
27:36we've done that we're going to upload
27:38some files to our collab instance pay uh
27:41close attention to um you know what
27:44you're saving these as right so this is
27:47the uh file that you'll need to uh send
27:50to laa parts right so if I look in my
27:52files here you can see I have uh done
27:55this process twice so I have two
27:57different vers versions of the files I
27:58need to make sure that I have uh the
28:04actual correct file right when I go to
28:07send it to lamap par so please pay very
28:09special attention to this if you've
28:11named your file something else you'll
28:13have to take the name right from here in
28:15order for this to work uh if you're
28:17watching this in the future um and we do
28:20the same thing for the AI report uh
28:22which Greg alluded to earlier that's the
28:24kind of artificial intelligence report
28:27pretty cool document right it's got
28:28sweet graphs got sweet figures we're
28:31going to see how well llam par Stacks
28:34up the next part is the actual parsing
28:37this part's totally opaque to us uh we
28:39just send the file to an endpoint and
28:41then at some point we get back a
28:44response and that response is uh
28:46documents that can be parsed easily
28:49through uh llama index as they're
28:51obviously tightly integrated you don't
28:53have to use llama Index right the
28:54documents are just uh marked down right
28:57so you you can use whatever you'd like
29:00past this step but uh obviously llama
29:02index is paying special attention to
29:04their ecosystem so that's what we're
29:06going to stay in today uh again only PDF
29:09files this is also a very inconsistent
29:12process I've found so sometimes this can
29:15take a very long time uh sometimes it
29:17doesn't take very long at all right um
29:20once you've done it for a file I have
29:23found that there's there's likely some
29:25kind of backend caching here because
29:27each subsequent or repeated uh attempt
29:30to to Paras these files is very quick
29:33but that first time is very inconsistent
29:37uh the AI report took quite a a long
29:39time to finish whereas the uh Nvidia 10K
29:42filing took much less time so it it's uh
29:46you know your mileage may vary I would
29:47not build this into a a latency critical
29:51application at this time but for offline
29:53or batch processing that seems super
29:55dope um you know you classic we start a
29:58job and then at some point we return
30:01into this documents uh which is going to
30:03be a list and that list is going to have
30:07objects we can take a peek at them and
30:10see that this markdown most assuredly
30:12preserves some context right we we
30:14there's there's no doubt about that this
30:17is a this is a table structure in
30:19markdown we we definitely have some idea
30:22of structure that's being preserved uh
30:24which is very important and desired
30:27right so that's that first of all that's
30:29that's awesome to see uh we can also
30:31look at our AI report which is it has uh
30:35I I believe literally zero tables uh
30:38there I think there's there's a couple
30:39at the end that are very kind of simple
30:41but um you can see it's still correctly
30:44identifies markdown right I mean this is
30:47uh this is important um the idea of the
30:51uh the markdown being preserved is huge
30:55uh and from chat Matt is suggesting well
30:58you know we also get HTML or CSV uh
31:02earnings or filings and that's
31:04absolutely true and uh if if you're if
31:07that's the information you're looking
31:08for I think that's probably going to be
31:11uh best but if you're looking for a
31:13combination of that semantic information
31:15and the uh and the actual uh you know
31:19structured data I do think that this is
31:21an excellent uh resource or in cases
31:24where you're looking at reports that
31:25don't have those uh examples provided as
31:28well of which there are uh unfortunately
31:32uh quite a lot so we can we can see it
31:35does the thing right I
31:37mean it it gives you markdown and the
31:39markdown can be leveraged to understand
31:42some kind of structure about the
31:43document right so that's that's that's
31:45great uh let's build a query engine to
31:49see if this is actually useful right so
31:52uh we can see that it yes this is marked
31:54down but can this be leveraged usefully
31:57uh so the first thing we're going to do
31:58is talk a little bit about llama index
32:01v0.10 now if you're used to llama index
32:03and you've watched some of our previous
32:05events on llama index you'll remember me
32:07talking about service context Global
32:09context setting context context all of
32:11this right gone it's all gone uh you
32:13know context is dead we're all about
32:15settings now so we still have this idea
32:17of like a global settings object that we
32:20can set so that we can fall back on
32:22specific user set defaults uh but we
32:25don't have this idea of a service
32:27context that we need to pass around and
32:29manage instead we're just going to do
32:31that how you'd expect normally by
32:32passing things into their their
32:34Constructor so we're we're gonna really
32:37uh this is this is probably one of my
32:39favorite changes from the V 0.10 uh just
32:43kind of normalizing that uh this library
32:46to the rest of the ecosystem feels
32:48really good but we still have settings
32:50so we're going to set our base llm as
32:52gbt 35 turbo and we're going to set our
32:54open AI embeddings uh as text embedding
32:58three small which is the uh the
33:01successor to Ada uh it is just as good
33:04or better than Ada and it costs less so
33:07that's why we're using small today um
33:10you'll notice that we're using gbt 35
33:12turbo so we're really not we're really
33:15not relying on the lm's ability to
33:18understand the structure right we're
33:19really relying on the retrieval
33:21processes uh ability to to uh to
33:24correctly represent that structure uh
33:27which is something that's a little bit
33:29different right if we used gbt 4 here we
33:36potential potential to say like oh well
33:38gbt 4 is just really good at this
33:40actually guys like uh I don't know it's
33:43uh it's it's it's just gbd 35 is good
33:46we're gonna stick with
33:48it and then we're gonna use the markdown
33:50element parser uh this is the thing
33:53that's going to uh you know really make
33:57sure that we're uh you know we're
34:01squeezing as much juice as we can from
34:03these markdown files right the markdown
34:06element node parser is specifically
34:08built to parse markdown elements right
34:11so uh we can see that we have this this
34:14entire idea of how to uh to use it their
34:18docks are are pretty good on this and
34:20the idea is simply that this is going to
34:22help us parse out this markdown into the
34:26constituent Parts uh so those cons
34:29constituent parts are uh you know what
34:32allows us to understand the structured
34:35versus unstructured nature of the data
34:37right so uh we want both we want
34:39semantic information to answer questions
34:42about semantic questions and we want
34:43structured data to answer uh those those
34:46kinds of semantic questions that rely on
34:49some context that's contained within uh
34:51tables or figures right uh all we got to
34:54do is run this um when we run the actual
34:58get nodes from documents you'll notice
35:00that it pretty frequently fails uh this
35:04does not mean that the nodes are not
35:06created and this does not mean that the
35:08actual total uh you know process fails
35:11just means that sometimes it's not able
35:13to understand the markdown that it
35:15received from the llap parse endpoint
35:18and so it you know we we see some errors
35:20it's not a big deal uh you know this is
35:23exactly as Greg showed right we're
35:26dealing with a a document that's quite
35:28long uh in both cases right so um the
35:32idea that there's going to be a few
35:34misses is totally expected this is a
35:37preview their first shot on goal but it
35:40is worth noting that there is some
35:43potential that you're you're going to
35:44miss right and when you do Miss uh means
35:47that we're not fully capturing the
35:49structure of our data and that could
35:51lead to uh some kind of potential issues
35:54uh but if we if we only miss sometime
35:57right I mean that's that's obviously
35:59much better than if we miss all the time
36:01or we miss a lot which is the case of
36:03some of the other methods that we've
36:04seen uh in the past so this is still
36:08still an improvement even if there is
36:09still some uh KS to be worked out once
36:13we have our nodes we're going to grab
36:14our nodes and objects so we can create
36:16our Vector store index we're going to
36:18have our nodes be those nodes plus those
36:20objects uh very important uh page number
36:24info is just uh metadata so that that's
36:29it there's no um there's nothing to it
36:33other than that uh we we have metadata
36:36via the nodes uh because we know which
36:39page we're on so we've already captured
36:41that metadata uh so that's how we we're
36:43able to capture that and we have another
36:45question in the chat that's uh that I'll
36:46just answer since we're we're we're all
36:48together right now which is um is there
36:50an easy to review way yes absolutely you
36:53can tell exactly which nodes failed uh
36:56it does does tell you uh the node that
36:59had the failure so you're able to go in
37:02and check and make sure that that data
37:04is uh in in a format that you want uh
37:08yes okay so we've got our nodes parsed
37:13and we accept that a couple of them
37:14didn't work out hey it happens right
37:16this is new technology once we've got
37:18our Vector store set up so this is our
37:20Index right now we're going to create
37:22our recursive query engine we're going
37:23to use the uh reranking process so this
37:26the flag embedding ranker uh which is
37:28going to be powered by uh the BGE rerer
37:31large right and we're going to set up a
37:34recursive uh uh query engine uh we're
37:37going to be able to install some
37:38requirements because we have to uh we
37:40need the postprocessor flag embedding
37:42ranker and we also need to grab flag and
37:44beding from their repo uh once we
37:47installed these two uh these two
37:50requirements we can initialize our flag
37:52and betting ranker a ranker for those of
37:57who who aren't sure what that is right
37:59basically when we get a bunch of of of
38:03uh things that are likely related to our
38:04query we have the chance to reorder the
38:08list using a more compute intensive or
38:11timeconsuming process that's more
38:13accurate right so you can think of it as
38:15we very quickly cast a WID net and then
38:18we slowly look through what we've got in
38:20that net and we we we take our time to
38:23reorder that list so in this case we're
38:25going to go from 15 retrieved contexts
38:29down to the top five of those 15 right
38:32but top five of 15 I mean even if the
38:35process takes uh you know a millisecond
38:39we're still only talking about you know
38:4015 milliseconds which is not bad um it's
38:44not great let's be real but it's not bad
38:47and this is the idea of a ranker when it
38:49comes to our uh similarity top K we're
38:52just going to grab the top uh 15 results
38:55and then we are going to rank them uh
38:57that's it that's all there you go so
39:01we've set up our retrieval we've set up
39:04our query engine we've parsed our
39:06documents they're all in this index now
39:08let's do the thing right so we can ask
39:10questions like who is the executive VP
39:12operations and how old are they right uh
39:15we use the recursive uh uh retrieval
39:19engine here so we have a number of
39:21requests that kind of parse through
39:22these nodes which is pretty dope you can
39:24see here uh we this we've got that
39:27little table right huge huge you love to
39:30see that uh and eventually we wind up
39:33with this response Deborah shoquist is
39:36the Executive Vice President of
39:37Operations she is 69 years year old
39:39that's exactly right I mean this this
39:41information is not mentioned anywhere
39:42else in the document that's
39:44huge if you're running this in a
39:47notebook that's on a CPU instance you're
39:50going to notice this takes a very long
39:51time right so uh we're using this BGE
39:55ranker which is a uh a pretty beefy
39:59algorithm right if you're using it on
40:01CPU this query is going to take a long
40:03time if you're using this in GPU
40:04accelerated instance uh right which you
40:06can select through your runtime and then
40:09go to change runtime type and then
40:10select the GPU instance uh you're going
40:12to notice it's a lot faster um so just
40:15keep that in mind that this is uh the
40:17slowness is not representative of llama
40:19index or this technology in general it's
40:22basically just uh dependent on which
40:24resources you select um that's not true
40:27of the actual LL parse end point but it
40:30is true of this retrieval process uh and
40:33then we can ask questions like what is a
40:34gross carrying amount of total
40:36amortizable intangible assets for
40:402023 what a mouthful right well uh we're
40:43able to extract that it's uh you know
40:483,539 million right uh which is exactly
40:51what we see this is that's correct and
40:54it's exactly correct and it's right next
40:56to information that would be wrong right
40:59so the the the power of this application
41:04is immediately apparent if you're
41:05working with that structured data right
41:07we can see that this this process allows
41:10us to very Faithfully pull out the
41:13correct piece of information cont
41:16context and again this is not um uh this
41:22not information that is you know
41:25available just through text in the
41:28document which is important you have to
41:29come to the table to get this
41:31information so that's great let's try it
41:34on the AI education report which again
41:36doesn't have a lot of graphs uh or sorry
41:39doesn't have a lot of tables but does
41:40have a lot of like graphs and figures
41:41right so let's let's ask about those uh
41:44all of this is just resetting up the uh
41:47retrieval uh process on our um on our
41:51actual AI report right and then when we
41:53query it we can say how many AI
41:55Publications on patter Rec nition was
41:57there in 2020 and we get this response
41:59of there were 30.07 AI Publications on P
42:02recognition in 2020 which is definitely
42:05wrong right like it should be in the
42:07mid-50s um it should be you know uh well
42:12it should be the mid-50s right but we
42:1530.07 now what's interesting to me is
42:17despite 30.07 not being mentioned
42:19anywhere else in the actual document we
42:22do see that 30.07 is associated with
42:26this figure so while it didn't retrieve
42:28the correct context didn't answer the
42:31question right we can see that it is at
42:34least able to parse some information out
42:37of this uh out of this figure uh even if
42:40it's not there yet right I mean this is
42:42the this is the for me the signal that
42:46things are going in the right direction
42:47even if we haven't literally got there
42:49yet that's because we
42:52understand something about this we get
42:54the right we're you know we're we're
42:56Landing in the right country we might
42:59not be in the right uh Province yet but
43:01we're in the right country and that's uh
43:03that's good to see we we ask another
43:05question right this one should be a
43:06little bit simpler can you describe what
43:08figure 14 is related to and we get the
43:10response that it's related to the long
43:12tale of learner viability in the context
43:14of AI education it goes on and on and on
43:16uh this is not true uh that is figure 13
43:20but figure 14 is what we see here which
43:22is uh unrelated to our to our response
43:25so again when it comes to the uh the
43:28figures so the more like pictoral
43:31representations or the graphs there's
43:33still work to be done right uh which is
43:35expected uh and and I I think that's
43:38clearly communicated well enough in
43:40their in their uh blog content but we
43:43would still like to see it kind of get
43:45to the point where we're able to uh to
43:47understand better more pictoral
43:50representations of data uh you know as
43:52as we go forward it's clearly working on
43:54it but we're not quite there yet so I
43:57would really view this more as a tool
43:58for extracting structured uh kind of
44:01tabular data versus this this
44:05understanding of images and everything
44:06like that um so with that we're gonna
44:09send the uh send you guys back to Greg
44:11who's going to uh uh close us out yeah
44:15and take us to Q&A so uh there you
44:19go you're that was rocking Chris thanks
44:22so much man um that was LL parse
44:25everybody that's where the current state
44:27of affairs is and we are going to
44:31conclude for the day that you know
44:35really out of the box llama parse is a
44:37great place to start especially in lie
44:39of a custom in-house solution um You
44:42probably don't want to build it into a
44:43latency critical application today but
44:46definitely it's got very very nice
44:49tabular extraction specifically for PDFs
44:52and again that sort of too many
44:53parameters to tune issue that they're
44:55trying to address address they are
44:57addressing it although it is a sort of a
44:59black box to us the proprietary solution
45:02which you know um it's not open source
45:04but it is easier to use as is the nature
45:08of these things as they progress and so
45:11you know if you're getting started and
45:13you're not really trying to mess with
45:15parameters it might be a good solution
45:16for you to pick up off the shelf now um
45:19it could be definitely a lot faster uh
45:21we'd love to see figures and the sort of
45:23pictorial representation stuff we'd love
45:25to see other data typ of course they're
45:26working on it uh the data framework
45:28trying to lead the way in the industry
45:30has a lot of work to do and a lot of
45:32good stuff ahead of them so we look
45:33forward to contining to check out the
45:35latest and greatest from llama index as
45:38they release so with that yes we do have
45:41a slid you can uh go ahead and ask your
45:45question directly in the slid I'll ask
45:47the whiz to come back up for Q&A now and
45:50we'll go ahead and keep this link on the
45:52screen if you do have questions you can
45:53also throw them in the YouTube chat
45:56so whiz uh little um sort of
46:01clarification here on this kind of rag
46:08more uh Dynamic more complicated than
46:10ETL why don't ETL decisions affect the
46:13application was I think you know brought
46:15up when I was talking about this this
46:17language from the uh the Llama index um
46:22blog materials where they said you know
46:27processing embedding creating the vector
46:30DB these are a situation where every
46:35decision directly affects the accuracy
46:38in contrast to the classic ETL stack um
46:41can you sort of explain the way you
46:43think about this and if that's the right
46:44way that we should be interpreting
46:47this uh I would say like this is a
46:50difficult one to to answer cleanly
46:52because I believe deeply in my soul that
46:54ETL decisions yeah affect the
46:56application um I think that it's it's
47:02uh there's there's kind of like
47:04different axes on which they might
47:06impact things right so if we're talking
47:07about like performance related or
47:10latency related decisions I mean
47:13ultimately you know it doesn't matter
47:15because we're just gonna we're gonna
47:17wind up with a pile of data and then
47:19from there we can do the other stuff
47:21right um but but how we actually
47:24transform especially uh it it can be it
47:27can have significant impacts on uh on
47:30performance on ability to retrieve
47:33everything like that so um I I would
47:37say it's it's very much still worth
47:40paying attention to and very much still
47:41a key part of the plan yeah yeah yeah
47:44yeah thank you for that question uh that
47:46was Anonymous but uh yeah yeah
47:49factchecking um the specific wording is
47:51very helpful for us and and I'm sure for
47:53llama index as well uh okay so viges a
47:57asks what is the difference between
47:59llama pars versus multimodal models can
48:03multimodal models also do this kind of
48:05OCR including tables and I think we we
48:07sort of answered this throughout right
48:09it's it's not really doing OCR um as far
48:13as we can tell is that right I mean I I
48:16is we have no line of sight so the
48:19difference between L PRS versus multimo
48:21model is not understood well uh they
48:23could be using multimo model on the back
48:27unlikely uh or or it doesn't right like
48:31the issue is that we're not quite sure
48:33what they wanted it to be able to do uh
48:36we don't know what they did to create it
48:39so I would say at this point though it
48:41seems closer to like a pi PDF or Pi mu
48:45PDF tool than it does to like a full
48:47multimodal model um just but that's
48:51based on uh I just thought of it right
48:54there's no like there's no facts that uh
48:57that would lead me there so yeah yeah
48:59yeah I and we're gonna we're going to
49:01keep this discussion going with Islam
49:02here keep the questions coming guys um
49:05we we' we'd love to continue the
49:07conversation here so Islam says in the
49:09AI ad use case it retrieved the correct
49:11table image but not the correct info do
49:14you think we could pass the image to gp4
49:16Vision or just the image for chat Q&A um
49:22yeah absolutely right if we can get to
49:24the node that has the image and we can
49:27associate that node with just literally
49:30like a DOT ping or something like that
49:32right uh in our uh in whatever Vector
49:35store we're using uh that yeah we can we
49:37can build that logic in what you know
49:39when there's an image sent it to uh some
49:42kind of process that will help us
49:44understand what the imagees talkot uh
49:46yes yeah I I think that's a good uh
49:48that's a good thought okay cool let's go
49:50to this uh question from Cena aizi can
49:53you explain your decision on the
49:54recursive query engine as opposed to say
50:03retriever recursive does what we want
50:05here very well because we're dealing
50:08with this idea of a structured piece of
50:10information right we kind of want to
50:12laser in on this particular uh you know
50:17part of the document and then and then
50:20make sure we're able to capture the full
50:22table now bm25 and kind of combining
50:25like these these sparse search methods
50:28is still going to be useful but with the
50:30recursive approach we can more generally
50:33guarantee that we're going to get access
50:34to the full table somewhere in our
50:36context and because that's what we need
50:39right and and someone mentioned or
50:41someone noted as Stephen earlier noted
50:43right because we got that kind of
50:45expanded relevant context thanks to the
50:48way that we set up a recursive retriever
50:50we we actually were able to see that it
50:52understood that that 3,000 number was
50:54was meant to to be in millions right so
50:57that's that's a piece of context that it
51:00might not have retrieved or seen
51:02otherwise which is why I think that
51:03recursive approach is very helpful for
51:06this specific use case that that we saw
51:08today yeah yeah yeah cool cool I want to
51:11combine sort of two questions here uh
51:13one from uh they're both from Anonymous
51:17how how does llama parse for PDFs
51:20compared to unstructured doio this is
51:22the last event we we did and then have
51:25you done a comparison with other open
51:27source parsers like llm Sherpa is named
51:30here but sort of talk about maybe you
51:33know your perspective on this a little
51:35deeper the real benefit of llama pars is
51:38that it's integrated into the Llama
51:40index ecosystem for me I think that
51:43there are certainly other ways or more
51:46engineering heavy ways that you could
51:48approach these problems and custom build
51:51solutions that are uh that are better
51:53for your particular use cases
51:56uh but I I feel happy to say things like
51:59uh I agree with the benchmarks that
52:01they've released uh llama par does feel
52:04better at preserving structural
52:06relationships than say the the kind of
52:09python packages that are meant to do
52:11this um that might not be true forever
52:13and so the comment might not age well
52:15into the future but as of time of
52:17recording right like uh I think it's I
52:19think it's pretty good um but the main
52:23benefit is that it's baked into that
52:24ecosystem uh I would say that uh
52:28otherwise there there are lots of
52:29options you can explore that are that
52:31are going to be useful to
52:33you yeah yeah um so we've got sort of a
52:37question that kind of stacks on you know
52:39some of the some of the theories we
52:41heard earlier gp4 vision can we do this
52:44can we do that how do you chunk PDFs
52:46containing tabular data and maybe if I
52:48can just contextualize this with another
52:51comment charts are the weak link what is
52:53happening out there to see a chart and
52:56convert it into a kind of reverse prompt
53:00to unwind the data so the model can
53:03understand um can you talk a little bit
53:05about how you think about uh what the
53:07best way to chunk PDFs out there is and
53:09then what the space is as far as you can
53:11see yeah I I mean I think it comes down
53:14to we want to have our tables preserved
53:17as some whole Chunk we I mean that's
53:20just tremendously useful right uh llms
53:24are good at kind of seeing or reasoning
53:28about charts even gbt 35 turbo right so
53:32I I I ultimately I think the the way we
53:34want to think about that is that uh you
53:36know charts tables all these kind of
53:39like figure based elements they're their
53:41own node and then our text is nodes
53:43around them and then they're connected
53:45via some kind of hierarchical metadata
53:48where it's like these are figures from
53:50this section along with these passages
53:52and and that's the way uh in terms of
53:55like the prompt you just feed the table
53:56in I mean if you if you just take like a
53:58markdown table and you show it to gp35
54:01Turbo it's going to be pretty good at
54:03answering questions about it and if we
54:05need more uh you know say like aggregate
54:08information or we need derived
54:11information from the chart uh you can
54:13hook it up to things like code executors
54:15or uh or you know processes that are
54:18similar to gb4 is code interpreter where
54:21we can actually like load it into some
54:24uh you know uh python structure and then
54:27do operations on it but um outside of
54:30that I think you know we just want to
54:32make sure that they're their own unit
54:34and that they're considered that and
54:36that the relationship between them and
54:38the text around them is clear but
54:39they're Their Own Thing by itself I
54:41think is very useful yeah yeah yeah and
54:43sort of this idea of if you can get it
54:45into a markdown format then you're going
54:48to be able to work with it in a lot of
54:50different ways and uh okay so then sort
54:53of kind of Stack in on that a little bit
54:58um I know it's still a black box for us
55:01Islam AP par but you know question from
55:05another one out there is how effective
55:08is this for sort of uh I interpret this
55:11as sort of maintaining the structure of
55:13the tables um row span column span like
55:19um I don't know I don't know if we're
55:21talking about sort of the distance
55:23between characters here
55:26um I mean it's pretty good it's just a
55:29markdown table though so uh you
55:33know it is good at preserving the
55:36structure of the table it is not good at
55:40uh being able to like you would not be
55:42able to recreate the table as it appears
55:45in the PDF from the markdown that you
55:47receive there's z% shot I if it was
55:51formatted in a specific way so like
55:53things like borderless or left left
55:55oriented or right oriented all of these
55:57or the colors I mean you're you're
55:58losing all of it like it it's just going
56:01to show you that there's some table and
56:03that table has values and here's how
56:05they look uh but that's all we're
56:07getting we're getting zero information
56:09about uh kind of the the way that it's
56:12visually presented um which is by Design
56:16frankly uh you know it's not meant to
56:18reconstitute the table it's just meant
56:20to extract it so we can turn it into
56:22something like a CSV and work with it
56:25right that's right that's right um okay
56:28yeah the great questions keep coming
56:33got a couple uh let's let's rapid fire
56:37these Chris real quick um let's do um
56:40can we use LOM index for retrieval and
56:42connect it to Lang chain is there any
56:45this why not uh Lang chain can uh can
56:50understand uh markdown just fine
56:53anything that can convert markdown into
56:54into some other useful file format or
56:56chunk it and convert it you know llama
57:00parse is a great tool because it just
57:01spits out a markdown file or it spits
57:03out a text file and so you can integrate
57:06it into a lot of other pipelines all
57:08right next is recursive retrieval
57:10similar to parent child
57:16yes okay all right and then um and then
57:25to uh keep up with the times here Tariq
57:31chat did is it even worth implementing
57:34rag with these large context window
57:38models like Gemini I mean should we just
57:40throw it out the window yes absolutely
57:44it is still worth implementing rag uh
57:48period uh we can talk about all of the
57:51axis you know that that we can examine
57:55that makes this true um cost uh effort
58:04accuracy uh confabulation hallucination
58:07rag is still a huge powerful component
58:13um because it lets us do what we want to
58:16do which is answer specific questions
58:18about specific things in specific
58:20documents and the large context window
58:23is is amazing and and it's going to help
58:26us do a lot of really awesome things uh
58:29but for right now um those things just
58:33don't push out Rag and you know what
58:36that reminds me and I guess it's a good
58:37spot to wrap on is this sort of idea of
58:39context augmentation from llama index
58:43appears to be sort of telling us that
58:45perhaps there are other patterns Beyond
58:48rag that um that they may have in mind
58:51for context augmentation in the future
58:53so uh stay tuned for
58:55what happens next the industry is going
58:57to continue to evolve and we'll keep you
58:59guys up to date and up to speed on the
59:01latest and greatest whiz thank you for
59:06wisdom and thanks everybody for joining
59:09today if you're interested in learning
59:11about when and how to do fine tuning
59:14when you're actually sort of done with
59:15rag that's what we're covering next week
59:17on Wednesday live same time and of
59:20course please like subscribe and ring
59:23that Bell to stay up with all events as
59:27they drop live or ones that we upload if
59:31you are seriously ready to accelerate
59:34your llm application development like
59:36seriously ready to accelerate then check
59:40out our AI engineering boot camp our
59:44industry-leading cohort-based live
59:47online course where you can fill all
59:49your skills gaps from building to
59:51operating to improving llm applications
59:56if you just enjoyed engaging with us in
59:58the chat or even just watching the chat
01:00:00you may go ahead and join our Discord uh
01:00:03because we can keep the conversation
01:00:04going in there and get you down the path
01:00:07to starting to build ship and share with
01:00:09us if you are not really somebody that
01:00:13wants to engage but you just want to
01:00:14kind of Tinker on your own we've got a
01:00:16few resources we want to share one is
01:00:19our awesome aim index which you can get
01:00:21direct access to code from all of our
01:00:23events and the the other one is our
01:00:25recently released op sourced llm Ops
01:00:31llms in production cohort one materials
01:00:34including the entire GitHub repo check
01:00:38that out if you're trying to get up to
01:00:39speed this is prev v0.1 stuff from 2023
01:00:43we look forward to open sourcing more
01:00:45stuff as we move forward into the future
01:00:49finally any feedback that you can
01:00:50provide us is great either through Luma
01:00:52or through our feedback form and as
01:00:55always thank you so much everybody until
01:00:58next time keep building shipping and
01:01:00sharing and we'll do the same we'll see
01:01:04soon have a great week