Superior RAG for Complex PDFs with LlamaParse

AI Makerspace2024-03-01

3K views|6 months ago

💫 Short Summary

The video discusses the release of LL index v0.10 and LL parse, focusing on parsing embedded tables and figures to enhance data framework for LL applications. It highlights the shift to context augmentation for better responses and the development of LL parse for complex document processing. Challenges in data processing and the importance of data quality are addressed, along with the benefits of fine-tuning for fact-checking. The video also covers the use of recursive retrieval algorithms and the potential for building a query engine for accurate information retrieval. Overall, it emphasizes the significance of these tools in data processing and continuous industry evolution.

✨ Highlights

📊 Transcript

✦

New LL index v0.10 release and LL parse library discussed for parsing embedded tables and figures.

01:11

Greg and Chris, co-founders of AI maker space, assess if tools enhance production-grade rag experience for complex PDFs.

Demonstration and analysis of LL index v0.10 and LL parse to showcase evolution of communication tools and capabilities.

New releases aim to establish LL index as a Next Generation data framework for LL applications.

✦

Key aspect of version 0.10 of the Llama framework is the shift towards core versus third party integrations with the removal of the service context object.

04:55

Llama index has been updated to focus on context augmentation for more accurate responses.

Context augmentation involves adding reference material to prompts for fact-checking and generating better answers.

The concept includes dense vector retrieval and context learning to avoid false information or 'hallucinations'.

The main goal of these changes is to improve responses and provide more reliable information in the llama framework.

✦

Utilizing an embedding model to create a vector format for a question and comparing it with documents in a vector store.

08:37

Process involves dense vector retrieval and context augmentation for finding similar information.

Key idea is in-context learning, regardless of the language model.

Following industry standard processes from prompting to Rag to fine-tuning embedding and chat models.

Aim is to achieve human-level performance in information retrieval.

✦

Importance of fine-tuning and updating data for easier fact-checking.

12:37

Emphasis on data-centric approach of llama index and rag for accurate results.

Challenges in decision-making for data processing, embedding, and setting up Vector databases.

Complexity of moving between different databases.

Common pain points in building applications, including inaccurate or insufficient results and overwhelming considerations from chunk sizing to model selection.

✦

Lamap Parse is a proprietary parsing algorithm for documents with embedded objects like tables and figures.

15:38

It allows building retrieval over complex, semi-structured documents with tabular and unstructured data.

This advancement aims for production-grade context augmentation.

Lamap Parse is built on recursive retrieval algorithms atop llama index.

✦

Summary of Extracting Information from Apple 10K Filings.

17:01

Parsing tables and text in markdown format is key to building complex R systems.

Comparisons were made between different methods for information extraction, showing improvements over standard methods.

Tabular data extraction performed very well, but inconsistencies in speed were noted, particularly with a recursive retriever.

Figure extraction was unsuccessful, but tabular extraction was deemed a significant achievement overall.

✦

Highlights on Figure Extraction Challenges and Document Type Support Development.

22:38

The process includes using simple models and open AI text embedding for better support.

Building a recursive query engine and utilizing recommended recursive retrievers are part of the development.

Llama Parse and Llama Index version 0.10 are key tools being utilized.

The release of Llama Cloud and Llama Parse offer leverage for document processing, with Llama Parse being a proprietary algorithm behind an API that accepts PDFs and returns documents in multiple formats.

✦

Overview of Markdown, llama v0.10 update, llama API key creation, PDF parser feature, and using open AI in Google Colab.

23:21

Markdown is discussed as a tool for capturing structural relationships in documentation.

llama v0.10 update focuses on splitting community and integration tasks to llama Hub for a streamlined core library.

The process of creating a llama API key through Llama Cloud is explained, stressing the importance of secure storage.

The PDF parser feature is highlighted, with a note that it currently only works with PDF files.

The video also mentions the use of open AI and asynchronous functions in Google Colab for document processing.

✦

Conversion tool for PDF files with a focus on preserving structure.

27:03

Markdown notation helps understand structured data efficiently.

Users can select language and number of workers for processing files.

Up to 10 workers can be used simultaneously in batch sets.

Process involves uploading files to Collab instance with correct naming for successful parsing.

✦

Inconsistency in processing PDF files with Index.

29:09

First-time processing can be time-consuming but subsequent attempts are faster.

AI report took longer compared to Nvidia 10K filing.

Preserving structure in markdown files is emphasized as crucial.

Leveraging markdown for understanding document structure and potential for building a query engine is mentioned as useful.

✦

Transition from service context to global settings object in coding library update.

32:11

Setting base parameters such as llm and open AI embeddings for improved performance.

Importance of accurately representing structure in data retrieval, utilizing markdown element parser for parsing structured data from markdown files.

Extracting semantic information and answering questions based on context within tables or figures.

✦

Summary of data capture process improvements.

35:47

Errors and missing data sometimes occur due to markdown processing failures.

Despite occasional issues, there has been an improvement in data capture.

Nodes are parsed and failures are identified for easy review.

Implementation of a recursive query engine using reranking processes for data analysis is possible once the Vector store is set up.

✦

Importance of Efficient Ranking in Context Retrieval.

38:23

The process involves casting a wide net and then slowly ranking the top five out of 15 results for accuracy.

The BGE ranker algorithm performs better on GPU accelerated instances compared to CPU instances.

Selecting the right resources is crucial for faster processing in the retrieval process.

Emphasis on the significance of the retrieval process in obtaining accurate information.

✦

The power of the application lies in accurately processing structured data for faithful extraction of contextually correct information not available through plain text.

42:26

Despite some inaccuracies in retrieving data, the application shows potential in parsing information from figures.

Improvement is needed in interpreting pictorial representations and graphs, with ongoing work required in this area.

Clear communication of expectations and progress is emphasized, with a desire for better understanding through visual data representations.

✦

Summary of structured data extraction tool discussion.

44:05

LL parse is recommended for tabular data extraction from PDFs.

Proprietary solution is user-friendly but not open source.

Ideal for users who do not want to adjust parameters.

Speaker anticipates future developments from llama index and invites questions during Q&A.

✦

Importance of ETL decisions in impacting performance and latency, with data transformation being crucial.

46:56

The distinction between llama pars and multimodal models is unclear, resembling a PDF tool.

AI use cases can involve passing images to GCP Vision for accurate information retrieval by linking the image node with a DOT ping in the Vector store.

Developing logic for image processing can enhance comprehension and enhance results in chat Q&A situations.

✦

Benefits of using a recursive query engine over a hybrid retrieval approach.

50:03

Recursive query engine can capture full tables and structured information more effectively.

Advantages of a recursive retriever in gaining access to relevant context and understanding complex data.

Comparison of llama parse with other open-source parsers, highlighting its integration into the Llama index ecosystem and superior preservation of structural relationships.

Speaker expresses confidence in llama parse's performance based on released benchmarks.

✦

Discussion on chunking PDFs with tabular data and converting charts into a reverse prompt for model understanding.

53:11

Emphasis on preserving tables as whole chunks and treating tables and figures as separate nodes connected via hierarchical metadata.

Mention of potential use of markdown format for working with tables in various ways.

Acknowledgment of limitations of markdown tables in preserving exact table structure from PDFs.

Indication of the need for clarity in the relationship between tables and surrounding text.

✦

Importance of visual presentation in data formatting.

56:16

Tools like Llama Index for converting markdown into useful file formats are discussed.

Benefits of recursive retrieval in data processing are highlighted.

Implementation of Rag for answering specific questions is mentioned.

Potential for future advancements in context augmentation is explored.

✦

Highlights from the YouTube video segment on LLN applications and AI engineering boot camp promotion.

59:54

The segment discussed fine tuning LLN applications and promoting an AI engineering boot camp.

Resources shared included an AI index for code access and open-sourced LLN Ops materials.

Future plans for open sourcing more content were mentioned, emphasizing continuous improvement.

Feedback was encouraged through Luma or a feedback form, with a focus on community engagement.

00:09[Music]

00:22hey Chris you hear about the new LL

00:24index

00:26v0.10 release in that new llama parse

00:29Library you hear about this yes I sure

00:32did hear about that Greg yeah man it it

00:35says that it can actually parse embedded

00:39tables and figures you ready to check

00:42this out today and see if it does what

00:45it says on the 10 I did hear it does

00:48that yeah and absolutely I can't wait to

00:50dig in yeah man well let's get right

00:52into it today we'll see you back for the

00:55results and conclusions on exactly what

00:58this thing is doing for us welcome

01:01everybody my name's Greg that's Chris

01:04AKA Dr Greg in The Whiz we are

01:06co-founders of AI maker space and today

01:09we're going to look at one of the newest

01:11tools to hit the opsource AI llm

01:16Builders Market llam parse and we want

01:20to take a close look at this and see if

01:21it actually improves on the rag for

01:26complex PDFs that we've looked at

01:29previously and that the entire industry

01:32really all Industries will continue to

01:34look at if you've got questions that pop

01:36up throughout today's demo please drop

01:38them in the slido link that we'll throw

01:40into the chat

01:42now but with that let's go ahead and get

01:45right into it we want to kind of cover a

01:47few things because there's been a lot of

01:49new stuff released from llama index

01:51including llama parse but also even a

01:54little bit more so we're going to see if

01:56all this comes together to give us

01:58really a superior production grade rag

02:01experience as with all sessions we're

02:04going to align our aim today and try to

02:06figure out exactly what you're going to

02:08get if you stick around with us for the

02:09hour we're going to do an overview of

02:11llama index

02:13v0.10 we're going to understand llama

02:16pars performance on embedded tables and

02:19figures this is what we set out to try

02:21to take a really close look at and we're

02:23going to see exactly how to build a

02:27query engine using lava par for your

02:30documents that you can leverage in your

02:32rag applications so first we're going to

02:34go ahead and check out llama index

02:38v0.10 and llama parse we're gonna sort

02:41of

02:42review llama index rag just to

02:45contextualize this a little bit a lot of

02:48the docs have been changing with a lot

02:49of these tools and we want to kind of

02:51keep you updated on the latest and

02:52greatest in the way people are

02:54communicating these tools and their

02:56capabilities so of course uh we start

02:58with the new release llama Index

03:02v0.10 this is the sort of Next Step

03:05along with the Llama Cloud platform that

03:07was released towards making llama index

03:09a real Next Generation production ready

03:11and we still see this key word again

03:13here data framework for llm

03:17applications similar to other new

03:20releases we've seen we see that llama

03:23index has a

03:25core that it's going to contain the main

03:28abstraction

03:30that we have talked about previously and

03:33that many of you are probably familiar

03:35with already and the separation between

03:39the core constructs and the third party

03:43Integrations is the key aspect of the

03:45core versus The

03:47Hub Additionally the service context

03:51object that if you've been building with

03:53llama index you're familiar with has

03:56kind of become

03:58cumbersome over time time and

04:00increasingly difficult to use it was

04:03meant to be used as an intermediate user

04:06facing layer uh to let you sort of

04:08Define parameters but it's it's sort of

04:11um it's sort of become not really the

04:14best solution given this new core llama

04:19Hub differentiation so service context

04:22is no longer going to be part of your

04:25build if you start upgrading to the new

04:28version and of course the number of

04:30thirdparty Integrations is

04:32growing uh lots and lots and lots of

04:34them um many hundreds at this point

04:37which is really really cool so you know

04:39this is kind of from their blog dropped

04:41a link in the chat

04:42here the core package underlies and then

04:45we have Integrations we have all the

04:47Llama packs and then there's some

04:49experimental and fine-tuning stuff

04:50happening as well so keep an eye out for

04:53that but the the sort of the big

04:55takeaways from V 0.10 are the service

04:57context removal and the core versus the

05:01Llama Hub all

05:04right so let's talk a little bit about

05:07llama

05:08index just in

05:12general they've updated all of their

05:15docs as they came out with

05:17v0.10 but they still are very much a

05:20data framework this is something that's

05:22unique in the industry and they're

05:26focused on helping you build llm

05:29application

05:30that can benefit from quote context

05:35augmentation this is the big idea behind

05:38llama index context augmentation let's

05:40demystify this for a second let's

05:43demystify this idea of context

05:47augmentation what are we talking about

05:49we're talking about augmenting the

05:52prompt in the context window of the

05:56llm that's it we're talking about rag

06:00that's what we're talking

06:02about why rag well rag because we don't

06:05like confident responses that are false

06:08these are called

06:10hallucinations fake news nobody likes it

06:14we need to be able to fact check with

06:17reference material that we can add to

06:20our prompt augment our prompt with it

06:24and then we can generate better

06:25answers now we we talk about context

06:28augment a one way to think about

06:32rag that we've been communicating to our

06:36audience and that we encourage everybody

06:38to sort of break down into its core

06:40component pieces is as dense Vector

06:44retrieval Plus in context learning so we

06:48sort of see this context augmentation

06:50coming in

06:51here so we're going to kind of walk

06:54through when you ask a question we're

06:57going to send that question to an

06:58embedding model it's going to create a

07:02vector format of that question after the

07:06tokenization and the embedding process

07:08we're going to then look in our Vector

07:10store our Vector store going be made up

07:11of our documents and we're going to look

07:14for similar stuff to the question using

07:17a simple similarity metric we'll set up

07:19a prompt template that we can use to

07:21augment our prompt

07:23context and it'll say something like use

07:25the provided context to answer the

07:27user's query uh don't answer if you

07:30don't

07:31know and we can then take the materials

07:35that we find that are similar we can

07:37shove those into the prompt context

07:42window this process is dense Vector

07:46retrieval right we're simply using a

07:49dense Vector representation or sometimes

07:51it's a sparse dense right sometimes it's

07:52a little more computationally efficient

07:54if you're using something like pine cone

07:55that's out of the box but let's say it's

07:57dense Vector retrieval in general in a

07:59naive way and we're returning natural

08:02language for the

08:03prompt now as we set up our prompt

08:06template and we're giving it more

08:09information in

08:11context this is the in context learning

08:13piece this is the big idea from the gpt3

08:17paper language models are F shot

08:19Learners now together these two things

08:23these are

08:25rag these are context augmentation the

08:29the important thing to note about this

08:32setup is that it is completely

08:35independent

08:37of the llm that you put it all into when

08:43you're

08:44done and then the llm provides the

08:47response in the end so we're augmenting

08:51the context window we augmenting The

08:56Prompt so why are we doing this

09:00well we're doing this because when we

09:03prototype we want to make

09:05sure that we're going through the same

09:08industry standard process that everybody

09:11goes through we start with

09:13prompting we move to

09:17Rag and then generally we're thinking

09:19about fine-tuning it's not always linear

09:22but oftentimes it

09:24is and this sort of mental model shown

09:28by open AI

09:30is to start with prompt

09:32engineering you can think about

09:35optimizing the context what the model

09:39needs to know through rag you can also

09:42think about

09:44optimizing the llm the way the model

09:47needs to act through

09:51fine-tuning eventually you'll probably

09:53do both end up fine-tuning both your

09:56embedding model and your chat model in

09:58the end

09:59as you try to reach human level

10:03performance in your application here's

10:06an example from open AI

10:10devday so generally we're

10:14seeing a order of operations where we're

10:17doing rag before

10:20fine-tuning this is generally a cheaper

10:25First Step it's going to be easier to

10:28update with the latest information and

10:30it's going to give us that fact

10:31checkability

10:32that's all super

10:36dope

10:38so what you can take away from this is

10:41that again rag is context

10:45augmentation rag is context

10:52augmentation and note this is all

10:54independent of the llm so the cool thing

10:56about llama index as a data framework is

10:59that llama index and rag in general they

11:01really pose no restriction on how you

11:03use the ls you can of course go to fine

11:05tuning you can put different things in

11:09to different llms but it's really really

11:11focused on the data piece on being data

11:14Centric right because the data Centric

11:16Paradigm hasn't gone

11:18anywhere and as we heard in L index's

11:21v0.10 release rag is only as good as

11:24your

11:26data and I like the sort of Al model the

11:29framework that they provided in that

11:35blog of the rag data

11:39stack that's different from classic

11:42ETL we're going to load the language

11:44data we're going to process the language

11:45data we're going to embed the language

11:47data and then we set up our Vector

11:50DB what's interesting about this is that

11:53we load the

11:55data we're chunking we're tokenizing

12:00as we're processing we're deciding on

12:02chunk sizes we're we're trying to figure

12:05out do we need any sort of uh meta data

12:07or hierarchy or or how are we exactly

12:10setting up the way to think about this

12:12data

12:14either short form or long form as we do

12:17embeddings there are obviously many

12:18different embedding models you can

12:20fine-tune them as well and then the

12:22actual Vector database setup or the

12:25setup of many indices many Vector

12:28databases and how to exactly move

12:30between them is sort of an art unto

12:33itself especially depending on your

12:34organization's

12:36data

12:37so in contrast to Classic ETL all of the

12:41decisions that were taking in the rag

12:44data

12:45stack they affect the ultimate

12:51application and the key pain points for

12:54people building today this is again from

12:56llama index which generally we

12:59absolutely agree with this is what we're

13:01hearing from folks out there in the

13:03marketplace as well is that results are

13:05sort of not right they're not accurate

13:07they're not good enough a lot of the

13:10time and there's just too many things to

13:13think about from chunk sizing to hybrid

13:16retrieval to exactly which model to if I

13:19should fine tune to all of this stuff

13:21and it's just a lot to deal with and

13:23then of course everybody has just

13:25ridiculous amounts of portable document

13:27format PDFs sitting around that they'd

13:30love to be able to use and PDFs are

13:34famously hard for us as humans to deal

13:36with and they've been something that's

13:37been challenging

13:39for llms to deal with as well so that's

13:43where we are today as

13:46we try to tackle llama parse we're

13:49looking at this not accurate too many

13:52parameters and PDFs these pain points

13:54the data syncing issue is sort of this

13:58sort of live data as it's changing and

13:59updating uh this separate issue we're

14:01not going to cover this

14:02today so let's talk about llama index

14:05rag tooling for a

14:07second we've covered this previously

14:11we're going to link a few events to you

14:13guys where we have covered it but we

14:17essentially have a way to ingest data

14:20those are data connectors we have a way

14:23to structure the data those are indices

14:25those are vector databases which is the

14:28simplest type of of index and then there

14:31are different engines we're going to

14:33build a query engine today and as many

14:35of you have heard us say before query

14:38engines are to llama index as sort of

14:41the chains are to Lang chain so the

14:42query engine is really at the heart of

14:46llama index now we're seeing this sort

14:49of chat engine emerge from llama index

14:51as well which is cool and that is going

14:54to dovetail directly into this idea of

14:56agents and you know the data frame work

14:58as data

15:00agents because with the chat interface

15:02you're able to sort of go back and forth

15:04a little bit better as you engage with

15:08and in interact with your applications

15:10through different

15:13reasoning and different cycles of

15:18decision

15:20making again if you would like to know

15:22more about the constructs we've covered

15:24this in previous events and sort of

15:26digging in deeper to the core llama

15:28index package that has been sort of put

15:31together in v0.10 definitely check these

15:33out but that's enough a background for

15:37today right we're here to talk about

15:38lamap pars and it's in public preview

15:40mode and we want to understand exactly

15:43how well it's

15:46doing what it's doing well what it's not

15:49doing so well and what to expect in the

15:51future and really llama

15:55parse at the highest level it's

15:58proprietary

15:59and it's a parsing algorithm for

16:01documents that have embedded

16:03objects so we read that it had embedded

16:06table and figure capability we wanted to

16:08check this

16:10out it's also allowing us to build

16:13retrieval over more complex documents

16:16sort of semi-structured meaning they

16:18have tabular and unstructured data

16:21meaning it's like tables and language

16:23data and then this is all sort of in the

16:27spirit of going toward towards

16:29production grade context

16:34augmentation now lamap par is built on

16:37top of recursive

16:40retrieval algorithms and work that llama

16:44index has done previously so it's sort

16:47of a natural Next

16:49Step that step is to parse out tables

16:53and text in markdown

16:56format because they've built a lot of

16:58tools already that integrates very very

17:01well with that markdown

17:04format so again we

17:08can build more complex R systems with

17:11more complex data now the sort

17:15of Flagship example here from their

17:18release is the Apple 10K filings and

17:22they did a comparison of llap pars

17:24versus Pi PDF over these 10K filings

17:29they also compared Pi M PDF textract and

17:31PDF minor uh you'll notice that the red

17:34is uh where the information was not

17:37extracted very well and so lot of red um

17:41this was the least red amongst all of

17:45the comparisons the pi PDF so it was the

17:48second best and then you'll notice there

17:49are a few red pieces in the LL pars one

17:53I know you probably have to squint to

17:54look at this but like here and here so

17:57it's not it's not per perfect today but

18:00it is an improvement over the standard

18:02and that's pretty

18:05cool which gets us

18:07into our testing so what we did is we

18:11said okay well we want to test out if

18:14this can kind of work on of course the

18:16classics right so we we picked up an

18:18Nvidia 10K filing we were also very

18:21interested like could this work on

18:23infographics could this work on more

18:26complex figures that were embedded in

18:29PDF

18:31documents first I was throwing out a

18:33couple of image ideas but but images

18:35isn't really the same as PDFs so we

18:39found not just the Nvidia filings but we

18:41found a great sort of related in in many

18:44ways in sort of meta ways to what we're

18:47doing now ai and the future of teaching

18:50and learning from the office of

18:52educational technology May 2023 it's a

18:56long PDF document 7 plus Pages the

18:59Nvidia 10K filing is 90 plus pages so

19:02long chunky documents and uh you know

19:05lots of sort of infographic esque

19:08figures we're wondering you know can it

19:11extract the text can it extract the

19:13numbers what's going on now um drum roll

19:17please well here's what we found the in

19:20conclusion and we'll sort of walk you

19:22through how we got here but we'll give

19:24you the conclusions kind of up front

19:26here is that when it came to parsing

19:29there was very inconsistent speed and

19:32especially with the recursive retriever

19:33that we built which was the recommended

19:35one that that they uh that they asked

19:38that they recommended

19:41um it took actually minutes to run

19:45requests now the tabular extraction the

19:48tabular data was very good and when it

19:52worked it worked very very very well so

19:56uh definitely that was the The Shining

19:59highlight although there was no figure

20:02extraction and I think this was

20:06perhaps something that we we read but

20:11then as we double clicked in and looked

20:14specifically at the release

20:16blogs uh we noticed that they actually

20:19still in the process of building out

20:20better support for figures forther

20:23document types and of course this is the

20:25natural progression right and and then

20:28you get into this figure space you're

20:29kind of getting into the image space so

20:31I can imagine how challenging of a

20:34problem this is and of course this is

20:37the kind of feedback that they're

20:38getting from lots of folks and I'm sure

20:41that you're very interested in the day

20:43when we can do figure extraction but

20:45that day is still not

20:48today so how did we figure this out well

20:51let's go through it specifically and

20:52then we'll see

20:54exactly how llama parse is working on

20:57the back back

20:58end we used some simple models we used

21:02open AI text embedding three small model

21:04the latest and greatest from them but

21:06the small we used open ai's gbt 3.5

21:10turbo we built a recursive query

21:15engine to their recommended we use the B

21:19ba AI BGE ranker large this is kind of

21:24the old two-step right you uh do

21:26embedding based retrieval get the docs

21:28and then you rerank

21:32them and that's it so you know we had

21:36the Nvidia 10K filings we had the

21:38department of Ed we used open AI models

21:42we used the recommended recursive

21:43retriever we use llama parse we use

21:45llama index

21:47v0.10 and with that I'm going kick it

21:51over to the whz to show you exactly how

21:53this looks in code and give you a little

21:56more nuanced Vibes and information

22:00related to what you might be able to

22:01expect in your application whiz over to

22:03you

22:05man thank you Greg yes okay so we're

22:08going to go ahead and drop this notebook

22:10into the chat so you can follow along

22:13and uh we're just going to go through a

22:15couple things we're g to start with a a

22:18a straightforward kind of uh you know

22:21portion of this which is getting actual

22:24uh you know llama pars to work uh and

22:27then we'll move on to uh creating that

22:30uh those retrieval pipelines that we saw

22:32those query engines that Greg was

22:34describing so uh first things first the

22:38Llama parse release comes along with

22:40llama Cloud llama cloud is has more than

22:44just llama parse but for right now

22:45that's what we're going to leverage it

22:47uh for uh basically llama parse is

22:50exactly as uh was described right it is

22:54some

22:55proprietary uh algorithm that they're

22:57using it is behind an API so we don't

23:00have any access to exactly what's

23:03happening uh behind the scenes we we can

23:05kind of infer what might be happening

23:07but the idea is that it's an API that

23:10accepts PDFs and it returns documents

23:13and it can return those documents in

23:15multiple formats one of those formats is

23:18markdown and the power of having it

23:21return uh markdown is that markdown can

23:24help us capture these kinds of you know

23:26structural relationships

23:28within our documentation we can use

23:31llama index's markdown node parser from

23:34there to help us really understand

23:36what's uh what's going on in these

23:38documents so that's great we also have

23:42of course

23:43v0.1 uh zero so this is v0.10 huge

23:48basically you know it is exactly the

23:51same kind of thing we've seen uh

23:52recently from uh other libraries

23:55including Lang chain which is this idea

23:57of things were getting kind of you know

23:59bloated right we have a lot of different

24:02uh we have a lot of different you know

24:05possibilities a lot of different things

24:07we can do uh and they were kind of

24:09glutting up the core library and so

24:11those were split uh effectively llama

24:14Hub is now the source of Truth for

24:16everything community and integration and

24:19then llama core focuses on just what

24:21llama uh index is supposed to do which

24:24is awesome so how do we use the actual

24:27cool well first of all we're going to

24:29need some uh you know we're going to

24:32need some dependencies so we're going to

24:33grab llama parse uh from our

24:35dependencies uh and then we need to

24:37create a llama API key we're going to do

24:40this through the Llama Cloud so I'll

24:42I'll just zoom in some here so you guys

24:43can see it basically when you arrive on

24:45this llama index page we can go to our

24:49resources in the bottom left which is

24:50our API key and then we can generate a

24:53new key give it a name and then uh store

24:55it somewhere safe so this is easy easy

24:57as it gets you'll also notice I'll zoom

25:00in a lot here so you can see real

25:01clearly you'll also notice that you get

25:03quite a few pages per day that you can

25:06use with the PDF parser and you'll also

25:09notice that this is a PDF parser so

25:12right now this is only something that

25:13works with PDFs uh there's no other file

25:17types that are currently accepted uh and

25:19as well we can we can see when we look

25:22through the uh the code uh that there's

25:24only two return types which is either

25:26just plain text

25:28or markdown but you get 10,000 pages per

25:32day which is pretty awesome uh so let's

25:34head back to the notebook once we have

25:36our API key we can provide it here uh

25:39we'll also be using open AI so we're

25:40going to slap our open AI key in here

25:43and then we need to do this classic uh

25:46you know uh cheat code for Google collab

25:50we're going to be using asynchronous uh

25:53functions here uh you know there is no

25:58other uh way around this we just need to

26:01run it like boilerplate if you're

26:02running this in a notebook um and that's

26:05correct yeah absolutely because it only

26:08accepts PDFs unless your file is a PDF

26:11you're going to have to find a way to

26:12get it into a PDF uh luckily a lot of

26:15files are already PDFs or they're easily

26:17converted so uh it should it should be

26:20uh a

26:21lower burden to convert a file to PDF

26:25than it is to convert from PDF and

26:27that's the problem that this tool is

26:29solving so the next thing we're going to

26:31do is just initialize llam parse uh

26:34couldn't be easier right uh we set up

26:36our object we're going to say we want it

26:38in markdown this is because we're very

26:40keen on that uh that structural

26:43relationship I'll just zoom in a little

26:44bit here um you know we really want to

26:47know what the structured data is saying

26:49and we want a way to interface with it

26:52uh that preserves that structure right

26:54so this plain text idea doesn't really

26:56help us do that as well whereas markdown

26:58gives us notation that we can use with

27:00the markdown node parser to uh to

27:03understand structural relationships

27:05remot equals true or false it's up to

27:07you how much text you want to read uh

27:09language there's a number of languages

27:11that are supported by default we're

27:13going to use English and uh in this

27:15example we also be using English and

27:17then of course we have a number of

27:19workers we are going to uh we're going

27:23to go ahead and set two workers because

27:25we're going to parse two f right so this

27:28is the idea uh you can have up to 10

27:31workers at a time so you can do this

27:33kind of in batched sets of 10 uh once

27:36we've done that we're going to upload

27:38some files to our collab instance pay uh

27:41close attention to um you know what

27:44you're saving these as right so this is

27:47the uh file that you'll need to uh send

27:50to laa parts right so if I look in my

27:52files here you can see I have uh done

27:55this process twice so I have two

27:57different vers versions of the files I

27:58need to make sure that I have uh the

28:04actual correct file right when I go to

28:07send it to lamap par so please pay very

28:09special attention to this if you've

28:11named your file something else you'll

28:13have to take the name right from here in

28:15order for this to work uh if you're

28:17watching this in the future um and we do

28:20the same thing for the AI report uh

28:22which Greg alluded to earlier that's the

28:24kind of artificial intelligence report

28:27pretty cool document right it's got

28:28sweet graphs got sweet figures we're

28:31going to see how well llam par Stacks

28:34up the next part is the actual parsing

28:37this part's totally opaque to us uh we

28:39just send the file to an endpoint and

28:41then at some point we get back a

28:44response and that response is uh

28:46documents that can be parsed easily

28:49through uh llama index as they're

28:51obviously tightly integrated you don't

28:53have to use llama Index right the

28:54documents are just uh marked down right

28:57so you you can use whatever you'd like

29:00past this step but uh obviously llama

29:02index is paying special attention to

29:04their ecosystem so that's what we're

29:06going to stay in today uh again only PDF

29:09files this is also a very inconsistent

29:12process I've found so sometimes this can

29:15take a very long time uh sometimes it

29:17doesn't take very long at all right um

29:20once you've done it for a file I have

29:23found that there's there's likely some

29:25kind of backend caching here because

29:27each subsequent or repeated uh attempt

29:30to to Paras these files is very quick

29:33but that first time is very inconsistent

29:37uh the AI report took quite a a long

29:39time to finish whereas the uh Nvidia 10K

29:42filing took much less time so it it's uh

29:46you know your mileage may vary I would

29:47not build this into a a latency critical

29:51application at this time but for offline

29:53or batch processing that seems super

29:55dope um you know you classic we start a

29:58job and then at some point we return

30:01into this documents uh which is going to

30:03be a list and that list is going to have

30:06these

30:07objects we can take a peek at them and

30:10see that this markdown most assuredly

30:12preserves some context right we we

30:14there's there's no doubt about that this

30:17is a this is a table structure in

30:19markdown we we definitely have some idea

30:22of structure that's being preserved uh

30:24which is very important and desired

30:27right so that's that first of all that's

30:29that's awesome to see uh we can also

30:31look at our AI report which is it has uh

30:35I I believe literally zero tables uh

30:38there I think there's there's a couple

30:39at the end that are very kind of simple

30:41but um you can see it's still correctly

30:44identifies markdown right I mean this is

30:47uh this is important um the idea of the

30:51uh the markdown being preserved is huge

30:55uh and from chat Matt is suggesting well

30:58you know we also get HTML or CSV uh

31:02earnings or filings and that's

31:04absolutely true and uh if if you're if

31:07that's the information you're looking

31:08for I think that's probably going to be

31:11uh best but if you're looking for a

31:13combination of that semantic information

31:15and the uh and the actual uh you know

31:19structured data I do think that this is

31:21an excellent uh resource or in cases

31:24where you're looking at reports that

31:25don't have those uh examples provided as

31:28well of which there are uh unfortunately

31:32uh quite a lot so we can we can see it

31:35does the thing right I

31:37mean it it gives you markdown and the

31:39markdown can be leveraged to understand

31:42some kind of structure about the

31:43document right so that's that's that's

31:45great uh let's build a query engine to

31:49see if this is actually useful right so

31:52uh we can see that it yes this is marked

31:54down but can this be leveraged usefully

31:57uh so the first thing we're going to do

31:58is talk a little bit about llama index

32:01v0.10 now if you're used to llama index

32:03and you've watched some of our previous

32:05events on llama index you'll remember me

32:07talking about service context Global

32:09context setting context context all of

32:11this right gone it's all gone uh you

32:13know context is dead we're all about

32:15settings now so we still have this idea

32:17of like a global settings object that we

32:20can set so that we can fall back on

32:22specific user set defaults uh but we

32:25don't have this idea of a service

32:27context that we need to pass around and

32:29manage instead we're just going to do

32:31that how you'd expect normally by

32:32passing things into their their

32:34Constructor so we're we're gonna really

32:37uh this is this is probably one of my

32:39favorite changes from the V 0.10 uh just

32:43kind of normalizing that uh this library

32:46to the rest of the ecosystem feels

32:48really good but we still have settings

32:50so we're going to set our base llm as

32:52gbt 35 turbo and we're going to set our

32:54open AI embeddings uh as text embedding

32:58three small which is the uh the

33:01successor to Ada uh it is just as good

33:04or better than Ada and it costs less so

33:07that's why we're using small today um

33:10you'll notice that we're using gbt 35

33:12turbo so we're really not we're really

33:15not relying on the lm's ability to

33:18understand the structure right we're

33:19really relying on the retrieval

33:21processes uh ability to to uh to

33:24correctly represent that structure uh

33:27which is something that's a little bit

33:29different right if we used gbt 4 here we

33:32might have some uh

33:36potential potential to say like oh well

33:38gbt 4 is just really good at this

33:40actually guys like uh I don't know it's

33:43uh it's it's it's just gbd 35 is good

33:46we're gonna stick with

33:48it and then we're gonna use the markdown

33:50element parser uh this is the thing

33:53that's going to uh you know really make

33:57sure that we're uh you know we're

34:01squeezing as much juice as we can from

34:03these markdown files right the markdown

34:06element node parser is specifically

34:08built to parse markdown elements right

34:11so uh we can see that we have this this

34:14entire idea of how to uh to use it their

34:18docks are are pretty good on this and

34:20the idea is simply that this is going to

34:22help us parse out this markdown into the

34:26constituent Parts uh so those cons

34:29constituent parts are uh you know what

34:32allows us to understand the structured

34:35versus unstructured nature of the data

34:37right so uh we want both we want

34:39semantic information to answer questions

34:42about semantic questions and we want

34:43structured data to answer uh those those

34:46kinds of semantic questions that rely on

34:49some context that's contained within uh

34:51tables or figures right uh all we got to

34:54do is run this um when we run the actual

34:58get nodes from documents you'll notice

35:00that it pretty frequently fails uh this

35:04does not mean that the nodes are not

35:06created and this does not mean that the

35:08actual total uh you know process fails

35:11just means that sometimes it's not able

35:13to understand the markdown that it

35:15received from the llap parse endpoint

35:18and so it you know we we see some errors

35:20it's not a big deal uh you know this is

35:23exactly as Greg showed right we're

35:26dealing with a a document that's quite

35:28long uh in both cases right so um the

35:32idea that there's going to be a few

35:34misses is totally expected this is a

35:37preview their first shot on goal but it

35:40is worth noting that there is some

35:43potential that you're you're going to

35:44miss right and when you do Miss uh means

35:47that we're not fully capturing the

35:49structure of our data and that could

35:51lead to uh some kind of potential issues

35:54uh but if we if we only miss sometime

35:57right I mean that's that's obviously

35:59much better than if we miss all the time

36:01or we miss a lot which is the case of

36:03some of the other methods that we've

36:04seen uh in the past so this is still

36:08still an improvement even if there is

36:09still some uh KS to be worked out once

36:13we have our nodes we're going to grab

36:14our nodes and objects so we can create

36:16our Vector store index we're going to

36:18have our nodes be those nodes plus those

36:20objects uh very important uh page number

36:24info is just uh metadata so that that's

36:29it there's no um there's nothing to it

36:33other than that uh we we have metadata

36:36via the nodes uh because we know which

36:39page we're on so we've already captured

36:41that metadata uh so that's how we we're

36:43able to capture that and we have another

36:45question in the chat that's uh that I'll

36:46just answer since we're we're we're all

36:48together right now which is um is there

36:50an easy to review way yes absolutely you

36:53can tell exactly which nodes failed uh

36:56it does does tell you uh the node that

36:59had the failure so you're able to go in

37:02and check and make sure that that data

37:04is uh in in a format that you want uh

37:07easily

37:08yes okay so we've got our nodes parsed

37:13and we accept that a couple of them

37:14didn't work out hey it happens right

37:16this is new technology once we've got

37:18our Vector store set up so this is our

37:20Index right now we're going to create

37:22our recursive query engine we're going

37:23to use the uh reranking process so this

37:26the flag embedding ranker uh which is

37:28going to be powered by uh the BGE rerer

37:31large right and we're going to set up a

37:34recursive uh uh query engine uh we're

37:37going to be able to install some

37:38requirements because we have to uh we

37:40need the postprocessor flag embedding

37:42ranker and we also need to grab flag and

37:44beding from their repo uh once we

37:47installed these two uh these two

37:50requirements we can initialize our flag

37:52and betting ranker a ranker for those of

37:55you who are uh

37:57who who aren't sure what that is right

37:59basically when we get a bunch of of of

38:03uh things that are likely related to our

38:04query we have the chance to reorder the

38:08list using a more compute intensive or

38:11timeconsuming process that's more

38:13accurate right so you can think of it as

38:15we very quickly cast a WID net and then

38:18we slowly look through what we've got in

38:20that net and we we we take our time to

38:23reorder that list so in this case we're

38:25going to go from 15 retrieved contexts

38:29down to the top five of those 15 right

38:32but top five of 15 I mean even if the

38:35process takes uh you know a millisecond

38:39we're still only talking about you know

38:4015 milliseconds which is not bad um it's

38:44not great let's be real but it's not bad

38:47and this is the idea of a ranker when it

38:49comes to our uh similarity top K we're

38:52just going to grab the top uh 15 results

38:55and then we are going to rank them uh

38:57that's it that's all there you go so

39:01we've set up our retrieval we've set up

39:04our query engine we've parsed our

39:06documents they're all in this index now

39:08let's do the thing right so we can ask

39:10questions like who is the executive VP

39:12operations and how old are they right uh

39:15we use the recursive uh uh retrieval

39:19engine here so we have a number of

39:21requests that kind of parse through

39:22these nodes which is pretty dope you can

39:24see here uh we this we've got that

39:27little table right huge huge you love to

39:30see that uh and eventually we wind up

39:33with this response Deborah shoquist is

39:36the Executive Vice President of

39:37Operations she is 69 years year old

39:39that's exactly right I mean this this

39:41information is not mentioned anywhere

39:42else in the document that's

39:44huge if you're running this in a

39:47notebook that's on a CPU instance you're

39:50going to notice this takes a very long

39:51time right so uh we're using this BGE

39:55ranker which is a uh a pretty beefy

39:59algorithm right if you're using it on

40:01CPU this query is going to take a long

40:03time if you're using this in GPU

40:04accelerated instance uh right which you

40:06can select through your runtime and then

40:09go to change runtime type and then

40:10select the GPU instance uh you're going

40:12to notice it's a lot faster um so just

40:15keep that in mind that this is uh the

40:17slowness is not representative of llama

40:19index or this technology in general it's

40:22basically just uh dependent on which

40:24resources you select um that's not true

40:27of the actual LL parse end point but it

40:30is true of this retrieval process uh and

40:33then we can ask questions like what is a

40:34gross carrying amount of total

40:36amortizable intangible assets for

40:38January 29th

40:402023 what a mouthful right well uh we're

40:43able to extract that it's uh you know

40:483,539 million right uh which is exactly

40:51what we see this is that's correct and

40:54it's exactly correct and it's right next

40:56to information that would be wrong right

40:59so the the the power of this application

41:04is immediately apparent if you're

41:05working with that structured data right

41:07we can see that this this process allows

41:10us to very Faithfully pull out the

41:13correct piece of information cont

41:16context and again this is not um uh this

41:20is

41:22not information that is you know

41:25available just through text in the

41:28document which is important you have to

41:29come to the table to get this

41:31information so that's great let's try it

41:34on the AI education report which again

41:36doesn't have a lot of graphs uh or sorry

41:39doesn't have a lot of tables but does

41:40have a lot of like graphs and figures

41:41right so let's let's ask about those uh

41:44all of this is just resetting up the uh

41:47retrieval uh process on our um on our

41:51actual AI report right and then when we

41:53query it we can say how many AI

41:55Publications on patter Rec nition was

41:57there in 2020 and we get this response

41:59of there were 30.07 AI Publications on P

42:02recognition in 2020 which is definitely

42:05wrong right like it should be in the

42:07mid-50s um it should be you know uh well

42:12it should be the mid-50s right but we

42:13instead we get

42:1530.07 now what's interesting to me is

42:17despite 30.07 not being mentioned

42:19anywhere else in the actual document we

42:22do see that 30.07 is associated with

42:26this figure so while it didn't retrieve

42:28the correct context didn't answer the

42:31question right we can see that it is at

42:34least able to parse some information out

42:37of this uh out of this figure uh even if

42:40it's not there yet right I mean this is

42:42the this is the for me the signal that

42:46things are going in the right direction

42:47even if we haven't literally got there

42:49yet that's because we

42:52understand something about this we get

42:54the right we're you know we're we're

42:56Landing in the right country we might

42:59not be in the right uh Province yet but

43:01we're in the right country and that's uh

43:03that's good to see we we ask another

43:05question right this one should be a

43:06little bit simpler can you describe what

43:08figure 14 is related to and we get the

43:10response that it's related to the long

43:12tale of learner viability in the context

43:14of AI education it goes on and on and on

43:16uh this is not true uh that is figure 13

43:20but figure 14 is what we see here which

43:22is uh unrelated to our to our response

43:25so again when it comes to the uh the

43:28figures so the more like pictoral

43:31representations or the graphs there's

43:33still work to be done right uh which is

43:35expected uh and and I I think that's

43:38clearly communicated well enough in

43:40their in their uh blog content but we

43:43would still like to see it kind of get

43:45to the point where we're able to uh to

43:47understand better more pictoral

43:50representations of data uh you know as

43:52as we go forward it's clearly working on

43:54it but we're not quite there yet so I

43:57would really view this more as a tool

43:58for extracting structured uh kind of

44:01tabular data versus this this

44:05understanding of images and everything

44:06like that um so with that we're gonna

44:09send the uh send you guys back to Greg

44:11who's going to uh uh close us out yeah

44:15and take us to Q&A so uh there you

44:19go you're that was rocking Chris thanks

44:22so much man um that was LL parse

44:25everybody that's where the current state

44:27of affairs is and we are going to

44:31conclude for the day that you know

44:35really out of the box llama parse is a

44:37great place to start especially in lie

44:39of a custom in-house solution um You

44:42probably don't want to build it into a

44:43latency critical application today but

44:46definitely it's got very very nice

44:49tabular extraction specifically for PDFs

44:52and again that sort of too many

44:53parameters to tune issue that they're

44:55trying to address address they are

44:57addressing it although it is a sort of a

44:59black box to us the proprietary solution

45:02which you know um it's not open source

45:04but it is easier to use as is the nature

45:08of these things as they progress and so

45:11you know if you're getting started and

45:13you're not really trying to mess with

45:15parameters it might be a good solution

45:16for you to pick up off the shelf now um

45:19it could be definitely a lot faster uh

45:21we'd love to see figures and the sort of

45:23pictorial representation stuff we'd love

45:25to see other data typ of course they're

45:26working on it uh the data framework

45:28trying to lead the way in the industry

45:30has a lot of work to do and a lot of

45:32good stuff ahead of them so we look

45:33forward to contining to check out the

45:35latest and greatest from llama index as

45:38they release so with that yes we do have

45:41a slid you can uh go ahead and ask your

45:45question directly in the slid I'll ask

45:47the whiz to come back up for Q&A now and

45:50we'll go ahead and keep this link on the

45:52screen if you do have questions you can

45:53also throw them in the YouTube chat

45:56so whiz uh little um sort of

46:01clarification here on this kind of rag

46:05data stack is

46:08more uh Dynamic more complicated than

46:10ETL why don't ETL decisions affect the

46:13application was I think you know brought

46:15up when I was talking about this this

46:17language from the uh the Llama index um

46:22blog materials where they said you know

46:24hey loading

46:27processing embedding creating the vector

46:30DB these are a situation where every

46:35decision directly affects the accuracy

46:38in contrast to the classic ETL stack um

46:41can you sort of explain the way you

46:43think about this and if that's the right

46:44way that we should be interpreting

46:47this uh I would say like this is a

46:50difficult one to to answer cleanly

46:52because I believe deeply in my soul that

46:54ETL decisions yeah affect the

46:56application um I think that it's it's

46:59pretty clear that

47:02uh there's there's kind of like

47:04different axes on which they might

47:06impact things right so if we're talking

47:07about like performance related or

47:10latency related decisions I mean

47:13ultimately you know it doesn't matter

47:15because we're just gonna we're gonna

47:17wind up with a pile of data and then

47:19from there we can do the other stuff

47:21right um but but how we actually

47:24transform especially uh it it can be it

47:27can have significant impacts on uh on

47:30performance on ability to retrieve

47:33everything like that so um I I would

47:37say it's it's very much still worth

47:40paying attention to and very much still

47:41a key part of the plan yeah yeah yeah

47:44yeah thank you for that question uh that

47:46was Anonymous but uh yeah yeah

47:49factchecking um the specific wording is

47:51very helpful for us and and I'm sure for

47:53llama index as well uh okay so viges a

47:57asks what is the difference between

47:59llama pars versus multimodal models can

48:03multimodal models also do this kind of

48:05OCR including tables and I think we we

48:07sort of answered this throughout right

48:09it's it's not really doing OCR um as far

48:13as we can tell is that right I mean I I

48:15don't know or it

48:16is we have no line of sight so the

48:19difference between L PRS versus multimo

48:21model is not understood well uh they

48:23could be using multimo model on the back

48:25end it seems

48:27unlikely uh or or it doesn't right like

48:31the issue is that we're not quite sure

48:33what they wanted it to be able to do uh

48:36we don't know what they did to create it

48:39so I would say at this point though it

48:41seems closer to like a pi PDF or Pi mu

48:45PDF tool than it does to like a full

48:47multimodal model um just but that's

48:51based on uh I just thought of it right

48:54there's no like there's no facts that uh

48:57that would lead me there so yeah yeah

48:59yeah I and we're gonna we're going to

49:01keep this discussion going with Islam

49:02here keep the questions coming guys um

49:05we we' we'd love to continue the

49:07conversation here so Islam says in the

49:09AI ad use case it retrieved the correct

49:11table image but not the correct info do

49:14you think we could pass the image to gp4

49:16Vision or just the image for chat Q&A um

49:22yeah absolutely right if we can get to

49:24the node that has the image and we can

49:27associate that node with just literally

49:30like a DOT ping or something like that

49:32right uh in our uh in whatever Vector

49:35store we're using uh that yeah we can we

49:37can build that logic in what you know

49:39when there's an image sent it to uh some

49:42kind of process that will help us

49:44understand what the imagees talkot uh

49:46yes yeah I I think that's a good uh

49:48that's a good thought okay cool let's go

49:50to this uh question from Cena aizi can

49:53you explain your decision on the

49:54recursive query engine as opposed to say

49:57a hybrid retrieval

49:59bm25 you know um

50:03retriever recursive does what we want

50:05here very well because we're dealing

50:08with this idea of a structured piece of

50:10information right we kind of want to

50:12laser in on this particular uh you know

50:17part of the document and then and then

50:20make sure we're able to capture the full

50:22table now bm25 and kind of combining

50:25like these these sparse search methods

50:28is still going to be useful but with the

50:30recursive approach we can more generally

50:33guarantee that we're going to get access

50:34to the full table somewhere in our

50:36context and because that's what we need

50:39right and and someone mentioned or

50:41someone noted as Stephen earlier noted

50:43right because we got that kind of

50:45expanded relevant context thanks to the

50:48way that we set up a recursive retriever

50:50we we actually were able to see that it

50:52understood that that 3,000 number was

50:54was meant to to be in millions right so

50:57that's that's a piece of context that it

51:00might not have retrieved or seen

51:02otherwise which is why I think that

51:03recursive approach is very helpful for

51:06this specific use case that that we saw

51:08today yeah yeah yeah cool cool I want to

51:11combine sort of two questions here uh

51:13one from uh they're both from Anonymous

51:17how how does llama parse for PDFs

51:20compared to unstructured doio this is

51:22the last event we we did and then have

51:25you done a comparison with other open

51:27source parsers like llm Sherpa is named

51:30here but sort of talk about maybe you

51:33know your perspective on this a little

51:35bit

51:35deeper the real benefit of llama pars is

51:38that it's integrated into the Llama

51:40index ecosystem for me I think that

51:43there are certainly other ways or more

51:46engineering heavy ways that you could

51:48approach these problems and custom build

51:51solutions that are uh that are better

51:53for your particular use cases

51:56uh but I I feel happy to say things like

51:59uh I agree with the benchmarks that

52:01they've released uh llama par does feel

52:04better at preserving structural

52:06relationships than say the the kind of

52:09python packages that are meant to do

52:11this um that might not be true forever

52:13and so the comment might not age well

52:15into the future but as of time of

52:17recording right like uh I think it's I

52:19think it's pretty good um but the main

52:23benefit is that it's baked into that

52:24ecosystem uh I would say that uh

52:28otherwise there there are lots of

52:29options you can explore that are that

52:31are going to be useful to

52:33you yeah yeah um so we've got sort of a

52:37question that kind of stacks on you know

52:39some of the some of the theories we

52:41heard earlier gp4 vision can we do this

52:44can we do that how do you chunk PDFs

52:46containing tabular data and maybe if I

52:48can just contextualize this with another

52:51comment charts are the weak link what is

52:53happening out there to see a chart and

52:56convert it into a kind of reverse prompt

53:00to unwind the data so the model can

53:03understand um can you talk a little bit

53:05about how you think about uh what the

53:07best way to chunk PDFs out there is and

53:09then what the space is as far as you can

53:11see yeah I I mean I think it comes down

53:14to we want to have our tables preserved

53:17as some whole Chunk we I mean that's

53:20just tremendously useful right uh llms

53:24are good at kind of seeing or reasoning

53:28about charts even gbt 35 turbo right so

53:32I I I ultimately I think the the way we

53:34want to think about that is that uh you

53:36know charts tables all these kind of

53:39like figure based elements they're their

53:41own node and then our text is nodes

53:43around them and then they're connected

53:45via some kind of hierarchical metadata

53:48where it's like these are figures from

53:50this section along with these passages

53:52and and that's the way uh in terms of

53:55like the prompt you just feed the table

53:56in I mean if you if you just take like a

53:58markdown table and you show it to gp35

54:01Turbo it's going to be pretty good at

54:03answering questions about it and if we

54:05need more uh you know say like aggregate

54:08information or we need derived

54:11information from the chart uh you can

54:13hook it up to things like code executors

54:15or uh or you know processes that are

54:18similar to gb4 is code interpreter where

54:21we can actually like load it into some

54:24uh you know uh python structure and then

54:27do operations on it but um outside of

54:30that I think you know we just want to

54:32make sure that they're their own unit

54:34and that they're considered that and

54:36that the relationship between them and

54:38the text around them is clear but

54:39they're Their Own Thing by itself I

54:41think is very useful yeah yeah yeah and

54:43sort of this idea of if you can get it

54:45into a markdown format then you're going

54:48to be able to work with it in a lot of

54:50different ways and uh okay so then sort

54:53of kind of Stack in on that a little bit

54:58um I know it's still a black box for us

55:01Islam AP par but you know question from

55:05another one out there is how effective

55:08is this for sort of uh I interpret this

55:11as sort of maintaining the structure of

55:13the tables um row span column span like

55:19um I don't know I don't know if we're

55:21talking about sort of the distance

55:23between characters here

55:26um I mean it's pretty good it's just a

55:29markdown table though so uh you

55:33know it is good at preserving the

55:36structure of the table it is not good at

55:40uh being able to like you would not be

55:42able to recreate the table as it appears

55:45in the PDF from the markdown that you

55:47receive there's z% shot I if it was

55:51formatted in a specific way so like

55:53things like borderless or left left

55:55oriented or right oriented all of these

55:57or the colors I mean you're you're

55:58losing all of it like it it's just going

56:01to show you that there's some table and

56:03that table has values and here's how

56:05they look uh but that's all we're

56:07getting we're getting zero information

56:09about uh kind of the the way that it's

56:12visually presented um which is by Design

56:16frankly uh you know it's not meant to

56:18reconstitute the table it's just meant

56:20to extract it so we can turn it into

56:22something like a CSV and work with it

56:25right that's right that's right um okay

56:28yeah the great questions keep coming

56:30in uh we've

56:33got a couple uh let's let's rapid fire

56:37these Chris real quick um let's do um

56:40can we use LOM index for retrieval and

56:42connect it to Lang chain is there any

56:44Merit to

56:45this why not uh Lang chain can uh can

56:50understand uh markdown just fine

56:53anything that can convert markdown into

56:54into some other useful file format or

56:56chunk it and convert it you know llama

57:00parse is a great tool because it just

57:01spits out a markdown file or it spits

57:03out a text file and so you can integrate

57:06it into a lot of other pipelines all

57:08right next is recursive retrieval

57:10similar to parent child

57:12retriever kind of

57:16yes okay all right and then um and then

57:22last question

57:25to uh keep up with the times here Tariq

57:29from the YouTube

57:31chat did is it even worth implementing

57:34rag with these large context window

57:38models like Gemini I mean should we just

57:40throw it out the window yes absolutely

57:44it is still worth implementing rag uh

57:48period uh we can talk about all of the

57:51axis you know that that we can examine

57:55that makes this true um cost uh effort

58:00memory

58:02latency uh

58:04accuracy uh confabulation hallucination

58:07rag is still a huge powerful component

58:13um because it lets us do what we want to

58:16do which is answer specific questions

58:18about specific things in specific

58:20documents and the large context window

58:23is is amazing and and it's going to help

58:26us do a lot of really awesome things uh

58:29but for right now um those things just

58:33don't push out Rag and you know what

58:36that reminds me and I guess it's a good

58:37spot to wrap on is this sort of idea of

58:39context augmentation from llama index

58:43appears to be sort of telling us that

58:45perhaps there are other patterns Beyond

58:48rag that um that they may have in mind

58:51for context augmentation in the future

58:53so uh stay tuned for

58:55what happens next the industry is going

58:57to continue to evolve and we'll keep you

58:59guys up to date and up to speed on the

59:01latest and greatest whiz thank you for

59:04your Q&A

59:06wisdom and thanks everybody for joining

59:09today if you're interested in learning

59:11about when and how to do fine tuning

59:14when you're actually sort of done with

59:15rag that's what we're covering next week

59:17on Wednesday live same time and of

59:20course please like subscribe and ring

59:23that Bell to stay up with all events as

59:27they drop live or ones that we upload if

59:31you are seriously ready to accelerate

59:34your llm application development like

59:36seriously ready to accelerate then check

59:40out our AI engineering boot camp our

59:44industry-leading cohort-based live

59:47online course where you can fill all

59:49your skills gaps from building to

59:51operating to improving llm applications

59:54in production

59:56if you just enjoyed engaging with us in

59:58the chat or even just watching the chat

01:00:00you may go ahead and join our Discord uh

01:00:03because we can keep the conversation

01:00:04going in there and get you down the path

01:00:07to starting to build ship and share with

01:00:09us if you are not really somebody that

01:00:13wants to engage but you just want to

01:00:14kind of Tinker on your own we've got a

01:00:16few resources we want to share one is

01:00:19our awesome aim index which you can get

01:00:21direct access to code from all of our

01:00:23events and the the other one is our

01:00:25recently released op sourced llm Ops

01:00:31llms in production cohort one materials

01:00:34including the entire GitHub repo check

01:00:38that out if you're trying to get up to

01:00:39speed this is prev v0.1 stuff from 2023

01:00:43we look forward to open sourcing more

01:00:45stuff as we move forward into the future

01:00:49finally any feedback that you can

01:00:50provide us is great either through Luma

01:00:52or through our feedback form and as

01:00:55always thank you so much everybody until

01:00:58next time keep building shipping and

01:01:00sharing and we'll do the same we'll see

01:01:03you all real

01:01:04soon have a great week

🎥 Related Videos

What vaccinating vampire bats can teach us about pandemics | Daniel Streicker

a16z Podcast | Things Come Together -- Truths about Tech in Africa

2024 TSCRS Applications of anterior segments diagnostic instruments in cataract surgery

a16z Podcast | The Infrastructure of Total Health

The Robot Lawyer Resistance with Joshua Browder of DoNotPay

NES Controllers Explained

🔥 Recently Summarized Examples

The Hitler-Stalin Pact | Reflections Episode 9

Uncovering Corruption From Health "Experts" | Scott Carney

The Forgotten Geometry: A New Path to Unification

Joe Rogan Experience #2194 - Luis Elizondo

From Tesla to DNA: The Science of Scalar Waves - Dr. Sandra Rose Michael - Think Tank E44

Bitcoin Holders...Watch Out for Sept

View original video