00:15hey everyone Welcome to our weekly live
00:17session my name's Neil kungo I'm joined
00:20here with Ryan Sigler Ryan you want to
00:21say hi to everyone hey everybody I'm
00:23Ryan hey Ryan good to have you on today
00:26I'm really excited for today's session
00:29we're going to be talking about
00:30retrieval augmented generation with Lane
00:33chain so retrieval augmented generation
00:36also known as rag has been a really
00:38popular topic with llms and Vector
00:41databases it's a way that you can extend
00:44the knowledge base of your llm to your
00:48private data uh you're able to search
00:51your data and query your data create a
00:54chat gbt like experience with your
00:57data be very conversational with it
01:00utilizing llm and utilizing a vector
01:02database for the knowledge base and uh
01:05Lang chain as the orchestrator so Ryan's
01:07going to be presenting on that today and
01:10um just a quick um intro I wanted to let
01:14everyone know that we are going to go
01:17sample notebook today python notebook
01:20and you're welcome to follow along with
01:22that um you can find that on our
01:25Learning Hub I'll show you how to get to
01:27that in a second but first uh make sure
01:29you're signed up for kdb aai you can go
01:31to kdb doai in your browser and you go
01:34to sign up you'll get a sign up page uh
01:38put your information in simple form uh
01:42you'll get a verification email click
01:45the link and then it'll send you another
01:47email with your login information uh a
01:51link to the kbi cloud and uh there you
01:54go you'll be presented with kbi Cloud uh
01:57user interface where you can add API
02:00Keys manager API Keys um you can uh see
02:05uh you can delete those if you need to
02:08and add new ones uh I always recommend
02:10copying your API key just so you have it
02:13in case you lose it um and if you do
02:16lose it just uh generate a new one but
02:20that's how you sign up and uh today I'm
02:22going to skip the vector database
02:24introduction I've done that introduction
02:25for the past few sessions and hoping
02:29that uh folks have gotten a good sense
02:32of how back databases work if not you
02:34can visit our Learning Hub on
02:36kb. or watch one of our previous
02:39sessions um but today we're going to
02:41talk about retrieval augmented
02:42generation so in our Learning Hub on the
02:45kb. website you go to Lane chain and rag
02:49under samples you'll see this page and
02:52you can click the link download from
02:54GitHub and when you download from GitHub
02:57you'll get a notebook uh it'll take you
02:59to our GitHub where you can download a
03:01python notebook which Ryan's going to go
03:02through today so with that um we're
03:06going to hand it over to Ryan and one
03:08one thing I did want to mention before
03:10heading over to Ryan is that retrieval
03:13generation is a uh is is an evolving
03:18space there's different approaches to it
03:21and today we're just going to do a very
03:23basic use case with just simple a
03:26retrieval augmented generation simple
03:28rag also known as naive rag rag um this
03:30is meant as an introduction to get you
03:33um introduced to the concepts of rag but
03:36we will have a followup session in a
03:38couple weeks where we're going to talk
03:40about um different approaches to rag uh
03:43and Ryan's going to give you a quick
03:44overview of that so Ryan if you want to
03:47go into presenter mode and I will add
03:55good yeah all right well thanks everyone
03:57for joining again my name is Ryan Sigler
04:00and uh we're going to do a quick uh
04:03introduction into retrieval augmented
04:06generation talk a little bit about what
04:08it is and and how it does what it does
04:11uh then we'll do a uh a look at like the
04:15high level architecture of naive rag
04:18what Neil just mentioned uh I'll briefly
04:20touch on some of the other rag
04:21approaches and uh like you said those
04:24are things we're going to be jumping
04:25into uh down the line a little bit um
04:30we'll go ahead and uh walk through some
04:32code so uh with that let's get started
04:36uh so retrieval augmented
04:38generation uh what is it and what does
04:41it do so really it empowers us to use
04:45large language models to answer
04:47questions about data that it wasn't
04:51previously exposed to so if we use like
04:54chat gbt for example it's not going to
04:57be able to answer questions on our own
04:59Enterprise data or our private
05:02documentation or even recently released
05:05like news articles or technical articles
05:08uh because it hasn't been trained on
05:09these things so how can we uh allow it
05:13to answer questions uh about this data
05:16that it hasn't seen and really at the
05:19core of it that's what rag does it's a
05:20pipeline an architecture that allows uh
05:24us to introduce data to the llm and have
05:28the llm answer question about that data
05:31us so when we look at how we would set
05:35up the rag approach there's really two
05:38uh steps in this and the first one is
05:41retrieval and what that means is is we
05:44are extracting the relevant data that we
05:48want the llm to answer questions about
05:51so our private data uh recent news
05:53articles whatever it may be whatever we
05:55want to create an application to uh
05:57answer questions about uh we we would
06:00retrieve those relevant documents those
06:02relevant chunks of data and then feed
06:05that into the Second Step which is the
06:08generation step so that's where we take
06:10that relevant data that we' found that
06:12we've retrieved and we present that to
06:14the large language model and then the
06:16large language model is able to generate
06:18a response based on that data so this is
06:21how we can answer questions about data
06:23that it hasn't been trained
06:26on so uh today the code that we're going
06:29to be walking through is uh
06:31representation of naive Rag and this is
06:34a high level architecture of what that
06:36is but uh I think to get started let's
06:39just walk through this and um it'll give
06:42us a bit better of understanding about
06:44how this actually works so we have a
06:49example data set whatever it may be I'm
06:51just using PDFs here as an example but
06:53you can really use any any file format
06:57there's all sorts of uh in file
06:59adjusters that will go through different
07:01file types so whatever format your data
07:04is in uh you can adjust that into this
07:08Pipeline and when we're uh setting this
07:11up we are using a large language model
07:14of choice but one thing we need to keep
07:16in mind with really with any of these
07:18large language models uh is that there
07:20is a limited context window uh and what
07:24that means is that we can introduce uh
07:27only a certain number of tokens or
07:30in other words a certain amount of
07:31information to a large language model at
07:33once there's a limit on that so it
07:35doesn't make sense uh in most
07:37circumstances to introduce our entire
07:40data set to the large language model and
07:41ask questions about that all at once uh
07:44it makes more sense to solve that
07:47problem by chunking up our data into
07:49smaller pieces so after we ingest our
07:52data we will chunk it up and we are
07:54going to do a full article on chunking
07:57and there's a bunch of different
07:58methodologies that we can use to
08:00effectively chunk um so look forward to
08:03that um and that'll be coming
08:05soon but after we have our data chunked
08:08up uh the next step is we need to embed
08:10that data and all that means is that
08:12we're taking our raw text and we're uh
08:15formatting it into a machine readable uh
08:18language so we're taking raw text and
08:21representing that text as uh numbers and
08:25then storing those numbers in a vector
08:27so uh what that represents is the raw
08:31text but it also add an additional um
08:34capability which is it represents the
08:36words themselves uh however it also
08:40captures some context and that means how
08:42each word relates to one another within
08:45that chunk so there's an extra
08:46contextual uh Dimension that we capture
08:50embedding but after we have embedded all
08:53of our chunks we then can store that
08:55into a vector database or a vector store
08:58so kdba a great Vector store to use um
09:02so when we get to this point um all of
09:05our data is now stored in uh Vector
09:08embedded format within this Vector store
09:12and this gives us some really unique
09:14capabilities uh when we are trying to
09:17retrieve the relevant data that we want
09:20so we have a user the user wants to ask
09:23a question about this data so the user
09:25sends a prompt or a query that query is
09:28embedded this same way that these chunks
09:31embedded and then uh so it's a it's a
09:34vector embedding and then we can do a
09:37similarity search uh between our Vector
09:40database which has all of our
09:42documentation and data and that user
09:44prompt and there's several different
09:46methods we can use here um cosine
09:49similarity ukian distance dot product
09:52different ways to compare uh how similar
09:55these vectors are to each other and uh
09:58after the similarity research we will
10:00actually extract the the most similar
10:03vectors or in other words the most
10:05relevant chunks of data uh from our
10:08initial data set so now we can introduce
10:12this relevant data to our large language
10:14model as well as the user prompt to the
10:17large language model and this gives the
10:20capability for it to answer questions
10:22about data that it hasn't been trained
10:24on it's answering questions about that
10:26relevant contextual data so then it can
10:29that response to the user so this is the
10:34pipeline but uh there's there's some
10:37problems with this and um some of the
10:40other methods more advanced methods of
10:42rag start to answer to some of these
10:44problems but if we just look at this uh
10:47what we see is that every time a user
10:49prompts it's going to go to a uh
10:51similarity search here but what if the
10:53user's prompting on something that's not
10:55even related to our data set that's not
10:57really related to anything started in
10:58the data store well that means or in the
11:02vector store so that means it's likely
11:04that uh any relevant information we get
11:06really isn't going to be that relevant
11:08so that could actually throw off our
11:10large language model so we just need to
11:11keep that in mind when we're using the
11:13naive approach and this is you know this
11:16wouldn't be something you'd likely want
11:17to use in production at all this is
11:20maybe trying to get your head wrapped
11:22around this concept doing some quick
11:24poc's uh to understand what retrieval
11:26augmented generation is and start to
11:28understand how work with some of these
11:29tools like Lang chain for
11:32example but uh there's some other
11:35methods and I'm just going to briefly
11:36touch on these um and then we'll get
11:38into some code but um like I said that
11:43some of the failures with the naive
11:44method is that it's always going to go
11:46and do that similarity search so uh a
11:49method like using an agent on top of
11:51your large language model which is just
11:53really a wrapper around your large
11:55language model which specializes in
11:57deciding next steps in your pipeline
12:00based on a user prompt uh you can use
12:02that so it wouldn't always go right into
12:04that rag process the agent could decide
12:06no let's just send this query directly
12:07to the large language model or no uh we
12:10can introduce a tool to an agent so oh
12:13we get a mathematics question in from a
12:15user we can just send that to the
12:16calculator tool and get a response from
12:19that so uh it gives a bit of uh
12:21reasoning to uh how how the bot is going
12:27prompt um it's a bit of a heavy process
12:30though because a do that reasoning we do
12:32an initial uh call to the large language
12:35model which decides those next steps so
12:37I just be wary of that um another method
12:40we could do uh is called guard rails and
12:44this is a package that was released from
12:47Nemo um but it's an interesting method
12:50that uh combines like a traditional
12:53chatbot design method where you're
12:56explicitly uh writing out that dialogue
12:59flow along with these more advanced like
13:01Rag and large language model methods uh
13:04but it what it does is it gives you uh
13:06the opportunity to actually Define
13:08certain topics and then give specified
13:10responses to those topics so if we
13:13didn't want our bot to answer questions
13:16about politics for example uh we could
13:19have a preset response um built right in
13:22or in other words as a guard rail uh for
13:25our bot so it's not going to send a
13:26political question to the large language
13:28model just send a preset response um so
13:31it it allows us to be a little bit more
13:33safe with how we're setting up our uh
13:35our chatbot or digital assistant
13:38there and then uh another final method
13:42is uh using knowledge graphs so
13:44knowledge graphs are a uh a growing in
13:48popularity type of a database that
13:51instead of representing data in in
13:54columns and rows it represented as nodes
13:57and edges so it's uh it's a graph and
14:00it's really good at representing data
14:02that's um I would say highly dependent
14:06relationships but uh when you have a
14:08Knowledge Graph uh you can really do rag
14:11on it in a similar way that you would do
14:13with any other type of data and that is
14:15just um chunk instead of chunking up
14:17your data you would just be taking
14:19specific nodes and embedding those nodes
14:21that have a certain amount of data
14:22associated with them um you do your
14:25similarity search and you pull out all
14:27the relevant nodes of data um that are
14:30in your knowledge graph uh but where you
14:33get a bit of additional capability is
14:35because you get these relevant nodes
14:37back that are in your knowledge graph
14:38you can actually do some extra
14:40traversals uh throughout your graph and
14:42find maybe nearest neighbors or
14:44surrounding nodes uh with their metadata
14:47and data as well so you're able to pass
14:49relevant nodes to the large language
14:50model as as well as extra surrounding
14:53context um so uh the more context you
14:56can provide the better um so these are
14:59just three approaches it's a really fast
15:01moving space so I'm sure there's more
15:03out there and more to come U but we will
15:06be covering these uh at a later date
15:09so with that hey Ryan uh we did we did
15:12have a question in the chat so probably
15:15was asking um and I think you kind of
15:18just discussed this a little bit but
15:19maybe just reiterate um uh so the
15:23question was given the llms are not
15:24rule-based how can you guarantee that
15:26it'll know to use augmented info
15:29uh that you provide in order to produce
15:30the final response how does llm know and
15:33I think you talked about that with some
15:34of these different approaches but if you
15:36want to highlight that General approach
15:38again exactly so um in the naive
15:42approach you're right it it doesn't know
15:45and it just will go to the vector
15:47database and do a similarity search uh
15:49right right away and if you're not if
15:52you're not prompting about data that
15:55you've stored in here then it's likely
15:57not going to give you very relevant data
15:58back but with some of these methods um
16:02you actually introduce some reasoning
16:04capability or some guard rails
16:06capability to sort of steer next steps
16:08so we don't always need to go uh through
16:12that retrieval process if we're not
16:14being prompted on uh data that's uh
16:18related so um there is that capability
16:21it's just not in the naive approach but
16:23in these more advanced approaches
16:24there's the capability to actually
16:26structure how we are going to move
16:28respond through this pipeline whether we
16:30go through the rag process or we could
16:32send a direct response to the user or if
16:34we just have a more of a general
16:35question we can just send it to the
16:36large language model and have the
16:38language model handle it by itself yeah
16:41and just to reiterate for the audience
16:43uh in two weeks not next week but the
16:45week after we're going to go over these
16:48rag approaches Ryan's here join us again
16:50and uh we'll go over these approaches uh
16:54as well as some other approaches as well
16:56that are not included on the slide uh
16:58just so you get a sense of the different
17:01ways to optimize rag in your own um your
17:06own applications and the week after that
17:09we will actually talk about performance
17:11optimization with rag so we have some
17:13good sessions upcoming just keep an eye
17:15out Wednesdays at the same time that
17:17you're on today U we'll have those
17:19sessions uh for you so anyway sorry to
17:22interrupt Ryan please go ahead oh no
17:25problem thanks for the question puru um
17:28so let's get into some code
17:32um all right so we have a notebook this
17:34is The Notebook that is uh on the kdba
17:38uh samples page so feel free to follow
17:41along but uh so let's go ahead and just
17:44get started so the first thing we need
17:46to do is install some
17:49requirements um so I I've already done
17:51this it takes a couple minutes to do
17:52that but you can go ahead and install
17:54those requirements and then we need to
17:56import uh we need we need to import our
17:59different packages so we need to import
18:02our kdb AI client and this will allow us
18:04to connect up to kdb
18:08database uh we have Lang chain packages
18:11and just a quick uh note on Lang chain
18:14so this is just an opsource framework
18:17that allows us to work and build
18:19applications with large language models
18:21there's a ton of capability in this
18:22highly recommend checking it out um but
18:25they offer all sorts of packages to help
18:27us build out this retrieval augmented
18:31pipeline so we'll import these uh and
18:35then after we do our Imports we just
18:37need to get our API Keys set up so we're
18:39going to be using um an open AI uh API
18:43key here and then the hugging phase I've
18:45already done this but um when you sign
18:48up for accounts with both of these
18:49providers you can get your own API key
18:54so let's get into it so we need to load
18:56our load our documentation
18:58you're doing it with open Ai and hugging
19:00face in parallel in this exactly yeah so
19:03so when we get down into actually
19:05performing our rag pipeline we're going
19:07to do uh a bit of a comparison between
19:10the two so we're gonna use a large
19:12language model from open AI as well as
19:15one from hugging face and and just see
19:17some of the differences there so you
19:20only need one but we'll do a comparison
19:22with both that's great exactly yeah
19:25thank you um so yeah when we get into uh
19:30this pipeline so the first thing we need
19:32to do is just load our
19:34data and uh we use text loader which is
19:38something you get from Lang chain and uh
19:40they have all sorts of different types
19:42of loaders so you could do you know
19:45CSV um PDF different things like that as
19:48well there's tons of loaders I was
19:51chain and uh L chain website and there's
19:56lots of letters on a documentation from
19:58reading websites reading like Wikipedia
20:01Pages reading Json so I mean like it
20:04looked like there's like a hundred of
20:05them so there's tons of ways to load
20:08data into your um your vector database
20:13um now is this the retriever part or the
20:15retriever part come later so this is the
20:18the beginning of the retriever part
20:20because we need to get our documentation
20:22loaded before we could really do
20:24anything so if we just take a quick peek
20:26back at this pipeline this is is this
20:28step right here so we're just loading
20:29our data in we haven't tried it yet
20:33but so we are loading our data and then
20:37we need to define a chunker so there's
20:40all sorts of different chunkers out
20:42there um but this is a chunker from
20:44recur uh from playing chain called
20:46recursive character text splitter and
20:49this allows us to Define like what our
20:51chunk size will be and we're going to be
20:53diving into how to decide the optimal
20:55trunk size and the article that will be
20:57published coming com up here um but you
20:59can also create a chunk overlap with
21:02this uh package which is pretty
21:04interesting because it allows you just
21:06like keep if there's some ongoing
21:08context uh between chunks it won't just
21:10cut it off so you could you know set a
21:1320 50 character chunk overlap if you
21:17chunk and that's not characters but
21:20tokens right so in this case it's
21:22actually characters oh because you're
21:24using exact yeah it's it's characters
21:28there's there's other methods that are
21:30specific to tokens as well okay
21:34great um so we have this defined now
21:37we've got our we've got our chunker and
21:39now we go ahead and we'll run that and
21:43then we need to actually do our chunking
21:45so this is just calling that text
21:48Splitter on our documents that we just
21:50ingested so this is where we do the
21:53chunking so all of our stuff is chunked
21:56now and we can take take a quick peek at
22:00uh what the first chunk is so you we're
22:03doing the State of the Union here um so
22:06this is just like a small paragraph of a
22:08chunk that has several sentences in
22:13it so at this point we have our data and
22:17I I sorry I was just going to point out
22:19this State of the Union Address um this
22:24after um this speech was after ch gbt
22:29was trained trat gbt goes to September
22:312021 and this the unions after so as a
22:35proof point when we go through this uh
22:38sample we ask questions of it you'll be
22:40able to see that it's in fact not using
22:43um the sequence of sequence modeling of
22:45chat GPT it's actually using the vector
22:49Lang yeah that's a great
22:52Point um so the next piece of this is
22:57the embedding so we got our chunks uh
23:00and raw text created but we need to
23:02embed them and put that put them in the
23:04vector embedded format and to do that we
23:06use an embedding model now there's
23:09several different models out there
23:10there's many embedding models like uh
23:12hugging face has has several as well but
23:15uh we're going to use open AIS
23:16embeddings uh Tex embedding add a uh 002
23:20it's a really popular embedding model um
23:22so uh that's been used in many of the
23:25demos that that I viewed myself um but I
23:29I know that this stuff is always being
23:30updated and there's likely going to be
23:33uh even better and better embedding
23:34models that are coming out in the near
23:36future so it's something just to keep
23:37your eye on uh as we as we move forward
23:40and this very fast moving space and
23:44that's their latest embedding model the
23:45one they recommend using as well right
23:48right there's some older ones um but uh
23:52this one is if you go to open ai's
23:54website they and documentation this is
23:57the inade model recommend
24:04so models yeah sorry I keep interrupting
24:08I just going to say what the model is
24:09going to do is take your query and take
24:12your take your uh your data and it's
24:15actually going to embed them and convert
24:17convert the text into vectors so that's
24:22uh you know we didn't cover the vector
24:23database introduction today uh but for
24:25those that are not familiar with this
24:30vectors which have the context of the
24:32text that's being submitted to
24:36it yeah perfect thank
24:39you all right so uh we've got our we've
24:43got our embedding model picked out and
24:46uh we need to set up our Vector database
24:49so we're going to be using uh
24:51kdb and like Neil said at the beginning
24:53you can go ahead and sign up for an
24:54account and once you are all signed up
24:57uh you'll get access to your endpoint
24:59and your API key uh so I've got that all
25:02here I already put that in and then uh
25:06we instantiate the session so this just
25:08allows us to talk from like this
25:11notebook to kdb aai uh so you put in
25:14your API key and your endpoint there so
25:17that connection set up um and now we're
25:20going to define the schema for our table
25:22so when we're creating the Spector
25:24database it's within a table uh in KW AI
25:28so here we're defining a few different
25:30columns we got an ID column and a text
25:32column uh and then we have our
25:33embeddings column so this is the one
25:35where you're actually doing that
25:37similarity search against and it's a
25:40vector index um so you define a few
25:43different things so you define the
25:45Dimensions um as well as your search
25:49metric so uh there's a few different
25:51search methodologies you can use like I
25:53mentioned we're using ukan distance here
25:56uh but you can also use cosign
25:58similarity your dot product uh so all
26:00that information's in the uh kdb
26:03documentation uh if you're interested in
26:04trying a different uh a different search
26:07metric um and then the type of index
26:10that we're using for our Vector uh
26:13database we're just using a flat index
26:15for now but there's like four or five
26:17different options you can use there um
26:19but for this for this uh PC and notebook
26:23this is uh this will work just fine just
26:29um so let's uh let's run this make sure
26:35good we'll set up our
26:40schema and then what we're going to do
26:42here is just check to make sure we don't
26:44already have a table named leg ring
26:46chain Lang chain because that's the
26:47table we want to create uh and then
26:50we'll go ahead and actually create this
26:52table um so this is going to just take a
26:56there um um but once uh We've created
27:00the table the next step is actually uh
27:03putting our chunks into the vector
27:07database okay so this is how we do that
27:10uh so we take our kdba Ai and we uh put
27:15a few different variables in here our
27:17session which is just that connection
27:19between uh kdb and the notebook uh the
27:23table that we just created leg R chain
27:25uh langing chain our chunks that we
27:28chunked up above and then the embedding
27:31model that we just uh that we just chose
27:34which is the open AI at a 002 model um
27:38so once we run this uh we will have uh a
27:43vector database that contains all of our
27:47chunked up documentation in that vector
27:50format so if we just take a look back at
27:53the architecture at this point our
27:55Vector store is um is filled up with all
27:59of our chunked up and embedded
28:04data so now we can do some similarity
28:08searches and um so let's just try it out
28:11here so we'll create a query what are
28:16strengths and um we'll run that and then
28:20we will go ahead and do the similarity
28:22search so this is running a ukian
28:26distance similarity search between this
28:28query and our Vector database and what
28:31we get out is a chunk and chunks of
28:35relevant information so this is
28:37retrieval right here this is we are
28:39retrieving relevant information um so in
28:42the grand rag pipeline this is where you
28:45retrieve your information and then you
28:47would you would send this relevant uh
28:50information to our large language model
28:52so this completes that first retrieval
28:58all right so we have that and now let's
29:01let's do it let's do retrieval augmented
29:03generation and uh what we're going to do
29:06here is Define those two uh those two
29:09large language models so we're going to
29:11use our open AI uh text DCI 003 which is
29:15G gpt3 and then we're going to use um a
29:19hugging face uh llm from Google called
29:23flan t5x XO and we're just going to use
29:26both of these just to see if there's any
29:29difference between the
29:31two so let's go ahead and give it give
29:33it a try here so we got them both
29:37defined here LM open AI LM FL and now we
29:41can create a chain for each model so how
29:45this works is in Lang chain um this sets
29:48up the process for you to actually hand
29:51uh a list of documents or relevant
29:54chunks into our large language model and
29:56pass them in as part of the prompt that
29:59would be uh prompting that large
30:01language model so we set up that chain
30:06here so we have two chains both the open
30:09AI chain and the hugging face
30:11chain and we got our query uh or our
30:15related chunk here that we just got from
30:17above this is just showing it one more
30:20time and now let's do it so this is
30:22where we actually run um run our rag
30:26pipeline from start to finish is uh we
30:28got our query SIM from above which is
30:30the similar chunks and we have the query
30:34which we pass in as well and we can run
30:37this and we'll see what it says the
30:40strength of the nation is the American
30:41people who have the potential to turn
30:43every crisis into an opportunity so it's
30:46it's all based on that relevant
30:48information um that it was passed and
30:51it's a decent answer um but let's try uh
30:54let's try the hugging face one and just
30:56see what Happ happens says possibilities
31:00so why is this this kind of interesting
31:03so I did a little digging and and the
31:05reason this happens is that um the
31:08Google flan T5 is really uh it's it's a
31:11model that specializes in like short
31:14specific answers so when you're asking
31:16like a big broad question like what are
31:18the strengths of the nation it's not
31:20really going to effectively be able to
31:22answer that uh with the same amount of
31:24accuracy as the open AI models like gpt3
31:28or Chad gbt uh models could um so in
31:32this case if we were building an
31:34application we would not want to use
31:35this and we would probably lean closer
31:37to using one of the open AI
31:41options all right uh and then we're
31:44going to try this one more time uh with
31:47a with a different uh retrieval method
31:50called retrieval QA and this is like a a
31:52wrapper sort of around what we just did
31:54you include um the retriever Within the
31:57call uh when you're creating your llm
32:00chain and the difference here is we can
32:04specify uh the number of related chunks
32:08we're going to be sending to the large
32:11model so in this case we're going to say
32:13let's send 10 chunks to our large
32:14language model 10 of the most related
32:17chunks to our large language model so we
32:20have our train and type which is just
32:21the thing that sends our uh document or
32:25related chunks to the llm and The Prompt
32:28we are choosing to use uh GPT 3.5 turbo
32:31which is the underlying model of chat
32:35GPT um and then for our retriever we
32:38have of course set up our Vector
32:40database within kdb and we're using that
32:43as our Retriever and we are looking to
32:46get K number of chunks back from uh kdb
32:50so we want to get the 10 related chunks
32:53I think uh temperature is something
32:55interesting to talk about out too so
32:58temperature is uh almost a creativity
33:01meter or hallucina hallucination meter
33:04if you set that temperature to 100 it's
33:07going to be more creative and it's but
33:09as a result it'll hallucinate more if
33:11you set that temperature to zero it's
33:13going to be more precise in its answers
33:15and it's not going to
33:17hallucinate yeah not nearly as much you
33:19know it's going to be it's going to
33:21attempt to give you very specific exam
33:25answers yeah that's a great point so if
33:28we want if we want those specific
33:30answers or you know it's more of like an
33:32academic use case where we don't want
33:35things to be uh kind of made up and a
33:37little bit more flexible than we would
33:38want to set a low low temperature um um
33:41so in this case we're just using zero
33:43but you know this is something you can
33:44experiment with if you wanted to spice
33:46it up a little bit you could increase
33:47the temperature and see what
33:51so we'll run that I think I ran this
33:54already make sure I got everything going
33:56here all right so now we can send it a
33:59query which is the what are the nation's
34:02strengths um so let's see what happens
34:05here takes a second to do this um I
34:07think the the heavier the model the
34:12um we'll give it a minute this is
34:14something where if you get an
34:16application you might show it um typing
34:19out the response as it's generating the
34:22response but uh here in this python
34:25notebook it's just going to generate the
34:27response and then it's going to generate
34:30the full response and then display it
34:32yeah yeah exactly so uh here's here we
34:37see oh wow we get a we get a big list of
34:40all the nation strengths that it pulled
34:42from all of the relevant documentation
34:44that it was provided and it's a really
34:46good answer so in this case we see there
34:49is a pretty significant difference even
34:51between uh 3.0 uh J gbt 3 and then 3.5
34:56so um this stuff's moving quick um and
35:00then you know we can try one more here
35:02um just with a different uh different
35:05type of or a different query there and
35:08uh this will take another minute to
35:12go um but yeah it's pretty interesting
35:14just to try some try some different
35:16models um you could try different
35:18temperatures try um you know just
35:22different number of chunks that you're
35:24sending in um so all things to expl it's
35:28also interesting I think it's also
35:30interesting because sometimes you you
35:31may want naive rag you may not you may
35:34want it to only talk about the topics
35:37that are in your vector database you may
35:39wanted to not reach out to you know you
35:43know not go out to the rest of the LM uh
35:45data and find other responses you want
35:48it to be specific to your data so I
35:51think this is interesting that but
35:54yeah yeah so here we just get a a
35:57similar response it didn't format it for
35:58some reason but um it's it's giving you
36:01a bunch of points about the things that
36:02the country needs to protect if you read
36:03through this it's it's pretty good Soh
36:06pretty good response but uh yeah so this
36:09is this is the presentation and this is
36:11the code that was walk through um and
36:13then just the last step you know if
36:15you're done experimenting with this you
36:16can go ahead and uh drop the table just
36:18so we're not using uh you know
36:20unnecessary resources there when you're
36:22done but uh yeah thank you thank you all
36:25for joining and if there's any questions
36:27I'm happy to answer yeah we had a you
36:30know a question from pru in the chat um
36:32he's actually asked another question um
36:35or it has just an observation uh I think
36:38it's worth highlighting it'd be
36:39interesting to come up with a measure
36:40that gives an idea of how much rag has
36:42helped and producing the final response
36:45and I think um if you look at uh Lane
36:48Chain's website they do offer um some
36:52paid services with observability I was
36:54looking into that a little bit last
36:57um but uh definitely a good
37:00observation um observability is another
37:02key area that we we should probably
37:06produce some sessions on Ryan we'll in
37:10that's but uh if you guys enjoyed this
37:13session you all enjoyed the session then
37:16uh follow Ryan on LinkedIn you can
37:18follow either of us on LinkedIn or both
37:20of us on LinkedIn and just to let you
37:22know next week we are going to do
37:24sentiment analysis with Vector database
37:27um I will be out next week but our head
37:30of evangelism Dan Baker will be leading
37:32the show with Laura Kerr doing the
37:34sample so we're really uh doing the
37:37sample exercise so we're really excited
37:38about that um sment analysis is like
37:41predicting positive negative neutral
37:43emotions and sentiment and different
37:45text and how that can be done with
37:47databases um and just to let you know
37:50what our upcoming sessions are um so
37:53we're doing sentiment analysis on
37:54November 1st uh approaches to rag uh
37:58Ryan will be back with us to do
38:00approaches to rag on November 8th um and
38:04then on November 15th Mya is going to
38:07join us to do improving rag performance
38:10and on November 29th uh we'll do some
38:13chunky methods we will take uh the week
38:16of Thanksgiving off um just with people
38:19being out for the holiday in the US uh
38:22but feel free to join any of these
38:24sessions again Wednesday 11:30 A.M
38:26Eastern 4:30 p.m. UK time um we're
38:30keeping these live streams going and uh
38:33if there's any topics that you all want
38:34to see feel free to message us on
38:36LinkedIn and uh we can um add those to
38:40our queue but with that thank you all
38:43for joining today hope you have a great
38:47week ahead and uh we will see you next
38:50time and thanks Ryan for your great
38:51presentation thank you thanks for