00:02 so weekly updates for February 23rd as
00:06 you can see a lot of topics it was busy
00:09 week for AI uh this is me as usual and
00:13 uh first the new model uh mistal next it
00:18 was released uh uh I actually make a a
00:22 small video on uh L lmc.org so this is
00:27 this um open chat where
00:30 you can compare different models but
00:33 what people don't know that on the top
00:36 there are tabs here and you can click on
00:38 Direct chat and then select the model
00:41 you want to chat to uh it will not be
00:44 included as part of the competition it
00:46 will not go in the leaderboard but uh
00:49 you can actually test the model and uh
00:52 what I found uh that well the model is
00:55 good there are multiple videos people
00:57 tested it that it's a good model but the
01:00 model doesn't know recent facts at all
01:04 so I was trying it uh asking about some
01:07 recent systems like ragas rag llama
01:11 llama CPP chpt and so on and it didn't
01:14 know anything about them well you knew
01:17 m.ai because the model is coming from
01:21 mistal but uh yeah interesting a high
01:24 quality model so hopefully they will
01:26 release it in open source soon
01:30 uh Google open llms Gemma 2B and 7B well
01:37 this was a scandal because uh well not
01:39 Scandal but on one side they release
01:42 them on the other side uh they claimed
01:46 in the their blog that they actually
01:49 better than corresponding let's say 7B
01:52 like mistal 7B so they were saying it's
01:55 much better than mral 7B but then many
01:57 people started testing it and found that
02:00 no it's actually shockingly bad there is
02:03 no comparison to mral 7B another strange
02:06 thing that 7B model you would expect uh
02:10 each parameter is 2 bytes so like 14
02:14 gabt but actually it's 34 gabt in
02:18 standard ggf format so it's it's really
02:21 strange um they provided some tools to
02:25 find unit and so on so maybe it's a
02:27 little bit early to say but it's a big
02:30 thing for Google to finally release
02:33 something in open source even if it is
02:35 not good uh open AI World simulation
02:40 like you know that uh open AI has
02:42 released uh the visual model which is
02:47 extremely good and looks like it is
02:50 actually based on the world simulation
02:56 uh uh well that's all all I will say uh
03:00 for now stable diffusion three so it is
03:05 uh announced it's in early preview stage
03:08 uh looks like it's much better than the
03:10 previous version it has up to 8 billion
03:13 parameters uh the previous one was about
03:16 10 times smaller uh and it looks like it
03:19 may use some concept from uh stability
03:22 AI Cascade model which was released
03:24 recently um so we'll see just a reminder
03:30 higher degree of compression of data and
03:33 because of that it can perform much
03:35 faster and somehow they achieved good
03:38 performance so it was already released
03:40 and stable diffusion 3 is in early
03:45 stage okay uh there is a lecture by Jeff
03:48 Dean from Google where he talks about
03:51 Trends in machine learning uh so it's
03:54 something you can listen to uh Andre
03:58 carpath uh relased released the code uh
04:01 so this is a small algorithm for teiz uh
04:07 video and then another code probably
04:12 next week we'll talk about te conization
04:14 and embeddings because it's a
04:17 area you guys wouldn't believe how much
04:20 fun predicting the next token is this
04:23 from CHP this is yeah this is on uh
04:26 Twitter um anyway uh
04:30 uh data set open math instruct one by
04:34 Nvidia so Nvidia has released a data set
04:39 uh for Math and uh it has some
04:43 subsets uh it uses Mixr 7bit to produce
04:47 the pairs and leverages both reasoning
04:49 and cont operative dur generation and so
04:52 on so it's open source and it's a good
04:55 data set to know about now a open source
04:59 m multilingual project this is really
05:03 so 119 countries right and the goal is
05:09 uh to create uh llms which are
05:14 multilingual not just llms but also data
05:17 sets for training like everything it's
05:21 source uh more different news so there
05:25 was discussion whether like you know
05:28 Google has released um the new model
05:30 which in research setting was operating
05:33 with 10 million tokens uh context length
05:35 which is a huge context length so people
05:39 were saying well then we don't need rag
05:41 we don't need a database we can just
05:43 load uh all the data and then uh this
05:47 model will deal with it but uh then of
05:50 course consideration is the cost because
05:52 if you load a lot of data you'll have to
05:54 pay for it whereas if you're using rug
05:57 you only process data which you retrieve
06:00 from the database which is small amount
06:03 and uh so far rug is like 100 times more
06:08 effective now here is the paper scalable
06:12 diffusion models with Transformers which
06:15 comes from Berkeley and NYU and this is
06:19 actually a foundation of the Sora model
06:24 uh released by uh open AI so you may
06:27 want to look um this
06:29 publication AI augmented predictions LM
06:32 assistance improve human forecasting
06:35 accuracy this is interesting that even
06:38 if the model hallucinates even if it
06:41 gives you wrong uh response still it's
06:46 helpful so they they measured and they
06:48 found that if it gives right answer it's
06:51 like 43% or biased like 28% but it's
06:56 still very very helpful and uh yeah yeah
06:59 so we know that we cannot trust the
07:02 model but we still use
07:04 it um anyway L chain famous uh python
07:08 framework which everybody's using now it
07:12 is a startup they raised $25 million so
07:15 this is the team and they have a lot of
07:18 following uh on GitHub so this is good
07:22 news and uh they have this application
07:25 Lang Smith llm application development
07:28 monitoring and testing I never used it
07:31 uh but well apparently it's either
07:33 already available or will become
07:35 available soon uh ragas uh which is rag
07:40 framework um so this is again open
07:43 source framework so people install ragas
07:45 in Python and then from ragas import
07:48 evaluate and so on so you can evaluate
07:51 your rag system using this framework Rus
07:55 evaluates pipelines on correctness
07:57 tonality hallucinations fluid Y and so
08:01 on uh more different news uh
08:06 Gemini can now work with your Google
08:09 workspace your Google dogs Google slides
08:12 and so on emails your Gmail so you can
08:17 ask it to search and summarize so it
08:20 becomes very useful it's kind of like U
08:24 uh Microsoft uh co-pilot can work with
08:27 your outlook uh and here Google Gemini
08:31 works with Google documents and Google
08:38 70b I tried to actually see where I can
08:41 download it but I had difficulty maybe
08:44 website was overloaded but what they did
08:47 they took code Lama
08:49 70b which is a model which was
08:53 specifically trained to work with code
08:55 right and then they further find unit on
09:00 additionally and they have shown that in
09:03 their tests it is better than even GPT 4
09:08 so this is quite amazing so it it looks
09:10 like if this is true then this is the
09:17 today uh 70b they have also smaller
09:20 model which is not as good but still
09:25 Gro uh so Gro is a startup which
09:30 specializes U which creates specialized
09:34 chips to run models and they run them
09:37 really really fast so you're talking
09:40 about 10 times faster 50 times faster
09:43 well it depends on what you're doing but
09:46 amazingly uh much faster so KN instant
09:49 responses efficiency affability you can
09:52 actually try it I tried it and yeah it
09:54 was like really amazing how fast it
09:57 prints the response
09:59 uh letter from uh Neil moan who is a
10:03 Youtube CEO he talks about uh four big
10:08 2024 that AI will Empower human
10:11 creativity creatives should be
10:13 recognized as Next Generation
10:15 Studios uh YouTube next Frontier is uh
10:18 living room and subscriptions by living
10:20 room means uh big TV screen so people
10:24 now uh watch more YouTube than they
10:29 uh to be watched on big screen and
10:31 living room and subscriptions are
10:33 growing uh protecting the Creator
10:36 economy is foundational okay magic magic
10:40 AI um uh this is a startup they got more
10:44 than $100 million to build AI software
10:48 engineer capable of assisting with
10:51 complex coding tasks and that will act
10:53 more as a coworker than merely a
10:55 co-pilot tool so yeah
11:01 next uh text embeddings comprehensive
11:05 guide so this is the link uh talking
11:07 about uh embeddings sometimes I just
11:10 give you the links so you can follow
11:11 them and by the way some of them are
11:13 coming from suer s thank you um so why
11:17 language model become large language
11:19 models and the cordal of developing LM
11:22 based applications so I really like this
11:25 uh article and this picture is actually
11:27 from this article and uh I mean there
11:30 are several good pictures there but you
11:34 happening uh in last year that uh
11:38 training annual costs is going down
11:42 whereas expense on inference goes up
11:47 that's why if you look at the most
11:48 recent Nvidia chips they are more tuned
11:53 for cheaper inference for faster and
11:57 cheaper inference because this is like
11:59 like the main expense
12:01 nowadays Laura land this is really
12:04 interesting project uh what they have
12:08 Mistral so Mistral 7B which is a small
12:12 nice model and they find
12:14 unit uh with like 25 different
12:18 topics and they achieved performance for
12:21 these topics better than GPT 4 so if you
12:26 have certain specific use case you don't
12:30 need gp4 you can take smaller model find
12:33 unit for this uh specific topic and you
12:36 will get better performance uh very
12:39 interesting and 25 different and uh it's
12:42 across the board like always the smaller
12:47 better uh gp4 knowledge cut off is now
12:53 2023 I tested GPT 3.5 it is still
12:56 January 2022 which is 2 years ago so GPT
13:00 3.5 is useless because it doesn't know
13:03 last 2 years but GPT 4 is is is
13:07 reasonably good okay uh this is probably
13:14 follow uh you remember there was a
13:16 famous article uh from Google when they
13:19 invented Transformer and they wrote
13:20 article attention is all you need so
13:23 these are eight people who wrote this
13:26 article and they all are co-founders of
13:30 companies and uh a CEO of Nvidia will be
13:35 interviewing them so there will be a
13:37 session and uh it will be in March on
13:41 March 20th so it will be probably very
13:44 interesting uh session and uh GTC is the
13:50 um GPU technology conference right so it
13:54 will be March 18th to
13:56 21 um okay uh Sierra
14:00 startup uh very interesting thing so
14:03 this is uh business use so first of all
14:06 uh the founder Brett Taylor he's open AI
14:11 chairman recently and uh before that he
14:17 Salesforce so apparently he knows he's
14:19 doing he made some strategic partnership
14:21 he secured more than $100 million in
14:24 funding they already have um 30
14:27 employers and and um the idea is uh to
14:31 create chatbots for for business for
14:35 customer service and uh uh they have
14:38 competitions which was uh competitors
14:41 for long time like for example this
14:43 hoptic AI from India you see it
14:47 2019 and they had a lot of Enterprise
14:50 customers and so on well but these
14:52 people are brave so they enter this
14:55 area uh to and they already have weight
14:58 watch and sonus and serus XM
15:02 so we'll see but you can make chatboard
15:06 for yourself or you can create chatboard
15:08 for big business and this is what
15:11 doing okay this is again sander thank
15:14 you like this is a great thing um so
15:19 this open source on on GitHub is set of
15:22 more than 100 coding tests and python
15:25 framework to apply these tests so he
15:29 tested it with uh different models GPT 4
15:32 3.5 CLA CLA mral medium mral small gini
15:36 Pro and you see that uh simple tests all
15:40 the models pass but then as the mo uh
15:43 tests become more more more difficult
15:45 you see the First Column is gp4 it still
15:48 holds but then eventually almost none of
15:51 them can solve the problems right so
15:53 it's very like graphically pleasing to
15:56 see the performance of different
15:58 different models and uh this is like I I
16:02 I looked at what was the first one here
16:04 on this line which nobody could do right
16:07 and this was actually this problem uh
16:11 here is this uh base 64 string as you
16:15 know L language models can read
16:16 different languages including base 64
16:18 encoding so read this base 64 string
16:21 think about the answer and type just the
16:26 64 uh your entire answer must in base 64
16:30 and you see none of them uh answered
16:32 this question okay up Trin uh open
16:40 evaluations has multiple metrics do AB
16:44 testing conversation evaluations and so
16:47 on so now people go in operations so
16:50 it's uh useful again it's open source
16:53 you see GitHub uh byp Pi so you install
16:57 it uh in your Python program okay uh
17:00 notebook lm. goole.com so this is uh how
17:05 you can keep your notes uh some people
17:07 really really like it uh so you know you
17:10 have Drive uh Google you have email now
17:13 you have notebook LM and you need to
17:19 hello can you please mute
17:23 yourself okay uh this is a
17:27 scandal so people have a lot of fun with
17:30 this so Gemini apparently had a
17:34 diversity problem uh with their settings
17:37 so when you ask it to generate an image
17:40 so for example uh this is uh uh Gemini
17:45 picture of Ellen musk so the request was
17:48 generate a picture of Ellen musk and you
17:51 see it's recognizable the face it is
17:53 Elon Musk but somehow it he's
17:56 black and look look look at this picture
17:59 like the request was generate an image
18:02 of the founders of Google so you know
18:05 Larry and Sergey and uh why they
18:11 Chinese and uh this for example the
18:14 request was paint me a historically
18:16 accurate depiction of a medieval British
18:20 king and well it doesn't look
18:23 like British king and uh and so on like
18:28 portrait of a famous physicist in the
18:32 17th century well definitely well this
18:36 may be but these don't look like and
18:40 this is a portrait of founding fathers
18:44 America well maybe this one but not the
18:47 others and this is very interesting this
18:50 is uh generate an image of a 1943 German
18:55 Soldier why German Soldier is agan Asian
18:59 woman or like a black
19:02 so they uh stopped uh like if you go
19:06 right now and you try to generate an
19:08 image like for example I Tred generate
19:10 image of elen musk so I tried to
19:12 reproduce this I got the answer that
19:14 we're working to improve whatever so
19:17 they they closed it until they will fix
19:20 it yeah but it was really funny um how
19:25 to Pilot uh uh generative AI uh by
19:29 Gartner so there's an article uh you can
19:32 read and they uh describe the process
19:36 how to successfully build generative AI
19:38 pilot applications analysis and
19:41 recommendations from people who have
19:42 done it uh again interesting
19:45 reading uh Arena leaderboard uh hasn't
19:50 changed yet last time it it was
19:53 regenerated on February 15th and it was
19:55 not updated yet uh helm leaderboard this
19:59 is interesting project so this is in
20:01 Stanford and they have Center for
20:03 research on Foundation models
20:06 crfm so it is C crfm do standford.edu
20:12 and uh you know when people provide uh
20:18 models they evaluate them on several uh
20:22 benchmarks maybe five maybe seven but
20:25 here they evaluated them on uh probably
20:28 about hundred of different uh metrics
20:31 and uh you see you can select uh
20:34 different scenarios and for different
20:36 scenarios they different metrics so it
20:38 is kind of holistic from all angles
20:41 evaluations of the model and uh th this
20:45 is their own leaderboard uh which you
20:48 can dissect with drop-down menus uh so
20:52 what what's interesting like on the top
20:56 gp4 and then uh pal
20:58 then pal 2 which was the older model and
21:02 then Y which is Chinese and here is mial
21:06 you see mial is open source and then
21:09 anthropic CLA again pal anthropic llama
21:12 2 and uh it's of course you see much
21:15 lower than than the top but I'm very
21:18 happy for mxtr uh but yeah you see GPT
21:23 3.5 you see it's all the way down
21:26 here so uh next is llm as a zero cost
21:31 commodity this is just interesting
21:33 thinking so llm are becoming better and
21:37 cheaper eventually uh it will become
21:42 commodity and uh the value will be not
21:45 in llm itself but in the systems which
21:48 are built around them and in the data
21:52 which is used uh to build them or used
21:55 by the system um with llms so this is
22:00 interesting discussion and uh first
22:03 customer service then replaced all call
22:06 centers llm will then incorporate it in
22:08 video games and there will be education
22:12 tuned llm substitute for the bottom 40%
22:15 of grade school teachers so all software
22:18 becomes a commodity over time llms will
22:21 become cheap they will become
22:23 free okay so it is interesting
22:26 publication from that respect
22:28 perspective the most valuable LM company
22:30 will be the anti llm companies which
22:34 doesn't use llm Facebook Tinder Twitter
22:37 Instagram all considerably less valuable
22:39 once the majority of the user base is
22:42 replaced with extremely high quality
22:44 Bots uh the real consumers May gradually
22:47 sign off and take their money elsewhere
22:49 in that world a naive person would try
22:51 to build the boat detector but a good
22:53 boat detector can simply be used to
22:55 build a better boat instead to win at
22:58 this game I think the most important way
23:00 to win with llms to devise better forms
23:05 authentication okay so uh again just
23:07 recommend follow this link there is a
23:09 lengthy discussion many people
23:12 contribute um this is interesting uh so
23:16 um there is a website called upwork uh
23:19 where you can uh hire somebody to do
23:22 some simple work for you for example
23:25 write something translate something and
23:27 so on and what you see that this is a
23:30 change of number of upwork job since
23:33 Char GPT was released and the red ones
23:36 means that it's a decrease it is writing
23:41 it is translation it is customer service
23:44 so this is another one a change of
23:46 hourly rates specified in upwork job
23:49 posting per category and you see again
23:59 production market research backend
24:02 development so so you see because of AI
24:05 there's definitely decrease in the
24:08 number of jobs and in the pay now this
24:11 this is another interesting thing uh
24:14 bloomberry uh number of new upwork jobs
24:17 per day mentioned each AI skill and the
24:21 top one here which is growing is
24:25 chatboard so you want to get a job
24:28 put chatbot in your
24:30 resume um anyway these are layoffs and
24:34 uh I just it's about as it was so you
24:39 2023 and January 24 is much smaller
24:44 about three times smaller and next is
24:46 February and you see February is much
24:49 smaller so this uh year like layoffs is
24:52 about three times less than they were
24:54 last year okay that's it thank you