Exciting AI Updates - February 23, 2024

Lev Selector2024-02-23

496 views|5 months ago

💫 Short Summary

In the AI weekly update for February 23rd, new AI models are discussed, including the release of the MISTAL NEXT model and the open AI World Simulation. Google's open sourcing of LLMs Gemma 2B and 7B is also mentioned, with some users finding the models' performance to be subpar. Other highlights include the announcement of stable diffusion 3 and the growth of AI-related skills in the job market.

✨ Highlights

📊 Transcript

✦

New model “mistal next” released, with the ability to test the model using open chat and select different tabs for direct chat.

00:02

Tabs for direct chat are available in the open chat feature of the new model “mistal next”.

The model can be tested, but it is observed that it doesn't know recent facts about certain systems.

00:02 so weekly updates for February 23rd as

00:06 you can see a lot of topics it was busy

00:09 week for AI uh this is me as usual and

00:13 uh first the new model uh mistal next it

00:18 was released uh uh I actually make a a

00:22 small video on uh L lmc.org so this is

00:27 this um open chat where

00:30 you can compare different models but

00:33 what people don't know that on the top

00:36 there are tabs here and you can click on

00:38 Direct chat and then select the model

00:41 you want to chat to uh it will not be

00:44 included as part of the competition it

00:46 will not go in the leaderboard but uh

00:49 you can actually test the model and uh

00:52 what I found uh that well the model is

00:55 good there are multiple videos people

00:57 tested it that it's a good model but the

01:00 model doesn't know recent facts at all

01:04 so I was trying it uh asking about some

01:07 recent systems like ragas rag llama

01:11 llama CPP chpt and so on and it didn't

01:14 know anything about them well you knew

01:16 about

01:17 m.ai because the model is coming from

01:21 mistal but uh yeah interesting a high

01:24 quality model so hopefully they will

01:26 release it in open source soon

01:30 uh Google open llms Gemma 2B and 7B well

01:37 this was a scandal because uh well not

01:39 Scandal but on one side they release

01:42 them on the other side uh they claimed

01:46 in the their blog that they actually

01:49 better than corresponding let's say 7B

01:52 like mistal 7B so they were saying it's

01:55 much better than mral 7B but then many

01:57 people started testing it and found that

02:00 no it's actually shockingly bad there is

02:03 no comparison to mral 7B another strange

02:06 thing that 7B model you would expect uh

02:10 each parameter is 2 bytes so like 14

02:14 gabt but actually it's 34 gabt in

02:18 standard ggf format so it's it's really

02:21 strange um they provided some tools to

02:25 find unit and so on so maybe it's a

02:27 little bit early to say but it's a big

02:30 thing for Google to finally release

02:33 something in open source even if it is

02:35 not good uh open AI World simulation

02:40 like you know that uh open AI has

02:42 released uh the visual model which is

02:47 extremely good and looks like it is

02:50 actually based on the world simulation

02:54 uh paper and

02:56 uh uh well that's all all I will say uh

03:00 for now stable diffusion three so it is

03:05 uh announced it's in early preview stage

03:08 uh looks like it's much better than the

03:10 previous version it has up to 8 billion

03:13 parameters uh the previous one was about

03:16 10 times smaller uh and it looks like it

03:19 may use some concept from uh stability

03:22 AI Cascade model which was released

03:24 recently um so we'll see just a reminder

03:28 Cascade um has

03:30 higher degree of compression of data and

03:33 because of that it can perform much

03:35 faster and somehow they achieved good

03:38 performance so it was already released

03:40 and stable diffusion 3 is in early

03:44 preview

03:45 stage okay uh there is a lecture by Jeff

03:48 Dean from Google where he talks about

03:51 Trends in machine learning uh so it's

03:54 something you can listen to uh Andre

03:58 carpath uh relased released the code uh

04:01 so this is a small algorithm for teiz uh

04:05 he also released a

04:07 video and then another code probably

04:12 next week we'll talk about te conization

04:14 and embeddings because it's a

04:16 interesting

04:17 area you guys wouldn't believe how much

04:20 fun predicting the next token is this

04:23 from CHP this is yeah this is on uh

04:26 Twitter um anyway uh

04:30 uh data set open math instruct one by

04:34 Nvidia so Nvidia has released a data set

04:39 uh for Math and uh it has some

04:43 subsets uh it uses Mixr 7bit to produce

04:47 the pairs and leverages both reasoning

04:49 and cont operative dur generation and so

04:52 on so it's open source and it's a good

04:55 data set to know about now a open source

04:59 m multilingual project this is really

05:02 interesting

05:03 so 119 countries right and the goal is

05:09 uh to create uh llms which are

05:14 multilingual not just llms but also data

05:17 sets for training like everything it's

05:19 completely open

05:21 source uh more different news so there

05:25 was discussion whether like you know

05:28 Google has released um the new model

05:30 which in research setting was operating

05:33 with 10 million tokens uh context length

05:35 which is a huge context length so people

05:39 were saying well then we don't need rag

05:41 we don't need a database we can just

05:43 load uh all the data and then uh this

05:47 model will deal with it but uh then of

05:50 course consideration is the cost because

05:52 if you load a lot of data you'll have to

05:54 pay for it whereas if you're using rug

05:57 you only process data which you retrieve

06:00 from the database which is small amount

06:03 and uh so far rug is like 100 times more

06:06 cost

06:08 effective now here is the paper scalable

06:12 diffusion models with Transformers which

06:15 comes from Berkeley and NYU and this is

06:19 actually a foundation of the Sora model

06:24 uh released by uh open AI so you may

06:27 want to look um this

06:29 publication AI augmented predictions LM

06:32 assistance improve human forecasting

06:35 accuracy this is interesting that even

06:38 if the model hallucinates even if it

06:41 gives you wrong uh response still it's

06:46 helpful so they they measured and they

06:48 found that if it gives right answer it's

06:51 like 43% or biased like 28% but it's

06:56 still very very helpful and uh yeah yeah

06:59 so we know that we cannot trust the

07:02 model but we still use

07:04 it um anyway L chain famous uh python

07:08 framework which everybody's using now it

07:12 is a startup they raised $25 million so

07:15 this is the team and they have a lot of

07:18 following uh on GitHub so this is good

07:22 news and uh they have this application

07:25 Lang Smith llm application development

07:28 monitoring and testing I never used it

07:31 uh but well apparently it's either

07:33 already available or will become

07:35 available soon uh ragas uh which is rag

07:39 assessment

07:40 framework um so this is again open

07:43 source framework so people install ragas

07:45 in Python and then from ragas import

07:48 evaluate and so on so you can evaluate

07:51 your rag system using this framework Rus

07:55 evaluates pipelines on correctness

07:57 tonality hallucinations fluid Y and so

08:01 on uh more different news uh

08:06 Gemini can now work with your Google

08:09 workspace your Google dogs Google slides

08:12 and so on emails your Gmail so you can

08:17 ask it to search and summarize so it

08:20 becomes very useful it's kind of like U

08:24 uh Microsoft uh co-pilot can work with

08:27 your outlook uh and here Google Gemini

08:31 works with Google documents and Google

08:35 email uh find

08:38 70b I tried to actually see where I can

08:41 download it but I had difficulty maybe

08:44 website was overloaded but what they did

08:47 they took code Lama

08:49 70b which is a model which was

08:53 specifically trained to work with code

08:55 right and then they further find unit on

08:58 50 billion tokens

09:00 additionally and they have shown that in

09:03 their tests it is better than even GPT 4

09:08 so this is quite amazing so it it looks

09:10 like if this is true then this is the

09:13 best coding llm

09:17 today uh 70b they have also smaller

09:20 model which is not as good but still

09:22 very good okay

09:25 Gro uh so Gro is a startup which

09:30 specializes U which creates specialized

09:34 chips to run models and they run them

09:37 really really fast so you're talking

09:40 about 10 times faster 50 times faster

09:43 well it depends on what you're doing but

09:46 amazingly uh much faster so KN instant

09:49 responses efficiency affability you can

09:52 actually try it I tried it and yeah it

09:54 was like really amazing how fast it

09:57 prints the response

09:59 uh letter from uh Neil moan who is a

10:03 Youtube CEO he talks about uh four big

10:06 bets for

10:08 2024 that AI will Empower human

10:11 creativity creatives should be

10:13 recognized as Next Generation

10:15 Studios uh YouTube next Frontier is uh

10:18 living room and subscriptions by living

10:20 room means uh big TV screen so people

10:24 now uh watch more YouTube than they

10:27 watch TV so video

10:29 uh to be watched on big screen and

10:31 living room and subscriptions are

10:33 growing uh protecting the Creator

10:36 economy is foundational okay magic magic

10:40 AI um uh this is a startup they got more

10:44 than $100 million to build AI software

10:48 engineer capable of assisting with

10:51 complex coding tasks and that will act

10:53 more as a coworker than merely a

10:55 co-pilot tool so yeah

10:59 yeah uh

11:01 next uh text embeddings comprehensive

11:05 guide so this is the link uh talking

11:07 about uh embeddings sometimes I just

11:10 give you the links so you can follow

11:11 them and by the way some of them are

11:13 coming from suer s thank you um so why

11:17 language model become large language

11:19 models and the cordal of developing LM

11:22 based applications so I really like this

11:25 uh article and this picture is actually

11:27 from this article and uh I mean there

11:30 are several good pictures there but you

11:33 see what's

11:34 happening uh in last year that uh

11:38 training annual costs is going down

11:42 whereas expense on inference goes up

11:47 that's why if you look at the most

11:48 recent Nvidia chips they are more tuned

11:53 for cheaper inference for faster and

11:57 cheaper inference because this is like

11:59 like the main expense

12:01 nowadays Laura land this is really

12:04 interesting project uh what they have

12:06 done they took

12:08 Mistral so Mistral 7B which is a small

12:12 nice model and they find

12:14 unit uh with like 25 different

12:18 topics and they achieved performance for

12:21 these topics better than GPT 4 so if you

12:26 have certain specific use case you don't

12:30 need gp4 you can take smaller model find

12:33 unit for this uh specific topic and you

12:36 will get better performance uh very

12:39 interesting and 25 different and uh it's

12:42 across the board like always the smaller

12:45 fune model behaved

12:47 better uh gp4 knowledge cut off is now

12:51 December

12:53 2023 I tested GPT 3.5 it is still

12:56 January 2022 which is 2 years ago so GPT

13:00 3.5 is useless because it doesn't know

13:03 last 2 years but GPT 4 is is is

13:07 reasonably good okay uh this is probably

13:11 good session uh to

13:14 follow uh you remember there was a

13:16 famous article uh from Google when they

13:19 invented Transformer and they wrote

13:20 article attention is all you need so

13:23 these are eight people who wrote this

13:26 article and they all are co-founders of

13:30 some

13:30 companies and uh a CEO of Nvidia will be

13:35 interviewing them so there will be a

13:37 session and uh it will be in March on

13:41 March 20th so it will be probably very

13:44 interesting uh session and uh GTC is the

13:50 um GPU technology conference right so it

13:54 will be March 18th to

13:56 21 um okay uh Sierra

14:00 startup uh very interesting thing so

14:03 this is uh business use so first of all

14:06 uh the founder Brett Taylor he's open AI

14:11 chairman recently and uh before that he

14:15 was Co CEO of

14:17 Salesforce so apparently he knows he's

14:19 doing he made some strategic partnership

14:21 he secured more than $100 million in

14:24 funding they already have um 30

14:27 employers and and um the idea is uh to

14:31 create chatbots for for business for

14:35 customer service and uh uh they have

14:38 competitions which was uh competitors

14:41 for long time like for example this

14:43 hoptic AI from India you see it

14:47 2019 and they had a lot of Enterprise

14:50 customers and so on well but these

14:52 people are brave so they enter this

14:55 area uh to and they already have weight

14:58 watch and sonus and serus XM

15:02 so we'll see but you can make chatboard

15:06 for yourself or you can create chatboard

15:08 for big business and this is what

15:10 they're

15:11 doing okay this is again sander thank

15:14 you like this is a great thing um so

15:19 this open source on on GitHub is set of

15:22 more than 100 coding tests and python

15:25 framework to apply these tests so he

15:29 tested it with uh different models GPT 4

15:32 3.5 CLA CLA mral medium mral small gini

15:36 Pro and you see that uh simple tests all

15:40 the models pass but then as the mo uh

15:43 tests become more more more difficult

15:45 you see the First Column is gp4 it still

15:48 holds but then eventually almost none of

15:51 them can solve the problems right so

15:53 it's very like graphically pleasing to

15:56 see the performance of different

15:58 different models and uh this is like I I

16:02 I looked at what was the first one here

16:04 on this line which nobody could do right

16:07 and this was actually this problem uh

16:11 here is this uh base 64 string as you

16:15 know L language models can read

16:16 different languages including base 64

16:18 encoding so read this base 64 string

16:21 think about the answer and type just the

16:24 answer in base

16:26 64 uh your entire answer must in base 64

16:30 and you see none of them uh answered

16:32 this question okay up Trin uh open

16:37 source uh llm Ops

16:40 evaluations has multiple metrics do AB

16:44 testing conversation evaluations and so

16:47 on so now people go in operations so

16:50 it's uh useful again it's open source

16:53 you see GitHub uh byp Pi so you install

16:57 it uh in your Python program okay uh

17:00 notebook lm. goole.com so this is uh how

17:05 you can keep your notes uh some people

17:07 really really like it uh so you know you

17:10 have Drive uh Google you have email now

17:13 you have notebook LM and you need to

17:17 find a spot

17:19 hello can you please mute

17:23 yourself okay uh this is a

17:27 scandal so people have a lot of fun with

17:30 this so Gemini apparently had a

17:34 diversity problem uh with their settings

17:37 so when you ask it to generate an image

17:40 so for example uh this is uh uh Gemini

17:45 picture of Ellen musk so the request was

17:48 generate a picture of Ellen musk and you

17:51 see it's recognizable the face it is

17:53 Elon Musk but somehow it he's

17:56 black and look look look at this picture

17:59 like the request was generate an image

18:02 of the founders of Google so you know

18:05 Larry and Sergey and uh why they

18:11 Chinese and uh this for example the

18:14 request was paint me a historically

18:16 accurate depiction of a medieval British

18:20 king and well it doesn't look

18:23 like British king and uh and so on like

18:27 this is uh

18:28 portrait of a famous physicist in the

18:32 17th century well definitely well this

18:36 may be but these don't look like and

18:40 this is a portrait of founding fathers

18:43 of

18:44 America well maybe this one but not the

18:47 others and this is very interesting this

18:50 is uh generate an image of a 1943 German

18:55 Soldier why German Soldier is agan Asian

18:59 woman or like a black

19:02 so they uh stopped uh like if you go

19:06 right now and you try to generate an

19:08 image like for example I Tred generate

19:10 image of elen musk so I tried to

19:12 reproduce this I got the answer that

19:14 we're working to improve whatever so

19:17 they they closed it until they will fix

19:20 it yeah but it was really funny um how

19:25 to Pilot uh uh generative AI uh by

19:29 Gartner so there's an article uh you can

19:32 read and they uh describe the process

19:36 how to successfully build generative AI

19:38 pilot applications analysis and

19:41 recommendations from people who have

19:42 done it uh again interesting

19:45 reading uh Arena leaderboard uh hasn't

19:50 changed yet last time it it was

19:53 regenerated on February 15th and it was

19:55 not updated yet uh helm leaderboard this

19:59 is interesting project so this is in

20:01 Stanford and they have Center for

20:03 research on Foundation models

20:06 crfm so it is C crfm do standford.edu

20:12 and uh you know when people provide uh

20:16 leaderboards of

20:18 models they evaluate them on several uh

20:22 benchmarks maybe five maybe seven but

20:25 here they evaluated them on uh probably

20:28 about hundred of different uh metrics

20:31 and uh you see you can select uh

20:34 different scenarios and for different

20:36 scenarios they different metrics so it

20:38 is kind of holistic from all angles

20:41 evaluations of the model and uh th this

20:45 is their own leaderboard uh which you

20:48 can dissect with drop-down menus uh so

20:52 what what's interesting like on the top

20:54 is

20:56 gp4 and then uh pal

20:58 then pal 2 which was the older model and

21:02 then Y which is Chinese and here is mial

21:06 you see mial is open source and then

21:09 anthropic CLA again pal anthropic llama

21:12 2 and uh it's of course you see much

21:15 lower than than the top but I'm very

21:18 happy for mxtr uh but yeah you see GPT

21:23 3.5 you see it's all the way down

21:26 here so uh next is llm as a zero cost

21:31 commodity this is just interesting

21:33 thinking so llm are becoming better and

21:37 cheaper eventually uh it will become

21:40 almost free

21:42 commodity and uh the value will be not

21:45 in llm itself but in the systems which

21:48 are built around them and in the data

21:52 which is used uh to build them or used

21:55 by the system um with llms so this is

22:00 interesting discussion and uh first

22:03 customer service then replaced all call

22:06 centers llm will then incorporate it in

22:08 video games and there will be education

22:12 tuned llm substitute for the bottom 40%

22:15 of grade school teachers so all software

22:18 becomes a commodity over time llms will

22:21 become cheap they will become

22:23 free okay so it is interesting

22:26 publication from that respect

22:28 perspective the most valuable LM company

22:30 will be the anti llm companies which

22:34 doesn't use llm Facebook Tinder Twitter

22:37 Instagram all considerably less valuable

22:39 once the majority of the user base is

22:42 replaced with extremely high quality

22:44 Bots uh the real consumers May gradually

22:47 sign off and take their money elsewhere

22:49 in that world a naive person would try

22:51 to build the boat detector but a good

22:53 boat detector can simply be used to

22:55 build a better boat instead to win at

22:58 this game I think the most important way

23:00 to win with llms to devise better forms

23:03 of human

23:05 authentication okay so uh again just

23:07 recommend follow this link there is a

23:09 lengthy discussion many people

23:12 contribute um this is interesting uh so

23:16 um there is a website called upwork uh

23:19 where you can uh hire somebody to do

23:22 some simple work for you for example

23:25 write something translate something and

23:27 so on and what you see that this is a

23:30 change of number of upwork job since

23:33 Char GPT was released and the red ones

23:36 means that it's a decrease it is writing

23:41 it is translation it is customer service

23:44 so this is another one a change of

23:46 hourly rates specified in upwork job

23:49 posting per category and you see again

23:53 translation

23:56 video uh

23:59 production market research backend

24:02 development so so you see because of AI

24:05 there's definitely decrease in the

24:08 number of jobs and in the pay now this

24:11 this is another interesting thing uh

24:13 Source

24:14 bloomberry uh number of new upwork jobs

24:17 per day mentioned each AI skill and the

24:21 top one here which is growing is

24:25 chatboard so you want to get a job

24:28 put chatbot in your

24:30 resume um anyway these are layoffs and

24:34 uh I just it's about as it was so you

24:37 see January

24:39 2023 and January 24 is much smaller

24:44 about three times smaller and next is

24:46 February and you see February is much

24:49 smaller so this uh year like layoffs is

24:52 about three times less than they were

24:54 last year okay that's it thank you

💫 FAQs about This YouTube Video

1. What are the key updates in AI for the week of February 23rd?

The AI updates for the week of February 23rd include the release of the new model “mistal next”, the testing capability using open chat, and the discovery that the model doesn't know recent facts about certain systems.

2. What was the outcome of testing the new model “mistal next” in the open chat?

While the new model “mistal next” is considered a good model, it was found that it doesn't know recent facts about certain systems. For example, when tested, it didn't have information about systems like “ragas” and “llama CPP chpt”.

3. How do the tabs for direct chat work in the new model “mistal next”?

The tabs for direct chat in the new model “mistal next” allow users to select the model they want to chat with, but the chat results will not be included in the competition or leaderboard.

4. What are the implications of Google's open sourcing of llms Gemma 2B and 7B?

Google's open sourcing of llms Gemma 2B and 7B has raised some concerns as they claimed to be better than corresponding models, but many people testing the models found that they did not perform as expected. The 7B model's parameters were also found to be unexpectedly large in standard format.

5. What is the latest development in AI model simulation?

The latest development in AI model simulation is the release of OpenAI's World Simulation, which is believed to be based on the world simulation paper and offers significant advancements in AI simulation technology.

6. What new project has been introduced to create multilingual lmss and open source datasets for training?

A new project has been introduced to create multilingual lmss and open source datasets for training in 119 countries. The project aims to make language models and training datasets available as open source resources for the AI community.

7. What analysis was presented regarding the scalability of diffusion models with Transformers in a recent paper?

A paper on scalable diffusion models with Transformers, presented by Berkeley and NYU, was the foundation of the Sora model released by open AI. The paper also discussed how the model's predictions, even if sometimes wrong, were still found to be helpful in improving human forecasting accuracy.

8. How did the AI model performance affect the job market and job rates on Upwork?

The performance of AI models has led to a decrease in the number of jobs and the pay rates, as observed in the job market and job rates on Upwork. However, there has been a growing demand for AI-related skills, particularly in the area of chatbots.

🎥 Related Videos

Hip Pain RELIEF! Stretches And Exercises For Lateral Hip Pain

Jenkins-Day-1|| Introduction to Continuous Integration and Continuous Deployment

How to Build the Life You Want: Timeless Wisdom for More Happiness & Purpose

Discovery 臺灣部落寶藏第1季信仰祭儀

Lessons Learned at Coursera – Expert Roundtable with Julia Stiglitz

Reality Transurfing Chapter 14 "Forward to the Past" by Vadim Zeland

🔥 Recently Summarized Examples

4 Steps to Master Any Complex Skill (quickly)

40 Years of Fitness Experience in Less Than 11 Minutes.

Gun Controlling Media Makes FATAL Mistake... They Have Tied Their Fate To Biden's & Gun Rights Win

GET READY! Palantir Is Officially The Next Nvidia.

Abundant Thinking: The Hidden Key to Get Everything You Want (Audiobook)

The Coming Demonic Invasion (Revelation 9:12–21)

View original video