# Stanford EE364A Convex Optimization I Stephen Boyd I 2023 I Lecture 12

Stanford Online2024-03-22

Stanford#Stanford Online

765 views|4 months ago

ðŸ’« Short Summary

The video segments cover topics such as analytic centers, linear discrimination, LP feasibility problems, robust linear discrimination, support vector machines, convex surrogates for value at risk, basis functions in machine learning, polynomial discriminators, circuit design optimization, placement problems, CVX Pi problem-solving methods, efficient matrix multiplication techniques, GPUs in matrix operations, solving linear equations, triangular equations, matrix factorizations, and factor solving for efficient equation solving. These concepts are explored in relation to optimization strategies, practical applications, and computational efficiency in various fields.

âœ¨ Highlights

ðŸ“Š Transcript

âœ¦

The concept of an analytic center in relation to sets of inequalities.

02:08The analytic center is found by minimizing the negative log of the margin or maximizing the product of margins.

Commonly used in various fields such as networking.

Crucial in convex problems to ensure the ellipsoid remains within the set.

Computing analytic centers is straightforward and involves writing code.

âœ¦

Overview of Linear Discrimination and Perceptron Algorithm.

05:03Linear discrimination involves a hyperplane separating data based on outcomes, with the Perceptron algorithm as a precursor to linear programming.

Importance of variable notation and understanding strict inequalities to achieve accurate data separation.

Emphasis on the linear nature of the function in the hyperplane and the significance of parameter definitions.

Insights into the historical development of algorithms and considerations in mathematical notation were provided.

âœ¦

Importance of strict and non-strict inequalities in solving systems of inequalities.

08:34Strict inequalities can be replaced with non-strict ones by scaling A and B without loss of generality.

Feasibility of inequalities can be determined by replacing variables with specific values.

Feasibility is preserved through scaling and all inequalities in a system are either feasible or infeasible together.

Understanding the distinction between strict and non-strict inequalities is crucial for various research and practical applications.

âœ¦

Exploring LP feasibility problems and separating hyperplanes using a slab method.

11:57The segment delves into calculating distances between hyperplanes and determining the width of slabs.

Touches on the convexity of functions and highlights cases where maximizing is equivalent to minimizing.

Solving a quadratic program (QP) can help find the thickest separating slab for specific points.

âœ¦

The concept of maximizing one transpose Lambda plus one transpose mu in robust linear discrimination and the optimization process.

15:50The importance of normalizing lambdas and M's in the mathematical details of the problem.

Minimizing the distance between two points in different convex hulls as a key aspect of the optimization process.

Emphasis on geometric clarity in solving optimization problems.

Showcasing the interpretability and relevance of geometric problems in optimization.

âœ¦

Practical applications of support vector machines in machine learning.

21:55Explanation of using slack variables to adjust constraints and minimize misclassified points.

Delving into the mathematical framework behind the cost function and optimization techniques.

Highlighting the importance of convexity in problem-solving.

Insights into transforming complex mathematical concepts into practical problem-solving strategies in machine learning.

âœ¦

The importance of using convex functions in analyzing and constructing portfolios with constraints.

25:23Convex surrogates for value at risk are highlighted as a practical solution in finance.

Utilizing convex relaxations is emphasized over solving non-convex problems due to its effectiveness.

The widespread applicability of convex functions is showcased in various fields, including finance.

âœ¦

Issues with Value at Risk as a Risk Measure.

26:02Value at Risk is widely used by banks and hedge funds but fails to accurately assess risk.

Importance of choosing the right positive number as a canonical value to impact scaling of variables and objectives.

Explanation of homogeneity in mathematical expressions, demonstrated through examples like norms and absolute values.

Variables are affected by multiplication in mathematical expressions.

âœ¦

Using basis functions in machine learning for simplifying Theta as a linear combination of constant vectors.

29:50Emphasizing that complex functions can be simplified through basis functions in machine learning.

Introducing the concept of feature engineering to transform X and y data for support vector machine analysis.

Setting up inequalities for quadratic discrimination and separating data using ellipsoids in machine learning.

The ellipsoid approach is beneficial for separating data outcomes in industrial processes.

âœ¦

Minimizing the degree of a polynomial discriminator by experimenting with linear, quadratic, and cubic degrees.

34:26Introduction of quasi-convex optimization, treating the degree of a polynomial as a quasi-convex function.

Applying the concept to placement and facility location problems, minimizing based on transportation costs with fixed positions and variable positions.

Universal application in circuit design, managing millions of interconnected gates at various levels.

âœ¦

Designing penalties and functions to minimize distances between connected points in a circuit.

38:37Different cost functions, such as the L1 Norm and sum of squares, are utilized to minimize total edge length.

The concept of chillness in relation to distance calculations is explored, with varying levels of concern for long lengths.

The segment emphasizes optimization strategies for circuit design.

âœ¦

Discussion on placement problems in non-convex scenarios.

42:58Maintaining a distance D between objects is crucial, as pushing objects closer once within D is not advantageous.

Complexity of placement problems increases with additional constraints like non-overlapping placements.

Practical application of these concepts in various fields and homework assignments.

Insights into problem-solving approaches and considerations are provided.

âœ¦

Problem-solving methods in CVX Pi.

44:54Problems are transformed into linear programs, quadratic programs, or semi-definite programs for solution.

The course covers solving LP, SOCP, and SDP problems, starting with linear algebra.

Importance of understanding numerical linear algebra is emphasized for math and computation.

Encouragement to read the book appendix for further insight.

âœ¦

Importance of practicality in solving linear equations.

47:56Some methods may be low in complexity theoretically but impractical in practice.

Algorithms for solving linear equations need to be adaptable to real-world applications.

Structured matrices, like diagonal matrices, can be solved faster by only storing diagonal entries.

Practical solutions may vary from theoretical expectations, emphasizing the need for adaptable problem-solving approaches.

âœ¦

History of floating point operations and misconceptions about efficiency.

50:40Floating point operations serve as a rough estimate of complexity and historical reference.

Vector operations involve inner products with distinct Blast Levels one, two, and three.

GPUs are optimized for these operations, providing incredible speed.

Discussion extends to more advanced processors like TPUs, crucial for efficient computation and system optimization.

âœ¦

Discussion on matrix multiplication and Blast Levels.

54:15Importance of sparsity in calculating inner products of vectors and its impact on computational cost.

Highlight on different structures like low-rank factorization.

Emphasis on efficient methods for matrix-vector multiplication to avoid unnecessary memory allocation.

Mention of complexity of matrix multiplication and reference to computer science theories.

âœ¦

Efficient matrix multiplication using block partitioning and recursive methods.

58:24Google developed a method to multiply matrices with fewer operations, reducing time complexity from n^3 to around n^2.8.

Breaking down matrices into smaller blocks and utilizing cache optimization improves performance.

Understanding computer architecture is significant for optimization purposes, particularly in a multi-core environment.

âœ¦

Importance of GPU speed and capabilities in matrix multiplications.

01:01:42GPUs can handle a significant number of floating-point operations per second.

Low-level device optimizations are crucial for tools like numpy and torch to be used effectively.

Advancements in technology allow for faster processing speeds on GPUs compared to laptops.

Experts play a crucial role in optimizing matrix operations for efficient computing performance.

âœ¦

Importance of Intelligence Over Computational Power in High-End GPUs

01:05:58High-end GPUs can efficiently multiply matrices at high speeds.

Solving linear equations through forward substitution method has a complexity of n^2 flop.

Emphasis on the efficiency of forward substitution in solving lower triangular matrices.

The factor of n plays a crucial role in the complexity of the process.

âœ¦

Efficiently solving triangular equations using backward substitution.

01:08:46Emphasis on upper triangular systems for more efficient solutions.

Highlighting different matrix structures like orthogonal matrices and permutation matrices.

Explanation of solving matrix equations by factoring matrices into products, including QR, eigen decomposition, and singular value decomposition.

Importance of understanding high-level ideas over low-level details for most viewers.

âœ¦

Solving Equations Through Inverting Matrices.

01:12:55Inverting matrices is essentially solving equations through sequences of solves.

Methods like diagonal or permutation are deemed inefficient for solving equations.

Factorization is quick and efficient for solving equations with large matrices in simple cases.

Multiple sets of equations with the same matrix can be efficiently solved in a short time.

âœ¦

Efficiency of Factor Solving in Linear Equations

01:16:25Solving multiple sets of linear equations is as costly as solving one set.

Factorization caching and LU factorization are common methods in the process.

Gaussian elimination is also discussed as an important aspect of equation solving.

Emphasis is placed on the importance of a few specialists in these methods benefiting the larger population through efficient techniques.

âœ¦

Simultaneous process in two approaches.

01:18:43Same idea across both approaches.

Discussion will continue next time.

00:05today we'll we'll uh continue these

00:08geometric type uh describing these

00:11geometric type problems um one thing

00:13that is going to come up actually uh not

00:16yeah maybe in a couple of weeks in the

00:17class is the concept of an analytic

00:19Center um it's actually not an analytic

00:22center of a set although people

00:24informally say the analytic center of a

00:27set they don't mean that what they

00:28really mean is the analytic center of a

00:31set of inqu of of a of a description of

00:33a set which is a set of inequalities

00:35right so and the analytic Center goes

00:37like this um so uh let's say we have I

00:42mean the equality constraints we just

00:43keep right here uh but suppose we have a

00:46bunch of inequalities here right and

00:48what this me we look at uh so obviously

00:51you know the F fi ofx should be less

00:53than zero minus fi of X is what people

00:56would call the margin typic when that's

00:59positive so - fi ofx is the margin and

01:02it's it's roughly you know how much you

01:04have left before you violate the

01:05inequalities that's the margin and the

01:08so-called analytic Center uh of a set of

01:11inequalities is given by minimizing the

01:16negative uh log of the um this is the

01:21margin minus fi ofx right another way to

01:23say it is you maximize the product of

01:26the margins okay so that's that's this

01:29comes up in a lot of other things in log

01:31utilities it's used universally

01:33networking and things like that so it

01:35this is not at all uncommon um okay so

01:38that is the so-called analytic Center oh

01:40of course that's a convex problem why

01:43because minus log minus argument is

01:46convex and increasing

01:49okay um and then the FIS ofx are are are

01:52convex um okay and this is we'll we'll

01:56actually soon very we'll see very soon

01:58that this is actually quite

01:59straightforward to compute matter of

02:01fact you will actually be doing you'll

02:03be writing code that will comp that will

02:06basically compute analytic centers and

02:08things like that um okay so that's the

02:10analytic inequal that is the analytic

02:12center of a set of inequalities um and

02:15here's just a picture of what it looks

02:16like for a polyhedrin right so the the

02:20this shows you you know what what's

02:21feasible the each of these corresponds

02:23to one of the rows here um each of those

02:27lines um the dash curve give you the

02:31level curves of this log barrier so

02:34called This is called the log barrier

02:35it's called a log barrier I think that's

02:37maybe barred from physics or something

02:38like that because it's like a potential

02:40function that goes up to Infinity as you

02:43approach the boundary right so that's

02:46like a barrier for a particle or

02:47something like that right if if that

02:49were if that were an energy um okay and

02:53a couple things you can see um It's

02:56actually kind of cool uh so we've marked

02:58there the analytic Center that's that's

03:00the point where that barrier is

03:01minimized so whatever that point is it

03:04maximizes the product actually of the

03:07margins for the inequalities okay so

03:10that's that that's what it does um and

03:13you can see actually that the level sets

03:16you know uh close down to the analytic

03:19Center are actually of course going to

03:21be very close to ellipsoidal right

03:23that's of course true for any any but

03:25any kind of smooth um any smooth

03:28function right so this is the dark

03:32ellipsoid drawn there is is this uh it

03:36turns out that for the analytic Center

03:38if you calculate the Hess of it then the

03:41ellipsoid defined by the center but with

03:43a one on the right hand side that

03:45ellipsoid is guaranteed to be inside the

03:48set okay so this is we will we'll talk

03:52this will be important later um if you

03:55puff that ellipsoid up by a factor

03:58around M actually M would be enough

04:00because that would be square root of of

04:03M squ there um if you puff it up by a

04:05factor of M then you cover the ellipsoid

04:08so basically it says that the the

04:10ellipsoids defined by analytic centers

04:13also have this approximation property

04:15right that you know it gives you an

04:17inside ellipsoid and then when you puff

04:19it up by a factor it covers the set

04:22right so it gives you this approximation

04:24um here the

04:26approximation depends now remember that

04:29for things like The Loner John ellipsoid

04:31or the minimum volume you know covering

04:33ellipsoid or the maximum volume

04:34inscribed ellipsoid there it depended

04:37only on the dimension here it depends on

04:39M which is the number of inequalities

04:41here right so if m is much larger than n

04:44this is a worse bound worse

04:47approximation bound of a convex set so

04:50okay but anyway this you you'll be

04:52seeing much more about this you know in

04:55a couple of

04:57weeks okay what we're going to look at

05:00now is uh linear discrimination uh

05:03probably most of you have seen this

05:04maybe in some you know basic uh machine

05:06learning class um but here's here's the

05:09problem and you know what there's many

05:11many variations on this but it looks

05:13like this I'm going to give you a set of

05:14points you know X X1 through xn these

05:17are vectors and a bunch of y's and if

05:19you like these are simply feature

05:21vectors associated with a a positive

05:23outcome and a negative outcome in some

05:25kind of Boolean uh problem right so I

05:28have some data uh and what we what we

05:31seek is a hyperplane uh so that's

05:35defined by a vector a and a and a

05:37constant B and we want the we want it we

05:40want all the XIs to be on one side of

05:42that hyperplane and we want all the Y's

05:44to be on the other side so that that's

05:46the that's what we want separating

05:48hyperplane um actually shockingly this

05:52was studied in the early

05:531950s um people came up with something

05:56called the perceptron algorithm which

05:57you might have seen somewhere it's like

05:59like a very early thing and it weirdly

06:03many years went by before someone looked

06:05up and said dude that's an LP uh in

06:09which case people literally down the

06:11hall know how to solve these right so

06:14it's just one of those cases where the

06:15diffusion of knowledge had you know took

06:18some time right so um there is a

06:21subtlety here that we're going to look

06:22at oh uh one thing we have to be careful

06:25of is the notation right so here the

06:29variables are basically A and B uh and

06:32the data are the X eyes and the Y eyes

06:35right which violates this usual

06:36convention right that things at the end

06:38of the alphabet are typical your

06:40typically your variables and things at

06:41the beginning are typically typically

06:42your parameters or data okay so so just

06:45so you have to be on your toes here um

06:47so what sort of inequality On A and B is

06:51is

06:56this what kind of what kind of function

06:58is is is a transpose XI plus b what kind

07:03of function of the pair a is

07:06that it's linear yeah okay that's linear

07:09okay um now there is one thing here we

07:11have these these strict inequalities

07:14okay so let me just say a little bit

07:16about that um there are a lot of cases

07:19where strict inequalities are are silly

07:21and mean and just mean nothing and you

07:23just quietly replace them with

07:25inequalities okay so here'd be an

07:27example you're working with someone and

07:29they're just designing a a power

07:30amplifier and they go your budget is 50

07:34m in your design and you go cool okay

07:37and then they come back and they go it's

07:38strict you have to be less than 50 Ms

07:41not less than or equal to and it's like

07:44okay this is just completely idiotic and

07:45makes no sense everybody see what I'm

07:47saying so you just you basically you

07:49just quietly you try to explain to the

07:51person that that doesn't make any sense

07:53and you quietly replace less than less

07:55strictly less than with less than or

07:57equal to and no one gets hurt everyone

07:59got that this okay so that's fine um now

08:03if we try that here let's see what

08:06happens right so you'd say well AI

08:09transpose X plus b is bigger than equal

08:11to zero I also want a transpose Yi plus

08:15b less than or equal to zero anybody got

08:17any suggestions about how to solve that

08:19system of

08:22inequalities a and b z a and b z I like

08:26it so A and B Z works okay so this is

08:29one of those cases where the distinction

08:31between strict and non-strict actually

08:34matters right it it actually matters

08:37because when you make it non-strict when

08:39you make it non-strict it's got a silly

08:41solution which is uninteresting right

08:44okay so in in a lot of these cases you

08:47actually have to hand argue each one I'm

08:49not aware of a general principle okay

08:52you hand argue each one and I'll do that

08:54argument right now um these the the main

08:58arguments go something like this this

09:00thing is that inequality is

09:03homogeneous right in other words if I

09:05double A and B then whatever this is

09:08which is supposed to be positive goes up

09:10by Factor two right everybody agree same

09:13over here right for this same for the

09:15other one right if this is supposed to

09:17be less than if I if I double A and B

09:21these things are negative whatever they

09:22they just got twice as negative

09:23everybody following this so what I could

09:26do is my claim is that I can I can

09:29without loss of generality replace these

09:31strict inequalities with these

09:34non-strict inequalities with a one the

09:36way I do that is up here if you give me

09:38an A and A B that satisfies this I

09:40simply scale them by whatever factor is

09:43needed to make you know however NE

09:45however positive that was how negative

09:46that was I multiply by a factor which

09:49makes that bigger than or equal to one

09:51everybody got it so that says if there

09:53exists A and B that satisfy that then by

09:56multiplying them by a positive number

09:58they're going to to satisfy this the

10:00converse goes like this it's Converse is

10:03relatively straightforward because if I

10:05tell you a number is bigger bigger than

10:06equal to one is definitely

10:08positive right so everyone got this it's

10:12actually a subtlty it's actually quite

10:13important and I don't know you you could

10:16EAS you could easily encounter

10:18this that was a I mean actually both in

10:21your in in research or other things or

10:24other places so this is something we

10:25kind of expect you to know about

10:28everybody yeah

10:29is there anything stopping us from

10:31saying a transpose X+ B is greater than

10:34a million nope it's fine that's totally

10:38cool yeah yeah so if you want to put 10

10:40the six here and minus 10 the 6 that's

10:43perfectly good right that um and then to

10:46be quite precise what I'm saying then is

10:48the first set of

10:50inequalities are feasible if an only if

10:53these are feasible if and only if your

10:56your uh your your variation were one is

10:59replaced with 10 the 6 is feasible and

11:02they're all if if one is feasible all of

11:05them are feasible period right and if

11:08one is infeasible all are infeasible

11:10right and how do we get how do we

11:12translate among them we just scale them

11:13by various things right so like if I

11:16have an A and B that satisfies these and

11:18I want to satisfy Zer I just multiply a

11:19and by a million and then they'll

11:21satisfy it make make sense so okay all

11:27right so that's the so we did did it

11:29very quietly but um you know or I did it

11:32with just half a line saying they're

11:33homogeneous in AB but that was the I

11:35just gave you the full argument for why

11:37that's the case Okay um all right so

11:41actually what does this say oh sorry we

11:42can go back this this says that if you

11:45gave me a pile of vectors you know one

11:48set of vectors and another set and you

11:49wanted to know do they can you separate

11:52them with a hyperplane and the answer is

11:54just an it's an LP feasibility problem

11:57right that that's what it says so it

11:58means that that's immediately like uh

12:00tractable right we we could just solve

12:02this okay okay well we could do more uh

12:08you could do things like this um you can

12:12actually say well I just now that I know

12:14that there is one I might want to pick

12:16one of these separating hyperplanes

12:17right and you know one way you might do

12:20it is something like this um if I have

12:23two hyperplanes like this uh like that

12:26and this they're normalized with a one

12:29on on on the right hand side or minus

12:31one um the distance between them is two

12:34divided by the the two Norm of a right

12:37you can just calculate that right in

12:39fact this is a slab right so this middle

12:42thing this is a transpose X plus b

12:45equals z this is I don't doesn't really

12:47matter but this is either let's say the

12:49minus one and this is plus one let's say

12:51right so that's what that is and the

12:53width of that slab is going to be two

12:57over the square uh two over the torm of

13:00a right um by the way what kind of

13:02function is this convex

13:09concave it's a trick question but a

13:11perfectly good one I think like what

13:14what uh what what kind of function is

13:15that of of

13:22a the answer is it's neither convex nor

13:25concave I was just messing with you I

13:27just wanted to see if you'd anyway it's

13:28fine it's not right so okay it's nothing

13:33right so so if you say if you say

13:35maximize this it's not it's not concave

13:37that's for sure okay um yeah you can

13:41kind of you can picture it in your mind

13:43if you want right in R2 uh the norm you

13:47know there's it's got level curves like

13:49this and that thing is a function it's a

13:52crazy thing with a weird Pole right at

13:54zero it goes up to Infinity so it's

13:56definitely not it's not convex and it's

13:58not concave actually in some directions

13:59it's convex and in others it's concave

14:01so okay all right doesn't mean we can't

14:05maximize it right because maximizing

14:08this is the same as minimizing one over

14:10it because it's non it's positive right

14:13so um that says this problem is convex

14:16that's a quadratic program um and that

14:20is this will this this if you solve this

14:23problem this QP you'll actually get the

14:25the thickest uh you'll get the

14:29thickest separating

14:31slab make sense right and in fact we can

14:35even say a little bit about what how how

14:36you there's an interesting way to to get

14:38that slab so for this for this problem

14:40over here I think this shows you the

14:42separating slab for these points and

14:44these points right this all all make

14:48sense okay um and let's see that's a

14:52that's a QP again you have to be very

14:54careful because you have to see what the

14:55variables are and all that sort of stuff

14:58uh here the variables are A and B XI and

15:00Yi are are data okay so that's that's uh

15:05that's that's like robust linear

15:06discrimination is the way you might say

15:08it that's dialect that's like machine

15:10learning old school machine learning

15:12dialect okay

15:14um now it's actually kind of interesting

15:17to take for this problem uh to to work

15:20out what is the what is the Dual uh of

15:23this QP it's not a QP sorry I I misspoke

15:27it's not a QP unless you square it

15:29that's an equivalent problem that's a QP

15:32this would be I guess ancp right but I I

15:35think that none of this would matter

15:37okay so if you work out uh what the lron

15:41duel is um you get something that looks

15:45like this right you had these two sets

15:47of inequalities and you end up with

15:49something that looks like this you

15:50should maximize one transpose Lambda

15:52plus one transpose new um mu subject to

15:57and you get something that looks like

15:58this it's it's actually kind of

15:59interesting right and it says that the

16:00sum of the lambdas and the sum of the

16:02M's have to be uh equal right and you

16:05get something like that okay and the

16:08optimal value of that top problem is

16:10actually the maximum margin of

16:13Separation right it it it's the it's the

16:16it's the minim it's the the norm or

16:18whatever it is right

16:20okay so if you stare at this long enough

16:23you realize that we can actually uh what

16:25we can do is change variables to uh

16:28normaliz the lambdas and the M's they're

16:31both non- negative vectors and we simply

16:33we replace them with vectors that sum up

16:36to one and are not negative otherwise

16:38known as mixtures mixture coefficients

16:40so we do that um and you you you plug

16:43that in up here and various things

16:45happen and you end up with this problem

16:47here and now this is very cool because

16:49now we can we can look at it right this

16:52says that Theta is is a set of mixture

16:54coefficients and so this is a general

16:57point in the convex Hull of the original

17:00X's right because it's a mixture of them

17:04this thing here again these are

17:06coefficients that are non- negative add

17:08up to one this thing here is actually a

17:11general point in the convex Hull of the

17:13Y eyes and what this says is you should

17:17minimize T subject to this Norm less

17:19than T obviously it's the same as just

17:21minimizing this thing this is the

17:23epigraph formulation right and then this

17:26so what it really saying is very very

17:29geometrically clear it says minimize the

17:32distance between two

17:34points uh one is in the convex hell of

17:37the x's and one in the convex hell of

17:39the Y's does that make sense so it's

17:42it's actually very cool so if we go back

17:44here it says the Primal says basically

17:48says expand find the slab the thickest

17:51SL expand a slab until it just until it

17:53stops spitting between the two sets the

17:55other one says form the convex Hull of

17:57the dark points in your head just do

18:00that right visually form the convex Hull

18:02of the of of the the the white circles

18:05here and then find a point in the two

18:08points in those two sets which are

18:10closest to each other and sure enough

18:12you're going to get something like I

18:14don't know probably that that dark point

18:16and then a point over there just all

18:19make

18:20sense right so it's kind of it it just

18:22means that a lot in in this in these a

18:25lot of these geometric problems duels

18:27are interpretable are usually highly

18:29interpretable and kind of interesting I

18:31meanes doesn't help you do anything it

18:32doesn't solve it better blah blah blah

18:34but it's just sort of interesting just

18:35to see that it all makes

18:36sense right so

18:40okay um I

18:42think we looked at that and now we get

18:45to something that's actually well useful

18:48uh so here it is

18:51um we now what it is is you have um you

18:56have points that are not linear nearly

18:58separable right and again if you've had

19:01a like a first course maybe in the first

19:03two weeks of a machine learning class

19:04you would have seen this this is like

19:06the so-called um it's getting close to

19:08the the idea of the support Vector

19:09machine which we'll get to in a minute

19:11um but what you do is you add and this

19:13is in the style of How It's described in

19:16a machine learning class which I think

19:17is poorly but that's doesn't matter

19:20that's fine um and so what you do is you

19:22say well look I can't satisfy these

19:24inequalities and you go cool I'm going

19:25to give you some

19:26slack that that that by the way that

19:29parsed both mathematically and

19:32colloquially right so what it means is

19:35instead of requiring these things to be

19:36bigger than one I'm going to say they're

19:37bigger than one minus UI UI is non-

19:41negative it's your slack it's how much

19:43you're going to we're going to shrink

19:44we're going to shrink your the

19:45requirement on the constraint so you can

19:47be so you can satisfy it right and then

19:49we're going to say something like let's

19:52minimize the sum of of this slack like

19:55how much do you act how many

19:57inequalities do you actually need do I

19:58need to give you slack on so you can

20:00satisfy them and you'd get a problem

20:02that looks like this right um and it's

20:06actually quite interesting uh here we

20:08would do it we would write this a very

20:10in a very simple way but that's okay uh

20:13but okay and it turns out this is kind

20:15this is actually kind of useful this is

20:17now it's a urtic for minimizing roughly

20:19the number of misclassified points um

20:23which I can I can show you what that

20:25would be actually why don't I show you

20:27that so so here's what it would look

20:29like

20:31um uh let's see I I will write down let

20:34me see what it is it's let let's take uh

20:36AI so this is AI

20:39transpose um a transpose XI plus b right

20:44we want that uh yeah here here's zero

20:49right and if I ask you whether you have

20:51misclassified a point XI or not the

20:54question is whether this is negative

20:57right so the Mis if if you're going to

20:59misclassify that's a that's a problem

21:01that that's something that looks like

21:03this right that's a a cost right so if I

21:06took this function of these I would

21:09actually what I would end up getting is

21:11precisely the number of data points I

21:14misclassified everybody agree with that

21:16because if this is negative it means I

21:18wanted this thing to be positive or okay

21:20here I can here if you if you if you

21:22insist that's positive I guess I have to

21:24make it like that okay right so that's

21:27what it so so some now this function of

21:29course is not convex so you can't

21:31minimize that right um then what this

21:35says is you should actually do something

21:38like this you should say I want that to

21:41be at least

21:42one and then your cost function the cost

21:45function we have looks like

21:48this because it's the it's the amount

21:52and actually it's super interesting it

21:53makes perfect sense that's equivalent to

21:55this problem here that's these numbers

21:58up here the U's and V's um so that's

22:01convex and it's actually super

22:03interesting to think about what it means

22:04here's what it means it says that if you

22:07have a data point um and we think of one

22:10as the margin right so it says you know

22:14what if you classify correctly with at

22:16least this me over here means you

22:19classify correctly bigger than zero if

22:21you classify correctly with a margin of

22:23one more than one or more then that's

22:26cool there's no cost like you you did

22:28fine right you start charging over here

22:32we're actually charging for classifying

22:35correctly but getting but being less

22:38than the margin we desire everybody

22:40following this okay and then it just

22:43Grows linearly Right so and actually you

22:46can see something immediately which is

22:47this this function is bigger than this

22:49other function and so it turns out that

22:51the optimal value of this is actually an

22:54upper bound on the number of

22:56misclassifications so I again if you're

22:59following this that's great and if you

23:01aren't that's probably okay too right so

23:04okay

23:05everybody all make

23:07sense I'm how many people actually have

23:09seen things like this in a machine

23:12learning class I hope a fair number good

23:14okay

23:15good okay and then all you do now is you

23:19add uh I guess they would actually

23:21Square this probably up here they they

23:23would Square this thing um and then this

23:26trades off to things it trades off this

23:30will trade off the two desirable things

23:32um number one having small slacks small

23:35slack corresponds to this surrogate

23:38function in fact there's a whole way to

23:40view this which makes perfect sense

23:42which is someone says says find me a

23:45hyperplane that misclassifies as few

23:47points as possible and you go cool that

23:49the function looks like this and you go

23:50I can't do that because it's not

23:52convex then you'd say well I'll I'll use

23:55this convex function here uh make you

23:59makes perfect sense right and you say

24:00I'll minimize that everybody following

24:03this um Anyway by the way that story uh

24:07often in many many fields that story

24:10usually has a very good outcome right

24:12that someone says I want to do X and

24:15you'd say can't can't do that not convex

24:18then someone says but here's like a

24:20surrogate which is convex and then

24:23people observe it to work really well in

24:24practice usually about 5 years later

24:27somebody writes a a theory oriented

24:28paper that basically says actually you

24:32never even wanted to solve the first

24:34thing which you can't solve anyway

24:35because it's a bad idea you really

24:37wanted to what you really wanted to

24:39solve was the convex relaxation so I

24:42won't I yeah I won't go into details on

24:44that but that comes up many many times

24:46like I mean here's one in finance people

24:49would say you know what you know at at

24:52you know with with 99 you know what what

24:54what is what what is my you know

24:56whatever my09 9 value at risk that's how

24:59much money could you lose you know in in

25:02a 1% event right that that's value at

25:05risk by the way it's used universally

25:06throughout Finance banking it's written

25:08into regulations

25:10everything um so some people figured out

25:13a very cool thing called conditional

25:15value risk sear if you know it doesn't

25:18you don't have to know what these are

25:19that's actually kind of that for value

25:21at risk it's exactly the same thing

25:23right it's a convex surrogate for value

25:26at risk everybody following this

25:28okay it's convex and that means we can

25:30do all sorts of stuff we can analyze

25:31portfol we can construct portfolios that

25:34that that you know have constraints on

25:35SAR and all sorts of everybody got it

25:37okay 10 years later people analyze it

25:40and find out that the value at risk is

25:44what's called an incoherent risk measure

25:47and carar is not so they write down some

25:50axioms I mean you know again this is

25:52just the story right you you write down

25:53some axioms that a that a risk measure

25:56should have um and what it means is that

25:59value at risk uh is actually it it's

26:02actually just a bad measure of risk

26:05right and too bad that it's what all

26:08banks use many hedge funds it's written

26:10into regulations right so it tells it

26:13tells Banks how much money they have to

26:14have on on hand to avoid a run on the

26:17bank and all this kind of crazy stuff

26:19but anyway so so that's a perfect

26:21example where the convex thing turned

26:23out to be actually what you really

26:25wanted to do not the one that sounded

26:27the most natural in the first place

26:29right this is another one it turns out

26:31here's an incredibly bad idea is to say

26:34I have a bunch of data please find me a

26:36separating hyper plane that separates

26:39that that misclassifies as few as

26:41possible but that sounds completely

26:43reasonable anybody would say that right

26:45it sounds reasonable in not it's not

26:47something like this is actually much

26:48better yeah in this problem it matter if

26:51we choose uh like plus - one as

26:55the like

26:58yeah exactly no no no no it doesn't it

27:00is not going to make any difference at

27:01all right um you could make them 10 6 if

27:04you want right and and and actually what

27:07would happen is just everything would

27:08just scale right uh because actually if

27:10you look at it everything's homogeneous

27:11including the objective this way right

27:14so everything would just scale wouldn't

27:15make any if you don't like if you don't

27:16like one as your canonical positive

27:19number choose another one you can make

27:22it E just to mess with people or Pi you

27:24can make it square root Pi just just

27:28that's not a bad idea just to do that

27:30just you know so that somebody poking

27:32through your code later goes like whoa

27:34what is

27:35that yes you clarify what you mean by

27:38homogene oh homogeneous means that if

27:41you multiply uh if you multiply the

27:44arguments or the variables by an in this

27:47case by a positive number um then that

27:50expression value goes up by the same

27:52number right so a norm is a great

27:55example an absolute value a norm right

27:58um and you can see here uh that you know

28:00the variables are like a b u and v and

28:03If I multiply them all by two you know

28:05you tell me what happens to the

28:06objective goes up by 2x right so it's

28:10homogeneous right so that's that's the

28:12uh yeah okay

28:15everybody got this idea

28:18okay well you can actually do um you

28:22know I mean this is people in I guess in

28:24machine learning you describe this as

28:25kernel methods but uh we could just do

28:28this right um you know suppose you want

28:31to separate two points by some nonlinear

28:34function right um now of course it's a

28:36gazillion functions that separate two

28:37sets of points because I could I could

28:39actually Define a a function that's

28:41positive right around each of our X's

28:44negative around each of our y's right

28:46and then I don't even care what it is

28:47anywhere else that couldn't it doesn't

28:49matter it's a half I don't care or zero

28:51that doesn't make any difference right

28:53that's silly uh you want so I mean you

28:56want some family of functions or

28:58whatever to regularize your choices um

29:00and so you know you could do things like

29:02this uh we can make uh a b so F of ZZ

29:06contains these basis functions and then

29:09Theta is now a linear combination of

29:11basis functions right one of them could

29:12by the way could be one uh that would be

29:14a very common basis functions uh and

29:18then you what you want to do is you want

29:20to find these inequalities now this

29:21looks complicated right because maybe

29:23somebody in in these things there's like

29:25SS cosiness hyperbolic Co I mean who

29:28knows what's in those there's you know

29:30some weird things I have no idea maybe

29:31those are like weird radial basis

29:33functions it doesn't matter right um so

29:36but it's looking it make you nervous but

29:38the point is that's data and so these

29:40are actually these are just vectors

29:42they're just constant vectors and it's

29:44exactly the same it's exactly the same

29:46problem right so okay that make

29:50sense maybe an even simpler

29:53interpretation of this is even even

29:55better and really dumb uh you think of

29:59this as I would think of this as feature

30:01engineering which is like thanks for

30:03your X's thanks for your y's I'm going

30:05to put them through some functions maybe

30:07I get a maybe I get a big a bigger

30:09vector and you say now just do support

30:12Vector machine on that or separate them

30:14that way that's another way to to say it

30:16so okay um so you can do crazy stuff

30:21like you could say I want to do

30:22quadratic a I do want to do quadratic

30:24discrimination so to do that you set up

30:27inequalities that look like this and

30:29this and the variables here are PQ and R

30:32what kind of inequality is this and this

30:35in PQ and

30:38R yeah they're they're linear right

30:41they're because the left hand side is

30:43actually linear in the triple pqr right

30:46I mean and you have to get used to this

30:47because you see X transpose PX and you

30:51immediately want to ask you know it it

30:54sounds it looks quad it is quadratic in

30:56XI but XI XI is data so it's it's not a

30:58it's not a variable your variable's P

31:01right so uh that's what it and you could

31:03do crazy stuff like you could say you

31:05know what uh I want to see if I can

31:09separate some data by an

31:10ellipsoid right and that would just be

31:13uh that would end up being an sdp you'd

31:16have an inequality constraint uh uh

31:19you'd have a a matrix inequality

31:20constraint on P something like that

31:23right um and this would be like a

31:26separation by an ellipsoid um somebody

31:29want to name a a useful a use for

31:33that to please to I give you a bunch of

31:36data some is let's say maybe these are

31:39like good outcomes or something maybe

31:41these are bad ones so anybody want

31:47to location what's that location yeah I

31:51mean it could be yeah this could it

31:53could be that you have some incredibly

31:55complex industrial process you set your

31:57knobs various ways um and you run the

32:00process and you get a bunch of data that

32:02looks like this these are the ones that

32:04came in these are the product that came

32:05out up to spec this is the one that

32:07failed right and so uh what this will do

32:10is it'll give you an ellipsoid in the

32:12original space that tells you if you set

32:14your this is this is where you want to

32:16set your uh this is how you want to set

32:19those those values right and you get you

32:21know ellipsoidal safe set or something

32:23like that that make make sense okay and

32:27then you could do very weird stuff like

32:30uh here this is we've separated these

32:32points by a a quartic right I mean but

32:35anyway you you get the once you get the

32:37idea there's you can do anything you

32:39like with it um although you have to

32:42admit it looks kind well okay I'll I'll

32:44try one for you I give you a set of I

32:46give you a set of points some x's and

32:48some y's and I say I'd like you to

32:50separate them by a minimum degree

32:54polom I tell you what the degree of a

32:56polinomial of X and Y is but let's say

32:58it's the maximum a polinomial has got

33:01you know it's it's a sum it's a linear

33:03combination of x i to the pi you know uh

33:07Yi to the Y to the sorry yeah x to the

33:10pi y to the P Qi I'm going to take P

33:13plus Q as the degree okay so someone

33:16tell me can you solve that problem and

33:19what kind of a problem is

33:22it how do you how do you minimize the

33:25degree of a polinomial discriminator for

33:29given

33:33data first of all could you do it

33:36yeah yeah by iterate you mean a try

33:40degree one otherwise known as a

33:42hyperplane fail it's a fail then you say

33:44I'm going to try quadratic fail cubic

33:48succeeds then you're done is that that

33:50was that was your algorithm I like it

33:51perfectly good um does it is it anything

33:54else it's some it's a we have a name for

33:56it

33:58that's just quasy convex optimization

34:01right because the degree of a polinomial

34:04is actually uh I mean it's a silly one

34:06but it's a quasi convex function right

34:08because I list out all the coefficients

34:11let's just do it for one variable I list

34:12the coefficients of a polinomial and the

34:16degree is I look along the vector for

34:18the largest coefficient that's not

34:22zero that's actually quasi convex Okay

34:26so just to you just go on and on with

34:29these things you can probably write a

34:30paper about some of these things I mean

34:32hilariously but actually almost

34:34certainly it' actually probably be

34:36better than a lot anyway okay I'll stop

34:38right there um okay um so our last topic

34:42is is called a placement and facility

34:45location and this is just there's a

34:47whole bunch of problems that look like

34:48this right so it's typically goes down

34:52in like two or three dimensions right

34:53but there's cases where it goes down in

34:55in higher Dimensions right and the way

34:58it works typically is some of the

34:59positions are given and the others um

35:02are variables and then for each pair of

35:05points I have a a function that's a

35:07function of the two the two points the

35:10two locations right I mean actually in

35:12many cases they're zero meaning it

35:14doesn't really matter um and then you

35:16just want to minimize the sum of these

35:20functions that that that are functions

35:22of the two locations right and you know

35:26there's a lot of

35:27interpretations um so I mean the famous

35:31one I think it's called like warehouse

35:33location problem or something like that

35:35and it basically says what what you do

35:37is you have a bunch of facilities I andj

35:39fi is the transportation cost between

35:43the two facilities and that might grow I

35:45don't know it might be linear it might

35:46go quadratically or something it doesn't

35:48matter it's

35:49something um and then uh some of the

35:52locations are fixed and then you want to

35:55choose the locations the other so that

35:58this total cost is minimized right your

36:00total transportation cost um this is

36:03kind of universal in circuit design

36:05right so um what what happens there is I

36:10don't know you have any of the many

36:12chips my phone and your phone all your

36:14electronics would have little cells I

36:16mean it this goes down at many levels

36:18but let's do it at the gate level so you

36:20might have I don't know 10 million Gates

36:23Right each gate is connected to you know

36:27order magnitude three or four others

36:29right roughly right I mean some some

36:31have you know big fan in some have big

36:33fan out but mostly they're connected to

36:35like between one and three others right

36:37or actually I guess two and whatever

36:39four others something like that right um

36:41then what you want to do is you want to

36:43choose where these where these where

36:45these gates are going to go and you want

36:46the ones that are connected to each

36:48other to be nearby and there you would

36:51actually take the uh th this thing the

36:55cost it this is this is zero if Cell I

36:59and cell J are are not connected to each

37:01other and this would simply be something

37:03like the L1 Norm uh if they are right

37:07the L1 Norm because when you run wires

37:09on a circuit you you don't run like this

37:11you you go this way and then you go

37:13through a via to another layer and then

37:15you go that way or something like that

37:17everybody got that so that's this is

37:20kind of like universally used right so

37:24um okay so that's an example um we'll

37:27we'll look at some examples here I mean

37:30a b just a baby example just so you can

37:32visualize it um and here by the way the

37:34idea of Designing penalties and

37:37functions is just comes up again uh so

37:40here it is I guess these these these

37:42ones on the boundary we imagine are

37:44fixed points right in a circuit those

37:45would be like something that you've

37:47already settled on or it could be

37:50something like a pin you'd say that's

37:52where the external pin comes in it's

37:54locked down that's where it is that's

37:56where that comes from um and then you're

37:58placing uh these things right so let's

38:02see what they are let's see if this

38:03makes sense in this first case so what

38:07we're going to do is we're going to

38:07minimize so and and there the lines here

38:10show you to graph and it tells you which

38:13things are connected to which others

38:15okay and what we do is on every one of

38:17those we take the distance uh in this

38:19case the ukian distance between the two

38:22and we take a function of it so if we

38:24literally just minimize the ukian

38:25distance we get this thing here like

38:27this okay so whatever this is this is

38:30the arrangement of those seven or

38:31whatever uh black dots in there and such

38:34a way that the total length of all of

38:37those edges is minimized everybody got

38:41that um now we go to now we minimize the

38:46sum of the squares by the way this is

38:48silly that's a uh I mean this is in a

38:50linear algebra class it's because it's

38:51just a lease squares problem with the

38:52quality constraints and that's gotten

38:55analytical it doesn't matter right but

38:57but that's what that and uh what

39:00happened they got more spread out and

39:01I'd like someone to

39:06explain using kind of I you know just

39:10anthrop morphies it so why would that

39:13happen right uh by the way this keeps H

39:16this is the fourth power here so uh and

39:19they get spread out even

39:21more somebody yeah longer the first one

39:27Square you mean in this one yeah

39:30compared to the first sorry yeah in the

39:32first one uh basically you would if if

39:35you're going to explain this cost

39:36function you'd say it's pretty chill

39:38about compared to squaring the distance

39:42just taking the distance pretty chill

39:43about long long about long lengths right

39:47and so sure enough you get a bunch of

39:49long lengths right in the second one

39:52it's way less chill about that right um

39:56that's that's what you're going to say

39:57yeah a smaller L exactly so yeah it's

40:00chill about the yeah so it's it's it's

40:03and and then this one is Chill about big

40:05big long lengths but it still cares

40:08about small so it's actually has an

40:10incentive to push things together even

40:12when they're pretty close this one here

40:14this is square and so what that means is

40:16doesn't like long lengths it's at all

40:18because you square the square the length

40:21short distances are actually okay

40:23because short squared is like real small

40:25everybody right

40:27everybody got this okay and so once you

40:31see that you understand kind of all of

40:32this and then I have a question for you

40:36um I'd

40:38like what do you what would you imagine

40:40if I I used the following uh function uh

40:45did I write it as h of uh I did okay so

40:49h of U is going to be uh I'll just draw

40:54it if you don't mind zero uh some

40:57distance D and then that so suppose it's

41:01dead zone linear like

41:05that what does it

41:09mean if I did that if I did placement

41:11with that function what it would say is

41:13you know what if you're with a distance

41:15D like I'm totally cool and actually I

41:18don't get happier if you put things even

41:20closer than D everybody got

41:23it okay and I want to know I would like

41:26to know what do you think would

41:28happen

41:31if a lot of values get pushed to d uh if

41:34they're above D exactly so so the a lot

41:38of distance values would be kind of at

41:41d right just tically right because it

41:44basically it says look the incentive to

41:46push things closer once you get within D

41:49of of someone you know it's a neighbor

41:51if you have an edge once you get within

41:53D this thing doesn't care anymore and

41:55you might as well just go go work on

41:56other stuff go find things who whose

41:58distance is bigger than D and work on

42:00those right I mean I'm completely

42:01anthropomorphizing it but that's what

42:03makes sense so actually what's cool

42:05about that is the placement problem gets

42:10hard non-convex when you add one more

42:13extremely realistic constraint um and

42:16it's this I'll just mention this because

42:17if there are people here who who

42:19actually do circuits or stuff like that

42:22so the a more realistic placement

42:24problem is something like this each of

42:26these things actually is like a a cell

42:30right it might look like this and this

42:31one looks like this right and maybe the

42:34the XIs are the are the center and an

42:37extremely common constraint is that they

42:39don't overlap I mean kind of duh right

42:42so um that constraint is non-convex

42:45everybody got that here's what's cool if

42:49you do this that's actually a pretty

42:52good convex euristic for getting a

42:54layout where things are at least D

42:58apart everybody got this and I could I

43:01could actually tune this if these were

43:04squares I could make for each Edge I

43:06could make D equal to uh one of the r uh

43:10the like diameter of the radius of this

43:13one plus the radius of that one and in

43:15that case it's a urtic but I will tend

43:18to get things that don't

43:20overlap so okay but we're get yeah make

43:23make sense yeah okay

43:27all right so that's that's placement

43:29okay so this actually finishes uh a a

43:32whole section of the course that was on

43:34just sort of

43:35applications um I mean it was just kind

43:39of a whirlwind tour um you're you're

43:41encountering a bunch of other

43:42applications in your homework so and I

43:44think the idea is just to see a whole

43:45lot of these in different fields

43:47different areas how it works um and then

43:50also know about I mean of course in the

43:52homework and watching these things I'm

43:55telling you what the problem is but one

43:58of the things you want to be thinking

43:59about is like if you had if you

44:02encountered a problem you know how would

44:04you set it up so that it would do

44:05something that you wanted to do I mean

44:07which is kind of the idea behind

44:08designing uh fun you know uh penalty

44:11functions and things like that right so

44:14okay um now the next part of the course

44:19is actually going to be on how do you

44:22how do you actually solve problems so

44:24that that's that's going to be the next

44:27course sorry the next section of the

44:29course it's actually and we may we'll

44:31probably have a a small bit at the end

44:33where we'll actually do some other

44:34interesting uh applications but this is

44:37important so we'll do this first um so

44:39we're going to talk about how uh like

44:42what happens when let's say you call the

44:45solve method in CVX PI right um I think

44:48actually we had discussions about what

44:50what happens at the highest level at at

44:52the highest level what happens is it

44:54takes your problem and it trans forms it

44:57just you know it's you get an equivalent

44:58problem and then you get another

44:59equivalent problem and another one and

45:02you you reduce it through a chain of

45:04equivalences until you get to some

45:07standard form we can solve like a linear

45:09program quadratic program or second

45:11order cone programming or semi-definite

45:13program okay so we that's first we're

45:14actually not going to talk about that

45:16although it is super interesting but it

45:17just uses the ideas from the beginning

45:19of the class of a reduction of

45:21equivalent problems so we're not going

45:23to talk about that that's what CVX Pi

45:25does now we're going to the next section

45:27of the course is like how do you solve

45:28an LP or a socp or a semi-definite

45:32program that's what we're going to look

45:33at and there it's also we're going to

45:35work our way up and the way that's going

45:39to work is we're going to start with at

45:42the very Basics with just linear algebra

45:44so you know how do you solve linear

45:45algebra things and we're going to then

45:47then you'll see that we'll build we'll

45:49we'll start with that on top of that

45:51we'll we I'll show you how to solve uh

45:54well let's see we'll we'll see show you

45:56how to solve quadratic problems oh by

45:58the way solving qu quadratic problems

46:00that's linear algebra then the next step

46:03up this is just to give you the big

46:04picture of what we're going to do um by

46:06next week or two weeks then we'll say

46:09something like okay uh we will solve

46:11your non-quadratic convex Problem by

46:15solving a sequence of quadratic problems

46:17and but quadratic problem solving a

46:19quadratic problems linear algebra and

46:21then the stuff we're going to talk about

46:22right now comes into play and then we'll

46:25go above that and eventually we'll get

46:27to Interior Point methods which will

46:28handle constraints and everything else

46:30all this way so that's the the big

46:32picture Okay so okay so what we're going

46:35to do is going to go all the way down to

46:37the bottom well we're not going to go to

46:38the gate level or anything like that

46:40we're going to go to the uh you know the

46:42the the floating Point level right so at

46:44the floating Point level so we're going

46:45to talk about numerical linear algebra

46:47background actually honestly I think

46:48everyone needs to know this it's it's I

46:51mean I'm just going to give you a little

46:53tiny bit there's an appendix on this in

46:55the book I please read it um actually

46:59just because this is useful

47:01for this is this is useful anyone here

47:04if you if you do anything has anything

47:06to do with math and computation like

47:08anything which is I mean pretty much if

47:10you're in this class you're in that

47:12category right you need to know this and

47:15it's not it's kind of to me it's

47:16completely unacceptable to not know at

47:19least a bit about how linear equations

47:21are solved and I'll tell you why because

47:23for example you're going to learn crazy

47:24things like we'll see cases where on my

47:27phone I can solve a set of linear

47:29equations it's like a million by a

47:31million right so you need to know what's

47:34easy what's hard and all that kind of

47:36stuff and it is not intuitive until you

47:39have seen this material some of it is

47:42not obvious at all okay so okay so

47:47that's the that that's the big

47:49picture um and so we'll start just by

47:53talking about you know you know you know

47:56Matrix structure and algorithm

47:57complexity

47:59um so you know suppose we want to solve

48:02a set of linear equations right like ax

48:04equals B here right so we want to solve

48:06that you know if a is like for General

48:08methods it it grows like n cubed right

48:12uh then oh and I I should mention

48:14something about this in theoretical

48:16computer science actually the number is

48:18not n cubed does anyone here know what

48:21the current number

48:23is somebody here's got to know

48:26go

48:27ahead by the way I wouldn't know so if

48:29you say the wrong number it's F totally

48:31fine I mean I know the range it is what

48:33is it I know it's between two and three

48:35I think it's like

48:372.4 yeah okay it's it's it's around 2.4

48:40maybe 2.3 something I I have no idea

48:43just got BR the record just got Bren by

48:44Google um because they use reinforcement

48:47learning to up with a faster algorithm

48:49awesome so it's what 2.3 now I don't

48:51know somewhere around okay yeah okay

48:53it's 2.3 right so end of the 2.3 um now

48:56you know these are I mean that is

48:58fascinating right because it can't be

49:01less than N squared because you got to

49:02look at you got to look at all entries

49:04of the Matrix right so you might imagine

49:07you know how could you do this without

49:08you know sort of n cubed anyway you can

49:10it turns out those algorithms are

49:11utterly and completely useless in

49:13practice uh just utter completely right

49:16I mean they look maybe sometime in the

49:18future they won't be but right now

49:20they're they're just useless right super

49:22interesting

49:24um and weirdly the methods people use to

49:27show that low complexity in theory are

49:30actually extremely close to the methods

49:32people use to make solving these things

49:34fast in practice so I will say what that

49:36is short uh what what that is shortly

49:39and that part is not obvious at all so

49:41the idea is super useful but the actual

49:43results are utterly useless right those

49:46okay back to this um so what's cool is

49:50if a is has got some structure to it um

49:54then you can actually solve this a lot

49:55faster faster like I mean here's like a

49:57no-brainer what if I told you a is

49:59diagonal right then you're like I know

50:03how to solve ax equals B when a is

50:04diagonal and you go but I said what if a

50:07was a million by a million you'd say

50:11yeah I can do that in about 300 micros

50:13seconds on my phone um everybody see

50:16this and you go but how would you store

50:18a million and you go ah well that's the

50:19point I I'm only going to store the

50:21diagonal entries so I'm not going to

50:22store the ones are zero is every okay I

50:23mean that's that's an extreme case

50:26I think most people would get that a

50:28profoundly stupid case but there's going

50:30to be others where it is not obvious at

50:33all right so uh and we will we'll get to

50:36those um okay all right so here's how

50:40this works and it's really cool the

50:41history is very cool goes back to the

50:4260s maybe even earlier uh so um so a

50:46floating Point operation is yeah it's

50:49like an addition of two floating Point

50:51number subtraction multiplication or

50:52Division and Truth is that's a big old

50:54lie because I think I think a division

50:55takes something like eight times as long

50:57as an an ad or something like that but

51:00you know like this is just going to be

51:02this is just for order of magnitude I'll

51:04also say other reasons why all of this

51:06model is completely wrong but it's fine

51:08it's it's good enough for this um okay

51:12and some people got weird in the 60s and

51:14they they decided a a floating Point

51:16operation was actually a multiply plus

51:19an add plus some indexing work and

51:21you're like okay whatever right so um

51:24okay so here what you do is you look at

51:26an algorithm that does stuff um and you

51:30count the number of flops um as a

51:34typically polinomial there might be a

51:35log in there which people just a log is

51:38just like you just change it to a

51:39constant right because that's all that's

51:40what it really is in practice right um

51:43so you look you express as a polinomial

51:46function of the problem Dimension and

51:47you simplify you only keep the leading

51:48terms because this is this is very crude

51:50right so um so it is not remotely

51:55uh a predictor a good predictor of

51:58computation time on Modern computers and

52:00it's just changing it's getting all very

52:01weird and all that kind of stuff right

52:03uh I mean that should be kind of obvious

52:06right um okay uh but it's useful as a

52:11rough estimate of complexity and a lot

52:13of people divide all these things in

52:14their head into into various groups like

52:17that um

52:19so the way and this is also historical

52:22so uh vector vector operations are ones

52:25that take like two you know two vectors

52:27right uh and you know have an inner

52:29product you know what does that take

52:31well I have to multiply XI by Yi for

52:33each so that's n multiplies right then I

52:36have n minus one adds because I have to

52:38add you know the X1 y1 plus X2 Y2 and

52:41blah blah blah okay so you know

52:43basically inner product people would

52:45just say n i mean the two is silly so

52:47you just say n

52:49um the the the sum would be the same

52:52thing it's just like n n flops um and by

52:56the way these have uh these have very

52:58cool names they're it's also a retro

53:00name so if you know this you can impress

53:02people um so this is the first one this

53:05is called Blast Blas level one second is

53:09called Blast Level two and that's blast

53:12level three blast is basic linear

53:14algebra subroutines that's from the

53:171960s okay so these so this is but it's

53:21actually a very good distinction to be

53:23thinking about right

53:26um okay so let's uh you know and then by

53:29the way I mean some of this persists but

53:31gets like really crazy uh when you start

53:34looking at things like

53:36gpus and you know God God help you if

53:39you are looking at tpus but anyway so it

53:41on gpus they're they're like crazy

53:44optimized for this right so there you

53:47can just get like just insane speed

53:51right so anyway well well I'll make some

53:54some mentions of that uh later and these

53:56are things like you should know right

53:58like if you're in the middle of if

54:00you're putting something together and

54:01you'd say that you'd say you can't

54:04multiply like a you know those two

54:06matricies together you you should step

54:08back and say oh that's blast level three

54:10and I think our friends in Nvidia have a

54:13really good idea about I I think I think

54:15I think we can do just fine thank you

54:17right so um

54:19okay all right so uh you can also look

54:22at the numbers here right that last

54:24level one are kind of linear in the

54:25single Dimension right um Blast Level

54:28Two you know if you're going to mult if

54:30you're going to form ax right Matrix

54:32Vector multiply you that's going to be

54:34it's like basically MN the product of

54:36the two um uh and let's see yeah this is

54:40fine I mean if things are sparse it gets

54:44very interesting that's a that's a

54:45structure everyone should know about so

54:48sparse you know how do you calculate the

54:50inner product of two sparse sparse

54:53vectors and and and how you know roughly

54:56speaking how much does it

55:01cost I give you two vectors each

55:04Dimension 100 million they're sparse

55:07each one has 100,000 entries you tell me

55:11I want I want the inner product of the

55:12two how long what does it take how do

55:14you do

55:17it also I want to know what the uh flop

55:20count

55:24is

55:26would yeah what do you do go ahead you

55:27Cy the nonzero elements yeah yeah what

55:31you do is the only time you ever have to

55:33do anything is when there's an entry

55:36that's nonzero in both vectors right so

55:39the absolute so the here's a big upper

55:42bound on on the number of flops it could

55:44take right it's the minimum of the

55:46number of

55:47nonzeros everybody got that of the two

55:50right so that's an example um Matrix

55:53Vector same story

55:55right if if a is is uh is sparse and I

55:58multiply by a a vector that can that

56:01that's going to be uh you you're going

56:03to have a whole lot it's going to have

56:04to it's going to depend on capital N the

56:06number of non-zero elements right um

56:10okay and there's other structures right

56:12like what if I gave you a as is a low

56:15rank factorization right so suppose I

56:17gave you a as a a is a million bym

56:20million Matrix which you can't store but

56:23what if I gave it to you as the product

56:25of A Million by 20 Matrix and a 20 by

56:28million Matrix both of which I can very

56:31h i can store on my

56:32phone right then how do you multiply how

56:36do you how do you actually form ax in

56:38that case right um here would be the

56:41wrong way um you would do this is the

56:44absolutely wrong way

56:46right this would be what did I multip

56:49what am I multiplying it by X right this

56:51would be a very poor idea um right

56:55because the first thing this would do is

56:57try to attempt to allocate 100 million

56:59by 100 million Matrix and then then it's

57:01game over everybody following this right

57:03and but this is s this is just silly I

57:05mean I don't no one would do that that's

57:08the right way okay right and then this

57:11one this one you just you know you work

57:14out what it is and so on and then work

57:16out that one right so that's another

57:17very common structure is low

57:20rank right so okay um all right that's

57:25good okay we'll we'll we'll we'll uh

57:28this is is the idea so this is Blast

57:29Level one two and

57:31three um these are the different levels

57:33um by the way there's some very

57:35interesting things uh the complexity of

57:38Matrix Matrix multiply these are n byn

57:40Matrix again our friends in computer

57:42theoretical computer science it's like

57:442.35 or whatever it's the same number if

57:47Google figured out a better number

57:48that's awesome right good for them right

57:50everybody got that and it's you know

57:52it's fine actually I can I I can tell

57:55you how that works because the exact

57:58same method is used in practice so I'm

58:01I'm going to show you that just for fun

58:02um uh not the details actually maybe

58:07almost all the details but it doesn't

58:08matter right so if you multiply uh two

58:12if I'm going to multiply two matrices

58:14right and this is like a b c d like e f

58:19GH something like that so what you do is

58:22you you just partition it into blocks

58:24like this right okay and you can work

58:27out what it is right like you know I

58:29forget what these entries are this is AE

58:31plus you know BG or something goes here

58:33right and so what then you have then you

58:35have a recursive method that says that

58:38says you just recurse down and each of

58:40each of these is given by blocks as well

58:41everybody following this all right and

58:43now comes the smart part this is so now

58:45first we're going to do some theoretical

58:47computer science and that is somebody

58:49figured out how to work this out with

58:52seven Matrix Matrix multiply not eight

58:56because your naive method would use

58:57eight they did it with seven and now you

59:00just recurse down and you're going to

59:02get a number that it's it's a it's not

59:04going to be n cubed it's going to be n

59:06to the 2.8 or something like that

59:08everybody following this okay now let's

59:12see this is actually how it works

59:13entirely in practice and this is your

59:16friend by the way normal people do not

59:18need to know about this right but it's

59:20good for you to know about it so what

59:23the way this the way you multiply two

59:25matrices uh so it depends right let's

59:28just let's just actually we can even do

59:29this just first let's do single thread

59:32right so single thread is going to work

59:34like this um yeah by the way if you

59:37multiply these out you don't use the

59:39silly seven multiplies versus eight it's

59:41just eight multiplies so if you if I say

59:44take your Matrix chop it up into

59:47subblocks chop those into subblocks and

59:49then just write a recursive thing and go

59:52then someone says how many flops do you

59:53end up doing

59:55and the answer is n cubed so it's the

59:57same as if you just wrote a oneline you

59:59know C program that said you know you

01:00:01know walk across the first one down the

01:00:03first row and fill them that kind of

01:00:05thing right everybody following this so

01:00:07you'd say you haven't changed the number

01:00:09of flops like how on Earth could this be

01:00:12faster um again anyone want to suggest

01:00:16something you'd have to know a little

01:00:17bit about like computer architecture and

01:00:19how computers work but I feel like all

01:00:21of us should know a little bit about

01:00:22that

01:00:23yes yeah that's exactly right it's cash

01:00:26optimized right so so what you do is you

01:00:30you know the bottom the the lowest level

01:00:32in the hierarchy are you know I don't

01:00:34know 16 by 16 matrices right why because

01:00:37you can multiply 16 x 16 matrices you

01:00:39know at you know in directly at the

01:00:41lowest level of cash right the next

01:00:44group up is like you know 256 x 256

01:00:46matrices you can multiply two of those

01:00:48using your level two cache everybody

01:00:51following this so then by the way this

01:00:53is from the 19

01:00:5560s 1970s no maybe 70s let's let's go

01:00:58with that everybody uh following this

01:01:01right so it's actually completely uh

01:01:04stunning uh what happens when you do

01:01:07this right oh and by the way the same

01:01:09things go to a world where you have a

01:01:10lot of cores so this could be a GPU with

01:01:1210,000 cores right and you say I you say

01:01:16oh boy oh is it good at multiplying I

01:01:18can multiply you know 5,000 16x 16

01:01:22matrices by other 16 x 16 mat icies all

01:01:25together in like two microc everybody

01:01:27following that right so there people

01:01:30people who work on these things know

01:01:31exactly what those things are they know

01:01:32exactly what they can do and so then

01:01:34it's the same thing you would you You'

01:01:36chop these things up into these things

01:01:38and all of us are utterly unaware of

01:01:39this right which is which is kind of

01:01:42cool but we benefit like crazy right uh

01:01:46so

01:01:47everybody following this so so if you

01:01:50ever meet I don't you know I don't know

01:01:51if people even do well there are people

01:01:53who do this like in my department CS so

01:01:55anyway if you meet one of those people

01:01:57just like remember I told you if you

01:01:58meet someone who works on like devices

01:02:00and things like that you should thank

01:02:01them if you meet someone who works on

01:02:04the low the lowest level stuff like this

01:02:07thank them too because that they're the

01:02:10ones that allow the rest of us to use

01:02:11like numpy and scipi for that matter

01:02:14torch and things like that because it's

01:02:16just awes like what they've done is

01:02:18amazing and we're the beneficiaries so

01:02:21okay um all right so I don't if you

01:02:24didn't if if you didn't follow what I

01:02:26was saying that's okay but you should

01:02:27kind of know about this I think um oh

01:02:30let's talk about like speed of like

01:02:32current you know computer and things so

01:02:35you could actually do things like you

01:02:38know you can do it yourself and you you

01:02:41can actually do some of these

01:02:42calculations and make guesses about how

01:02:44long something's going to going to take

01:02:46or you could actually say it did that

01:02:48calculation at this number of of of this

01:02:51number of flops per second right uh um

01:02:55and by the way what so anyway somebody

01:02:57tell me what's a here I don't know there

01:02:59a four-year-old laptop um what will this

01:03:02do for I don't know Matrix Vector

01:03:05multiplies and stuff like that anyone

01:03:07it's by the way here's a hint it's

01:03:08measured in gigaflops which is 10 to the

01:03:10N flops per

01:03:12second right

01:03:14so what what would be a speed number you

01:03:16just I mean you either know this or you

01:03:18don't but what would be a speed number

01:03:19for like the 5-year-old

01:03:23laptop

01:03:25any

01:03:27uh how many billions of flops that by

01:03:30the way that's a lot of floating Point

01:03:31operations just for the record but how

01:03:33many of the how many billions can you do

01:03:35in a second on like a f okay my how

01:03:38about my newer a newer computer what

01:03:40what's the number anyone another the

01:03:43number

01:03:46range okay it could be it's between one

01:03:49and 10 for this it could easily be 30

01:03:53for this

01:03:55why because people who know what they're

01:03:57doing have like optimized Matrix Matrix

01:04:00multiply because people do it a lot okay

01:04:03um what would it be on a a simple GPU

01:04:06well I have a GPU in here for my

01:04:08Graphics so but what is it there now

01:04:11we're talking 100 easily that's a baby

01:04:14GPU and on a real GPU you start getting

01:04:18towards the Tera flops and then it goes

01:04:20up from there Everybody by the way these

01:04:22are just amazing num numers they're and

01:04:25and and we are the beneficiaries right

01:04:27so I mean just as a quick thing how long

01:04:29does it take to multiply like 2,000 by

01:04:30th matrices

01:04:33or someone want to I mean look it's

01:04:36order n Cub flop so that's 10 the N so

01:04:38someone tell me the

01:04:41number that's a gig of flop that's a gig

01:04:44that's a billion flops how long does it

01:04:47take I could just do it right now but

01:04:49I'm just I'm not going to because I know

01:04:50I know what would happen but go ahead

01:04:51somebody tell what what's the number

01:04:53does that take a minute

01:04:55you'd say yes wait do do you realize

01:04:57like multiplying 2,000 each of those

01:04:59major I'll make it sound crazy you'd say

01:05:01are you you have two arrays each of them

01:05:03have a Million numbers in them everybody

01:05:06following this and then you have to

01:05:08multiply them together and they all

01:05:09connect with all of them and you do

01:05:11you're going to do a billion multiplies

01:05:14that's crazy everybody following this so

01:05:16how long does it

01:05:18take what said

01:05:20100 that's about right so it's it's it's

01:05:24on the it's put it this way it's

01:05:25measured in milliseconds yeah so 100th

01:05:28would be on my on on a new computer okay

01:05:31uh so this would be a tenth of a this

01:05:33would be 100 milliseconds here so and I

01:05:36it's probably worth you know sitting

01:05:39down and like seriously pondering like

01:05:41how amazing that is and there's a lot of

01:05:44people to thank for that um right so

01:05:47anyway so if you if you're whatever your

01:05:49roommates or neighbors or something do

01:05:52this kind of stuff you should thank them

01:05:53so okay okay that that sound good oh and

01:05:56what about on a what about a super

01:05:58high-end GPU and I'm

01:05:59multiplying uh 20,000 by 20,000 dense

01:06:05matrices you'd be terrified is the

01:06:07answer is what what it is it's it's

01:06:08unbelievably fast right okay but

01:06:12actually that's not the main point we're

01:06:13going to be talking about other stuff

01:06:15where uh you don't need a supercomputer

01:06:18to to to achieve things you just need to

01:06:20be smart and that's going to be the

01:06:22theme of of the next uh of the mat we're

01:06:24going to look at okay um so we're going

01:06:27to start we're going to build up real

01:06:29easy and just say let's look at some

01:06:30linear equ you know when is it easy to

01:06:32solve linear equations well here's one

01:06:35time it was diagonal that's idiotic like

01:06:37it's order n okay um what if a matrix is

01:06:41what if it's lower triangular and you

01:06:43want to solve it right so you want to

01:06:45solve something that looks it's lower

01:06:47triangular so so it looks like you know

01:06:51a11 a21 A2 2 right

01:06:56a31 a32 a33 I I'm not going to draw the

01:07:00zeros right times you know let's say X1

01:07:03X2

01:07:04X3 equals like B1 B2 B3 and this you

01:07:08know you probably saw this and probably

01:07:10even in high school or something right

01:07:12so how do you solve this well look the

01:07:14first equation is a11 X1 equals B1 so X1

01:07:17is B1 over a11 right now you know X1 the

01:07:21second equation is a21 X1 but you know

01:07:23what that is you already calculated it

01:07:25so the second one is you know a22 X2

01:07:29equals B2 minus you know a21 X1 which

01:07:32you already know okay so that method is

01:07:34called that's called forward

01:07:36substitution because you solve for X1

01:07:38then X2 then X3 then X4 and you and

01:07:41whenever you have one you just it's

01:07:43going to it's going to you're going to

01:07:44substitute it in in future equations

01:07:46everybody got

01:07:47that and when you work out the

01:07:50complexity um that's going to be

01:07:52actually n^2 flop

01:07:55okay uh so so that's cool by the way

01:07:59it's different from n n cubed right so

01:08:02so that's in fact there's a factor of n

01:08:03in there right so um so you already know

01:08:06something it says that if you had to

01:08:07solve some triangular equations you can

01:08:10do that like really fast compared to

01:08:13just solving non-triangular equations

01:08:15okay um and if it you know if it's upper

01:08:17triangular you do what's called a

01:08:19backward substitution it's the same

01:08:20thing in that case you get xn first then

01:08:23xn minus one and so on right oh and by

01:08:26the way this is not how it actually

01:08:27works um it this is blocked that's how

01:08:30they actually do it right so there's

01:08:32little so there's little blocks again we

01:08:34are blissfully unaware of this but when

01:08:36you solve a triangle triangular system

01:08:38they don't do it line by line like this

01:08:41it's it's actually blocked right so but

01:08:44we can pretend it looks like this and

01:08:46the conclusion is the

01:08:47same okay um what if a matrix is

01:08:51orthogonal right like then

01:08:54it it takes two n^2 flops right to

01:08:57compute a transpose you know to to to

01:08:59compute sorry a a inverse B right

01:09:02because a inverse is a transposed then

01:09:04it's Matrix Vector multiply so actually

01:09:07you know yeah that that's what it was um

01:09:10okay uh there's lots of other cases

01:09:12right uh if a could be I mean here's

01:09:15here's a very famous structure for a

01:09:16matrix a it's identity minus rank one I

01:09:20think that's called a maybe a

01:09:21householder reflection or some somebody

01:09:24might know somebody in icme or something

01:09:27would could tell me if I think that's

01:09:28right that's called a householder

01:09:30reflection or something those again you

01:09:32can you can solve in end flops right

01:09:34because you just are smart about how you

01:09:35do it um and the storage is also clear

01:09:38you would store you not the Matrix right

01:09:42a permutation Matrix is a good example

01:09:43right if you have a permutation

01:09:45Matrix and you want to know how long

01:09:48does it take to solve like you know a

01:09:50permutation Matrix * xal B you just un

01:09:53permute them which is the transpose um

01:09:56and you actually don't even do any flops

01:09:57because all you're doing is you're just

01:09:59moving numbers around right so uh

01:10:01permutations is kind of hilarious

01:10:03because if you ask under a flop count

01:10:05analysis from the 1960s or 70s it's zero

01:10:08flops because you just move it's

01:10:10actually like me you know mem copies

01:10:12right you're just moving numbers around

01:10:14right of course in fact that can take a

01:10:16lot of time on on if you know if it's if

01:10:18it's a vector that's you know 100

01:10:20million high or something like that okay

01:10:23um all right so these are examples um

01:10:27and now we're going to get to what the

01:10:28method is um and actually it's much more

01:10:30it's much more important to just

01:10:31understand the high level idea than the

01:10:34low-level

01:10:35details right except if you're in the

01:10:37few people who actually work on these

01:10:39method so that the rest of us benefit

01:10:41but the rest of us here's the method

01:10:43it's actually very simple and there's

01:10:45already like immediately weird things

01:10:47that you I don't know some of you

01:10:49probably will leave leave today knowing

01:10:51something you didn't know before that's

01:10:53weird than shocking so here it is um the

01:10:57general method works like this you take

01:10:58a matrix a you want to solve ax equals B

01:11:00you take the Matrix a and the first

01:11:01thing you do is you factor it and Factor

01:11:03it means you write it out as a product

01:11:05of matrices you know a bunch of these

01:11:08factorizations probably like there's the

01:11:10QR factorization there's the you know

01:11:13igen decomposition singular value

01:11:14decomposition you probably know a bunch

01:11:16of these already anyway so um all right

01:11:18so you factor it into a product of

01:11:21matrices Each of which is easy to invert

01:11:25people say that by the way when they say

01:11:26that that's slang it doesn't mean that

01:11:29anyone's going to invert that Matrix

01:11:31when you say easy to invert it means

01:11:32it's easy to solve ax equals B right

01:11:36that that's what they actually mean by

01:11:37that but they just say easy to invert

01:11:39even though no one is inverting any

01:11:42matrices right okay so this is really

01:11:45really quite dumb it just says well if a

01:11:48is this product then the inverse of a is

01:11:51the product of the inverses in reverse

01:11:52order right so so I get this thing and

01:11:55now I just put parentheses the right way

01:11:57which is to say I I first put a

01:11:59parenthesis around uh the first one A1

01:12:02inverse B and I I calculate that so it's

01:12:06actually you know what you do is you

01:12:08solve this to get

01:12:10X1 right that that that's actually that

01:12:13second expression then I have to

01:12:15multiply by A2 inverse which I which I

01:12:17Do by simply calling the solve method on

01:12:22A2 with with with right hand side X1

01:12:26everybody see that so so in fact you

01:12:29solve this by just a sequence of solves

01:12:32everybody you can even think of this in

01:12:33a beautiful objectoriented way where

01:12:35each of these things has a solve method

01:12:37um right and solve literally means you

01:12:41if you know solve a b means it returns a

01:12:45inverse B right some of these methods

01:12:48are idiotic like it would be diagonal or

01:12:50permutation or something like that

01:12:51doesn't matter that that's the idea so

01:12:53this so you solve that by literally just

01:12:55a sequence of solves that's it okay all

01:12:59right

01:13:00um now usually the way this works and

01:13:04there are exceptions which we're going

01:13:05to talk about but it works kind of in a

01:13:07cool way usually I mean in the the

01:13:09simplest most generic cases the

01:13:11factorization is cost order n

01:13:14cubed and then we'll see that the solve

01:13:17actually is going to take order n

01:13:20s now things get weird because suppose I

01:13:24ask you how long does it take me to

01:13:27solve one

01:13:28equation with 5,000 variables so 5,000

01:13:32by 5,000 Matrix a 5,000 Vector B and I

01:13:35want to compute you know whatever I want

01:13:38I want to actually factor

01:13:40a right well we okay I'll it's you know

01:13:43it's order of magnitude it's under a

01:13:45second I mean it depends on what you're

01:13:46doing it on but that's it's not it's not

01:13:48minutes it's not milliseconds that's

01:13:51order well it be milliseconds on some

01:13:53crazy things but that's fine right

01:13:54everybody follow this and now I'm going

01:13:57to ask you let's say it's a second and

01:14:00make it simple suppose I say well no I

01:14:02would I now I want to do this I want to

01:14:04actually uh solve 10 sets of equations

01:14:08with the same Matrix but different right

01:14:10hand

01:14:12sides well the naive method would take

01:14:1510 seconds because you just have a

01:14:16little Loop or whatever unless you had

01:14:1810 cores around and paralyzed it but

01:14:19that's okay but whatever right so you

01:14:22just you just it did take 10 seconds

01:14:24everybody following this how long in

01:14:26fact does it take if you know what

01:14:27you're

01:14:32doing let's let's review right so I have

01:14:35I want I want you to solve 5,000

01:14:37equations 5,000 variables right takes a

01:14:40second okay now I'd say okay thanks for

01:14:43doing that I'd like to do it now for 10

01:14:45sets with the same coefficient Matrix a

01:14:48normal person would say um like 10

01:14:51seconds right I mean are we right and

01:14:54what is the actual number it's 1 second

01:14:57right why because you know if if it's

01:15:025,000 by 5,000 mat took a second to

01:15:04factor solve is going to be a factor n

01:15:06smaller because that's n squar not n

01:15:08cubed okay so it takes so the answer

01:15:12that that that works out to to actually

01:15:14microseconds right so the answer would

01:15:16be oh yeah it's about it's about 1

01:15:18second plus maybe about 150 microc

01:15:21everybody following this okay so how

01:15:24many is a weird I mean unless you until

01:15:26you know this you this is not obvious

01:15:28right so actually how many people knew

01:15:29that basically you can solve 10 sets of

01:15:33linear equations basically the exact

01:15:34same cost as solving

01:15:36one some of you must know have known

01:15:39this or something okay everybody

01:15:42okay are you I'm not detecting that

01:15:45you're that

01:15:46impressed uh are you like like what is

01:15:49the is are you just like who cares or I

01:15:51mean what it's kind of I mean it's kind

01:15:54of cool right okay I'm I put it this way

01:15:57I'm impressed I think you should be

01:15:58honestly I think you should be too right

01:16:01by the way this requires you actually

01:16:02know what you're doing right because if

01:16:05you if you actually just if you say you

01:16:07know for right hand side in right hand

01:16:10sides you know solve a Ab you're you're

01:16:14sure by the way the it's another name

01:16:16for this is called factorization

01:16:18caching right so it means you you factor

01:16:22once and then do m multiple solves okay

01:16:25all right everybody I think this makes

01:16:27sense right okay um and you know the the

01:16:31most common one is the uh luu

01:16:33factorization L is lower triangular U is

01:16:36upper um and it basically says you can

01:16:39take a matrix a if it's non- singular

01:16:42you can write it as a permutation times

01:16:43a lower times an upper triangular Matrix

01:16:46the cost of that is like n cubed flops

01:16:48um and then this is how You' solve it it

01:16:51looks like this right uh so you you you

01:16:55this one dominates right and then uh

01:16:59these things this is the this is the

01:17:01solve method right this is where you

01:17:04you're actually walking through the

01:17:05three there's three factors that's why

01:17:07there's three things here right each of

01:17:09these is solving essentially or

01:17:11Computing the in it's working with the

01:17:13inverse of one of the factors right um

01:17:16and so this is this is what you're doing

01:17:18here and this whole thing here is order

01:17:20n s that's order n cubed

01:17:23okay so and this comes up in a whole lot

01:17:26of uh areas but okay so that's this is

01:17:28cool and this is kind of the same as the

01:17:33gaussian elimination that you were

01:17:35almost certainly tortured with either in

01:17:38high school or undergraduate days right

01:17:41so it's basically this yeah by the way

01:17:43when you were taught that did anyone say

01:17:44that it has nothing to do whatsoever

01:17:47with how people actually solve equations

01:17:49probably not no so yeah so why you would

01:17:53cause people to actually do ga I mean I

01:17:56don't know it's like writing an assembly

01:17:58language program a person should do that

01:17:59either zero or once and absolutely no

01:18:03more than that and and most normal

01:18:05people should do it exactly zero times

01:18:07but they should appreciate oh sorry we

01:18:10need a very small number of people who

01:18:11are very weird and like that to do that

01:18:15that's all they do right so that the

01:18:17rest of us can benefit that's that's

01:18:18kind of the way this all works out so um

01:18:22yes so why I don't anyway this so this

01:18:25is basically the same as that except

01:18:27it's not the way it really works of

01:18:28course the way it really works is all of

01:18:30this is the same thing except all these

01:18:31things are blocked right and if you're

01:18:33on Multi if you have multiple cores like

01:18:35on a GPU or just multiple multiple cores

01:18:38stuff is being done at the same time

01:18:41right but other than that the idea is

01:18:43identical so okay so I think we'll quit

01:18:46here and we'll uh we we'll continue next

01:18:51time

ðŸŽ¥ Related Videos

ðŸ”¥ Recently Summarized Examples