00:00 hi yuan thank you so much for joining me
00:02 today again to share your knowledge uh
00:04 in causal influence uh so last time you
00:07 talked about two methods two causal
00:09 imprisonment methods regression matching
00:11 and also you talk about how we can use
00:14 them to control for the confounding
00:17 variables which made it to first
00:18 conclusions uh when making causal
00:21 inference right so then i was thinking
00:23 about the example you gave us uh the
00:27 impact of harmful content on users right
00:31 and in that example we have many treated
00:35 units because many users
00:37 could have been exposed to harmful
00:40 content right so then i was thinking
00:43 what if we want to measure something
00:44 like covet what if we want to measure
00:47 the impact of covet on the economy you
00:50 know covet is a rare event right it only
00:52 happens once i mean hopefully it only
00:55 happens once um so how do we
00:58 use causal inference
01:00 to draw the conclusion that
01:02 covet has a causal impact
01:05 on the whole economy
01:06 covalent is such an interesting case
01:08 because as you mentioned in order to use
01:10 methods like regression or matching we
01:13 need data that looks like this
01:15 with a lot of treated units and
01:17 untreated units so like we can compare
01:22 but um as you pointed out
01:24 things not just like covalent but
01:27 elections policy changes or natural
01:30 disasters they happen very infrequently
01:33 and when they do happen
01:35 they usually impact large units like an
01:38 entire city country or in covid's case
01:47 if you pass a state law then you cannot
01:50 just assign a law to some selected
01:52 individuals in a state it impacts other
02:01 size of the units we end up with very
02:04 few treated units and sometimes just one
02:07 so it is impractical to use regression
02:10 or matching so what do we do in this
02:15 we need to take a step back and think
02:17 about the philosophy behind causation so
02:21 what do we mean when we say um x makes y
02:27 probably at the back of our mind we're
02:30 if x did not happen then why would not
02:34 have occurred so that's what we mean by
02:38 so to answer a coveted question we can
02:41 um create a parallel universe in which
02:44 covet never happened
02:47 we can measure the economy on that
02:49 planet with what's actually going on in
02:52 this world so the difference between the
02:54 two worlds is the causal impact of
02:57 clovis on our economy
02:59 so is this abstract idea clear to you
03:04 yeah yeah it's very clear but i wonder
03:06 how do we create a counterfactual world
03:09 that you know the event did not happen
03:12 at all any measure we could use to do
03:15 yeah i have to think of three things the
03:21 for us for those of us who are now dr
03:23 strange then there are two popular
03:25 methods that can help us create
03:30 difference in differences and a
03:33 i will dive into the details but now i
03:36 just want to give you the gist behind
03:39 which is that we find uh units like a
03:43 city or a country that is similar to the
03:49 aware the event did not happen
03:52 and then we can pretend
03:54 those untreated units are the parallel
03:57 universe without this treatment then
04:00 what's going on in that parallel
04:02 universe may shed light on how things
04:04 could have been in this world um
04:08 if the treatment had no impact at all
04:11 but if the treatment is impactful then
04:14 we should see those two worlds diverge
04:17 and the difference between
04:20 the real world and the counterfactual
04:22 world is the treatment we're looking for
04:27 that is the general idea um behind all
04:30 both of those methods
04:32 so could you give us some applied
04:34 examples like how do we use in data
04:40 i was thinking um back in 2018 before i
04:44 got my driver's license i used to uh
04:46 uber a lot and one day i noticed that
04:49 if i requested the uber pool ride and
04:51 just walked a little bit to the pickup
04:54 location instead of waiting for the
04:56 drivers to compete to pick me up um then
05:00 i only needed to pay as little as two
05:02 bucks for what could have been a 10 trip
05:05 later i learned that feature is called
05:10 and it's very much in line with uber's
05:13 business business strategy to put more
05:16 people in fewer cars without incurring
05:18 much more inconvenience
05:22 later when i was um
05:24 doing all those product interviews it
05:26 got me curious um so how can we measure
05:29 uh the impact of this new feature
05:32 um i think as with all the business
05:35 problems we should always start with
05:38 what metrics we should use to define the
05:41 success of expresspool so
05:44 emma what do you think what metrics
05:46 would you use to measure
05:48 the success of express pool
05:51 i think to look at the impact of this
05:53 feature we want to look at who will be
05:55 impacted by this new feature right
05:58 so yeah so from the driver's side i
06:01 think because riders are willing to
06:06 so basically it would be easier for the
06:08 drivers to pick them up right so that
06:12 means the trip duration will be
06:16 a decrease in pickup time
06:19 so that also means drivers one driver
06:22 could pick up more riders within the
06:24 same amount of time so that we can put
06:27 more people in less cars
06:30 right so i think one metric i could use
06:33 is average trip duration from the
06:38 from the rider's side because you
06:41 mentioned that the uh
06:43 the cost will be less right so i think
06:45 if we use revenue there might not be a
06:48 good metric to major success because the
06:50 revenue may decrease even if this
06:53 feature is successful right
06:55 so given that i will be using average
06:58 field duration as a success magic okay
07:02 uh thank you for your analysis on both
07:04 sides um of the ubers uberpost market
07:08 okay and then let's say we go with
07:10 the average trip duration of uberpool
07:13 rides as the success metric of express
07:16 pool and launched this feature
07:21 then one idea is that
07:23 maybe we can just compare how this
07:27 after the launch right and call that our
07:31 but here is the problem though
07:33 um as an airbnb former airbnb data
07:37 scientist famously said the outside
07:40 world often has a much larger effect on
07:43 metrics than product changes do
07:47 a billy island concert or extreme
07:49 weather may impact trip durations in san
07:53 francisco much more dramatically than
07:56 express pool does so we would be
07:58 flattering ourselves by attributing
08:01 whatever changes after the launch to
08:04 the treatment effect of express pool
08:08 more rigorous method we can use is
08:11 difference in differences
08:14 say we launched this feature
08:17 in san francisco but not in new york
08:20 um so both cities are really busy uh
08:23 with high demands for uberpool so
08:27 theoretically this new feature
08:30 could be successful in both cities
08:34 um before the launch we measured the
08:36 average troop duration in both cities
08:40 and here's the trend that you see before
08:44 and then we do that again afterwards
08:47 so based on this graph
08:50 how do we find the treatment effect of
08:52 express pool so we need to make
09:00 without express pool then
09:07 outcome metric which is the average trip
09:10 derivation would trend the same way in
09:12 both of those two cities
09:15 so if you buy that common trend
09:21 the average trip duration um after the
09:25 in san francisco would be parallel to
09:28 the trend in new york city um
09:32 that is not what we see actually so
09:35 this started blue line shows the
09:38 counterfactual trend parallel to the
09:40 untreated city but the solid blue line
09:43 shows us what actually happened after
09:46 the launch in san francisco and the
09:48 difference between the counterfactual
09:51 and the actual trends are the treatment
09:57 the launch is probably a success because
10:01 the average duration in a treated city
10:04 so that's one way we can
10:09 the treatment effect but there's another
10:14 to analyze difference in differences
10:19 generate a data table and break down
10:22 this common trend assumption a little
10:27 one interpretation of
10:31 the common trend is that um the time
10:34 effects should be the same
10:36 across different units
10:38 so um as you can see uh here um
10:42 in new york city um
10:46 average duration increased by two
10:48 minutes after the launch
10:52 and if um the time effect is the same in
10:55 san francisco we should also see a
10:57 two-minute increase after the launch
11:03 instead we see what we see is a
11:06 one-minute decrease in average durations
11:09 in san francisco so the difference
11:12 between what we expect
11:15 and what we actually saw
11:17 is the treatment effect of
11:22 here it decreased the average duration
11:29 digest the common trend assumption
11:32 is that um it means
11:34 the original differences should be the
11:38 at different points in time
11:41 before the launch the average
11:46 was one minute longer than that in san
11:50 so after the change if nothing happened
11:53 then we should also see the trip
11:58 so that the trip duration in new york
12:00 city could also be one minute longer
12:02 than that in sf but instead it's now
12:05 four minutes longer
12:08 the difference in those two differences
12:11 again is three minutes which is um how
12:14 much express pro reduced um the average
12:17 trip duration in san francisco
12:25 yeah the idea sounds intriguing uh but i
12:28 guess the assumption you have the which
12:30 is a common trend assumption
12:32 seems a bit strong to me what if the
12:35 green line and the blue line do not
12:38 share a common trend what if they die
12:40 word into different directions
12:43 that's um such a valid worry and that's
12:48 what makes this method
12:50 under criticism by many
12:55 there are several things we can do to
12:56 mitigate this worry so the first is that
13:00 as we talk about in our first video we
13:03 can use methods like matching to find
13:06 untreated cities that are similar to the
13:10 in terms of factors that might impact
13:14 so in uber post case it could be the
13:16 weather the city population
13:25 similar cities are more likely to share
13:30 but even with matching we still want to
13:34 whether the trends before the treatment
13:37 are actually the same
13:39 in different cities so to do that um
13:42 we can build a model that takes the
13:44 factors that they mentioned
13:46 along with um the city and the time to
13:50 predict um the outcome metric so um
13:57 common trend assumption is true then we
13:59 should see the same effect of time in
14:02 those different cities
14:04 does that make sense
14:05 yeah yeah yeah this definitely makes
14:07 sense to me but what if um
14:09 none of these assumptions hold do we
14:11 have any alternative
14:16 if you don't want to commit to this
14:18 assumption or you checked this
14:20 assumption and it breaks then we can um
14:24 use more sophisticated methods like the
14:28 to explicitly model how the trend uh
14:32 changes over time um
14:35 okay uh so we'll talk about this in just
14:37 a minute so so in the uber express for
14:39 example how could we use synthetic
14:41 control to um you know to measure the
14:47 first of all um synthetic control is
14:49 such a complex method that all i can
14:52 give you today is some quick intuitions
14:55 for those of you who are interested in
14:56 learning in the full detail i highly
14:58 recommend this recent review paper by
15:01 alberta baby which who is the inventor
15:04 of the synthetic control method
15:08 here is the general idea of how to apply
15:14 rather than using one untreated cities
15:17 to simulate the outcome in a treaty city
15:21 find several cities untreated cities
15:24 that are similar to the treaty city and
15:26 they combine their result the outcomes
15:29 in those cities um to give us a more
15:32 robust counter factual prediction
15:35 so together these untreated cities are
15:38 called out the donor pool in this case
15:41 we may have london nyc and la in our
15:49 another thing to note is that
15:52 not all of those cities are equally
15:54 important when it comes to creating the
15:56 counterfactual um so some cities are
15:59 more important so they have higher
16:03 compared to cities that are less
16:07 so that then um after having this donor
16:09 pool we want to find the best weight for
16:12 each city in the pool
16:13 um so there are some constraints when it
16:16 comes to the weights so uh all of the
16:19 weights should not be negative and
16:21 together they should add up to one so
16:24 here's the punchline um
16:27 after having the donor pool and finding
16:31 the untreated cities the weighted
16:34 average of all the untreated units in a
16:36 donor pool is called the synthetic
16:40 and at any point in time the synthetic
16:44 is our best guess for
16:48 how things could have been in a treated
16:52 and the difference between the synthetic
16:56 and the actual outcome um in a treated
17:00 city at the time is our treatment effect
17:06 this is the gist behind the synthetic
17:12 there are three things that i think are
17:14 clear in this graph one is that i can
17:17 never be a designer and then the second
17:21 so before the launch
17:26 sf created by london nyc and la
17:30 is pretty similar to the actual san
17:33 francisco when it comes to average trip
17:37 but then after the future launch um
17:42 the average trip duration is slower in
17:44 the actual san francisco compared to the
17:47 synthetic control created by our donor
17:50 so this again shows us this feature is a
17:54 success because it leads to
17:57 a decrease in average division
18:01 that is the just behind this
18:04 method um do you have any questions emma
18:08 yeah i i do i think this is a very
18:10 intriguing idea um my question my first
18:13 question is about um the weight so how
18:15 do we find the weight to you know to
18:18 represent the uh counterfactual outcome
18:21 how do we know you know for nyc it might
18:23 be 0.5 la might be 0.6 to represent san
18:30 yeah so that is the hard question when
18:32 it comes to actually applying synthetic
18:40 i want to begin with this general idea
18:43 so uh the weights do not come from thin
18:46 air they are actually found from the
18:48 pre-treatment period data so that the
18:51 idea is that we want to tweak the width
18:54 weights of the uncreated cities
18:59 like average outcome is close to the um
19:02 outcome in a treated city as possible
19:05 before the treatment happened
19:10 is how we actually do that um so um
19:15 so first of all we begin by uh building
19:17 models um to predict um
19:21 the outcome that we care about trip
19:23 durations using features such as rain
19:27 the city population
19:33 not all features are equally important
19:36 so um generally speaking features that
19:39 are more predictive of the outcome have
19:45 as for how we can find future
19:47 importances it's an important topic in
19:51 but i will not go into the detail today
19:54 so say we have the model um with
19:57 all the features that we can use to
19:59 predict outcome along with their
20:02 then the next step is that um we want to
20:08 average of the predicted outcomes in the
20:16 predict outcome in a trinity cities in a
20:18 pre-treatment period
20:22 once we have the weights then we can
20:24 create a synthetic control uh based on
20:26 which we can find a treatment effect
20:28 like i showed you on a previous slide
20:33 does that make sense
20:34 we're trying to find the weight um
20:37 that could minimize the difference
20:41 san francisco and the weighted average
20:43 of the donor pool right before
20:47 yeah that's the idea
20:50 so it sounds like many things could go
20:52 wrong with synthetic control um so what
20:55 are the things we need to pay attention
20:56 to in order to do it the right way
21:01 as you uh mentioned uh
21:04 like this method um has many moving
21:06 parts that could go wrong um so uh here
21:11 so first of all we need to be very
21:13 careful when it comes to selecting a
21:15 donor pool so first of all
21:18 the untreated units in a donor pool
21:22 actually be similar to the treated city
21:26 we cannot create a good counterfactual
21:30 for how things could have been without
21:33 and another thing they haven't mentioned
21:37 if the pre-treatment period is short
21:42 a thousand cities in a donor pool then
21:46 suffer from overfitting which is a
21:48 common problem in machine learning so
21:52 um the synthetic control may mimic the
21:55 trend in a tradition very closely uh
21:58 before the treatment but they
22:01 their prediction of how things would be
22:03 after the treatment um
22:08 that's like one thing we should be aware
22:11 another thing is that as mentioned all
22:14 optimizations and adjustments are done
22:17 on a pre-treatment data
22:21 how we use matching to find cities in a
22:23 donor pool and how we find
22:25 um the weights of each city but um
22:31 you may have um like an urge to pick at
22:34 the post treatment data when finding the
22:37 matches and when adjusting the weight
22:44 tweaking the model in a way that you um
22:48 that leads you to find
22:49 treatment effects that do not exist
22:53 this data picking problem
22:55 is common elsewhere in stats and the
22:58 machine learning and it could also be a
23:00 problem when we use synthetic control
23:06 synthetic control is not a silver bullet
23:09 for all use cases um
23:11 so the express pull feature and the
23:17 pretty dramatic events
23:19 but sometimes you made a small ui change
23:26 because the effect may be pretty weak so
23:28 synthetic control operates on
23:31 city level original aggregate data so
23:34 you may not be able to find um detect
23:37 such small changes using synthetic
23:41 and um on that note also because
23:44 synthetic control is based on aggregate
23:47 level data so you cannot use it to
23:50 answer user level questions so like in
23:55 it is pretty hard to use synthetic
23:56 control to see how it impacts individual
24:00 people's productivity
24:02 these are the things that we can
24:04 need to be aware of when we use this
24:07 thank you for uh explaining the pitfalls
24:10 and limitations of the synthetic control
24:12 method um so do you mind summarizing
24:15 what we have learned today
24:17 yeah so let me give you a quick summary
24:19 so sometimes we don't have many traded
24:22 units that we can compare with um
24:26 because the event of interest
24:29 happens rarely and the impacts
24:33 but we still want to make causal
24:35 inference about the impact of those
24:40 to achieve a seemingly impossible goal
24:46 counterfactual world without the
24:48 treatment and the compared outcome there
24:51 with uh the actual outcome in a real
24:54 um so if you want some quick results um
24:58 then you can apply the difference in
25:01 differences method which is an old and
25:05 that people have been using for decades
25:07 uh if not centuries um
25:12 you want more precise um
25:14 like predictions and if you don't want
25:17 to commit to strong assumptions um
25:21 that difference in differences has to
25:23 make like all units sharing a common
25:26 trend then you can use methods like
25:30 to capture how the outcome changes over
25:33 time in the real world and in the
25:37 awesome thank you so much