00:00hi everyone welcome to the a 6nz podcast
00:02I am sonal and today we're doing a
00:04podcast on data network effects and we
00:07have two general partners here to have
00:09that conversation with us we have Vijay
00:10Pandey who covers all things bio and
00:13Aleks Rempel who covers all things
00:15FinTech as well as other areas welcome
00:19okay so first let's just kick off by
00:20talking about what a data network effect
00:22is in the most simplest form it's a
00:24network effect that results from data
00:26and if a network effect is defined as
00:28something where values were the value to
00:30users and all the participants increase
00:33as more users use a particular platform
00:35or marketplace how does this play out
00:38with data so if you think about eBay
00:40which is more people more buyers go to
00:42eBay because more sellers go to eBay
00:44more sellers go to eBay because more
00:45buyers go to eBay that is the canonical
00:46network effect and their commerce is
00:49happening that's that's the transaction
00:51for a data network effect typically the
00:54there's no commerce per se there's an
00:57extraction you're either reading or
00:58writing in most cases you're reading and
01:00by reading or writing you mean in the
01:01database that's like reading to a
01:03database and as more people write the
01:06value of each read goes up that's the
01:08way of thinking about it so an example
01:10would be the credit score I could figure
01:12out what your credit is by just looking
01:14at you and profiling you in legal ways
01:16not illegal ways and saying here's how
01:18here's what I think your proclivity to
01:20repay is but if every bank on earth is
01:23using one central repository then they
01:26will pay more money to actually extract
01:28to read the reads become far more
01:30valuable and if a new company started
01:32tomorrow and said hey we're going to do
01:34credit score's we're going to charge a
01:35dollar per extraction per read and not
01:38$10 per extraction well there's nothing
01:40to extract like they can't actually
01:41provide any value if they end up having
01:44more data than the current number one
01:46person then they could charge a lot more
01:48than than ten dollars they could charge
01:50a hundred dollars and in fact the value
01:51of the number two person goes to zero
01:52because they actually have a
01:54demonstrably poor product which is why
01:56there aren't really any competitors to
01:58eBay right it's also a way people often
02:00talk about network companies have
02:02network effects as as winner-take-all
02:04markets yeah which is generally the case
02:06or winner take most or where to take
02:07like the vast majority but I think if it
02:10is if you think about it in the database
02:11sense of reads and writes
02:13the reeds just become disproportionately
02:15more valuable as more people are using a
02:17central repository of data you know on
02:20the medical side there's interesting
02:21aspects that combine in with machine
02:23because the database model is I think a
02:25very natural one but then if you put
02:27data science and machine learning on top
02:29the reeds can become much more higher
02:31value because you could of the insights
02:33that you can gain from the data as well
02:34especially these new modern machine
02:36learning methods like deep learning just
02:38crave data and so often you have to
02:41reach a critical mass before they can
02:42even be used I'll give you another
02:43example like Google right now and
02:45Facebook what do you think about
02:47translation services which has nothing
02:48to do with FinTech but I want to go
02:50translate text or I want to go look at
02:52images and figure out what they are
02:54Google Baidu Facebook people that have
02:57large corpus corpuses are just they have
02:59such a huge advantage because if I want
03:02to figure out translation at scale and I
03:05have no data on which to draw this isn't
03:08a read/write problem because you're not
03:10making a central repository like eBay is
03:12a central repository it's a marketplace
03:13where people trade or the credit bureaus
03:15or marketplaces where people trade or
03:17anti-fraud companies are that work with
03:19lots of e-commerce companies there's
03:21central market places where people trade
03:22here it's just like Google can have not
03:26the best computer science they probably
03:27do have the best computer science but
03:28imagine that they didn't but they had
03:30the biggest corpus of data Megan is go
03:32acquired the best computer science and
03:34the unfair advantage that they have
03:36their data network effect is effectively
03:38as they get better translation they can
03:40actually use that to make their
03:42translation software even better and
03:44they also have users to autocorrect that
03:45as well so that's another example of a
03:47data network effect where like the
03:48corpus is the demonstrable advantage so
03:51one thing that confuses me here and I
03:52feel like we overuse this as a result is
03:54that sometimes people conflate having a
03:56lot of data to your point in this case
03:58the large corpus is required in order to
04:00create better results which is a feature
04:02of machine and deep learning but
04:04sometimes people conflate having a lot
04:06of data and say we have a data network
04:08effect and that's actually not true so
04:09how do we sort of travel from having
04:11data to actually having a data network
04:13effect that results from that data yeah
04:15you have to have a plan to actually do
04:16something with the data right and
04:18usually this is something where you guys
04:19are providing of higher quality let's
04:22say in Diagnostics you because you know
04:24so many other results that you can
04:25actually do their job at predicting and
04:27diagnosing or do something cheaper and
04:29obviously the combination the two higher
04:31quality at lower cost is really a game
04:34changer well the other thing is that
04:35having a lot of data is not a network
04:37effect of having a lot of data doesn't
04:39have a plan to make your data better go
04:40back to credit bureau I have a lot of
04:42data on Experian therefore people write
04:45and read from me and therefore I get
04:47more data and my data gets better as
04:49opposed to like look at visa you know
04:52where these about my company and they
04:54have a tremendous amount of data they
04:56could predict the US economy down to
04:57like probably the ninth percentage point
04:59or a decimal point but knowing having
05:01all of that data doesn't make their data
05:02better people don't want to go transact
05:05it's an output so it's like exhaust so a
05:08lot of data actually takes that form of
05:10an exhaust and it makes it very very
05:11valuable but there's no network effect
05:14typically to exhaust type data as
05:16opposed to when the data is actually
05:18it's a key component of the business
05:20model and there is this concept of more
05:22people want to write because more people
05:24want to read more people want to read
05:26because more people want to write and
05:27replace that with the commerce aspect of
05:30like buyers and sellers right so if you
05:33were to operationalize that and make
05:34even more even more concrete than that
05:36one thing I've heard is that you have to
05:38have an algorithm to actually take the
05:40data out and then to your point add
05:43value back it I mean how would you sort
05:45of operationalize this more concretely
05:47for people who are building products if
05:48they want to build data network effects
05:50what should they do yeah I mean that
05:52really varies obviously in terms of the
05:54domain and the company but you know some
05:56sort of data science machine learning is
05:58very natural to be able to apply to this
05:59but I think you know sometimes this
06:01doesn't have to be fancy and machine
06:03learning or anything like that just the
06:04ability to monetize something from that
06:06and really something where you your
06:08company gets better yeah I think part of
06:11the problem as well is that algorithms
06:13like if you look at compression
06:15algorithms over time there's one called
06:17lzw which has been around for a very
06:18very long time it's pretty good and then
06:21the next one that was better was maybe
06:221% better the next one that was better
06:24was 2% better and if you are an
06:26algorithm company it's very very hard to
06:28build any kind of value because somebody
06:30else comes up with a marginally better
06:31algorithm so you need to pair the
06:33algorithm with the data and there's
06:36actually a shift going on right now from
06:38outputting just the data to more
06:41a Earned aspect on top of that that's
06:44the algorithm part I mentioned fraud in
06:46anti-fraud companies so they're a bunch
06:48of companies there's one in our
06:49portfolio called signifyd there's
06:51another one that I invested in as an
06:52angel a long time ago called sift
06:54science and for a long time many of
06:56these companies will tell you ok do we
06:57think this is risky they'll tell you all
06:59of the answers that got pulled from the
07:01data so go back to credit reports well
07:03credit reports are a combination of like
07:05what you did in your past you got this
07:07thing when you got out of college you
07:08didn't pay that loan on time you you
07:10know you were a deadbeat for this doctor
07:12or whatever whatever and then there's a
07:13credit score which is a heuristic that's
07:15built on top of that so it's actually
07:17interesting if you look at credit
07:18reporting right now the goal of applying
07:22machine learning is to actually come up
07:24with a better heuristic so this is the
07:26thing where you need the data repository
07:28and ideally it's proprietary to you
07:30because then you can extract more
07:32economic rent if you're building a
07:33company here and then you want to have a
07:35better set of heuristics on top of that
07:37that's the algorithm and neither one
07:39alone is really sufficient I mean it is
07:42sufficient I guess you could say if you
07:43have the data the data network effect
07:45tends to be more valuable than the
07:46algorithm but you can extract more value
07:49if you're not saying here are 50 things
07:50for you to go to analyze on your own and
07:52we're the only one that have access to
07:53that and then you have to hire a team of
07:5550 50 people to go analyze it but now
07:58you actually have an algorithm that
08:00outputs you a decision and you can use
08:02that decision and that's an even bigger
08:04advantage for a tech company that has a
08:06data network effect and while they're
08:08not formally related usually one Falls
08:10the other you have you're the one with
08:11the big giant corpus you'll attract the
08:14very best data scientists because they
08:15want to dive into that they'll come up
08:17with the right features and the right
08:18ideas and that will be another sort of
08:20effect on top so how you saw the chicken
08:23egg problem in this scenario and and buy
08:25the chicken egg problem we talked about
08:26the conundrum of how where you start
08:28like an example you just shared VJ is it
08:30the corpus that comes first and then the
08:31data scientists or do you get the data
08:33scientist first to create that corpus
08:35like how does it sort of come together
08:36you know there's a couple different
08:36strategies one common strategy is to
08:38sell something at cost or not
08:41necessarily with huge margins in order
08:43to be able to gather data you know in
08:45principle 23andme was doing something
08:47where they're getting these kits out and
08:49gathering huge data sets and then
08:50downstream making big research deals
08:53a canonical example but that's not easy
08:55to do to build up that size so quickly
08:58yeah another example is I mean Google
08:59didn't set out and whatever it was 1998
09:02or whatever they were incorporated long
09:03time ago almost 20 years ago to become a
09:06deep learning company this was almost
09:08like wow we've been scanning the web
09:10forever we have hundreds of thousands of
09:12servers or however many they have around
09:14the planet we have all these images that
09:16we've stored now we have a corpus and we
09:18also have a very profitable business
09:20let's go get a bunch of data scientists
09:22and machine learning people and figure
09:23out what we can do so that's called the
09:25accident that that's the atypical one
09:27but that is actually it's atypical but
09:30at the same time it is quite typical
09:30because some of the best people out
09:32there today are working at companies
09:34like Google or like Facebook and
09:36Facebook didn't want to be an image
09:37recognition company back in the day it
09:39fell into it because they have such that
09:40so that that enormous corpus the other
09:42example is you kind of move up the value
09:46chain over time so I'll talk about the
09:48fraud example here where a lot of the
09:50anti-fraud companies like Twitter has a
09:52fraud problem but Twitter what is the
09:54economic impact of fraud on Twitter it
09:56means that somebody opened an account
09:58and they've been spamming somebody or
10:01there's trust and abuse or things that
10:03don't have massive economic impact
10:05they're annoyances but they're not
10:06really really problematic Blue Nile has
10:09a much much bigger problem Blue Nile
10:11sales diamonds online so as you know
10:13diamonds are very very expensive they're
10:15very very small and you know one pound
10:17of diamonds is worth millions of dollars
10:19so if you lose the equivalent of one
10:21pound in diamonds to fraud like that's
10:23not good right has economic it's if
10:25you're economic so you know you can
10:26imagine on the fraud scale and yet
10:29actually there's overlap because bad
10:31people tend to do lots of bad things so
10:33somebody who's truly a bad person might
10:35open up a bad Twitter account and then
10:37actually steal a credit card number as
10:39well and then use that stolen credit
10:41card number to go steal a diamond and
10:43then they might do all sorts of other
10:44unsavory things as well and the nice
10:45thing is that bad people because they
10:47don't exist in pockets there is
10:48horizontal overlap here across all these
10:50different verticals if you go and it's
10:53almost like what Vijay was saying where
10:54it's not even giving it away for free
10:56because that's hard to sustain for too
10:58long but you can go to people that have
11:00vast vast numbers of Rights going back
11:02to the readwrite analogy so Twitter
11:04would be able to say ok we will give you
11:07everybody who's potentially a bad
11:09account or a good account will just let
11:10you watch these people not watch their
11:12data but just profile them like you know
11:14here's their browser type here's their
11:16IP address here's a cookie that was on
11:18their machine things like that and now
11:20you build up 50 million bad people and
11:22Twitter will pay a little bit of money
11:23for this not that much because there
11:24isn't a data network effect then you
11:26merge that with tumblr then you merge
11:28that with somebody else and none of
11:29these people will pay that much but now
11:31the value of a read is getting of
11:34substantial size to Blue Nile the
11:36diamond company or to any other
11:38ecommerce company whereas if you went to
11:40Blue Nile from scratch and you said hey
11:43you should use our anti-fraud technology
11:45and not these guys anti-fraud technology
11:47you yeah you don't have a data network
11:49effect at all so it's very hard to say
11:51like you might have a better algorithm
11:52but again it's hard to extract that much
11:54economic rent from a marginally better
11:56algorithm because it's only marginally
11:57better today and not tomorrow
11:58potentially and you don't have enough
12:01data as well so you might bootstrap
12:03yourself by a different vertical so it's
12:06part of what you also touch on is this
12:08notion of pooling data among different
12:10sources how does this play out in both
12:13FinTech and bio because I would think if
12:15data is your advantage and yet you need
12:17more data especially in science we have
12:19open science and sharing how do you then
12:21sort of overcome that sort of silo
12:23effect and create that shared central
12:25repository when everyone wants to
12:27protect their data yeah it's a huge
12:28challenge on the health side because of
12:30things like HIPAA which require
12:32anonymity and and become natural
12:34barriers and so but that's also
12:37therefore an opportunity for the company
12:38that can put everything together but
12:40also you know what's interesting is that
12:41there just is so much data there I mean
12:44whether we're talking about data from
12:45clinical trials or from patients or from
12:47Pharma and so the opportunity is huge if
12:49a company can work out those logistical
12:51issues yeah and likewise I mean it's
12:53very hard to get competitors to work
12:54together so as an example if you carry
12:57credit card debt imagine that you have
12:59five credit cards every credit card
13:01company should want to know how much
13:02you're spending on the other credit
13:03cards because if you go imagine that you
13:06decide I'm gonna flee the country and
13:07renounce my US citizenship and never pay
13:09any of my debts back and you have five
13:11credit cards that each have a twenty
13:12thousand dollar limit well you could
13:13just go steal a hundred thousand dollars
13:14with him punitive and that would be very
13:17Chace should want to know how much
13:19you're charging on your Amex card at any
13:20point in time Amex doesn't want to tell
13:23Chase and in many cases this actually
13:25creates the opportunity for a separate
13:27company and you anonymize everything you
13:30wash it you make sure that nothing is
13:32actually of discernible value because if
13:34Amex is turning over their complete
13:35customer list to chase every night that
13:38would not be like I can't imagine that
13:40agreement ever ever happening so part of
13:42what the data company does is they
13:45figure out how to sanitize it they deal
13:47with the political issues and then
13:49everybody benefits from being part of
13:51this cooperative and it's very hard to
13:53get these things off the ground but the
13:55nice thing is that the companies
13:56themselves have left to their own
13:57devices will never do it and yet at the
13:59same time it's a very very big problem
14:01for them so is the ideal opportunity
14:02then first startup to be sort of at that
14:04center of all these different players
14:06like play a broker like role or to try
14:08to create something in its own vertical
14:10I mean like where do the opportunities
14:11lie here for startups and both of your
14:13spaces and beyond I think I mean I hate
14:15to say it depends but it really depends
14:16because I mean in some cases you're
14:18creating something new and you're not
14:20really I mean like in the fraud case
14:22it's not like you're extracting like
14:24very very confidential confidential
14:25information and sanitizing it or there's
14:28a company called yodelling which is very
14:30very interesting they are like every
14:32FinTech company pretty much on earth
14:34right now is in some way shape or form
14:36using yo delete to aggregate information
14:38across all of these different financial
14:41services companies like so you have an e
14:42trade account you have your IRA with bit
14:45of fidelity and you've got your bank
14:47account with Bank of America and you
14:49want to put them in a mint like
14:50interface whether on mobile or on the
14:52desktop yoder Lee is typically the
14:54player behind the scenes that's
14:55aggregating all of that but then yotally
14:57actually retains all that information as
14:59well and they can use it for on an
15:01anonymized basis their own purposes that
15:03didn't exist before so people are doing
15:06all sorts of cool things on that data as
15:08well to figure out you know what's
15:10happening in the world so it really
15:11depends on whether or not you were like
15:13that there's the I have to build a
15:14cooperative and there are only ten
15:16companies that have this data and I'm
15:18going to be the UN between them you sure
15:20that's very very valuable but it's very
15:22very hard to be the UN because these are
15:24very very large monolithic companies
15:26that can't agree on anything and getting
15:28them to agree to work with you or
15:30that matter is a that's an uphill battle
15:32if you can get it there's a lot of value
15:34there I tend to like the companies that
15:36they're not reliant on playing
15:37peacemaker with ten but there are
15:39thousands and then eventually you can
15:41build up with thousands and then sure
15:43those ten have no choice but to use your
15:45information because acting in a
15:47centralized manner it's so important and
15:48there's nothing else quite like it going
15:50back to the to the network effect piece
15:51I think there's a lot that these
15:53healthcare side can learn from the
15:55FinTech side my assessment of things is
15:57that it's maybe a little bit further
15:58behind and there's a lot of different
16:00reasons for this one reason is even just
16:02the use of electronic medical records or
16:04um ours is only relatively recent and
16:06and that's that's really changing but
16:08that's much more recent and to speak to
16:10Alex's point there are generally just a
16:13few big players there's not like a
16:15thousand health insurance companies or
16:17something like that so there are these
16:18new challenges but I think I'm always
16:21curious when Alex and I chat to see what
16:23what tricks can be borrowed from the
16:24FinTech space into the healthcare space
16:25so one question and you may not have the
16:28answers for it but I think it's worth us
16:30discussing is sort of the ethical
16:31implications of users in a system where
16:34the biggest value were the network
16:36effect now accrues from data and as it
16:38is users are always you know there's a
16:39lot of advocacy groups who say like
16:41users should have the right to extract
16:43their data and do whatever they want
16:44with their own data which is a separate
16:46point but related in the sense that it
16:48touches on how much agency who has that
16:51agency and what are the ethics
16:53associated with all of this VG we should
16:55probably start off with you because I
16:56think with HIPAA it's automatically a
17:00yeah there's HIPPA which you know
17:01requires the anonymization and sometimes
17:03that is not as obvious as you might
17:05think it's not just removing someone's
17:07name if someone has a scan of your brain
17:09like an MRI of your brain is that
17:12anonymized is that because maybe that
17:14could go back to you is your genome
17:15sequence anonymized just having that
17:17sequence alone might be enough to be
17:19able to connect it to you with the blood
17:20test probably it is and so it's actually
17:22a much more of a profound sort of
17:24philosophical issue to think about but
17:26on the flip side the upside could be
17:29really quite huge it could be the
17:30difference between pooling everyone's
17:32information to be able to predict
17:33whether you're going to get cancer or
17:34not and I would like to have my
17:37information in there and I'd like to
17:38know those things and so there's going
17:40to be something that we're going to have
17:41to sort of figure out on the
17:42and policy side to figure out what's the
17:44best thing to balance these two forces
17:45the other interesting point there is
17:47that there is I mean in economics there
17:49was this concept of the public good or
17:51free rider problem and you often have
17:53that so going back to reads and writes
17:54everybody wants to read but nobody wants
17:56to write and in many cases like I mean
17:58if writing means giving your blood and
18:00actually going to a phlebotomist and
18:01getting blood withdrawn from you like
18:03reading is very easy readings a lot of
18:04fun writing actually requires a lot of
18:06work and so there are two ways that I
18:09think about that that's that that's
18:10obviously a health-related analogy but I
18:12think of that about this in terms of on
18:14the one hand you've got kind of
18:16regulatory issues and there's also just
18:19like a lack of consumer understanding so
18:22I remember a good friend of mine who's
18:23not very literate computationally or
18:25technology technological II was saying
18:27oh my god Alex I have all these cookies
18:29on my computer like I'm being tracked
18:31this is terrible how do I like cookies
18:32are dangerous and I happen and I think
18:35some tech column has contributed to all
18:37of his confusion over cookies I was
18:38trying to explain to this friend that if
18:40you go to the New York Times do you want
18:42to have to log in to the New York Times
18:43every time you go to New York Times coms
18:45like no I'd hate to log in every time
18:46that's annoying it's like well that's
18:47what a cookies doing it's remembering on
18:49your own browser some information so the
18:51New York Times can reference you and
18:52actually D anonymize you and then when
18:55he understood it that way was like oh I
18:56like cookies it was just this this kind
18:58of fundamental misunderstanding the
19:00benefit to the users sort of greater
19:02than right so part of it is it is the
19:04free-rider thing of like sometimes
19:06providing data actually makes you better
19:09like if I'm willing to give up more
19:10information for insurance purposes like
19:13okay will I let my car insurance company
19:16see how fast I'm driving and on the one
19:18hand that sounds like really really
19:20spooky like oh my god they're watching
19:21what I'm doing and then Big Brother this
19:23in 1984 that that sounds terrible on the
19:26other hand if I'm willing to give that
19:27up and I show that I never drive past
19:30the speed limit I never veer out of my
19:31lane you get a better insurance I he had
19:33a better insurance right so part so part
19:35of it is like it's not caveat emptor
19:36it's like whatever the latin phrase
19:38would be like choose your own destiny
19:39kind of thing some people will value
19:42time more than money some people value
19:43money more than time the same thing goes
19:45with privacy some people value privacy
19:47more than money some people will value
19:48money more than privacy and I think part
19:49of it is just making it transparent so
19:51that that's one side the other side is
19:54is how you educate people
19:56right I think how you talk about it to
19:58your point transparency how you talk
19:59about it and sometimes giving users a
20:01choice to opt in or out of a system it
20:03also doesn't have to be black or white I
20:04think with especially with the machine
20:06learning you could learn features from
20:08data without having to share the data
20:09itself and that's useful for IP or for
20:12hip or and so on so I think there's a
20:13lot of ways that one can contribute to
20:15network effects without making your data
20:17even publicly known or even exchanging
20:20data necessarily right and I think
20:22actually in most cases the benefit of
20:23the doubt I mean right now it's like the
20:26company that's using data is the evil
20:28company and they're up to some
20:29pernicious whatever and that's almost
20:32never the case I just think that that's
20:33a lot of like Congress goes and
20:35investigates company XYZ because they're
20:37using data or what are they doing with
20:38consumer data and part of it is like the
20:40default assumption is that these guys
20:41are out to get you in most cases that's
20:44not true and there are a lot of good
20:46things that do come from being part of
20:48this cooperative and I think as people
20:50do like I would love it if you don't get
20:52charged more because that's where like
20:55how would people react poorly or
20:56negative or positively poorly your
20:59insurance company says hey we saw that
21:00you were speeding you're getting charged
21:02a lot more can you imagine how terribly
21:04people would react to that it's like up
21:05and up in arms congressional inquiries
21:07bla bla bla on the other hand if you got
21:09a giant rebate check from your insurance
21:11company saying hey you've been driving
21:13very safely or you haven't gone to see
21:16the doctor in a long long time and the
21:18last time you went to go see the doctor
21:20all your vitals were better here's a
21:21rebate check people would love that and
21:23that's coming from data as well so I
21:25think part of it is just the psychology
21:27of how you how you reward people for
21:30sharing their data when in many respects
21:32it's already being shared anyway you're
21:33right okay so this has been helpful so
21:35far so let's talk about the fact that we
21:37think data network effects are really
21:39important for software based companies
21:40especially in this age as you mentioned
21:42in machine learning deep learning ai all
21:44the things kind of trends coming
21:46concretely can entrepreneurs do to a
21:48build data network effects or think more
21:51strategically about it early on versus
21:53by accident and secondly what do you
21:55want to see in pitches from
21:57entrepreneurs when they talk about data
21:58network effects yes so you know in terms
22:00of a start-up usually starts do well
22:03when they focus on one area and so the
22:05challenge here is that how can the data
22:10accelerate what they're doing I think
22:11too often that what happens is the de
22:13network effect almost suggests a side
22:15business or something like that and and
22:16so one challenge is how to think about
22:18what is the rule go to market is the
22:20Dana Network effect really germane and
22:22central and key to the focus and then
22:25how can you monetize it how can you take
22:27advantage of it what often happens is I
22:29think there's the aspiration for taking
22:32advantage of the day and network effect
22:33or the assumption that it will just come
22:34but often we see situations where maybe
22:38that plan hasn't been well thought out
22:39yet you know I would say that in many
22:41cases it's about going up the value
22:43chain so starting at the bottom where
22:45your data doesn't like you're you're
22:47accumulating rights with the purpose of
22:49hopefully charging for reads down the
22:52road and/or hopping across different
22:54verticals so you start off in vertical X
22:56where again you have your right heavy
22:58which is great because every write that
23:01you're getting is more data that you can
23:03eventually learn from and even if you're
23:04not learning from as we talked about
23:05there's a network effect that might play
23:07out there and then eventually you go
23:08into an area where it has high monetary
23:10value and you're charging for reads but
23:12you're still continuing to get rights
23:14along the way and I think economics is
23:16really the best way of looking at how
23:18effective this really is because there
23:20are a lot of people that claim I have a
23:21data network effect I have a data
23:22network effect or they'll say I will
23:24have one eventually and it's like sure
23:26like Google had one eventually or
23:28Facebook had one eventually I mean
23:29everybody has one eventually
23:31but it's very hard to prognosticate that
23:33eventualities when that happens and how
23:35do you make it more deterministic versus
23:37kind of economics comes in so like
23:39imagine that you're at the stage where
23:40you actually are charging for your
23:41product a good sign is that assuming
23:44that you kind of started off in the low
23:45monetary value area and now you're
23:47charging for reads in the high monetary
23:48value area if you are charging more than
23:51the incumbents I mean normally you say
23:53oh if I can like charge one-tenth as
23:54much then it's going to be very
23:55disruptive and I'm shrinking the market
23:57but you actually have the opportunity to
23:59charge a lot more value-based pricing so
24:02if you can really show that you're
24:04charging twenty thirty forty percent
24:05more than the competition that's a and
24:08they're actually willing to pay for it
24:09and they're switching from a lower
24:10priced product either they're totally
24:12irrational they say hey I want to lose
24:14more money this year and increase my
24:16cost which by the way almost never
24:17happens or you've actually demonstrated
24:20in the eyes of many many customers that
24:21they are willing to pay
24:22or because your data is better and
24:24they're contributing back to this
24:26collective as well which almost de-facto
24:28means that you do have a data network
24:29effect and it's not about
24:31prognosticating it's like it's actually
24:33real that's a great example I will say
24:35that the exception that seems to me is
24:37in the very early days of a company
24:40where people are actually not quite you
24:42can't really use proxy the proxy people
24:44paying quite yet so then how do you sort
24:46of figure it out there can be other
24:48network effects I mean Google and
24:49Facebook had clear other network effects
24:51and that sort of helped create the day
24:52and network effects so I think that's
24:54one mechanism by which this can be
24:56bootstrapped or likewise if as long as
24:59the entrepreneur has a pretty clear plan
25:00and they have access to a lot of rights
25:02and they don't necessarily have to be
25:04charging for those but it just seems
25:06pretty evident like the hard thing is in
25:08my example on e-commerce fraud how do
25:10you get twitter to sign up to supply you
25:13with the rights or how do you get some
25:14other like massive publisher that
25:16doesn't really have that much economic
25:17downside from fraud but has tons and
25:20tons of data if it was actually assigned
25:22to a company like this if you sign up
25:24ten of those it's almost very easy to
25:26see the blueprint of wow you've solved
25:29the biggest problem which is you now
25:30have the rights you have a different
25:32more execution oriented problem which is
25:34how do you go charge for the reads and
25:35how do you show that you have enough
25:37value but at least you've solved the
25:38right side of the database so what I'm
25:40really hearing is a theme for you guys
25:41is you know it can happen by accident
25:44but if you have a plan if you're even
25:46aware and intentional about some of the
25:49decisions you make those are all
25:51contributing factors to actually just
25:52create and be better at building data
25:55and the other thing going back to this
25:56winner-take-all things like you're never
25:58going to get to a network effect if it's
25:59a I mean if there are 25 companies doing
26:02exactly what you do and they're all
26:04about the same size and nobody gets the
26:06big that nobody has like a just
26:08demonstrably better system then the data
26:11is actually it looks more like the
26:12algorithm remember I we talked about how
26:14the algorithm gets like one percent
26:16better every year and this company can
26:18out algorithm that company until they
26:20can't the same thing goes for data if
26:22nobody really gets to that critical that
26:24critical point then it's never going to
26:27be that much better and you can't charge
26:28excess rent either great so any parting
26:31advice for entrepreneurs before we wrap
26:32it up you know in the data science side
26:36some of the best data Sciences are ones
26:37where people can go deep within the
26:39domain and it's something where it's not
26:41just taking off the shells algorithms
26:43and so on and this is especially
26:44important in this case with a data
26:45networking effect because this is where
26:47speaking to Alex's discussion of data
26:49and algorithms often the two are really
26:51tightly connected and having a deep
26:53experience in the domain and on the
26:55algorithm side can really bring that
26:56we call it founder market but it's
26:58almost like data algorithm founder fit
27:00and there's a profound HR implication of
27:02this as well because the algorithm set
27:05is related to the data side because if
27:07you have the best data then guess what
27:09you're gonna be able to hire the best
27:10people because if you're an amazing
27:12statistician you don't want to work on a
27:14database that has five rows on it you
27:15want to work on I did is that has five
27:16trillion rows in it and if you have that
27:18then you come up with a better algorithm
27:20and therefore more people want to
27:22contribute data you're getting more
27:24rights therefore you're getting more
27:25reads and that kind of continues on and
27:27on and the the HR component is very
27:29important because the best people again
27:31you can attract them to a start-up I
27:33often advise companies where they say oh
27:35you know we're gonna hire these five
27:36data scientists but they don't have any
27:38data yet and what they don't really
27:39realize is that if these are if their
27:41data scientists who are happy to take
27:43out the trash and you know clean the
27:45toilets and do all the other things that
27:47are fun about running a startup then
27:48that's fantastic but if they really
27:49really are very laser focused on this
27:51one task they're gonna burn out or not
27:53even burn out they're just gonna leave
27:54there's there's nothing to do
27:56chronologically I mean it's great if
27:58that's in the founding team DNA but you
28:01also just have to be careful about not
28:03over building until you actually know
28:05that you have enough there when you say
28:07not over building you mean well I mean I
28:09just look at a lot of companies where
28:10they say wow you know we're gonna hide
28:12we have tended like here's a new ad
28:13network or its new this and we have 20
28:16data scientists and they're amazing I
28:18see these companies all the time they
28:20have like just their overweighted on the
28:22data science side and they have no data
28:25yeah and in like what they don't realize
28:27is that like they're increasing their
28:29burn like just yeah if they can't find
28:31these people and it's a 20-month hiring
28:33process then then okay but they're gonna
28:35lose these people unless they've managed
28:37the other side of their network unless
28:39they manage the supply side of the data
28:41and actually figure out how they get
28:42those people how they get the rights
28:44coming in the door you're gonna lose
28:46your team that's the vicious cycle the
28:48virtuous cycle is obviously appa
28:49if you have data you can hire the best
28:51people if you have if you hire the best
28:53people you get the best algorithms you
28:54get the best clients you get the best
28:56data I'm glad you brought that up
28:57because you're actually focusing on the
28:58flywheel effect and for data network
29:00effects as including talent as a
29:02component well there's a flipside of
29:03this which is that I think if the data
29:05science is tacked on too late it also I
29:07think hasn't strongly connect them so
29:10what Alex spoke to I think he mentioned
29:12it being in the founder DNA I think
29:14that's what I loved were in terms of the
29:16vision of the company from the beginning
29:17it's there but not over built you know
29:21before you're ready to go to war in that
29:23area but for that to be in the founders
29:24DNA I think it's perfect yeah and and I
29:27think part of that is just what is the
29:29architecture of I mean like you know I
29:32was mentioning the rows of the database
29:33if if it's a non-technical team and
29:36they've got two columns to their
29:38database they're not really collecting
29:39that much stuff that makes it harder to
29:42actually append the data scientists to
29:44do all of these great things especially
29:46if you end up becoming a data company by
29:48accident so imagine that you are a
29:50background check company what is an
29:52exhaust of a background check company
29:54well it's how many people are applying
29:56for jobs I mean if there's all sorts of
29:57interesting things on the data side but
30:00hopefully you are collecting things not
30:01in freeform text entry but you're doing
30:03it in a much more itemized way we're in
30:07a much more like defined in controlled
30:09way or enumerated way we're a pre
30:10enumerated way where you can do things
30:12that are much more relevant so this is a
30:14bland generalization but is it fair than
30:16to say that almost by definition as
30:18every company becomes a software company
30:20that every software company is by
30:22definition a data company well maybe I
30:24think it's just part of it depends on
30:26how lucrative your primary business
30:27model is because every company of scale
30:30has amazing exhaust and the question is
30:33whether or not you want like you know
30:35I'll go back to visa visas exhaust is
30:37very very valuable but if they shared
30:39that then their clients wouldn't like
30:41them very much and then they lose their
30:42clients so even though they're exhaust
30:44is worth it's probably worth billions of
30:46dollars a year you can predict the
30:47economy down to the earth would pay for
30:50that it just can't do it and they're not
30:53making a mistake by not doing it so yes
30:55they are a data company but they also
30:57are a network and being a network is
31:00probably more important than being a
31:02company but it really just depends on
31:03the particular use case I mean yeah I
31:05think you will have different products I
31:07mean every company that gets to scale
31:09that's touching enough consumers or
31:10businesses will have the opportunity to
31:13have a very very valuable data suite
31:15beyond just whatever they use for their
31:17own purposes but the question is whether
31:19or not they want to and part of that is
31:21just how lucrative like Apple could be
31:22the biggest you know insert the X there
31:24a lot of that pertains to data but Apple
31:27makes too much money selling an iPhone
31:28so I don't think they're gonna do that
31:30you know towards that end the accident
31:32that Alex referred to is often really an
31:34inevitability that you know this will
31:36happen the question is what do you do
31:38with it and is it part of your core
31:40business or is it something that you
31:41have to leave on the side well thank you
31:42guys and we'll talk more about this a