00:00hi everyone welcome to the a 6 & Z
00:02podcast today's episode continuing our
00:05series on translating AI into practice
00:07is one of our shorter bites based on a
00:09panel discussion that took place at a
00:11recent annual a6 and z summit event just
00:14operating partner Frank Chen who put out
00:17a micro set on getting started with AI
00:18earlier this year talks with Yan Strika
00:21co-founder of data bricks and Scott
00:23Clarke co-founder of cig opt and both
00:26have been on this podcast if you wanna
00:27hear more from them in other episodes
00:28about the cold start problem for
00:31companies getting started with AI
00:32especially focusing on the role of data
00:35scientists and domain experts in this
00:37context you guys now between the two of
00:41you have now sort of been with the
00:43customer on their journeys from sort of
00:45day one until they have models in
00:47production and so what advice do you
00:49have for people who aren't Google Amazon
00:51Facebook Apple to realize machine
00:53learning what do they need to do on day
00:54one we have many enterprise companies
00:56and out of them over 70% actually they
00:59have AI projects and what we see
01:02actually if you take the step back there
01:04are three stages the first stage is to
01:07make sure that you have the data many
01:10times this takes more than actually
01:12building the machine learning or AI
01:15model the second thing is about once you
01:17have the data to become so to speak to
01:20operationalizes to become a data-driven
01:22company to figure out what are the KPIs
01:26key performance indicators which are
01:28going to be driving your business you
01:31need to take this KPI is based on the
01:33data and operationalize meanings to have
01:35reports dashboard and so forth and now
01:38once you have this then you are going to
01:40start and use machine learning and AI to
01:43improve this kpi's so that's kind of the
01:46journey so that sounds great you have
01:49this sort of very methodical process
01:51oriented roadmap to get from here to
01:54there so tell me where can I go wrong
01:56where are the pitfalls where have you
01:58seen people get stuck on this journey
02:00yeah at every single one of those stages
02:04there are pitfalls that you're going to
02:05need to try to avoid
02:07from just making sure that you have the
02:09right data that it represents what's
02:11actually happening in the real world to
02:13defining those KPIs and metrics there
02:16needs to be this huge contextual
02:18component and I think that's where data
02:20science is moving towards as more and
02:22more of these more arduous tasks its
02:24automated that you need to be able to
02:26say what's actually right for my
02:27business and what am I actually aiming
02:29for and then of course it's how do I get
02:31there as efficiently as possible again I
02:34cannot emphasize enough how important is
02:36the data and this is a continuous
02:39process you need to devote resources on
02:41a continuous basis to make sure the data
02:43is correct because you know you are
02:44going to get data from your sources you
02:47are going to change the software which
02:49logs some of the data everywhere you can
02:52you can have mistakes can happen and
02:55like they say you know garbage in
02:58garbage out no matter what it's house
02:59map is what you have in between so
03:01that's number one so you really need to
03:03be paranoid about your data collection
03:05the accuracy of your data I think the
03:09other thing is when I said about the
03:11second stage typically it's about
03:13figuring out what are the KPIs
03:16that's all you know actually when you
03:17hire data scientists actually having
03:20data scientists which have a good
03:21understanding about your business or can
03:24work with people the business people
03:27it's extremely important
03:29fundamentally data science is about you
03:32know you're not Romney's now statistics
03:33and you know to know math and of course
03:36machine learning but you need to either
03:39be a domain expert in what you are doing
03:41or work well with domain experts so I
03:45had asked what are the pitfalls where
03:46can it go wrong I'm gonna ask the
03:48inverse of that questions so number one
03:50is productivity of people it's hard as
03:54you know getting hiring best data
03:56scientist and retaining them it's a nice
03:58best thing you can do is to make them
04:00more productive even more to make your
04:02organization more productive by allowing
04:03them to share the artifacts they build
04:06in terms of models which everyone is the
04:08organization sometimes it will be as
04:09simple as using a model as writing the
04:12sequel query so I think that's a very
04:14important aspect the other one which is
04:17related with that time-to-market right
04:21it's basically you know there have many
04:23companies we can cut the time tomorrow
04:25from idea to product by one or remind
04:28you I want to go back to this sort of
04:30getting started the cold start problem
04:32right in AI cuz I've met with hundreds
04:35of companies now who are beginning their
04:36AI journeys and if I were to summarize
04:38their frustration it would be this it's
04:40like you Silicon Valley guys drive me
04:42crazy you told me I couldn't run on bare
04:45metal I had to run on hypervisors and
04:47then you said I can't run in my own data
04:49center I have to run in the cloud and
04:50then you have to build an iPhone app
04:52that's native you can't just do mobile
04:54web and you have to do big data analysis
04:57and get really good at analysis and now
04:59like you're coming and telling me I have
05:00to do AI and machine learning like I
05:02can't keep up there's too much stuff so
05:04as you think about the companies that
05:06have been successful with their projects
05:08how do they get over the cold start
05:10problem do they hire consultants do they
05:13repurpose internal engineers do they
05:14send into training classes do they hire
05:16people from all of these data science
05:18boot camps yes so I think it's it's a
05:20very hard problem so as any hard
05:22problems that is now single silver
05:24bullet so we try to solve this problem
05:26by emphasizing on different aspects
05:29everything from education deployment and
05:32so forth the one thing I want to also
05:34mention again from our observation the
05:36small companies actually they start with
05:38the AI mindset they're building the AI
05:41platform to solve a specific problem as
05:43opposed to being an incumbent that's
05:45then trying to apply AI to what they
05:47already have but let me talk a little
05:49bit about the enterprise you know there
05:50are 50 years or even some cases over 100
05:53year old companies so they want to use
05:55AI again to improve their business
05:57competitivity so what we see is that the
06:00enterprise which are the most successful
06:01they go all-in what do I mean is because
06:04they have multiple projects is not only
06:07one project and yes you can try with one
06:09project and so forth to kind of test it
06:12but at the end of the day it's hard when
06:15you start a data science AI project to
06:18know is very successful in many cases
06:19goes down to the fact that even after
06:22you have the data it may not be enough
06:24to get the kind of improvement you
06:25expect so is think about is like hedging
06:28these other companies who have multiple
06:30projects they are doing you know some of
06:33these projects are going to be
06:35access for but not all of them can be
06:37successful do we know companies which
06:39actually very technical and sample the
06:41project fails because there is not
06:43enough data so you believe that it's
06:44enough data but it's not enough is that
06:46it's not enough signal at least is what
06:48we've seen and one of the things we see
06:50is different than kind of these
06:51traditional approaches is that
06:53used to take maybe a decade to kind of
06:56move from your own bare-metal data
06:58centers to the cloud and things like
07:00that but now like all the pieces are
07:02kind of coming together for a I like a
07:04lot of these traditional bottlenecks
07:06that would have traditionally taken the
07:07enterprise oh we need to do this over
07:09five ten years now you can kind of get
07:12up and running very quickly like the the
07:14pieces are there to move very quickly so
07:16I think that cold start problem where it
07:18used to be this huge threshold where you
07:20had to get over is now becoming easier
07:21and easier and there's less of an excuse
07:24why you're not actually doing it to be
07:25honest that's a perfect springboard to
07:27my last question which is we're in this
07:30cycle right now where the tools are
07:31improving rapidly right and so what used
07:34to be a black art can now be an API call
07:36million-dollar data science integers now
07:38it's an API call away so if I'm an
07:41organization shouldn't I just wait for
07:43the tools to get better like why do I
07:45need data science or maybe another way
07:47to ask the question is how does the data
07:48science job change over the next two
07:51years as the tools get much better I
07:53think it's all about that context so
07:55once again tensorflow is an incredible
07:56tool it's a way to kind of get up and
07:58running very quickly with deep learning
08:00but it's only as good as what you
08:02pointed at and this happens all the time
08:04we can tune any underlying system we can
08:06only tune it towards the metrics you
08:08point us at we'll hit any target in the
08:10world but if you point us at the wrong
08:12target we'll hit that wrong target
08:13better than anything else in the world
08:15and so the idea is you still need the
08:17data scientists to really understand
08:19what it is that you're trying to achieve
08:21as a business and how does that relate
08:23to your customers relate to your unique
08:25data sets and how do you actually
08:27differentiate yourselves from your
08:28competitors and I think there's going to
08:30be a lot of tools that make it easier to
08:32do that but at the end of the day you
08:34need to know where you want to go with
08:35the business yeah so I cannot agree more
08:37so fundamentally like we discussed many
08:40times is the most important things is to
08:42figure out what are your business
08:44objectives and whatever you improve the
08:46related with this business
08:47Jakob's so that's why the data science
08:50they have to be accurately aware about
08:53the context and all these tools it just
08:55allows them to get there faster to build
08:59to process more data to hit this target
09:03faster like you said but if the studies
09:05are wrong but they are not going to move
09:07the needle is not much you can do
09:09everybody wants to do AI but it it
09:12doesn't really help to do it for the
09:13sake of just doing it just checking a
09:16box and saying okay now we're doing AI
09:17isn't enough you need to know what it is
09:19you're shooting for and sometimes in
09:21like financial services that might be
09:23relatively easy I just want to make as
09:25much money as possible but in other
09:27industries it might be more difficult
09:28and setting up that success criteria
09:30early will be helpful to make sure that
09:32you build towards the right goal and
09:34then eventually optimize towards it well
09:37Scott Yun thank you for joining us thank