00:00hi everyone welcome to the a6 & Z
00:02podcast I am sonal today's episode is
00:04all about storage with the cost of
00:06system memory decreasing memory for boat
00:09storage and compute will be the exact
00:10same thing so as we enter a new era of
00:13distributed computing and what Peter has
00:15also argued in a popular deck is the end
00:17how does storage evolve how is this
00:19affected by trends in computing such as
00:21machine and deep learning joining us to
00:23have this conversation today our hy CEO
00:25and co-founder of Alexio formerly
00:27tachyon which came out of the UC Berkley
00:29amp lab the birthplace of other industry
00:31defining technologies such as spark and
00:33ms-dos general partner Peter Levine who
00:35is funded memory centric infrastructure
00:36companies at every level of the Berkeley
00:38data analytics deck the badass deck and
00:40Mike Majid senior analyst at Tunisia
00:42group which covers everything related to
00:44Big Data compute and storage okay so
00:47that's the intros to kick things off I
00:49just have to ask why should we care
00:51about storage I feel like it's the dark
00:52underbelly of computing that no one
00:53really cares about look I mean while
00:55storage may be the underbelly without
00:57storage computers wouldn't work and so
01:01it's one of the most important you know
01:04compute networking and storage or the
01:06three fundamental elements of what makes
01:09the entire Internet work it makes cloud
01:11computing work and without storage you
01:13wouldn't have databases and without
01:14databases you wouldn't have Big Eight if
01:16you wouldn't have analytics you wouldn't
01:18have anything because information needs
01:21to be stored and it needs to be
01:22retrieved so storage is hugely hugely
01:24important and you know it the
01:27interesting thing is I think we're in a
01:29very transformative period of time here
01:31where storage is undergoing a bit of a
01:35renaissance and I think it's going to
01:37transform how computing and applications
01:40work in the not-too-distant future when
01:44I got started storage I thought hey this
01:46is really the stayed the the
01:49tried-and-true stuff compute was where
01:51it was at you know there's all these
01:52advantages in advanced is happening in
01:55client server and then cloud and new
01:57chips coming along every year but the
01:59more I got into storage the more I
02:01figured out that storage is really the
02:03most complex part of that equation it
02:05takes a lot of effort to protect data to
02:08manage data data has gravity it has
02:12it has wait in history so storage is
02:14really the critical piece to get right
02:16wait what do you mean when you say that
02:18data has gravity and momentum data has
02:20to live somewhere you know compute can
02:23be spun up in a cloud it's a little
02:26ephemeral you can repeat it you can spin
02:28up and down virtual machines but data
02:31actually has to have a footprint
02:32somewhere and that footprint has to be
02:34persisted and protected and secured and
02:37of course in this case made accessible
02:39or the data has no valid height but how
02:41is that different than what we have
02:42right now that it requires a new form of
02:44storage I'm gonna use a phrase I say a
02:47lot in the podcast I think it's actually
02:49really true of our times which is when
02:51we say there's a lot more data that's a
02:53difference of degree not just kind why
02:55do we need a different type of solution
02:56why can't we just keep doing the same
02:58things that we were doing before but
02:59just do it bigger and better so I think
03:02this is like many other things in the
03:04world like when you make a cell phone at
03:06the beginning itself and just make a
03:08phone call and now itself is a different
03:10type of cell phone similar thing is also
03:12happening in the storage industry as
03:13well at the very beginning in storing a
03:16block device is the base and bit but
03:18just bits raw data beyond the block
03:20devices we had file system a different
03:23type of file system now we have blob
03:25storage object storage and in the
03:28meantime we have so much innovation in
03:30open source area as well and you have
03:33public cloud storage from several huge
03:35vendors in the world like Amazon Google
03:38Microsoft Alibaba etc you have different
03:42type of storage solution provided by
03:44traditional wonders like EMC like HPE
03:47IBM the providing this new innovations I
03:50think that will pale in comparison to
03:54what's going to happen over the next
03:55let's say decade here when we think
03:58about information and data there's a an
04:02entirely new phenomena that's really
04:04just kicked in relative to what is data
04:07what is data that's a very existential
04:09question well up until right now compute
04:13data has largely been input by some
04:15human being typing on a keyboard or a
04:18database recovering a record that's the
04:21input of a human asking the computer
04:24data largely has been put in there
04:26through human fingers and through some
04:28human interaction fast-forward to right
04:31now let's just talk about a self-driving
04:33car that has sensors those sensors are
04:36now inputting data that's the world
04:40around us and so there's completely new
04:42types of data so what is data now far
04:46exceeds the human input data we are now
04:49collecting the world's information via
04:52sensors and all of that needs to be
04:54processed and stored it will be
04:56literally orders of magnitude and in the
04:59exact mathematical sense orders of
05:01magnitude more data that needs to be
05:03stored in process so that's sort of
05:04that's sort of point one on what's
05:08happening secondly a mobile supply chain
05:11is influencing the data center and
05:14influencing the cost curves in the data
05:16center for storage you take a mobile
05:18phone and you take the components of
05:20that mobile phone and put it in the data
05:21center you have a a very inexpensive
05:25storage substrate that is far less
05:28expensive than the enterprise systems
05:29that we saw in the past
05:31and so the cost curves come way down we
05:35will have much more in memory data
05:38systems that literally live in real
05:41memory and the notion of disk drives and
05:44tape drives and even SSDs will all go
05:48away I believe that there's a future
05:50here where memory architecture is
05:53completely flattened computing has been
05:55built on slow cheap and fast and
05:59expensive and I believe that we're gonna
06:02be fast and cheap I mean it sounds like
06:04it'd be obvious but what does fast and
06:05cheap really do for us when it comes to
06:07the so called storage Renaissance fast
06:09and cheap means that we can collect
06:11massive amounts of information put it in
06:14memory not have to put it out to disk
06:17drive and do all these you know
06:18backflips to get data to work correctly
06:20it's going to all be in memory and it
06:23will be very inexpensive and that's the
06:25Renaissance in whether you call it
06:27storage but more importantly data and
06:30the importance of data and the
06:31correlation between the volumes of data
06:35the price curves in in in the
06:38not-too-distant future for what I'll
06:40call storage even though its memory and
06:42those pieces coming together that to me
06:45is the Renaissance that's happening in
06:47computing I totally grew and actually
06:49just add one more points to that is that
06:51we actually should view memory as a
06:54frontier of storage exactly it's a tier
06:57of store exactly I would argue that it
07:00is the tier of store that over time
07:02there is no other storage
07:04it's just memory certainly we've been
07:07seeing the rise of memory class storage
07:09already being talked about by vendors
07:11and bringing persistence to memory will
07:15completely overhaul how compute in
07:18storage is envisioned today because
07:20tomorrow data is going to live in the
07:24compute devices those are going to be
07:25more Internet of Things devices and be
07:27far more distributed as well
07:29but I also want to temper that with the
07:31thought that we've also talked to a lot
07:33of these big storage vendors and they're
07:36forecasting that there just simply isn't
07:38going to be enough storage for all the
07:40data we're collecting in a midterm
07:42horizon like three to five years that
07:44we're creating so much data there won't
07:47be enough chips there won't be enough
07:48hard drives there won't be enough tape
07:50there simply isn't going to be enough
07:51storage out in the world which I do want
07:54to point out means that there's still
07:55some opportunities for things in storage
07:57management people to consider how many
07:59copies of that date am I making do I
08:01have to take the compute the processes
08:03out to where the data lives do I have to
08:06bring the data centralized and make
08:08copies of it or can I do something more
08:10optimized with how I organize my
08:13architecture and only store the data
08:15once only compute data once in one place
08:17to draw a fine point here we are we are
08:20entering a new world of distributed
08:22computing and if you think about the new
08:25world of distributed computing the data
08:27that gets collected in a in a
08:29self-driving car or some endpoint is
08:31going to be prot the information will be
08:34processed at that endpoint it won't be
08:36translated back to a central storage
08:38pool the information will be curated and
08:41then transmitted back or we'll be
08:43collecting massive amounts of
08:46self-driving car collects 10 gigabytes
08:48of data a mile write like some
08:51ridiculous amount of data you know
08:53there's not enough storage on the planet
08:55to ever hold all that information so the
08:57curation is going to occur at the edge
09:00close to the compute and the quote
09:03unquote storage will be processed at the
09:06edge and then important information will
09:09come back to some centralized data store
09:12but all that computation at the edge in
09:13the storage of it and kind of the the
09:16permutations of it is exactly the
09:18Renaissance that I believe needs to
09:21happen in storage even to process this
09:23stuff so another huge issue here is that
09:25it may the ecosystem much more complex
09:29than before all the big enterprise
09:31companies it will try different
09:33innovations they will have their
09:35existing storage and past storage new
09:38storage formed very complex systems and
09:41makes this hard to manage how to consume
09:44and the many cases is not cost-effective
09:47as well and this is one thing we're
09:51seeing requested by the customers many
09:54big enterprise in the world is that how
09:56connect consume and and these data from
09:59different storage systems easily and
10:01manage them efficiently so this just
10:04because they have a hodgepodge of like
10:05all these different storage systems or
10:08is it that it's just buried in the same
10:09place but under a bunch of different
10:10interfaces and tools or like what's the
10:12problem really the data is stored in
10:14different storage systems just give you
10:16a very concrete example if you talk to
10:18this department their data is stored in
10:20you know public cloud storage maybe in
10:22probably Amazon s3 or Google Cloud
10:25storage and another department they have
10:27some data stored in the EMC storage HPE
10:30storage you have another department say
10:33they have my own private cloud storage
10:35another group they want analyze the data
10:37inside the enterprise in the end of the
10:40day why people want data people want to
10:42use data to generate values which means
10:45use data to make decision or facilitate
10:48making decisions the more data you have
10:49you've analyzed the best result normally
10:52you get so this existing environment is
10:54very hard we trying to tackle or
10:58taking memory as a first-class citizen
11:00in a storage in a storage system who
11:02have a memory century architecture and
11:04viana how to manage the data or how to
11:06access the data from different storage
11:09systems in the most effective manner
11:11what's pausing that for a quick moment
11:12why does in-memory aspect matter so one
11:14one side is of course about performance
11:17and performances memory is much faster
11:19than than SSD or HDD and from the other
11:22perspective at a cost the cost is
11:25decreasing very fast it's about every 18
11:28months the cost if decreased by 50% so
11:32that 2 points performance plus the cost
11:35which regular capacity that 2 points
11:39together MIT is MIT now is the right
11:42time to build memory as a tier of
11:46storage I think that machine learning is
11:49the application that unlocks much of the
11:52new in memory systems it's not so I mean
11:54machine learning is the next generation
11:57of big data and what is machine learning
11:58it's iterations over large large large
12:01data sets to come up with better ways to
12:05forecast and better ways to utilize
12:07information the only way so in order to
12:11unlock the power machine learning in a
12:13time-sensitive fashion is to actually do
12:16these computations in memory because if
12:18you have to what's called go go out to
12:22the disk drive to get information or go
12:24out to an SSD the time to seek for that
12:28information and look for it is a huge
12:30penalty when you're dealing with massive
12:32amounts of information to the extent
12:34it's all in memory I can operate on it
12:37very quickly and do many more iterative
12:39sets in a shorter time frame giving the
12:43results that might be needed in a
12:45machine learning or AI environment
12:47unless we get to in-memory processing
12:49and in-memory data structures machine
12:52learning doesn't really work yeah and so
12:54we have to come up with these ways of
12:57having much more in process high
12:59fidelity storage that is this new tier
13:02and this exact is you know this is
13:04exactly what's causing what I would
13:06argue this renaissance to occur
13:09over the next several years here I'll
13:11just add that supercomputing a couple
13:14years ago was really inaccessible to
13:16most people and in a super queueing
13:18environment every node is highly
13:20networked to every other nodes because
13:23they needed to communicate State and
13:26information between them when you have
13:27Hadoop and MapReduce they could
13:30partition certain categories of problems
13:33and run them in parallel but not really
13:36a lot of machine learning algorithms
13:38they just didn't don't work that way
13:40they require more of that
13:42interconnectedness and that
13:44communication between nodes and between
13:46memory and data sets are lots of
13:48iterations on the same data so spark and
13:52in-memory approaches to machine learning
13:54really accelerate the opportunity to
13:58create and apply machine learning
14:01algorithms to just about every facet of
14:03human existence not to overstate the
14:06case but there really is a huge
14:08Renaissance just from that alone coming
14:10so if I know this fascinating in terms
14:13of the evolution of computing but the
14:16question I have is how does this
14:17actually affect people like how does it
14:19change for better or worse their their
14:21work on their work practice today if
14:24people want to really see the global
14:26data they move the data manually from
14:29one storage to another storage to put
14:32them together to analyze it even though
14:35the data could be in memory in a final
14:37storage but because of this manual
14:40process or this process of moving data
14:43around is first of all it's very hard to
14:45manage secondarily the whole process is
14:48very time consuming it could be easily
14:51like weeks or even longer so that makes
14:54things much hotter and data has have
14:57less value so that's another huge issue
15:00we're seeing yeah I mean you need a new
15:02abstraction layer when there's a whole
15:04when there's an old world and you have a
15:06new world and you don't need to be stuck
15:07into this old model of how you in this
15:09case store and be write data but what
15:11does that mean on the design side and
15:12the interface side for people exactly so
15:14the issue today is that data really
15:17stored in different data silos there's
15:19so many different types storage there
15:21are different type of interface
15:22make an application-level very hard to
15:25consume easily that's a big issue think
15:27of virtualization in a computer side we
15:29have a virtualization technology to
15:32really virtualize the compute resource
15:33and to be able to leverage resource more
15:36efficiently also think of Internet
15:38Protocol stack in the middle you have
15:40the IP layer which is really the narrow
15:42waist when you make the innovation in
15:44the upper layer you don't need to worry
15:46about a lower layer so similarly from
15:49the storage ecosystem perspective we
15:52should build a layer to extract
15:54different storage systems and then
15:57present a unique or standard API to the
16:01upper layer with a global namespace from
16:04the user perspective they will be able
16:07to access the data from different
16:10storage systems very easily very easily
16:13and this coupled with the in-memory
16:16technology as far as the smart algorithm
16:19to intelligently move the data will
16:22solve a lot of issues for the for the
16:25users performance issue cost issue
16:27performance cost issue and also the
16:30unification or the silo issues so that's
16:33also a one direction while trying is
16:35this something we're seeing beyond one
16:36company like what's sort of the broad
16:38industry level view of all this there's
16:40certainly a big need for unifying
16:43storage today most organizations have a
16:46hodgepodge a a organically-grown set of
16:51state of storage systems and
16:53applications stretching all the way back
16:54to their mainframes there's an awful lot
16:56of mainframes still out there believe it
16:57or not and when we look at what a
17:00company's real assets are today
17:03oftentimes as I think Peter is pointing
17:05out it's in the data that they have but
17:08the data collectively and in aggregate
17:11in order for you to be able to analyze
17:13it and make predictions out of it you
17:14want a cohesive whole
17:16so solutions that can bring all that
17:18data together in an integrated analysis
17:21and and support actually uh pipelines of
17:25prediction and feedback loops of
17:28analytics and visualization and really
17:30tighten the knot if you will are gonna
17:32be extremely valuable
17:34those that can leverage the existing
17:36infrastructure in the existing data
17:38stores without upsetting the applecart
17:39are the ones that are going to be
17:40adopted for how does it affect the IT
17:42department or the CIO like how should
17:44they think about this shift IT
17:45departments in general are facing a lot
17:48of changes it started with this idea of
17:51cloud and treating their internal
17:53customers as service provider as clients
17:56and seeing themselves as service
17:58providers that's a big shift in culture
18:01and approach and temperament with
18:04infrastructure and the idea of being
18:07able to bring machine learning and
18:09really leverage a company's data sets in
18:12a myriad of ways they have to become
18:15experts in that data in those
18:17applications because at the department
18:20level or division level in the business
18:21analyst side you know you're to find
18:23lots of people know how to use Microsoft
18:24Excel so IT departments are really gonna
18:27be looked at as the point of the sword
18:30in bringing these advances to the
18:33company and not just being seen as
18:36reactive operators of infrastructure on
18:39the back end and that's a big shock I
18:41mean data is the lifeblood of any
18:43organization and so it doesn't matter
18:46whether you're web developer or
18:48consumer products or healthcare
18:49organization fundamentally we are all
18:54becoming data-driven organizations big
18:57data up until right now has really been
19:00a reactive process even querying a
19:02database like I go look at what's
19:04happened in the past and the holy grail
19:07of all computing and the holy grail of
19:09what a CIO ultimately cares about and
19:13what a business cares about is that I
19:15can predict the future
19:16if I know what's gonna happen tomorrow
19:18whether its inventory healthcare finance
19:22and I know what's gonna happen tomorrow
19:24or next week or next month and I can
19:26accurately predict that that is the holy
19:29grail of computing and I'll even
19:31accelerate that it's it's not about
19:33predicting tomorrow for a lot of people
19:35it's about predicting what's gonna
19:37happen next in terms of what the users
19:40gonna click on where do I where's the
19:43car turn you know how does that rocket
19:47sort of thing it's becoming much more of
19:49a real-time proposition we are at the
19:50cus now of moving from historical to
19:54future prediction and rides on the back
19:57of oaring recreated storage for in
20:01memory architectures and machine
20:03learning coming together to provide this
20:05very unique and very interesting
20:08capability that quite frankly has not
20:11happened yet in the history of computing
20:14I can't believe this you've just made
20:15storage sexy again so one last note to
20:18wrap up what comes next well I think we
20:20mentioned the storage class memory
20:23that's coming out Intel 3d crosspoint
20:25and some other folks making a very fast
20:30close to the chip memory that's actually
20:34storage so it'll be faster again than
20:36flash maybe a little bit slower than
20:38today's DRAM but it's gonna fill in that
20:41gap and that's gonna be another
20:43interesting tier storage it's gonna
20:44change a lot of the way computing works
20:46and that's coming out already we're
20:49gonna see that in the next couple
20:50quarters I also would introduce in the
20:52longer term something to think about and
20:55that is there are companies today that
20:57should be from a contractual perspective
21:01perhaps locking in their rights to data
21:04up and down their supply chain
21:06especially with the Internet of Things I
21:07think there's gonna be some companies
21:09kind of surprised in a year to define
21:11that they're not gonna have access to
21:13the data that they're gonna really want
21:14to have that's relevant to what they
21:16need to do to make the predictions to
21:18optimize their business and I'm gonna
21:20call it a kind of data poverty or
21:22paucity and we're gonna find companies
21:24that are forward-looking and data rich
21:26and some companies that have suddenly
21:28discovered they're sitting on the
21:30outside a little bit and data poor yeah
21:33that's especially fascinating because we
21:34talked a lot on this podcast about the
21:36role of data and building businesses
21:37like whether it's data network effects
21:39or data and machine learning startups we
21:41just talked a lot about how data is
21:42increasingly advantage and a lot of
21:44businesses if we can have faster access
21:47data easier management data have a
21:49complete view of the data a smarter way
21:52of analyzing the data I think it's a
21:55very exciting moment and many more
21:58innovation will come out along the way
22:00well thank you guys for joining the a