a16z Podcast | The Storage Renaissance

a16z2019-01-02

37 views|5 years ago

💫 Short Summary

The video discusses the evolving significance of storage in computing, including its complexity, role in data protection, and future innovations. It highlights the shift towards memory-class storage, in-memory processing, and machine learning for faster and more efficient data processing. The importance of unified storage systems, standard APIs, and predictive analytics is emphasized. IT departments are adapting to become service providers focusing on real-time data analysis and future predictions. The future of computing involves new tier storage technologies and securing data rights in the supply chain to drive innovation and business success.

✨ Highlights

📊 Transcript

✦

The importance of storage in computing is crucial for databases, analytics, and information retrieval.

01:22

Storage is complex and plays a critical role in data protection and management.

Data has gravity and momentum, requiring a footprint for persistence and security.

The transformative nature of storage is highlighted, suggesting it will revolutionize computing and applications in the near future.

The discussion focuses on the evolving significance of storage in the technological landscape.

✦

Evolution of storage industry with focus on open source and public cloud storage solutions.

03:54

Traditional vendors like EMC and IBM are introducing new storage options to keep up with the changing market.

Future data processing involves handling massive amounts of sensor data, surpassing human-generated data.

Mobile supply chain components are impacting data center storage, leading to more cost-effective solutions and a move towards in-memory data systems.

✦

The future of computing involves memory architecture flattening for fast and cheap data storage.

05:59

This storage renaissance allows data to be stored in memory without the need for disk drives, enabling easier access and processing.

Memory is considered a frontier of storage, with a shift towards memory-class storage and persistent memory.

This shift will revolutionize how computing and storage are envisioned, with data residing in compute devices like IoT devices.

Concerns exist about potential lack of storage capacity in the future, necessitating optimized data management strategies to handle growing data volume.

✦

The importance of processing data at the edge and the need for data curation close to the compute.

08:57

Managing data complexity from various storage systems in enterprises and the efficient connection and consumption of data.

The focus on using data for value generation and decision-making, emphasizing the challenge of leveraging memory in storage systems.

The significance of in-memory processing for enhancing performance and cost-effectiveness.

✦

Importance of machine learning in unlocking new in-memory systems for faster data processing.

12:05

In-memory processing reduces access time to stored information on disks or SSDs, essential for effective machine learning.

Shift towards in-memory data structures is driving a computing renaissance, making supercomputing more accessible.

Spark and in-memory approaches are accelerating development and implementation of machine learning algorithms.

Revolutionizing human existence through technology advancements in various industries.

✦

The importance of creating a new abstraction layer for simplifying data storage and retrieval.

15:52

Manual data transfer between storage systems is challenging and time-consuming, reducing data value.

There is a need to unify storage systems and establish a standard API for easier data access.

In-memory technology and smart algorithms can improve performance, cut costs, and tackle silo problems.

Organizations face difficulties with different storage systems, emphasizing the significance of cohesive data analysis and predictive abilities.

✦

The transformation of IT departments into service providers and experts in data and machine learning.

19:13

Data is becoming increasingly important for organizations, leading to a focus on predictive analytics for future success.

Emphasis is shifting towards real-time data analysis and moving from historical to future predictions.

The combination of storage class memory, in-memory architectures, and machine learning is revolutionizing computing.

Technologies like Intel 3D XPoint are driving rapid advancements in the field.

✦

The future of computing involves new tier storage technologies.

21:55

These technologies have the potential to be faster than flash and bridge the gap between current DRAM capabilities.

Companies should prioritize securing data rights throughout their supply chain, particularly with the rise of IoT, to prevent 'data poverty.'

Data is becoming increasingly essential for business success, driving innovation through faster access, easier management, and smarter analysis.

The speaker anticipates significant advancements in data-related technologies and stresses the importance of being forward-looking and data-rich in the evolving business landscape.

00:00hi everyone welcome to the a6 & Z

00:02podcast I am sonal today's episode is

00:04all about storage with the cost of

00:06system memory decreasing memory for boat

00:09storage and compute will be the exact

00:10same thing so as we enter a new era of

00:13distributed computing and what Peter has

00:15also argued in a popular deck is the end

00:16of the cloud

00:17how does storage evolve how is this

00:19affected by trends in computing such as

00:21machine and deep learning joining us to

00:23have this conversation today our hy CEO

00:25and co-founder of Alexio formerly

00:27tachyon which came out of the UC Berkley

00:29amp lab the birthplace of other industry

00:31defining technologies such as spark and

00:33ms-dos general partner Peter Levine who

00:35is funded memory centric infrastructure

00:36companies at every level of the Berkeley

00:38data analytics deck the badass deck and

00:40Mike Majid senior analyst at Tunisia

00:42group which covers everything related to

00:44Big Data compute and storage okay so

00:47that's the intros to kick things off I

00:49just have to ask why should we care

00:51about storage I feel like it's the dark

00:52underbelly of computing that no one

00:53really cares about look I mean while

00:55storage may be the underbelly without

00:57storage computers wouldn't work and so

01:01it's one of the most important you know

01:04compute networking and storage or the

01:06three fundamental elements of what makes

01:09the entire Internet work it makes cloud

01:11computing work and without storage you

01:13wouldn't have databases and without

01:14databases you wouldn't have Big Eight if

01:16you wouldn't have analytics you wouldn't

01:18have anything because information needs

01:21to be stored and it needs to be

01:22retrieved so storage is hugely hugely

01:24important and you know it the

01:27interesting thing is I think we're in a

01:29very transformative period of time here

01:31where storage is undergoing a bit of a

01:35renaissance and I think it's going to

01:37transform how computing and applications

01:40work in the not-too-distant future when

01:44I got started storage I thought hey this

01:46is really the stayed the the

01:49tried-and-true stuff compute was where

01:51it was at you know there's all these

01:52advantages in advanced is happening in

01:55client server and then cloud and new

01:57chips coming along every year but the

01:59more I got into storage the more I

02:01figured out that storage is really the

02:03most complex part of that equation it

02:05takes a lot of effort to protect data to

02:08manage data data has gravity it has

02:10momentum

02:12it has wait in history so storage is

02:14really the critical piece to get right

02:16wait what do you mean when you say that

02:18data has gravity and momentum data has

02:20to live somewhere you know compute can

02:23be spun up in a cloud it's a little

02:26ephemeral you can repeat it you can spin

02:28up and down virtual machines but data

02:31actually has to have a footprint

02:32somewhere and that footprint has to be

02:34persisted and protected and secured and

02:37of course in this case made accessible

02:39or the data has no valid height but how

02:41is that different than what we have

02:42right now that it requires a new form of

02:44storage I'm gonna use a phrase I say a

02:47lot in the podcast I think it's actually

02:49really true of our times which is when

02:51we say there's a lot more data that's a

02:53difference of degree not just kind why

02:55do we need a different type of solution

02:56why can't we just keep doing the same

02:58things that we were doing before but

02:59just do it bigger and better so I think

03:02this is like many other things in the

03:04world like when you make a cell phone at

03:06the beginning itself and just make a

03:08phone call and now itself is a different

03:10type of cell phone similar thing is also

03:12happening in the storage industry as

03:13well at the very beginning in storing a

03:16block device is the base and bit but

03:18just bits raw data beyond the block

03:20devices we had file system a different

03:23type of file system now we have blob

03:25storage object storage and in the

03:28meantime we have so much innovation in

03:30open source area as well and you have

03:33public cloud storage from several huge

03:35vendors in the world like Amazon Google

03:38Microsoft Alibaba etc you have different

03:42type of storage solution provided by

03:44traditional wonders like EMC like HPE

03:47IBM the providing this new innovations I

03:50think that will pale in comparison to

03:54what's going to happen over the next

03:55let's say decade here when we think

03:58about information and data there's a an

04:02entirely new phenomena that's really

04:04just kicked in relative to what is data

04:07what is data that's a very existential

04:09question well up until right now compute

04:13data has largely been input by some

04:15human being typing on a keyboard or a

04:18database recovering a record that's the

04:21input of a human asking the computer

04:23something

04:24data largely has been put in there

04:26through human fingers and through some

04:28human interaction fast-forward to right

04:31now let's just talk about a self-driving

04:33car that has sensors those sensors are

04:36now inputting data that's the world

04:40around us and so there's completely new

04:42types of data so what is data now far

04:46exceeds the human input data we are now

04:49collecting the world's information via

04:52sensors and all of that needs to be

04:54processed and stored it will be

04:56literally orders of magnitude and in the

04:59exact mathematical sense orders of

05:01magnitude more data that needs to be

05:03stored in process so that's sort of

05:04that's sort of point one on what's

05:08happening secondly a mobile supply chain

05:11is influencing the data center and

05:14influencing the cost curves in the data

05:16center for storage you take a mobile

05:18phone and you take the components of

05:20that mobile phone and put it in the data

05:21center you have a a very inexpensive

05:25storage substrate that is far less

05:28expensive than the enterprise systems

05:29that we saw in the past

05:31and so the cost curves come way down we

05:35will have much more in memory data

05:38systems that literally live in real

05:41memory and the notion of disk drives and

05:44tape drives and even SSDs will all go

05:48away I believe that there's a future

05:50here where memory architecture is

05:53completely flattened computing has been

05:55built on slow cheap and fast and

05:59expensive and I believe that we're gonna

06:02be fast and cheap I mean it sounds like

06:04it'd be obvious but what does fast and

06:05cheap really do for us when it comes to

06:07the so called storage Renaissance fast

06:09and cheap means that we can collect

06:11massive amounts of information put it in

06:14memory not have to put it out to disk

06:17drive and do all these you know

06:18backflips to get data to work correctly

06:20it's going to all be in memory and it

06:23will be very inexpensive and that's the

06:25Renaissance in whether you call it

06:27storage but more importantly data and

06:30the importance of data and the

06:31correlation between the volumes of data

06:35the price curves in in in the

06:38not-too-distant future for what I'll

06:40call storage even though its memory and

06:42those pieces coming together that to me

06:45is the Renaissance that's happening in

06:47computing I totally grew and actually

06:49just add one more points to that is that

06:51we actually should view memory as a

06:54frontier of storage exactly it's a tier

06:57of store exactly I would argue that it

07:00is the tier of store that over time

07:02there is no other storage

07:04it's just memory certainly we've been

07:07seeing the rise of memory class storage

07:09already being talked about by vendors

07:11and bringing persistence to memory will

07:15completely overhaul how compute in

07:18storage is envisioned today because

07:20tomorrow data is going to live in the

07:24compute devices those are going to be

07:25more Internet of Things devices and be

07:27far more distributed as well

07:29but I also want to temper that with the

07:31thought that we've also talked to a lot

07:33of these big storage vendors and they're

07:36forecasting that there just simply isn't

07:38going to be enough storage for all the

07:40data we're collecting in a midterm

07:42horizon like three to five years that

07:44we're creating so much data there won't

07:47be enough chips there won't be enough

07:48hard drives there won't be enough tape

07:50there simply isn't going to be enough

07:51storage out in the world which I do want

07:54to point out means that there's still

07:55some opportunities for things in storage

07:57management people to consider how many

07:59copies of that date am I making do I

08:01have to take the compute the processes

08:03out to where the data lives do I have to

08:06bring the data centralized and make

08:08copies of it or can I do something more

08:10optimized with how I organize my

08:13architecture and only store the data

08:15once only compute data once in one place

08:17to draw a fine point here we are we are

08:20entering a new world of distributed

08:22computing and if you think about the new

08:25world of distributed computing the data

08:27that gets collected in a in a

08:29self-driving car or some endpoint is

08:31going to be prot the information will be

08:34processed at that endpoint it won't be

08:36translated back to a central storage

08:38pool the information will be curated and

08:41then transmitted back or we'll be

08:43collecting massive amounts of

08:45information

08:46self-driving car collects 10 gigabytes

08:48of data a mile write like some

08:51ridiculous amount of data you know

08:53there's not enough storage on the planet

08:55to ever hold all that information so the

08:57curation is going to occur at the edge

09:00close to the compute and the quote

09:03unquote storage will be processed at the

09:06edge and then important information will

09:09come back to some centralized data store

09:12but all that computation at the edge in

09:13the storage of it and kind of the the

09:16permutations of it is exactly the

09:18Renaissance that I believe needs to

09:21happen in storage even to process this

09:23stuff so another huge issue here is that

09:25it may the ecosystem much more complex

09:29than before all the big enterprise

09:31companies it will try different

09:33innovations they will have their

09:35existing storage and past storage new

09:38storage formed very complex systems and

09:41makes this hard to manage how to consume

09:44and the many cases is not cost-effective

09:47as well and this is one thing we're

09:51seeing requested by the customers many

09:54big enterprise in the world is that how

09:56connect consume and and these data from

09:59different storage systems easily and

10:01manage them efficiently so this just

10:04because they have a hodgepodge of like

10:05all these different storage systems or

10:08is it that it's just buried in the same

10:09place but under a bunch of different

10:10interfaces and tools or like what's the

10:12problem really the data is stored in

10:14different storage systems just give you

10:16a very concrete example if you talk to

10:18this department their data is stored in

10:20you know public cloud storage maybe in

10:22probably Amazon s3 or Google Cloud

10:25storage and another department they have

10:27some data stored in the EMC storage HPE

10:30storage you have another department say

10:33they have my own private cloud storage

10:35another group they want analyze the data

10:37inside the enterprise in the end of the

10:40day why people want data people want to

10:42use data to generate values which means

10:45use data to make decision or facilitate

10:48making decisions the more data you have

10:49you've analyzed the best result normally

10:52you get so this existing environment is

10:54very hard we trying to tackle or

10:57leverage

10:58taking memory as a first-class citizen

11:00in a storage in a storage system who

11:02have a memory century architecture and

11:04viana how to manage the data or how to

11:06access the data from different storage

11:09systems in the most effective manner

11:11what's pausing that for a quick moment

11:12why does in-memory aspect matter so one

11:14one side is of course about performance

11:17and performances memory is much faster

11:19than than SSD or HDD and from the other

11:22perspective at a cost the cost is

11:25decreasing very fast it's about every 18

11:28months the cost if decreased by 50% so

11:32that 2 points performance plus the cost

11:35which regular capacity that 2 points

11:39together MIT is MIT now is the right

11:42time to build memory as a tier of

11:46storage I think that machine learning is

11:49the application that unlocks much of the

11:52new in memory systems it's not so I mean

11:54machine learning is the next generation

11:57of big data and what is machine learning

11:58it's iterations over large large large

12:01data sets to come up with better ways to

12:05forecast and better ways to utilize

12:07information the only way so in order to

12:11unlock the power machine learning in a

12:13time-sensitive fashion is to actually do

12:16these computations in memory because if

12:18you have to what's called go go out to

12:22the disk drive to get information or go

12:24out to an SSD the time to seek for that

12:28information and look for it is a huge

12:30penalty when you're dealing with massive

12:32amounts of information to the extent

12:34it's all in memory I can operate on it

12:37very quickly and do many more iterative

12:39sets in a shorter time frame giving the

12:43results that might be needed in a

12:45machine learning or AI environment

12:47unless we get to in-memory processing

12:49and in-memory data structures machine

12:52learning doesn't really work yeah and so

12:54we have to come up with these ways of

12:57having much more in process high

12:59fidelity storage that is this new tier

13:02and this exact is you know this is

13:04exactly what's causing what I would

13:06argue this renaissance to occur

13:09over the next several years here I'll

13:11just add that supercomputing a couple

13:14years ago was really inaccessible to

13:16most people and in a super queueing

13:18environment every node is highly

13:20networked to every other nodes because

13:23they needed to communicate State and

13:26information between them when you have

13:27Hadoop and MapReduce they could

13:30partition certain categories of problems

13:33and run them in parallel but not really

13:36a lot of machine learning algorithms

13:38they just didn't don't work that way

13:40they require more of that

13:42interconnectedness and that

13:44communication between nodes and between

13:46memory and data sets are lots of

13:48iterations on the same data so spark and

13:52in-memory approaches to machine learning

13:54really accelerate the opportunity to

13:58create and apply machine learning

14:01algorithms to just about every facet of

14:03human existence not to overstate the

14:06case but there really is a huge

14:08Renaissance just from that alone coming

14:10so if I know this fascinating in terms

14:13of the evolution of computing but the

14:16question I have is how does this

14:17actually affect people like how does it

14:19change for better or worse their their

14:21work on their work practice today if

14:24people want to really see the global

14:26data they move the data manually from

14:29one storage to another storage to put

14:32them together to analyze it even though

14:35the data could be in memory in a final

14:37storage but because of this manual

14:40process or this process of moving data

14:43around is first of all it's very hard to

14:45manage secondarily the whole process is

14:48very time consuming it could be easily

14:51like weeks or even longer so that makes

14:54things much hotter and data has have

14:57less value so that's another huge issue

15:00we're seeing yeah I mean you need a new

15:02abstraction layer when there's a whole

15:04when there's an old world and you have a

15:06new world and you don't need to be stuck

15:07into this old model of how you in this

15:09case store and be write data but what

15:11does that mean on the design side and

15:12the interface side for people exactly so

15:14the issue today is that data really

15:17stored in different data silos there's

15:19so many different types storage there

15:21are different type of interface

15:22make an application-level very hard to

15:25consume easily that's a big issue think

15:27of virtualization in a computer side we

15:29have a virtualization technology to

15:32really virtualize the compute resource

15:33and to be able to leverage resource more

15:36efficiently also think of Internet

15:38Protocol stack in the middle you have

15:40the IP layer which is really the narrow

15:42waist when you make the innovation in

15:44the upper layer you don't need to worry

15:46about a lower layer so similarly from

15:49the storage ecosystem perspective we

15:52should build a layer to extract

15:54different storage systems and then

15:57present a unique or standard API to the

16:01upper layer with a global namespace from

16:04the user perspective they will be able

16:07to access the data from different

16:10storage systems very easily very easily

16:13and this coupled with the in-memory

16:16technology as far as the smart algorithm

16:19to intelligently move the data will

16:22solve a lot of issues for the for the

16:25users performance issue cost issue

16:27performance cost issue and also the

16:30unification or the silo issues so that's

16:33also a one direction while trying is

16:35this something we're seeing beyond one

16:36company like what's sort of the broad

16:38industry level view of all this there's

16:40certainly a big need for unifying

16:43storage today most organizations have a

16:46hodgepodge a a organically-grown set of

16:51state of storage systems and

16:53applications stretching all the way back

16:54to their mainframes there's an awful lot

16:56of mainframes still out there believe it

16:57or not and when we look at what a

17:00company's real assets are today

17:03oftentimes as I think Peter is pointing

17:05out it's in the data that they have but

17:08the data collectively and in aggregate

17:11in order for you to be able to analyze

17:13it and make predictions out of it you

17:14want a cohesive whole

17:16so solutions that can bring all that

17:18data together in an integrated analysis

17:21and and support actually uh pipelines of

17:25prediction and feedback loops of

17:28analytics and visualization and really

17:30tighten the knot if you will are gonna

17:32be extremely valuable

17:34those that can leverage the existing

17:36infrastructure in the existing data

17:38stores without upsetting the applecart

17:39are the ones that are going to be

17:40adopted for how does it affect the IT

17:42department or the CIO like how should

17:44they think about this shift IT

17:45departments in general are facing a lot

17:48of changes it started with this idea of

17:51cloud and treating their internal

17:53customers as service provider as clients

17:56and seeing themselves as service

17:58providers that's a big shift in culture

18:01and approach and temperament with

18:04infrastructure and the idea of being

18:07able to bring machine learning and

18:09really leverage a company's data sets in

18:12a myriad of ways they have to become

18:15experts in that data in those

18:17applications because at the department

18:20level or division level in the business

18:21analyst side you know you're to find

18:23lots of people know how to use Microsoft

18:24Excel so IT departments are really gonna

18:27be looked at as the point of the sword

18:30in bringing these advances to the

18:33company and not just being seen as

18:36reactive operators of infrastructure on

18:39the back end and that's a big shock I

18:41mean data is the lifeblood of any

18:43organization and so it doesn't matter

18:46whether you're web developer or

18:48consumer products or healthcare

18:49organization fundamentally we are all

18:54becoming data-driven organizations big

18:57data up until right now has really been

19:00a reactive process even querying a

19:02database like I go look at what's

19:04happened in the past and the holy grail

19:07of all computing and the holy grail of

19:09what a CIO ultimately cares about and

19:13what a business cares about is that I

19:15can predict the future

19:16if I know what's gonna happen tomorrow

19:18whether its inventory healthcare finance

19:22and I know what's gonna happen tomorrow

19:24or next week or next month and I can

19:26accurately predict that that is the holy

19:29grail of computing and I'll even

19:31accelerate that it's it's not about

19:33predicting tomorrow for a lot of people

19:35it's about predicting what's gonna

19:37happen next in terms of what the users

19:40gonna click on where do I where's the

19:43car turn you know how does that rocket

19:46land on its legs

19:47sort of thing it's becoming much more of

19:49a real-time proposition we are at the

19:50cus now of moving from historical to

19:54future prediction and rides on the back

19:57of oaring recreated storage for in

20:01memory architectures and machine

20:03learning coming together to provide this

20:05very unique and very interesting

20:08capability that quite frankly has not

20:11happened yet in the history of computing

20:14I can't believe this you've just made

20:15storage sexy again so one last note to

20:18wrap up what comes next well I think we

20:20mentioned the storage class memory

20:23that's coming out Intel 3d crosspoint

20:25and some other folks making a very fast

20:30close to the chip memory that's actually

20:34storage so it'll be faster again than

20:36flash maybe a little bit slower than

20:38today's DRAM but it's gonna fill in that

20:41gap and that's gonna be another

20:43interesting tier storage it's gonna

20:44change a lot of the way computing works

20:46and that's coming out already we're

20:49gonna see that in the next couple

20:50quarters I also would introduce in the

20:52longer term something to think about and

20:55that is there are companies today that

20:57should be from a contractual perspective

21:01perhaps locking in their rights to data

21:04up and down their supply chain

21:06especially with the Internet of Things I

21:07think there's gonna be some companies

21:09kind of surprised in a year to define

21:11that they're not gonna have access to

21:13the data that they're gonna really want

21:14to have that's relevant to what they

21:16need to do to make the predictions to

21:18optimize their business and I'm gonna

21:20call it a kind of data poverty or

21:22paucity and we're gonna find companies

21:24that are forward-looking and data rich

21:26and some companies that have suddenly

21:28discovered they're sitting on the

21:30outside a little bit and data poor yeah

21:33that's especially fascinating because we

21:34talked a lot on this podcast about the

21:36role of data and building businesses

21:37like whether it's data network effects

21:39or data and machine learning startups we

21:41just talked a lot about how data is

21:42increasingly advantage and a lot of

21:44businesses if we can have faster access

21:47data easier management data have a

21:49complete view of the data a smarter way

21:52of analyzing the data I think it's a

21:55very exciting moment and many more

21:58innovation will come out along the way

22:00okay

22:00well thank you guys for joining the a

22:026nz podcast

🎥 Related Videos

a16z Podcast | Things Come Together -- Truths about Tech in Africa

a16z Podcast | The Infrastructure of Total Health

The Robot Lawyer Resistance with Joshua Browder of DoNotPay

a16z Podcast | Bots and Beyond

Design Sprints as a Tool for Organizational Change

a16z Podcast | Valuing Today's Fast-Growing Software Companies

🔥 Recently Summarized Examples

Former Priest REVEALS Jesus' MYSTICAL Lost Years & His Connection to BUDDHA! | Fr. Seán ÓLaoire

Kim Kardashian's Plastic Surgery Reversal: Is She Trying to Rewind Time?

How To Succeed As A NEW & YOUNG Realtor [Deals Every Month + Luxury Listings]

BITCOIN EMERGENCY: NEXT PRICE TARGETS REVEALED!! Bitcoin News Today & Ethereum Price Prediction!

Uncovering Ancient Atlantean Ruins: Exploring Evolutionary Pathways and Psychic Phenomenon

Samsung Technician Knives TV To Void Warranty

View original video