00:00good afternoon everybody this is Steven
00:02Sinofsky here with the a16 Z podcast
00:05very excited today to have Benjamin
00:08Hyneman of mrs. fear here today and
00:10we're gonna talk about a new concept
00:13that the company's coming out with
00:14called the data center operating system
00:18you know today you know you know apps
00:20they span servers there are things like
00:23Kafka and spark and MapReduce Cassandra
00:26that's super super complex to roll out
00:29these these huge systems in fact the
00:31real challenge of just allocating
00:33resources and figuring things out
00:34reminds me personally of the very early
00:37days of computing when when programmers
00:40were responsible for allocating the
00:42resources of a machine you know if you
00:44wanted to file you sort of wrote your
00:46own file system if you wanted a process
00:48you had to figure out which part of the
00:50CPU to save and store and load and you
00:53know great programmers back in those
00:55days which really weren't as long ago as
00:56people seem to think knew how to squeeze
00:59the most out of a computer by being able
01:02to manually allocate resources you know
01:04my old boss Bill Gates was famous for
01:07how many things he could squeeze into an
01:088-bit byte of basic you know over the
01:11weekend and it's very very important
01:14back then to do that and it the problem
01:16was if you were really good at it your
01:18code became completely unmanageable and
01:20hard to deal with and that turns out to
01:23be a little bit of what's going on today
01:25and in the data center except I think
01:27it's a little bit of the opposite today
01:30you know an enterprise of the big data
01:32center is taking the opposite approach
01:33which is let's just keep buying more and
01:35more resources and use them for special
01:38purposes so I don't have to think hard
01:39about packing more bits into a byte so
01:42to speak and so there's more servers and
01:44more complexity and more VMs you know
01:46you're in this world of like it's
01:47basically one app per server one app per
01:50VM and you know what the problem is
01:52that's simpler but not simple to manage
01:54but it leads to this unbelievable waste
01:57and waste in a data center is is a big
02:00mess eighty-five percent of the
02:02resources go unused and and I think to
02:05me that's where where the data center
02:07operating system really comes in and so
02:12do is just sort of talk about this well
02:15we call it the D cos which is weird cuz
02:17data center is is one word so it really
02:19should be dass but that would take this
02:21podcast a whole different level and I
02:23don't think you know if we think about
02:26traditional operating system as
02:27allocating the CPU and the memory and
02:30the disk in the network all for a single
02:32computer what what is the been what's
02:36the D cos yeah yeah great
02:38so so the data center operating system
02:41consists of a bunch of components and
02:44when you really think about an operating
02:45system it itself consists of a bunch of
02:47components in fact operating systems
02:49have evolved over the years we've had
02:51you know monolithic operating system
02:53Microsoft service based based based
02:56operating systems and what we've really
03:00done with with Mesa spheres data center
03:03operating system is we've created
03:04something that's more like a microkernel
03:07like operating system where at the core
03:10of it is this open source project that
03:11we have called Apache mesos and it's
03:14what's being used at a bunch of
03:15companies like Twitter and and Airbnb
03:17and other companies to actually run run
03:19run their infrastructure and then a lot
03:21of the other what we're calling data
03:22center services which are these these
03:24these software frameworks which run on
03:26top of miso scanned and can take
03:28advantage of may suppose to actually
03:29execute the computations that you want
03:31to do things like Kafka and HDFS and
03:34Hadoop and Cassandra and those
03:38components really make up the core parts
03:40of what makes data center operating
03:43system so you can really think about the
03:44base level is the kernel which is which
03:46is may so it's just like a kernel in an
03:48operating system and then things like
03:50storage something like HDFS which
03:53leverages the kernel meso stew actually
03:55actually provide its storage and then
03:58these these data center services that I
04:00mentioned and and then a really really
04:03key one for us is what we call our
04:05distributed in it D and that's that's a
04:09Linux and not Windows that's true yeah
04:16and are distributed in it D is
04:20what we ship is something called
04:21marathon but there's alternatives to
04:24that just like in fact in the Linux
04:25world today there's alternatives to n8du
04:27a system D you've got a bunch of
04:29different and it D alternatives kind of
04:31like on the on your operating system
04:34today you have many alternative browsers
04:36Internet Explorer Chrome Firefox but we
04:41we we use marathon and and that's kind
04:45of that's that that's the core of how
04:46you end up running a lot of your tasks
04:47because that's that's your your init
04:48system where you describe all your tasks
04:50and then of course to interact with your
04:54operating system you need some kind of
04:55interface what about going back just
04:57before we jump into the interface tell
04:58me like you know when I think about what
05:00an operating system needs to do one of
05:02the things that needs to do is it needs
05:03to like schedule things so they schedule
05:05or yeah yeah so the kernel may so the
05:10core primitives that it really provides
05:11is task management a process management
05:14but task management resource allocation
05:16resource isolation the things the things
05:18you'd expect to get from something that
05:21needs to run multi-tenant lots of
05:23applications at the same time
05:24what's your between a task then in a
05:27so we chose tasks because we didn't want
05:30to overload the the process nomenclature
05:34and a task is just it's the entity that
05:36we use to describe something that we
05:38have launched on some host in the data
05:41center and so it could be a process it
05:43could be a collection of processes but
05:45it's the thing it's the unit that we use
05:47to actually schedule it's the thing that
05:48consumes resources really at the end of
05:50the day so so I have a conceptual
05:53understanding of like a level of
05:55services but how does it actually work
05:57like how do i how do I get all of this
05:59onto a machine on to a data center
06:01what's the what's the mechanism that
06:03everybody's connected up yeah yeah so
06:05it's the bus yeah yeah yeah so so the
06:09way it works is that using one of these
06:11data center services we talked about
06:12that consists of the entire operating
06:14system something like marathon prefer
06:16for running your tasks you would
06:19interface through a marathon you would
06:20ask marathon you'd say hey marathon
06:21launched this task just like you would
06:23tell in addy on Linux hey run this task
06:25when when when you boot up and then what
06:28it does is it uses what we really think
06:31of as kind of a system call interface
06:33Maye sews to get resources allocated to
06:35it and then launch a task so so it says
06:38to me says hey I'd like to run this I'd
06:40like to run a task I need these
06:41resources to get the resources allocated
06:43to it and then it launches the task and
06:45then meso said that that period takes it
06:46takes care of making sure that it gets
06:48the tasks to the right machine the right
06:50host launches the task monitors it
06:53isolates it when it fails it tells the
06:55system that it's failed so it can either
06:56be relaunched whatever it needs to
06:58happen and so that the communication
07:00really is between one of these data
07:02center services like marathon that's
07:04running on top of may sauce and may
07:06sauce which is really providing kind of
07:07this system call the system call API and
07:10when you think about it this is one of
07:13the interesting things about Mei sauce
07:14itself it really is much more like a
07:17kernel anyway you know if you download
07:19meso by itself today it's not really
07:22much you can do with it just like if
07:23you're telling like the Linux kernel I
07:24sell today right great now I got the
07:26kernel what I do I'm not gonna program
07:27code which is gonna do interrupt a tea
07:29so you know do a system call you're
07:32going to use something at a higher level
07:33you're gonna say say bash at a higher
07:35level to launch tasks we're gonna use
07:38some kind of window manager at a higher
07:39level and that's exactly what something
07:41like marathon is provided on top of my
07:43sauce today so so first how do all of
07:46the you know I think of datacenter I
07:48think a rack and I think of all these
07:50boxes how does how does Mises know that
07:54the boxes are part of its resource pool
07:56yeah what's connects them all yeah so on
07:59each individual machine we run an agent
08:01process and and so that process could
08:04could be launched either via a system
08:06image that you would use one of our
08:07system images or if you are using some
08:10more traditional configuration
08:11management software you could use that
08:13to set up here to set up all your
08:15individual machines physical or virtual
08:17and then they all communicate back
08:20through the the mesas master as we call
08:22it the sort of the brain of Mesa which
08:26is responsible for managing all these
08:28machines that have connected through
08:29their agents and then the bus is
08:32basically between those machines and the
08:34Masters themselves cool so then so now
08:37I'm sitting in front of the machine yeah
08:41or of the the cluster or whatever and
08:44how do I know I'm running it like
08:46there you mentioned command line so like
08:48that I'm sort of in my head I have this
08:50now the data center is now like one one
08:52big computer yep and so well I want to
08:54tell it to do something yep what do I do
08:57yeah so so the interface really the
09:00first interface that we've provided is a
09:01command-line interface and so we did
09:04this for a bunch of reasons
09:05so not a card reader okay oh yeah we
09:10made it pluggable we can make that
09:13interface as well but not that it made a
09:15lot of sense for us to actually make
09:16this be really the first interface - or
09:18- to the to the to the DCOs
09:21and and so what you can do is is you can
09:23actually type from from from a terminal
09:25you can type DCOs space and then one of
09:28these data center services that I was I
09:29was mentioning something like marathon
09:30you can say marathon and then you can
09:31give it some information to run a task
09:33you can say like DCOs marathon run and
09:36then the command you want to run and
09:37maybe some extra flag information to
09:39describe how it gets its its artifacts
09:40its you know it's its resources to run
09:42and then you do that and it starts
09:44running and so of course what does that
09:46mean it starts running well it could
09:48mean that you could go to some web
09:49browser if the task that you launched
09:50happen to be a web server but of course
09:54you can also do something with a CLI
09:56PS so you can actually see all the
09:58processes that are running all the tasks
10:00you have running so all the processes
10:02were all the tasks all the tasks yes
10:03right tasks you've immediately like I'm
10:05kind of done with processes yeah so now
10:07I'm looking at a task might be spanning
10:10resource yes yeah so in the the the CLI
10:13today what we have is just just all the
10:15tasks but as we evolved the CLI we'll be
10:18able to drill down so you can see for
10:20this tasks what processes represent
10:22those tasks for those processes what
10:24threads and those processes so you'll go
10:27to see all the resources are actually
10:28being consumed to define because of all
10:30even even in the best cases of single
10:33machine computing at some point for
10:35diagnostics or performance or something
10:37you're gonna actually have to know how
10:40things are done my staff so the fact
10:41that you're using these abstractions
10:43doesn't prohibit a DevOps person from
10:46really knowing what's going on that's
10:48yep and that's that's just the same
10:50today where you know if you just type PS
10:52on say Linux box you just do just see
10:54the processes but if you want you can
10:55really dive in and you can say show me
10:57all the threads for those
10:58so okay so so you sort of describe how I
11:02get a something going like is that do I
11:05install software on it what what do I
11:08think of is like where does where does
11:09the tasks come from yeah yeah so once
11:12once once the Mesa sphere DCOs
11:15software's really installed everywhere
11:16and you want to run other tasks we have
11:20built a repository a registry like
11:23system that allows you to to describe a
11:26task and just kind of like homebrew or
11:28like the the package managers out there
11:31you can say hey I want to install one of
11:33these one of these frameworks one of
11:34these services you can do that it'll
11:36pull down from a repository the
11:38necessary bits of information you can
11:40have it either get installed on the
11:41distributed file system you might have
11:43running something like HDFS or Ceph
11:45which again is is something that's
11:47running on top of the the DC OS and so
11:52you know you can point to where it is
11:53and then you can say hey my init.d you
11:55know my service scheduler go ahead and
11:57now run this service pull it from this
11:58location so you have the bits and go
12:00from there just so folks can have a
12:01clearer view like give me what are some
12:03specific examples of services that
12:05you're that come to mind or tasks that
12:07you would yeah I would really think yeah
12:08so making it really concrete yeah yeah
12:11yeah so at a company like Twitter which
12:14is a big user of of may sews the they've
12:17basically decomposed their architecture
12:19from this monolithic architecture and a
12:21bunch of small services and each of
12:23those individual apps each of those
12:24individual services which is say when a
12:27tweet comes in it's sending out a you
12:30know post to an SMS or it's us you know
12:36hydrating the tweet for other people's
12:38timelines so they so that other people
12:39can see that this tweet has come din
12:40because it should show up each of these
12:42individual services would be the kind of
12:44task and app that you might want to run
12:45and so you could just say to the DCOs
12:47hey I want to run this this this
12:49application I don't care you know where
12:51I want to run it just here's the
12:52information here's the binary needs to
12:53run go your big computer run this some
12:56really big computer so so one of the
12:59things that jumps to mind is is that you
13:01know when I think of an OS I think not
13:03just of like the resource management but
13:06conceptual models for really important
13:08things like one that jumps to mind is
13:11any time you start telling me like hell
13:13by the way codes running anywhere I
13:15start to worry like well if code is
13:17anywhere and I don't know where it is
13:19doesn't that make me vulnerable in
13:20places that I'm not predicting great so
13:22tell me a little bit about how like
13:24something like like isolation or I think
13:25of security and a DCOs model yeah so you
13:28know I think this is a really
13:30interesting topic because what tends to
13:32happen a lot of these organizations when
13:33there isn't some centralized way and
13:35people are thinking about how they want
13:36to do resource management and run their
13:38applications is you get a bunch of
13:42disaggregated you know everyone's doing
13:44it slightly differently yeah so often
13:46times you have worse security because
13:48you know rather than a security team
13:49being able to audit just the one way in
13:51which everything gets to run they have
13:52to audit a whole bunch of different
13:54processes and some people get a little
13:55bit differently and then the worst part
13:57about that is they can't compose right
13:59and this to me is is one of the the
14:01fundamental issues I have with a lot of
14:03distributed systems is because people
14:06are building distribute systems in such
14:07a in such a personalized way and are
14:10personalized for their organization or
14:11their company you can't you can't easily
14:13build a distributed system in one
14:14organization and move it to another
14:16organization and right and security is a
14:18perfect example that you know one
14:19organization uses LDAP so the first way
14:21that they build it in is it hooks into
14:23LDAP and it's so ingrained that they're
14:24gonna do LDAP and another organization
14:26doesn't use LDAP they use some other
14:28mechanisms of authentication or identity
14:32earlier yes like I you always see this
14:34like with like come with when you have a
14:36big giant web presence you have the
14:38company that operates the web server
14:39part yeah and then they went and did
14:41analytics and a completely different
14:43sort of stack yeah and they're figuring
14:45out how to get the access to the logs to
14:47do the analysis yeah
14:48and then no one can either do both do
14:50audit both exactly exactly so I mean
14:53this is one of the biggest drivers of
14:55why we are we're trying to build why
14:57we're building a data center operating
14:59system is because I think in the day
15:00somebody should be able to build an
15:01application against the primitives like
15:03security primitives that could be
15:05provided by it by a data center
15:06operating system and go and run it in
15:08another organization because it's just
15:11an app that you built and you know it
15:13was very interesting at the beginning of
15:14the podcast when you were talking about
15:15the people that wrote the you know the
15:17hardcore applications that that's the
15:20case with distribute systems today so
15:21you know we choake have to have a PhD to
15:23write a distributed system
15:25many PhDs came about showing you how to
15:27write disputes and they went and built
15:29them yeah that's right and and and but
15:31we're at the point now where everyone is
15:33basically building a distributed system
15:35they don't all have PhDs and we want to
15:37be able to build those distribute
15:38systems in one organization run them in
15:40another organization I'm going to do
15:42that in a really really efficient manner
15:44and and and and that like security is a
15:45perfect example of something that if we
15:47can provide the interfaces for doing
15:50security and our distributed systems and
15:52people can build against those
15:53interfaces then we can easily move our
15:55applications across across organizations
15:58so so speed building on the applications
16:00part like one of the the things that
16:02obviously has a huge amount of attention
16:04and excitement right now whether it's
16:06from docker or Core OS is just the
16:08notion of containers yeah so in
16:09listening to you I'm sort of trying to
16:11parse in my head like do i no longer
16:14need containers are you gonna provide a
16:17container that i have to use am i gonna
16:19be able to use containers that i've
16:21where did containers fit in on your
16:23stack yeah that's a great question so um
16:27so meso says use containerization
16:30technologies what we've used to underpin
16:34has used containerization technologies
16:36for a long time since 2009 in fact in
16:392009 we even had Solaris zones support
16:41so we had containerization technologies
16:43so from from from from even even outside
16:45of linux and we've provided that
16:47containerization technology and will
16:49continue to do so so with when people
16:51have created have have used the existing
16:53containers containerization technology
16:55to build new things like docker on top
16:57that's been something that we've been
16:58able to integrate with very very easily
17:00so if you're creating docker images this
17:02is a fantastic thing you can give us you
17:04can give it directly to us we can launch
17:06those those those docker
17:07docker images directly using our
17:08containerization technology and as this
17:10stuff evolves as other companies
17:12introduce new image like formats to
17:15describe the bits you need to run your
17:18containers again this is just going to
17:19be something that we can plug in to our
17:23data center operating system you just
17:24give us bits and we'll run those bits
17:26and if those bits happen to be a docker
17:28cat a docker image or a rocket
17:33app container specification we'll take
17:35those things and we can actually run
17:36them and so but the benefit is of course
17:38so first you can go create your
17:39container however you want to go create
17:41it and then the neat thing is you were
17:43able to deploy it in in a in a
17:45distributed way like where you don't
17:46where you're scaling in a highly
17:48efficient way without really realizing
17:49it yeah and when there are failures they
17:51get rescheduled when and when when when
17:53we want to do even smarter things like
17:55oversubscription because we want to we
17:57want to move that 85 percent unused
17:59resources to say 10% unused resources we
18:03can start to do all that just like an
18:05operating system does for you under the
18:06covers today on set your laptop and you
18:09just gave us you know the binary that we
18:10need to run whether it's a container
18:12image or whether it's or whether it's a
18:14you know some some real binary well so
18:16if I want to take a step back because to
18:19me this is what's so fascinating is that
18:21that what you're really doing is just
18:22changing what I view is the abstractions
18:24of an operating system and you're you're
18:26you're basically directly or by
18:28implication saying wow you know the
18:30abstractions that people deal with like
18:32the notion of having a virtual machine
18:33is just completely wrong and that we
18:35really need a new set of abstractions
18:38and to me what this feels like is is
18:40when virtual memory came out the
18:42abstraction just you blew your mind
18:44because you went from like I literally
18:46personally went from like figuring out
18:48where to put stuff in 640 K of memory to
18:51having having two gigabytes of memory
18:53yeah and and not only that but the
18:55address space was linear so I actually
18:56got to just you know just not worry
18:58about where it went whereas I I spent
19:00the first two years of my career like
19:02swap tuning code so I knew exactly where
19:04in memory it was gonna be and so it
19:06seems crazy to think like that like cuz
19:10aren't a bunch of hardcore people just
19:11gonna say no the problem is if I have a
19:13whole datacenter I'm gonna be better at
19:15organizing what goes where then some
19:17piece of software that doesn't know the
19:18loads the resource needs and why would
19:21why would the DC OS know better than me
19:24I'm a smart PhD yeah no no I I think I
19:29think that's exactly right I think that
19:31that what we're doing is is we're doing
19:34exactly what virtual memory did for for
19:37existing operating systems which is
19:38providing the abstractions so that we
19:41can really really effectively do the
19:42resource management the scheduling
19:44with the failures and I think just like
19:47what you saw in virtual memory there
19:48will probably be a lot of people who
19:50believe that they can do it better but
19:51times going to show that actually we can
19:53start to do far more sophisticated
19:54things and we will be able to do far
19:57better scheduling for utilization for
19:59meeting SLA s4 for serving the customers
20:03yeah and I think to me that that's just
20:06a super important point for folks to
20:07understand because in these kind of
20:09transitions when you're changing
20:10abstraction layers like you tend to
20:12there's this sort of management
20:14retrenching of like wow security is
20:16really important we know how to secure
20:18this so we're gonna stick with it even
20:20though you know and a few percentage of
20:22utilization won't change it even though
20:23the system isn't secure it's just
20:26comfortably insecure that's right yeah
20:28and and like it was great to be
20:31comfortable even though you were failing
20:33and so I I think that like for me that's
20:35the big transition that people gonna
20:37have to just sort of get over their own
20:38perceived expertise yeah and let
20:41computers do stuff that they're good at
20:42that's right yeah and then that's why I
20:44think bring pulling in analogies of the
20:46past is so valuable yes it helps to
20:49people start to realize you know what
20:50maybe yeah this is a good idea so so
20:54who's using it today yeah so the
20:58open-source components that make up a
21:00large part of the DC OS are used by a
21:02large number of companies today some of
21:04the biggest users out there are
21:05companies like Twitter Airbnb HubSpot -
21:09eBay and and paypal are using it for
21:11running things Netflix is using it for
21:13running things some of the smaller
21:15companies without a lot of machines
21:16that's right ya know I mean one of the
21:19great things about the way that that the
21:20software has evolved over the years is
21:23we've we've made it so that it works
21:24well at small scale but it also scales
21:28and it works very very well for for the
21:29large scale skies and as Hardware itself
21:32is starting to evolve in our data
21:33centers and maybe the rack is gonna
21:35start looking less like the rack or a
21:37machine is gonna start looking less like
21:38a machine you really need these levels
21:40of abstraction for both the small guys
21:42and for the big guys yeah it certainly
21:44seems to me that that one of the things
21:46that our operating system brings is it
21:47allows hardware to proceed at a
21:49different pace of innovation and so I
21:52when I look at DCOs I think wow this is
21:55really gonna free a set of people to go
21:57well let's just go replace our servers
21:59with arm servers let's go replace our
22:01networking infrastructure in a certain
22:02way because they'll be able to map those
22:05abstractions up yeah rather than today I
22:07mean you can't once you say it's a VM
22:09running this instruction set that
22:11assumes this level of you're stuck
22:13that's right yeah so a lot of people are
22:15looking at the stack of cloud today you
22:18know or we haven't even used a lot here
22:19because we're really focused on
22:20distributed operating system but you
22:22know and they think of platform as a
22:24service or infrastructure as a service
22:26and so to me like let's assume that this
22:28isn't platform as a service let's take
22:31that let's assume what we understand
22:34pasta V but but isn't you know is is
22:36definitely at this VM server level so
22:39why is this not an is yeah I think yeah
22:42yeah yeah so one of the biggest
22:44differentiators between what we've done
22:46versus what they've done with the
22:48infrastructures infrastructure as a
22:49service space is really try to provide
22:53these abstractions and these primitives
22:55that enable you building new distributed
22:58systems on top and again that's really
22:59what an operating system should be
23:00providing what infrastructure as a
23:02service provides to you is another
23:04machine you know it turns a physical
23:07machine into a virtual machine or maybe
23:09a virtual machine it's first show a
23:10machine when you're running say
23:12OpenStack on ec2 and that does not help
23:15the developer build another system it's
23:18the same primitive it's just kind of
23:19wrapped up and so really what you get
23:22from from something like data center
23:23operating system are the abstractions
23:25and primitives that make it easier to
23:26build new distribute systems and that's
23:28what makes it easier to then move those
23:30distribute systems from one organization
23:32to another organization because that's
23:33the abstraction that everybody has and
23:35they can use those yeah you know I think
23:37that this is super interesting because I
23:38think from a IT leadership and the
23:40enterprise perspective you know right
23:42now we're on the verge where everybody
23:44wants to move to cloud they don't know
23:45what that means and so they're very
23:46quickly virtualizing that servers that
23:50they have laying around and I'm a big
23:51believer that that's just not a useful a
23:53good use of time yeah I think it might
23:55be cost effective in some marginal way
23:56but a cost of moving and the bugs you
23:58introduced and stuff and so I think what
24:00would you say to sort of your typical
24:02enterprise CIO there's not really a
24:04typical but an enterprise CIO is
24:05overseeing a move like like what what is
24:09it that they'll that
24:10should understand about moving to a
24:13missus kind of environment rather than
24:15take this intermediate step of doing a
24:17bunch more VM stuff or better managing
24:19review right right right
24:20yeah I mean I think one thing that's
24:22really really clear is that one of the
24:24nice things about a data center
24:25operating system is that it doesn't
24:27really compete with an infrastructure
24:29service at the end of the day because
24:30it's still about just taking all your
24:32resources whether those resources come
24:34from virtual or physical machines and
24:35using those resources effectively so for
24:38folks that do already have
24:39infrastructure as a service like
24:40deployments there's still a ton of value
24:42in using may sauce in the data center
24:45operating system because you still want
24:46to best take advantage of the resources
24:48that you already have again if you're
24:49just bunch of virtual machines and the
24:51same thing applies why something like
24:53Mesa sauce in the DC OS is still so
24:55valuable in ec2 like environments on AWS
24:57is because again still you want to best
24:59take advantage of all the resources that
25:01you have but for people that are
25:02starting from scratch I think you can
25:04really now start to take a very close
25:06look on whether or not you need to go
25:08through that first level of
25:09virtualization or not and we've had a
25:12lot of reports of people that can go
25:14directly to using something like Mesa
25:16and the data center operating system and
25:18then you don't have to start paying that
25:2030% virtualization overhead for running
25:22your applications which can start to
25:24save a lot of money well because that's
25:26how I sort of think of it as as you know
25:29both our cost savings and then like if
25:31you're gonna go a Greenfield in like if
25:33you're gonna build a new expense app
25:34rather than just virtualize the old
25:36expense app you probably want to build
25:38it because you know it's never gonna use
25:39a whole rack yeah like so why would but
25:42you're gonna probably if you were to go
25:43build it you would dedicate the rack
25:44yeah and then you get all the overhead
25:46of a bunch of VMs and so it seems like
25:49you should just go straight to building
25:51it as a distributed ab and then you'll
25:53have your thousand apps over the next
25:54ten years that get rewritten or all just
25:56gonna squeeze in and use the right
25:57amount of resource yes that's exactly
25:59right so but don't I want to go back one
26:01quick SEC to the platform as-a-service
26:02because to me they're like platform as a
26:05service infrastructure or a service or
26:06sort of almost inherently connected in
26:09an inefficient way yeah like so what
26:11would you say that well oh no we're okay
26:13because we're just going to use you know
26:15a cloud vendors platform right but that
26:17doesn't solve the distributed yeah
26:19no I mean I mean what ends up happening
26:21at the end of the day with platformers
26:23is services again it's so it's that's a
26:25high level attraction on top of em
26:27structure of service what platform as a
26:28service really solves is the fact that
26:29oh great from infrastructure service I
26:31got a bunch of machines now what do I do
26:32it's a platform reserved said okay well
26:34we'll abstract away the machines and
26:36we'll let you just run your tasks your
26:38your processes your apps whatever it is
26:40but but then you just run the processes
26:42and what you really want is you want to
26:43be able to launch those processes those
26:45applications and then you want those
26:46applications to be able to continue to
26:49execute by using the underlying
26:51infrastructure by calling back into
26:53something like the data center operating
26:55system and say hey now I need more
26:56resources or for us to be able to call
26:58into the apps the data center operating
27:00system people call in the absence a hey
27:01this machine is going down for reboot
27:03because it's doing maintenance you
27:05should know about this just like in a
27:06normal operating system we actually did
27:07we actually you know you you do memory
27:10paging and that's the big distinguisher
27:12again between something like
27:13platform-as-a-service
27:14and the data center operating system is
27:15platform-as-a-service about okay here's
27:17an app I run it I go and the data center
27:19operating systems but okay here's an app
27:21I run it and then while that app is
27:23running it uses the data center
27:24operating system to continue to run it
27:26calls back in it uses the system call
27:28API and as that IP I gets bigger and
27:31bigger and bigger it makes a really
27:33really rich environment for programmers
27:34to be able to build really sophisticated
27:35distributed applications one last
27:37question is I mean you just read a lot
27:39of stuff so I'll make it two parts a
27:42where can I get the stuff today yep and
27:44what can I do with it and then be like
27:46like what comes next
27:48yeah and go to Mesa Apache org and and
27:52that's where you can where you can learn
27:54a lot about the kernel itself the mesas
27:56kernel and the new stuff that was a
28:02second yeah yeah this effort was like
28:04well tell everybody now that they've
28:05absorbed all this what's coming next
28:06yeah yeah so the new stuffs the most fun
28:10stuff to me it's really where we start
28:13to take the beginning steps of what it
28:15means to be you know a data center
28:17operating system and take it to the next
28:18level and it means we start to take the
28:20things that historically have been
28:22really really tough to run regardless of
28:25whether or not you've used higher levels
28:26of abstraction like things like passes
28:28or infrastructure as a service like
28:32and we get to start running those things
28:33in a really really really effective way
28:35in the data center that historically
28:38have required a lot of humans to
28:40actually deal with that kind of stuff so
28:41there are two examples I want to give
28:43here two primitives that are being built
28:44that I think are really really cool one
28:47primitive we're building and is this
28:48notion of maintenance so because we have
28:51this this software let layer the kernel
28:53actually running in our data center
28:55operating system when the applications
28:56are running on top we can have it start
28:59to actually deal with maintenance of
29:03things that are happening in your data
29:05center so for example when a machine or
29:06Iraq needs to go offline we can have the
29:09software talk to the other software and
29:11say hey you know what this machine is
29:12going down for repair you should you
29:15know we need to reschedule you or you
29:16should get reschedule you need to move
29:18data let's treat it like it was a
29:19failure but a planned failure that's
29:20right that's right it's a failure but a
29:21plan for that's exactly right and that
29:23this is this is huge because usually the
29:25way this works in most most
29:26organizations is a human walks up to
29:28another human and says hey I'm going to
29:29be taking this rack down what can we
29:31actually do about this we can turn this
29:32into software right and the analogy that
29:35I'd like to give from just try to
29:37traditional operating systems is the
29:40operating systems today would do things
29:41like page out memory but what they do is
29:44they just they just say hey you know
29:45we're gonna use the LRU algorithm we're
29:47gonna page out the least recently used
29:49and that doesn't always work great and
29:50it wouldn't it be better if actually the
29:52operating system could work with the
29:54applications right on top to do smarter
29:55things when it comes to failures or
29:58needing more resources whatever it is
29:59and that I think is like that realm of
30:01things is is to me one of the most
30:03exciting things that were going to be
30:04working on because we get to reimagine a
30:06lot of the basic primitives that existed
30:08for single machines and rebuild them in
30:11a way that makes sense in a distributed
30:12environment and make sense for people
30:15that want to do things in a smarter way
30:17sort of what we've been working with for
30:19a lot a scale that people can only
30:21imagine yeah at a scale that it's it's
30:23already hard enough to do it manually
30:25and so we have to do it in software
30:28based ways and so so we can do that
30:29awesome well thanks so much this has
30:32been a Benjamin Hyneman
30:34from useless fear and I'm Steven
30:36Sinofsky signing off this episode of the
30:38a16 z podcast thanks everybody great