00:00welcome to the a 16z podcast I'm Michael
00:02Copeland and we are here today with
00:04board partner Steven Sinofsky Steven
00:06welcome hi there and the CTO and
00:10co-founder of our latest investment
00:12tinium orion and ah we welcome thank you
00:15all the way from Berkeley California
00:17that's right yeah we like to represent
00:20the East Bay every now and again so I
00:22wanna really dig in you guys have built
00:24this incredible enterprise technology
00:27and so I want you to describe it for us
00:29but let's maybe start by describing the
00:33big problem that you and and your
00:34co-founder is your father but that you
00:36and your team Atheneum had been
00:38addressing for years really yeah so I
00:41mean we essentially found in taenia m--
00:44with the co-founders that we originally
00:46founded bigfix with and so bigfix was
00:48founded in 1997 a technology that
00:51essentially allowed people that had
00:52really large-scale environments to
00:54assess the state of those environments
00:55so look at the endpoint see what was
00:58running on them you know which users
01:00were logged in and where they were and a
01:02variety of other characteristics and
01:03what we kept on seeing a big fix was
01:05that you know around 2004-2005 customers
01:08were really struggling to get those
01:10systems to work fast enough so if you
01:13look at the problems that they were
01:14trying to address it was kind of a
01:15nascent area of apt and you know much
01:18faster attacks and they were seeing a
01:21lot of outages that were happening they
01:23were starting and ending in the span of
01:25minutes instead of the span of hours or
01:26days Sam so these attacks were coming
01:29from Internet connected endpoints by
01:32malicious people yeah I mean essentially
01:35people were realizing that this wasn't
01:36just you know the script kids anymore it
01:38really was actually that nation-state
01:41attackers and professional attacking
01:43organizations were realizing that there
01:44was a lot of value behind those
01:45firewalls and so they were coming in
01:47exfiltrating data that was extremely
01:49valuable and the customers were just
01:51unable to respond quickly enough and so
01:53they were coming to us and essentially
01:55asking us figure out a way where you can
01:58make your systems much much faster and
01:59we essentially realized that we couldn't
02:01do that using the topology that we'd
02:03created we had to throw everything away
02:05and start from scratch and so we in 2007
02:09founded Taney I'm around the principle
02:11that we needed to make things not
02:12you know ten times faster but 10,000
02:15times faster instead of gathering data
02:16in hours or days from really large
02:18networks we need to be able to do that
02:20in seconds and so to do that we
02:23essentially took a team of engineers put
02:25them in a room and asked them to start
02:27from fresh principles really not assume
02:30anything about how we were gonna
02:31approach this and what we did was
02:33actually completely refactor the entire
02:35topology of how people were collecting
02:37data the real kind of structure about
02:39how it was done and as a result of that
02:41even at five hundred thousand seat
02:42networks now for the first time we can
02:45get 15-second old data so Steven I want
02:47you to tell us how you saw the problem
02:50both you know as a person who looked out
02:52across large enterprises but what he was
02:55describing how did you view it from your
02:57end well sure I like it it's it's really
02:59I mean folks out there who manage our
03:01enterprise networks of the world they
03:02know this problem you know super well
03:05that that like you know you run these
03:07logon scripts that do inventories of pcs
03:09you run these very heavy client-server
03:12systems where you've got is you know -
03:14you know inventory the network you've
03:16got a bunch of sequel servers running
03:18and management tools and you know as an
03:20end user you know that like they're
03:22doing an inventory because that day you
03:24log on to your machine and all of a
03:25sudden it's like ten minutes before you
03:27can actually do any work and it usually
03:29happens at the worst possible time and
03:31then when all of that said and done like
03:33both the network isn't particularly
03:35secure and the information isn't really
03:37that accurate I actually remember I was
03:39I once was in a big briefing with a
03:41really really giant government customer
03:43in the United States and the guy was
03:45giving me a hard time about how
03:46difficult it was to monitor the network
03:48he's like he said to me look I have
03:50between 150 and 300 thousand pcs in my
03:52network and I looked at them and I was
03:55like well do you have a hundred and
03:57fifty or 300,000 I mean if your job is
04:00to count them right like that's that's a
04:03really big difference and then I
04:05actually learned a lot and this was a
04:06long time ago and actually the state of
04:08the art hasn't really changed like big
04:09fix is the state of the art and for a
04:11network that size you're looking at
04:13three four five day turnaround and by
04:15the end of the five days you know think
04:17of it you know like how many employees
04:18quit how many machines got thrown away
04:20how many machines got bought at Best Buy
04:22that week and the number is just
04:24completely out of whack
04:25and so a system bigfix which which was
04:27later bought by IBM correct was built in
04:31a world that didn't look anything like
04:33the world we operate within today that's
04:37yes I mean essentially if you look at
04:39those tools they were built 20 years ago
04:40to solve a completely different problem
04:42the problem was that there were these
04:44untargeted attacks like slammer and
04:46blaster that were affecting
04:48vulnerabilities that were patched and he
04:50just needed to get the patches out
04:52because no one was being individually
04:53targeted no organization was getting
04:56special attacks that were tailored to
04:57them or very few today what our
05:00customers are worried about our
05:02professional organizations or
05:03nation-states that are not only
05:05attacking just them specifically they're
05:08often cueing their attacks that they're
05:10gonna do against the solutions that
05:12those organizations have deployed so
05:14they're as sophisticated as you could
05:16possibly get and what we're seeing is
05:17you're trying to essentially screw in
05:19screws with a hammer right right
05:21that was a tool developed for a
05:22completely different problem and now
05:23you're trying to apply it to this and
05:25antivirus falls in that category
05:26firewalls fall in that category you know
05:29they just are not effective against the
05:31insider threat and advanced persistent
05:33threat that our customers are facing you
05:35know today what we what we what
05:37customers are seeing is you know they've
05:39got a bunch of tools that the systems
05:40management people use to to inventory
05:42and maintain and deploy patches and
05:44updates and software to machines and
05:46monitor performance and then a bunch of
05:48tools the security people do and both of
05:50those have state-of-the-art the security
05:52people are building taller and taller
05:53walls thicker and thicker walls and
05:56trying to close off the doors and things
05:57like that and the management people are
06:00just trying to keep track of what's
06:02going on but what's happening is now
06:04what when you you're hit with an attack
06:06it's usually through a sequence of very
06:08benign things that look ok to the
06:11network like somebody logging in
06:13somebody reading a file
06:14somebody you know installing a piece of
06:16software it looks pretty benign until
06:19something bad happens in which case then
06:20it just looks like a flaky PC mm-hm and
06:23so what's happening is the system's
06:24managing people now are sort of the
06:26frontline of when vulnerabilities happen
06:28but they use a different set of tools
06:30and their tools like it takes like 2
06:32weeks to figure something out and then
06:33the security people it takes like 3
06:35weeks before that particular one
06:38a pattern that they can go find out and
06:40so we're taenia really comes in is is
06:42now you're just talking about looking at
06:44a network of hundreds of thousands of
06:45nodes and being able to ask anything you
06:48want of that network and get an answer
06:49back instantly so I want to I want to
06:53describe Stephen if you can win tinium
06:56when Orion and his team came in to demo
06:59this you obviously have vast experience
07:02in the advanced on the table but what
07:07happened what did you see and then what
07:09well it's fast you know a lot of folks
07:11here at Andreessen Horowitz are coming
07:13from a very deep operational background
07:15many of them were members of the ops
07:16wear team and of course mark and Ben
07:17created the company and so you're
07:20basically sitting around the table and
07:21we think that there's like 200 odd years
07:24of large-scale enterprise software
07:25management experience and so Ryan comes
07:28in pops open his laptop go opens up a
07:30browser hits a bookmark and starts
07:33typing in questions like show me all the
07:34machine names on the network show me how
07:37many machines are leaking network
07:38leaking a network packet data show me
07:41the md5 hash of all the processes of all
07:43the machines running on all of this
07:45network and we kind of were all thinking
07:47independently we only realized this
07:48after that oh that's a pretty neat
07:50little mock-up of the product that if we
07:52fund them they hope to go build because
07:55there was no conceivable way that he
07:57could be doing this to a network and
07:59then he I think he kind of looked
08:01puzzled you look cases so this is a live
08:03network of of several thousand nodes
08:06running at a HIPAA compliant hospital
08:09like this is this is a real system
08:12running and they're in production this
08:14is not a test it's not a mock-up it's
08:16not a simulate and we just were all
08:18scratching our heads and we literally
08:19couldn't believe that what he was asking
08:21was really possible it was fast actually
08:24like and then we started talking more
08:26and so then I started trying to play
08:27like let's stump Orion with questions
08:29about about networks and so my favorite
08:31vulnerability from the window side was
08:33always the plug in a USB memory stick
08:36and have a virus you know scamper across
08:38the network so I I crossed my arms and I
08:40leaned back and I said I want to know on
08:43this network how many of the pcs have a
08:46USB memory stick in them and are
08:48currently writing to it and I thought
08:51like a joke and literally in 15 seconds
08:54was the list of the machines that were
08:55currently writing to USB memory on the
08:57network and and Orion again this
09:0015-second response time I mean what's at
09:03stake in those 15 seconds and how under
09:06the hood to the excel you can explain to
09:09us what's happening sure so you know
09:12really the fundamental problem with
09:14existing solutions that we see is that
09:16when you're querying them you're just
09:17querying a database and the database is
09:20being filled on the back end by clients
09:22that are polling once every few hours or
09:23a few days what we're actually doing is
09:26reaching out and touching every endpoint
09:27so synchronously because you're asking a
09:29question we're actually asking them a
09:31question they evaluate a piece of data
09:33and then they come back to you with it
09:35and we've created this topology the ring
09:37architecture that we have which
09:38essentially allows clients endpoints to
09:41aggregate data on the land before they
09:44send back one answer back from the whole
09:46land that represents all of the machines
09:49that are out there so if you think about
09:50a branch environment you think about the
09:52largest scale retail environments or
09:54banking environments they may have
09:55thousands of different lands in each
09:58branch and then they may have thousands
10:00of machines that are in these you know
10:03core environments and what we're
10:04essentially doing is automatically
10:06constructing these groups based on
10:09proximity of the machines so that they
10:11know they can talk to each other because
10:13they're close to each other and
10:14essentially we've created a mechanism
10:16where they can automatically aggregate
10:18data when a question is asked and send
10:20back one message across the land that
10:23contains all the data that you were
10:24asking for so really efficient on the
10:26way in and you don't need to have a ton
10:28of infrastructure to back it up so to
10:29Stephens point where you know oftentimes
10:32you guys are asking queries and in
10:34current systems and and things slow way
10:37down I can't get anything done what's
10:39the sort of the load on the system when
10:41when queries are coming through taenia
10:43yeah 0.1% CPUs a 0.1% CPU on a tiny
10:47little runtime that's what like a mega
10:49mega installer seven Meg of RAM 10 Meg
10:52on disk so the beauty of this whole
10:54thing is it's so lightweight you can
10:55actually put it in VMs we have people
10:58who are putting it on process
10:59controllers and ATMs and point of sale
11:01devices and the heart rate monitors I
11:03mean devices that a lot of people wish
11:05were not computers so you look at the
11:07target attack you know a lot of people
11:09don't think of a point-of-sale device as
11:10a computer or didn't until they realize
11:12it's an existential threat to their
11:14organization not to right and so in
11:17these massive systems and Steven oh I
11:19want you to answer this like what
11:21happens if I don't know what questions
11:22to ask I mean like when things look
11:24benign they look benign how do I know
11:27how do I sort of uncover well yeah this
11:30this is what's really going on in the in
11:32the world of systems management and
11:34security right now your your what's
11:36happening is you've got all of these
11:37tools sort of firing off like this might
11:39be a problem this might be a problem we
11:41call these i/o sees these indicators of
11:42compromise right and and if you look at
11:45one of them and you know if you're an IT
11:47pros of these if you look at them it's a
11:49series of you're looking for this file
11:51for this process this Windows registry
11:53key change you know this this little
11:55piece of software ended up on a Mac and
11:57you actually have all of these and the
12:00problem is that today and that's also
12:02the collective knowledge of the Internet
12:03like like everybody is contributing to
12:06these there's feeds of I OCS you could
12:07subscribe to them but you can't do
12:10anything about it because all you could
12:12do is query the the databases all Ryan
12:13mentioned and find out if maybe three
12:15weeks ago I might have had a machine
12:16that met that pattern and and of course
12:19none of these attacks are last even that
12:20long and so what what you can do now is
12:23you can actually use these as inputs in
12:25to entertain iam and and actually answer
12:28them and that's sort of the the
12:29fundamental thing I mean you really have
12:31to get your head and one of the reasons
12:32that we get so excited about tinium as
12:34an investment for for andreessen
12:36horowitz was that these guys invented an
12:38incredibly cool technology and and the
12:41way that they deployed it it you know we
12:43mesh networking has been around for a
12:45long time peer-to-peer networking around
12:47for a long time and most of those hadn't
12:49really reached any any critical uses in
12:52the enterprise and they have this
12:53amazing insight that if you could walk
12:55up to any computer knowing the answer to
12:58one of these questions is sort of like a
12:59constant time operation is the registry
13:01key there or not that takes no time the
13:04problem was if you had two hundred
13:05thousand nodes and you had two
13:07hub-and-spoke ask them all of that
13:09question you'll never finish whereas if
13:11you can just get a message broadcast to
13:13all of them answer this question on your
13:15and then just share the answer with the
13:17computer next to you it turns out you
13:19can actually do that instantly and and
13:22so it's sort of to me one of the very
13:23first commercial and commercially viable
13:26uses of mash and peer-to-peer networking
13:29and that and I don't want to use those
13:30terms because they get a little bit
13:31loaded but they actually invented a
13:33specific implementation they called
13:35linear peer-to-peer networking and you
13:36could read more about it on the blog
13:37post that talks about you know the cool
13:39way that they brought these technologies
13:41together so give us an example um
13:43heartbleed was something that caused
13:45many hearts to bleed how did something
13:48like that in your environment you know
13:51crop up and then get handled yeah so
13:53it's actually a really interesting
13:54question our customers literally could
13:56ask an english-language question tell me
13:58all the machines and versions of OpenSSL
14:00for every machine that's got open SSL
14:02across the environment and get an answer
14:04back in 15 seconds they didn't have to
14:06create a script they didn't have to wait
14:08days and we still are getting notices
14:10from companies that they're realizing
14:12that they're affected by heartbleed
14:13right it's been a long time since that
14:17first announcement came out and yet
14:18people have tools that are so broken
14:20they can't tell that for months
14:22potentially and what tinium customers
14:25were able to do is literally ask you a
14:26question in English and be able to see
14:28exactly where they were affected and
14:30then you know another thing that a Neum
14:31can do is allow you to actually fix
14:32things so quarantine machines turn
14:35firewalls on stop services that were
14:37affected I mean our customers were in
14:39full triage mode and they weren't
14:41looking at cycles of triage that took
14:43weeks or months they could do it in
14:45seconds and confirm that they were
14:46actually doing what they did intended
14:47and see where the gaps were yeah that
14:49that was sort of one of the most
14:51fascinating things that we learned and
14:52looking at a Neum was that what
14:55heartbleed happens and you know the most
14:57obvious first question that every every
14:59CEO all of a sudden needed to do was
15:01please you know IT folks tell me how
15:03many machines we have affected and if
15:05you remember what was happening at that
15:06moment companies were issuing press
15:07releases about about heartbleed and they
15:09were saying we are looking into it and
15:12you're like well that's not a really
15:13good answer for like and it turns out
15:15for days most companies had no idea
15:18whether they were affected now now Orion
15:20told us well there are all their
15:21customers realized they were all
15:22affected like everybody and you know you
15:24think of a large enterprise you've got
15:26code written by vendors you're running
15:28offsite you have branch offices that
15:30might be using a different product that
15:31you don't know about in the central
15:33office and so really those press
15:35releases were accurate they didn't know
15:37and so you put what but it's an
15:40existential thing you know every CEO of
15:44every major corporation is now
15:45effectively an IT person because every
15:48company in software eats the world is
15:49really a software company so I want to
15:51get that like how you know this problem
15:54that you've been working on for years
15:55now you know with big fix and now obtain
15:57iam how does that describe the
15:59architecture that you've you going
16:00forward what does the future look like
16:01and I want both of you to answer this
16:03and and and how do then we tackle these
16:06things as scale gets bigger and bigger
16:07and more and more complex so that's the
16:10scary thing right people think hundreds
16:11of thousands of machines is a lot it's
16:14not gonna be a lot coming soon right IOT
16:17and people going in embedding chips and
16:20lightbulbs means we need to be scaling
16:22to potentially billions of devices and
16:24being able to assess them for telemetry
16:26and state and you know the hub-and-spoke
16:28model we already know is broken at
16:30hundreds of thousands we don't even want
16:32to talk about millions or billions right
16:34we need a fundamentally new architecture
16:36and so what we see is the possibility to
16:39embed this you know ring and linear
16:41peer-to-peer communications model into a
16:44myriad set of devices that some of which
16:46are going to be very lightweight right
16:47we're looking at watches and light bulbs
16:49some of which are going to be very heavy
16:51like servers and have a you know a
16:53language that all of these devices that
16:55have computability and that also have
16:57telemetry data on them should be able to
16:59speak with each other so that they can
17:01gather data about you know heat and
17:03power and location and you know more you
17:08know complicated things like which
17:09applications are running and what the
17:12workloads are and be able to aggregate
17:13those in real time and we believe that
17:15essentially fundamentally if you don't
17:17have real time data you're basically
17:18always playing whack-a-mole right if
17:21it's really really old data if it's days
17:23or weeks old data it's probably
17:25completely useless and if it's even
17:28you're subtly wrong and what we believe
17:31is that all data is gonna have to move
17:32toward real time and we believe that
17:33that's possible with us yeah I mean
17:36that's fundamentally what's so exciting
17:38about about the future of obtain iam is
17:40that they've developed an innovative
17:41architecture and and a really creative
17:45and inventive approach to how you can
17:47really scale in a unique way and it's
17:49super clear that down the road that when
17:52you have a billion a billion devices or
17:54endpoints that they're all still gonna
17:55that they're gonna be near each other in
17:57these clusters and so that communication
17:59technique and that you know that's so
18:00different than hub-and-spoke is a huge
18:02asset going forward the last thing I
18:04just wanted to mention that's super cool
18:06about the product is it's effectively
18:07one giant API and so although you can go
18:10as a browser and go and and access it
18:13through this natural language interface
18:14you can also just use the API build your
18:17own model for how you want to ask
18:18questions of the network and model them
18:20and deploy tools and and charts and
18:22graphs and dashboards that are
18:24constantly and in real-time monitoring
18:26your network well Ryan thanks for coming
18:29by Steven thanks as always awesome thank