a16z Podcast | When Large Scale Gets Really Massive -- Managing Today’s Enterprise Networks

a16z2019-01-02

81 views|5 years ago

💫 Short Summary

Atheneum was founded in 2007 to address the need for faster systems in response to increased cyber attacks. They refactored their data collection approach to make systems 10,000 times faster against modern threats. Traditional tools like BigFix are inadequate for targeted attacks from nation-states. New technology like Stephen's platform allows for instant network-wide information retrieval and analysis, impressing individuals at Andreessen Horowitz. The innovative system can quickly identify all machines on a network writing to USB sticks, providing real-time data on network activity. Mesh and peer-to-peer networking were used to efficiently address security vulnerabilities like Heartbleed. CEOs need to understand IT issues as technology advances, and real-time data processing will become crucial for handling security threats. Obtain IAM offers unique scaling capabilities with a focus on clusters rather than hub-and-spoke communication for real-time network monitoring.

✨ Highlights

📊 Transcript

✦

Atheneum's founding in 2007 was driven by the need for faster systems in response to cyber attacks.

02:05

The company's focus shifted to collecting data from large networks in seconds instead of hours or days.

Atheneum completely refactored their data collection approach to address the ineffectiveness of traditional methods against modern threats.

The innovative approach aimed to make systems 10,000 times faster to prioritize quick response and data exfiltration prevention.

✦

Outdated network management tools are not equipped to handle modern cybersecurity threats.

04:25

These tools were designed to address untargeted attacks like slammer and blaster.

Current threats include targeted attacks from professional organizations and nation-states.

BigFix, later acquired by IBM, was developed in a different era and is insufficient for evolving cybersecurity challenges.

✦

The limitations of traditional security tools against insider and advanced persistent threats.

06:49

Security and systems management teams struggle with identifying and responding to vulnerabilities using separate tools.

Stephen's platform offers instant network-wide information retrieval and analysis, improving network security.

The demo of Stephen's platform impressed experienced individuals at Andreessen Horowitz.

The platform allows for quick and comprehensive network monitoring, advancing cybersecurity practices.

✦

The system can quickly identify all machines on a network writing to USB memory sticks.

08:40

The fast response time provides real-time data on network activity, amazing the team.

Existing solutions' limitations are highlighted, with the new system reaching out and interacting with every endpoint synchronously.

The system's ring architecture aggregates data before sending a comprehensive response.

Automation in grouping machines based on proximity enables efficient communication among thousands of machines in various environments.

✦

The system allows for automatic aggregation of data when a question is asked, providing all requested data efficiently.

10:34

The system is lightweight, can be used in various devices like VMs, process controllers, and ATMs, and doesn't require extensive infrastructure.

Multiple tools are utilized in systems management and security to identify indicators of compromise, such as file changes and software presence.

Collective knowledge from the Internet contributes to these indicators, which can now be used as inputs for further analysis and response.

✦

Innovative use of mesh and peer-to-peer networking in addressing security vulnerabilities like Heartbleed.

13:26

Linear peer-to-peer networking implementation enabled quick identification and resolution of affected systems without manual scripts.

Immediate triage and resolution of security issues provided a more efficient response to threats.

Practical application of mesh networking showcased its potential for enhancing cybersecurity measures in real-world scenarios.

✦

The need for a new architecture for handling IoT devices is highlighted by the Heartbleed incident.

15:37

CEOs need to have a better understanding of IT issues to prevent similar incidents in the future.

The current hub-and-spoke model is inadequate for handling the increasing number of connected devices.

A new architecture enabling peer-to-peer communication among devices is necessary for real-time data collection and analysis.

Real-time data is crucial for organizations to stay ahead of security threats.

✦

The future of data is moving towards real-time processing.

17:33

Obtain IAM's innovative architecture allows for unique scaling capabilities.

The communication technique focuses on clusters rather than hub-and-spoke.

Obtain IAM offers a single API that can be accessed through a natural language interface or used to build custom models for network monitoring in real-time.

00:00welcome to the a 16z podcast I'm Michael

00:02Copeland and we are here today with

00:04board partner Steven Sinofsky Steven

00:06welcome hi there and the CTO and

00:10co-founder of our latest investment

00:12tinium orion and ah we welcome thank you

00:15all the way from Berkeley California

00:17that's right yeah we like to represent

00:20the East Bay every now and again so I

00:22wanna really dig in you guys have built

00:24this incredible enterprise technology

00:27and so I want you to describe it for us

00:29but let's maybe start by describing the

00:33big problem that you and and your

00:34co-founder is your father but that you

00:36and your team Atheneum had been

00:38addressing for years really yeah so I

00:41mean we essentially found in taenia m--

00:44with the co-founders that we originally

00:46founded bigfix with and so bigfix was

00:48founded in 1997 a technology that

00:51essentially allowed people that had

00:52really large-scale environments to

00:54assess the state of those environments

00:55so look at the endpoint see what was

00:58running on them you know which users

01:00were logged in and where they were and a

01:02variety of other characteristics and

01:03what we kept on seeing a big fix was

01:05that you know around 2004-2005 customers

01:08were really struggling to get those

01:10systems to work fast enough so if you

01:13look at the problems that they were

01:14trying to address it was kind of a

01:15nascent area of apt and you know much

01:18faster attacks and they were seeing a

01:21lot of outages that were happening they

01:23were starting and ending in the span of

01:25minutes instead of the span of hours or

01:26days Sam so these attacks were coming

01:29from Internet connected endpoints by

01:32malicious people yeah I mean essentially

01:35people were realizing that this wasn't

01:36just you know the script kids anymore it

01:38really was actually that nation-state

01:41attackers and professional attacking

01:43organizations were realizing that there

01:44was a lot of value behind those

01:45firewalls and so they were coming in

01:47exfiltrating data that was extremely

01:49valuable and the customers were just

01:51unable to respond quickly enough and so

01:53they were coming to us and essentially

01:55asking us figure out a way where you can

01:58make your systems much much faster and

01:59we essentially realized that we couldn't

02:01do that using the topology that we'd

02:03created we had to throw everything away

02:05and start from scratch and so we in 2007

02:09founded Taney I'm around the principle

02:11that we needed to make things not

02:12you know ten times faster but 10,000

02:15times faster instead of gathering data

02:16in hours or days from really large

02:18networks we need to be able to do that

02:20in seconds and so to do that we

02:23essentially took a team of engineers put

02:25them in a room and asked them to start

02:27from fresh principles really not assume

02:30anything about how we were gonna

02:31approach this and what we did was

02:33actually completely refactor the entire

02:35topology of how people were collecting

02:37data the real kind of structure about

02:39how it was done and as a result of that

02:41even at five hundred thousand seat

02:42networks now for the first time we can

02:45get 15-second old data so Steven I want

02:47you to tell us how you saw the problem

02:50both you know as a person who looked out

02:52across large enterprises but what he was

02:55describing how did you view it from your

02:57end well sure I like it it's it's really

02:59I mean folks out there who manage our

03:01enterprise networks of the world they

03:02know this problem you know super well

03:05that that like you know you run these

03:07logon scripts that do inventories of pcs

03:09you run these very heavy client-server

03:12systems where you've got is you know -

03:14you know inventory the network you've

03:16got a bunch of sequel servers running

03:18and management tools and you know as an

03:20end user you know that like they're

03:22doing an inventory because that day you

03:24log on to your machine and all of a

03:25sudden it's like ten minutes before you

03:27can actually do any work and it usually

03:29happens at the worst possible time and

03:31then when all of that said and done like

03:33both the network isn't particularly

03:35secure and the information isn't really

03:37that accurate I actually remember I was

03:39I once was in a big briefing with a

03:41really really giant government customer

03:43in the United States and the guy was

03:45giving me a hard time about how

03:46difficult it was to monitor the network

03:48he's like he said to me look I have

03:50between 150 and 300 thousand pcs in my

03:52network and I looked at them and I was

03:55like well do you have a hundred and

03:57fifty or 300,000 I mean if your job is

04:00to count them right like that's that's a

04:03really big difference and then I

04:05actually learned a lot and this was a

04:06long time ago and actually the state of

04:08the art hasn't really changed like big

04:09fix is the state of the art and for a

04:11network that size you're looking at

04:13three four five day turnaround and by

04:15the end of the five days you know think

04:17of it you know like how many employees

04:18quit how many machines got thrown away

04:20how many machines got bought at Best Buy

04:22that week and the number is just

04:24completely out of whack

04:25and so a system bigfix which which was

04:27later bought by IBM correct was built in

04:31a world that didn't look anything like

04:33the world we operate within today that's

04:36exactly right

04:37yes I mean essentially if you look at

04:39those tools they were built 20 years ago

04:40to solve a completely different problem

04:42the problem was that there were these

04:44untargeted attacks like slammer and

04:46blaster that were affecting

04:48vulnerabilities that were patched and he

04:50just needed to get the patches out

04:52because no one was being individually

04:53targeted no organization was getting

04:56special attacks that were tailored to

04:57them or very few today what our

05:00customers are worried about our

05:02professional organizations or

05:03nation-states that are not only

05:05attacking just them specifically they're

05:08often cueing their attacks that they're

05:10gonna do against the solutions that

05:12those organizations have deployed so

05:14they're as sophisticated as you could

05:16possibly get and what we're seeing is

05:17you're trying to essentially screw in

05:19screws with a hammer right right

05:21that was a tool developed for a

05:22completely different problem and now

05:23you're trying to apply it to this and

05:25antivirus falls in that category

05:26firewalls fall in that category you know

05:29they just are not effective against the

05:31insider threat and advanced persistent

05:33threat that our customers are facing you

05:35know today what we what we what

05:37customers are seeing is you know they've

05:39got a bunch of tools that the systems

05:40management people use to to inventory

05:42and maintain and deploy patches and

05:44updates and software to machines and

05:46monitor performance and then a bunch of

05:48tools the security people do and both of

05:50those have state-of-the-art the security

05:52people are building taller and taller

05:53walls thicker and thicker walls and

05:56trying to close off the doors and things

05:57like that and the management people are

06:00just trying to keep track of what's

06:02going on but what's happening is now

06:04what when you you're hit with an attack

06:06it's usually through a sequence of very

06:08benign things that look ok to the

06:11network like somebody logging in

06:13somebody reading a file

06:14somebody you know installing a piece of

06:16software it looks pretty benign until

06:19something bad happens in which case then

06:20it just looks like a flaky PC mm-hm and

06:23so what's happening is the system's

06:24managing people now are sort of the

06:26frontline of when vulnerabilities happen

06:28but they use a different set of tools

06:30and their tools like it takes like 2

06:32weeks to figure something out and then

06:33the security people it takes like 3

06:35weeks before that particular one

06:38a pattern that they can go find out and

06:40so we're taenia really comes in is is

06:42now you're just talking about looking at

06:44a network of hundreds of thousands of

06:45nodes and being able to ask anything you

06:48want of that network and get an answer

06:49back instantly so I want to I want to

06:53describe Stephen if you can win tinium

06:56when Orion and his team came in to demo

06:59this you obviously have vast experience

07:02in the advanced on the table but what

07:07happened what did you see and then what

07:08was the reaction

07:09well it's fast you know a lot of folks

07:11here at Andreessen Horowitz are coming

07:13from a very deep operational background

07:15many of them were members of the ops

07:16wear team and of course mark and Ben

07:17created the company and so you're

07:20basically sitting around the table and

07:21we think that there's like 200 odd years

07:24of large-scale enterprise software

07:25management experience and so Ryan comes

07:28in pops open his laptop go opens up a

07:30browser hits a bookmark and starts

07:33typing in questions like show me all the

07:34machine names on the network show me how

07:37many machines are leaking network

07:38leaking a network packet data show me

07:41the md5 hash of all the processes of all

07:43the machines running on all of this

07:45network and we kind of were all thinking

07:47independently we only realized this

07:48after that oh that's a pretty neat

07:50little mock-up of the product that if we

07:52fund them they hope to go build because

07:55there was no conceivable way that he

07:57could be doing this to a network and

07:59then he I think he kind of looked

08:01puzzled you look cases so this is a live

08:03network of of several thousand nodes

08:06running at a HIPAA compliant hospital

08:09like this is this is a real system

08:12running and they're in production this

08:14is not a test it's not a mock-up it's

08:16not a simulate and we just were all

08:18scratching our heads and we literally

08:19couldn't believe that what he was asking

08:21was really possible it was fast actually

08:24like and then we started talking more

08:26and so then I started trying to play

08:27like let's stump Orion with questions

08:29about about networks and so my favorite

08:31vulnerability from the window side was

08:33always the plug in a USB memory stick

08:36and have a virus you know scamper across

08:38the network so I I crossed my arms and I

08:40leaned back and I said I want to know on

08:43this network how many of the pcs have a

08:46USB memory stick in them and are

08:48currently writing to it and I thought

08:50that's

08:51like a joke and literally in 15 seconds

08:54was the list of the machines that were

08:55currently writing to USB memory on the

08:57network and and Orion again this

09:0015-second response time I mean what's at

09:03stake in those 15 seconds and how under

09:06the hood to the excel you can explain to

09:09us what's happening sure so you know

09:12really the fundamental problem with

09:14existing solutions that we see is that

09:16when you're querying them you're just

09:17querying a database and the database is

09:20being filled on the back end by clients

09:22that are polling once every few hours or

09:23a few days what we're actually doing is

09:26reaching out and touching every endpoint

09:27so synchronously because you're asking a

09:29question we're actually asking them a

09:31question they evaluate a piece of data

09:33and then they come back to you with it

09:35and we've created this topology the ring

09:37architecture that we have which

09:38essentially allows clients endpoints to

09:41aggregate data on the land before they

09:44send back one answer back from the whole

09:46land that represents all of the machines

09:49that are out there so if you think about

09:50a branch environment you think about the

09:52largest scale retail environments or

09:54banking environments they may have

09:55thousands of different lands in each

09:58branch and then they may have thousands

10:00of machines that are in these you know

10:03core environments and what we're

10:04essentially doing is automatically

10:06constructing these groups based on

10:09proximity of the machines so that they

10:11know they can talk to each other because

10:13they're close to each other and

10:14essentially we've created a mechanism

10:16where they can automatically aggregate

10:18data when a question is asked and send

10:20back one message across the land that

10:23contains all the data that you were

10:24asking for so really efficient on the

10:26way in and you don't need to have a ton

10:28of infrastructure to back it up so to

10:29Stephens point where you know oftentimes

10:32you guys are asking queries and in

10:34current systems and and things slow way

10:37down I can't get anything done what's

10:39the sort of the load on the system when

10:41when queries are coming through taenia

10:43yeah 0.1% CPUs a 0.1% CPU on a tiny

10:47little runtime that's what like a mega

10:49mega installer seven Meg of RAM 10 Meg

10:52on disk so the beauty of this whole

10:54thing is it's so lightweight you can

10:55actually put it in VMs we have people

10:58who are putting it on process

10:59controllers and ATMs and point of sale

11:01devices and the heart rate monitors I

11:03mean devices that a lot of people wish

11:05were not computers so you look at the

11:07target attack you know a lot of people

11:09don't think of a point-of-sale device as

11:10a computer or didn't until they realize

11:12it's an existential threat to their

11:14organization not to right and so in

11:17these massive systems and Steven oh I

11:19want you to answer this like what

11:21happens if I don't know what questions

11:22to ask I mean like when things look

11:24benign they look benign how do I know

11:27how do I sort of uncover well yeah this

11:30this is what's really going on in the in

11:32the world of systems management and

11:34security right now your your what's

11:36happening is you've got all of these

11:37tools sort of firing off like this might

11:39be a problem this might be a problem we

11:41call these i/o sees these indicators of

11:42compromise right and and if you look at

11:45one of them and you know if you're an IT

11:47pros of these if you look at them it's a

11:49series of you're looking for this file

11:51for this process this Windows registry

11:53key change you know this this little

11:55piece of software ended up on a Mac and

11:57you actually have all of these and the

12:00problem is that today and that's also

12:02the collective knowledge of the Internet

12:03like like everybody is contributing to

12:06these there's feeds of I OCS you could

12:07subscribe to them but you can't do

12:10anything about it because all you could

12:12do is query the the databases all Ryan

12:13mentioned and find out if maybe three

12:15weeks ago I might have had a machine

12:16that met that pattern and and of course

12:19none of these attacks are last even that

12:20long and so what what you can do now is

12:23you can actually use these as inputs in

12:25to entertain iam and and actually answer

12:28them and that's sort of the the

12:29fundamental thing I mean you really have

12:31to get your head and one of the reasons

12:32that we get so excited about tinium as

12:34an investment for for andreessen

12:36horowitz was that these guys invented an

12:38incredibly cool technology and and the

12:41way that they deployed it it you know we

12:43mesh networking has been around for a

12:45long time peer-to-peer networking around

12:47for a long time and most of those hadn't

12:49really reached any any critical uses in

12:52the enterprise and they have this

12:53amazing insight that if you could walk

12:55up to any computer knowing the answer to

12:58one of these questions is sort of like a

12:59constant time operation is the registry

13:01key there or not that takes no time the

13:04problem was if you had two hundred

13:05thousand nodes and you had two

13:07hub-and-spoke ask them all of that

13:09question you'll never finish whereas if

13:11you can just get a message broadcast to

13:13all of them answer this question on your

13:15own

13:15and then just share the answer with the

13:17computer next to you it turns out you

13:19can actually do that instantly and and

13:22so it's sort of to me one of the very

13:23first commercial and commercially viable

13:26uses of mash and peer-to-peer networking

13:29and that and I don't want to use those

13:30terms because they get a little bit

13:31loaded but they actually invented a

13:33specific implementation they called

13:35linear peer-to-peer networking and you

13:36could read more about it on the blog

13:37post that talks about you know the cool

13:39way that they brought these technologies

13:41together so give us an example um

13:43heartbleed was something that caused

13:45many hearts to bleed how did something

13:48like that in your environment you know

13:51crop up and then get handled yeah so

13:53it's actually a really interesting

13:54question our customers literally could

13:56ask an english-language question tell me

13:58all the machines and versions of OpenSSL

14:00for every machine that's got open SSL

14:02across the environment and get an answer

14:04back in 15 seconds they didn't have to

14:06create a script they didn't have to wait

14:08days and we still are getting notices

14:10from companies that they're realizing

14:12that they're affected by heartbleed

14:13right it's been a long time since that

14:17first announcement came out and yet

14:18people have tools that are so broken

14:20they can't tell that for months

14:22potentially and what tinium customers

14:25were able to do is literally ask you a

14:26question in English and be able to see

14:28exactly where they were affected and

14:30then you know another thing that a Neum

14:31can do is allow you to actually fix

14:32things so quarantine machines turn

14:35firewalls on stop services that were

14:37affected I mean our customers were in

14:39full triage mode and they weren't

14:41looking at cycles of triage that took

14:43weeks or months they could do it in

14:45seconds and confirm that they were

14:46actually doing what they did intended

14:47and see where the gaps were yeah that

14:49that was sort of one of the most

14:51fascinating things that we learned and

14:52looking at a Neum was that what

14:55heartbleed happens and you know the most

14:57obvious first question that every every

14:59CEO all of a sudden needed to do was

15:01please you know IT folks tell me how

15:03many machines we have affected and if

15:05you remember what was happening at that

15:06moment companies were issuing press

15:07releases about about heartbleed and they

15:09were saying we are looking into it and

15:12you're like well that's not a really

15:13good answer for like and it turns out

15:15for days most companies had no idea

15:18whether they were affected now now Orion

15:20told us well there are all their

15:21customers realized they were all

15:22affected like everybody and you know you

15:24think of a large enterprise you've got

15:26code written by vendors you're running

15:27things

15:28offsite you have branch offices that

15:30might be using a different product that

15:31you don't know about in the central

15:33office and so really those press

15:35releases were accurate they didn't know

15:37and so you put what but it's an

15:40existential thing you know every CEO of

15:44every major corporation is now

15:45effectively an IT person because every

15:48company in software eats the world is

15:49really a software company so I want to

15:51get that like how you know this problem

15:54that you've been working on for years

15:55now you know with big fix and now obtain

15:57iam how does that describe the

15:59architecture that you've you going

16:00forward what does the future look like

16:01and I want both of you to answer this

16:03and and and how do then we tackle these

16:06things as scale gets bigger and bigger

16:07and more and more complex so that's the

16:10scary thing right people think hundreds

16:11of thousands of machines is a lot it's

16:14not gonna be a lot coming soon right IOT

16:17and people going in embedding chips and

16:20lightbulbs means we need to be scaling

16:22to potentially billions of devices and

16:24being able to assess them for telemetry

16:26and state and you know the hub-and-spoke

16:28model we already know is broken at

16:30hundreds of thousands we don't even want

16:32to talk about millions or billions right

16:34we need a fundamentally new architecture

16:36and so what we see is the possibility to

16:39embed this you know ring and linear

16:41peer-to-peer communications model into a

16:44myriad set of devices that some of which

16:46are going to be very lightweight right

16:47we're looking at watches and light bulbs

16:49some of which are going to be very heavy

16:51like servers and have a you know a

16:53language that all of these devices that

16:55have computability and that also have

16:57telemetry data on them should be able to

16:59speak with each other so that they can

17:01gather data about you know heat and

17:03power and location and you know more you

17:08know complicated things like which

17:09applications are running and what the

17:12workloads are and be able to aggregate

17:13those in real time and we believe that

17:15essentially fundamentally if you don't

17:17have real time data you're basically

17:18always playing whack-a-mole right if

17:21it's really really old data if it's days

17:23or weeks old data it's probably

17:25completely useless and if it's even

17:27minutes old data

17:28you're subtly wrong and what we believe

17:31is that all data is gonna have to move

17:32toward real time and we believe that

17:33that's possible with us yeah I mean

17:36that's fundamentally what's so exciting

17:38about about the future of obtain iam is

17:40that they've developed an innovative

17:41architecture and and a really creative

17:45and inventive approach to how you can

17:47really scale in a unique way and it's

17:49super clear that down the road that when

17:52you have a billion a billion devices or

17:54endpoints that they're all still gonna

17:55that they're gonna be near each other in

17:57these clusters and so that communication

17:59technique and that you know that's so

18:00different than hub-and-spoke is a huge

18:02asset going forward the last thing I

18:04just wanted to mention that's super cool

18:06about the product is it's effectively

18:07one giant API and so although you can go

18:10as a browser and go and and access it

18:13through this natural language interface

18:14you can also just use the API build your

18:17own model for how you want to ask

18:18questions of the network and model them

18:20and deploy tools and and charts and

18:22graphs and dashboards that are

18:24constantly and in real-time monitoring

18:26your network well Ryan thanks for coming

18:29by Steven thanks as always awesome thank

18:31you thank you

🎥 Related Videos

a16z Podcast | Things Come Together -- Truths about Tech in Africa

a16z Podcast | The Infrastructure of Total Health

The Robot Lawyer Resistance with Joshua Browder of DoNotPay

a16z Podcast | Bots and Beyond

Design Sprints as a Tool for Organizational Change

a16z Podcast | Valuing Today's Fast-Growing Software Companies

🔥 Recently Summarized Examples

Former Priest REVEALS Jesus' MYSTICAL Lost Years & His Connection to BUDDHA! | Fr. Seán ÓLaoire

Kim Kardashian's Plastic Surgery Reversal: Is She Trying to Rewind Time?

How To Succeed As A NEW & YOUNG Realtor [Deals Every Month + Luxury Listings]

BITCOIN EMERGENCY: NEXT PRICE TARGETS REVEALED!! Bitcoin News Today & Ethereum Price Prediction!

Uncovering Ancient Atlantean Ruins: Exploring Evolutionary Pathways and Psychic Phenomenon

Samsung Technician Knives TV To Void Warranty

View original video