How to Become an SRE in 2024 | The 3rd Highest Paying Tech Role in the US 👩🏾‍💻

Adama Talks Tech2024-01-07

Site Reliability Engineering#SRE#DevOps#Software Engineering#Infrastructure#Scalability#Monitoring#Incident Response#Automation#Cloud Computing#Reliability Engineering#System Architecture#Deployment#Continuous Integration#Continuous Delivery#Kubernetes#SRE careers#SRE job#DevOps Engineer#Infrastructure Engineer#Systems Engineer#Reliability Engineer#Site Operations#IT Operations#Capacity Planning#Service Level Agreement#Incident Management

2K views|8 months ago

💫 Short Summary

Site Reliability Engineering (SRE) is one of the highest paying tech roles in 2023 and is responsible for ensuring the reliability of platforms and applications in production. Key skills include reliability first, automation, monitoring and alerting, embracing risk, service level modeling, and collaboration. SREs can come from various backgrounds such as software engineering, devops, cloud architecture, and network and security engineering. Transitioning into an SRE role involves acquiring knowledge and experience in SRE principles and seeking out job opportunities aligned with one's skills and background. The AI Revolution also creates opportunities for SREs in ensuring the reliability of AI platforms.

✨ Highlights

📊 Transcript

✦

Site Reliability Engineers (SREs) are responsible for ensuring that platforms, websites, and applications remain reliable once they are in production.

00:00

SREs play a crucial role in ensuring that end users have the experience they expect when using a digital platform.

Reliability is important in various sectors, such as medical applications where system failure could have catastrophic consequences.

The skills of an SRE revolve around prioritizing reliability, automation, monitoring, alerting, embracing risk, managing the service level model, and collaboration within teams.

✦

The key skills for an SRE include expertise in SLOs and SLIs, monitoring and alerting, data-driven decisions, cloud architecture, reliable systems, and automation.

02:02

SREs need to be experts in areas like SLOs and SLIs, monitoring and alerting, data-driven decisions, cloud architecture, reliable systems, and automation.

They also require peripheral knowledge and experience in areas such as networking and application testing, without needing to be experts in these fields.

✦

Acquiring knowledge in key SRE topics can be done through resources like the Google SRE workbook, Linux foundation training, and the 'Becoming an SRE' course.

05:24

There are few comprehensive SRE courses due to the diverse nature of the role.

The 'Becoming an SRE' course covers fundamental topics, offers different levels of intensity, includes projects to build a portfolio, and provides a career development pack for job preparation.

Other resources like the Google SRE workbook and Linux foundation training can also help in learning SRE principles and skills.

✦

SREs come from a range of backgrounds, and the transition into an SRE role can be smooth for those in jobs like devops engineers, software engineers, cloud architects, second-line support, and network/security engineers.

07:28

For software engineers transitioning into SRE, they can leverage their programming skills, understanding of application design, and logging to support their move into the SRE role.

To progress further, they need to acquire knowledge of SRE fundamentals, cloud architecture, automation, and infrastructure as code.

It's important to identify SRE jobs that align with your skills and background by looking for specific skills listed in the job descriptions.

✦

AI and machine learning have not replaced the need for SREs; in fact, AI platforms also require reliability engineers to ensure their performance.

11:04

Companies like OpenAI and Anthropic, which are heavily involved in AI, still require SREs to maintain the reliability of their platforms.

There is no need to panic about AI taking over SRE roles, as there is still a demand for SREs in the tech industry.

00:00 SRE was one of the highest paying Tech

00:01 roles in 2023 and actually ranked third

00:04 on forbes' list but it's also one of the

00:06 most dynamic interesting and

00:07 transferable jobs in the tech space it

00:10 also happens to be one of the most

00:11 challenging but if that all sounds good

00:12 to you and you're interested in

00:13 transitioning into Sr you want to know

00:16 what it takes to become an Sr then stay

00:18 tuned hey what's up I'm Adam if you're

00:20 new here and if you're returning then

00:21 you know what's good I'm an Sr based in

00:24 London and in this video I want to get

00:26 into a bit of the nitty-gritty about how

00:28 to transition from a roll in Tech or

00:30 from a different type of space into the

00:32 SR role I feel like the SR role is a

00:34 little bit mystical like what do Sr

00:36 actually do what kind of skill should I

00:38 be checking off if I want to be an SRE

00:40 and we're going to get into that today

00:41 but I'm also going to get into what kind

00:43 of Pathways you can follow if say you're

00:45 a software engineer looking to be an Sr

00:47 or devops engineer and actually how you

00:49 can search for jobs that you stand a

00:51 good chance of getting like SRE roles

00:53 that lean into your skills and then

00:55 finally why the AI Revolution actually

00:57 opens up a big opportunity for sres

01:00 let's just get into it so what is site

01:01 reliability engineering and why does it

01:03 matter Well site reliability Engineers

01:05 or sres are responsible for making sure

01:07 that platforms websites applications

01:10 once they're in production that they

01:11 remain reliable that the end user gets

01:14 the experience they expect why does that

01:16 matter well consider this a medical

01:17 application that doctors or medical

01:20 staff use to access patient data maybe

01:22 even their images from scans and things

01:24 like that now imagine a doctor goes to

01:26 user application to treat a patient

01:28 perhaps one that's in dire need and they

01:29 can't access the site they can't access

01:31 the application it's not responding as

01:32 expected or even worse maybe it's

01:34 serving up the wrong data Maybe it's

01:36 delivering data about a different

01:37 patient you can see how the consequences

01:39 of that unreliable system could actually

01:41 be catastrophic and that's just one

01:43 example the impact of an unreliable

01:44 system can be far spread from customer

01:47 dissatisfaction and loss of Revenue to

01:49 actually compliance issues and your own

01:51 employees basically getting sick and

01:52 tired like If all we're doing is putting

01:54 out fires because our system is always

01:56 failing right it's unreliable then

01:58 where's the time for innovation and

02:00 design and moving forward so now we see

02:02 why reliability is important and why

02:03 having an engineer dedicated to it makes

02:06 sense but then what kind of skills does

02:07 the SRE have like how does that

02:09 translate into preparing for a job well

02:11 the key skills of an SRE actually center

02:13 around the principles of site

02:14 reliability engineering and they're as

02:16 follows reliability first this is the

02:18 idea that reliability is the most

02:20 important feature of your platform like

02:21 it doesn't matter how many nice things

02:23 that you can layer on top of it how many

02:25 updates if the system is unreliable

02:27 you're building on really weak

02:28 foundations the second is automation the

02:30 idea that the SR should be automating

02:32 away toil that is manual tasks that take

02:34 away from precious engineering time

02:36 right so putting things in place so that

02:38 we can spend more time on the things

02:39 that are going to be Innovative and move

02:40 us forward next is monitoring and

02:42 alerting this is all about seeing into

02:44 your systems right if you want to ensure

02:46 that the system is reliable and that you

02:47 don't have failures and outages we need

02:49 to be able to see what's going on and be

02:51 able to alert effectively ideally

02:53 automated alerting so when something's

02:55 going wrong we get an alert sent to us

02:56 as sres and we can check it out and

02:58 potentially before the end user even

03:00 knows anything has happened so the next

03:01 one may be a bit surprising but it's

03:02 actually about embracing risk you're

03:04 like hm but I thought we was trying to

03:05 be reliable the aim is not 100%

03:07 reliability we're not trying to foster a

03:09 culture where everyone's scared to touch

03:10 everything right and we never move

03:12 forward because if we press this one

03:13 thing we might break everything or we

03:15 might break one thing and nobody wants

03:17 to do that we want to give enough room

03:19 so that people devs all types of

03:20 engineers want to push boundaries and

03:22 move the applicational platform forward

03:24 but so we can do it in a way that is

03:26 safe if something goes wrong we know how

03:27 to do roll backs and things like that to

03:29 bring the system back and ultimately

03:31 it's more in a controlled system there's

03:33 also the principle of the service level

03:34 model so this is the way that we manage

03:36 and monitor our systems in the service

03:38 level so slos and sis and finally it's

03:40 about collaboration SRS don't work

03:42 independently we work as part of teams

03:44 and that's an important thing a key

03:46 principle in SRE okay so how does that

03:49 translate into the skills that are

03:50 needed to be an SRE cuz I've said a lot

03:52 of things and you're thinking that's a

03:53 lot of things to check off the thing is

03:55 you don't have to be an expert in

03:56 everything there are some areas that

03:58 srvs are expected to be the subject

04:00 matter expert and there are other things

04:02 that we need kind of peripheral

04:04 knowledge of experience of we need to

04:05 know how to recognize them but I don't

04:07 need to know how to be design something

04:08 from scratch and here's what I mean if

04:10 we take a look at this table which

04:11 splits things into subject matter expert

04:13 and things that you need experience and

04:15 knowledge of we can see that you're

04:16 expected to really know the

04:17 nitty-gritties of things like slos and

04:19 sis of monitoring and alerting of data

04:22 driven decisions things like

04:23 architecting in the cloud reliable

04:25 systems and things like automation you

04:27 should have a firm grip on these sorts

04:29 of things but there's other things that

04:31 you need knowledge of and experience of

04:33 but you might not need to be an expert

04:34 like I don't need to be a networking

04:36 engineer to be an SRE right so it's in

04:37 the experience side but I do need to

04:39 have knowledge of networking because if

04:42 there's a networking issue if that's why

04:43 my system is unreliable I need to know

04:45 how to at least identify that maybe do

04:47 some sort of work but then obviously

04:49 seek help with where necessary from the

04:51 appropriate Engineers who are also in

04:52 the team also in the wider company to

04:54 support a similar thing is like

04:55 application testing I don't need to be

04:57 able to write the most elaborate test

04:58 for Java applications python

05:00 applications all of these things but I

05:01 need to understand the job of tests and

05:04 how they fit into the the pipeline and

05:05 how they fit into the software

05:06 development life cycle right and when

05:08 necessary I need to be able to test my

05:10 own code so if I'm writing automation

05:12 scripts in Python I do need to know how

05:14 to write tests at least to the standard

05:16 that we expect to check that that works

05:18 otherwise we're just Flying Blind

05:19 basically and hoping that it works all

05:21 the time but how do you acquire then

05:22 some of this knowledge especially those

05:24 key subject matter expert topics well

05:27 it's kind of up to you there is not many

05:29 all encompassing SRE courses out there

05:31 just because of how diverse and Broad

05:33 the role is right but you can take a

05:36 list like this and go through and find

05:38 resources that are align with it for

05:39 example the Google SRE workbook is a

05:42 great resource for learning SRE

05:43 principles or you may look at something

05:45 like the Linux foundation for some of

05:46 their training and certifications but in

05:48 case you do want to One-Stop shop for

05:50 everything I have my becoming an SRE

05:51 course that is out on the 8th of January

05:53 2024 so here we go through all the

05:55 fundamental topics there's varying

05:57 levels of intensity for the things that

05:59 are sub matter expert you dive very deep

06:01 into things that are more peripheral you

06:03 know we get to grips with understanding

06:05 but we don't want to waste time we want

06:07 to get to the really core of the SR and

06:09 when it takes for us to get a job and

06:11 execute that job within the first few

06:12 weeks months and years of our career so

06:14 the course Hastings from Theory to demos

06:16 to quizzes to make sure that you

06:17 understand what you've been learning

06:18 right in all of these modules but it

06:20 also has projects projects to support

06:22 the learning and actually build out a

06:24 portfolio of these skills right so that

06:25 you can store them in places like GitHub

06:27 and link to them when you're applying

06:28 for jobs and after all of that there's

06:30 the career development pack where we

06:32 dive deep into how do you actually get

06:33 the SRE job right there's things like

06:35 the skills tracker and the application

06:37 tracker but also how do you construct a

06:39 CV for an Sr role right if you're coming

06:41 from a completely different space or how

06:44 do you actually find jobs that align

06:45 with what you do and what are your

06:47 skills are and your background how do

06:48 you then interview what kind of

06:50 interview Styles and questions can you

06:51 expect all of that is in the course and

06:53 if you want more information on that

06:54 check the description of this video but

06:56 anyway back to it so how do you

06:57 transition into an S role depending on

06:59 where you are now because we know that

07:01 SRS come from a range of backgrounds and

07:03 bring their skills from their previous

07:04 jobs with them I just want to make it

07:06 clear that some jobs lend themselves

07:07 very well to SRE that the transition can

07:10 be very smooth things like devops

07:11 Engineers software Engineers Cloud

07:14 Architects second line support even

07:16 Network and security Engineers because

07:18 of the broad aspect of Sr you can bring

07:21 your Specialties you can bring your area

07:22 of expertise into the SR role so you

07:25 have your foundations and you start to

07:27 layer on top so let's quickly go over

07:28 the software engineer path just to

07:30 illustrate how this would work as a

07:31 software engineer you are going to be

07:33 well versed in things like programming

07:35 languages whatever you have been

07:36 programming in whatever you've been

07:37 building applications in and supporting

07:39 you're going to have a really good grasp

07:41 on that which means things like the

07:42 automation element in terms of writing

07:44 scripts to get things done you should be

07:46 able to do that sort of thing a lot

07:47 easier than somebody else because you

07:49 understand how to turn problems into

07:51 code already codification of problems

07:53 you're also going to have an

07:54 understanding of application design and

07:56 to an extent logging right like error

07:57 handling and how to log a effectively

07:59 and appropriately for those who are

08:01 going to be supporting it these are all

08:02 very strong things to bring in when you

08:04 are applying and thinking about the SR

08:06 position but where do you go next like

08:07 what is the next layer to add well now

08:08 you want to start thinking about some of

08:09 these Sr fundamentals and principles

08:12 right you want to start understanding

08:13 slos sis and that model you want to

08:15 start understanding observability and

08:17 alerting right monitoring and alerting

08:19 how can we build out systems like this

08:21 that are functional and useful after

08:23 that you may want to add the next layer

08:24 which will be things around cloud and

08:26 automation right do you understand how

08:28 to build a Rel system in the cloud or at

08:30 least support one right so if you're

08:32 going to be working in an AWS

08:33 environment you need to be in touch with

08:34 these AWS Services understand how they

08:37 work and understand how to build

08:38 reliability into the way that we use

08:40 them and the way that we execute with

08:41 them if you're not familiar with things

08:43 like infrastructures code then you also

08:45 going to start to layer these things on

08:46 in cicd so that's how the transition may

08:48 work and finally kind of throughout this

08:51 you want to start thinking about the

08:53 principles the ideas and the attitudes

08:55 towards Sr like the end use of focus of

08:57 everything that we do right these data

08:59 driven decisions and things like that

09:01 but you can see how you're not starting

09:02 from scratch as a software engineer or

09:04 software developer or a devops engineer

09:06 in another example or second line

09:07 support like you're bringing those

09:09 skills with you so I did promise that I

09:11 would touch on how you identify these

09:12 jobs and the jobs that are aligned with

09:14 you well you want to be looking out for

09:16 SRE jobs where the job description lists

09:19 things that you know that you are

09:20 skilled in let's go back to that

09:21 software engineering example if you are

09:24 looking at a job description for an Sr

09:26 rooll and there's an emphasis on

09:27 programming right and

09:29 understanding um code and understanding

09:32 applications in terms of the larger

09:33 scale and their design then you might

09:35 start to think I probably have a

09:37 competitive advantage over somebody who

09:39 may be from a devil's background and you

09:40 know may not spent that much time in

09:42 application code right so you will want

09:44 to put yourself forward for things like

09:45 that whereas if you are from the dev's

09:47 background and you start seeing job

09:48 descriptions and there's a heavy

09:49 emphasis on cicd or there's a heavy

09:52 emphasis on infrastructures code and

09:54 terraform or even Linux then you're

09:56 going to start to think that is where I

09:58 am well placed right that is where my

10:00 odds are higher well the odds are in my

10:02 favor in an Sr role like that and

10:04 because the role can change so much from

10:06 place to place and companies will Define

10:08 what they mean by SRE this is why it's

10:10 so important to look at the job

10:11 description when you're applying instead

10:12 of just blindly applying for SRE roles

10:14 one C that I did want to include here

10:16 before you start thinking about how you

10:17 get your first SRV job is if you are in

10:20 a role right if you're in a company and

10:22 you're aligned with a tech department or

10:23 you even have access to it so maybe

10:25 you're not an engineer but you work

10:26 within this large organization or in a

10:28 tech company

10:29 start making connections with the SRE

10:31 and platform teams and even the devops

10:33 team if there isn't an SRE team from

10:35 early on right start thinking about the

10:37 ways that you can bring SRE principles

10:38 into the work that you do and how you

10:40 can maybe take on task and work from the

10:43 SR or the platform team right could you

10:45 ask to be involved in some of the

10:46 tickets could you bring some of the

10:47 knowledge that you have from your

10:48 current role into what they're doing

10:50 right offering support that way you

10:52 start to build up experience that you

10:54 can put on your things like CVS for when

10:56 you're applying for full on SRE roles or

10:58 you you can even make the case to your

11:00 company that you would like to

11:01 transition into the SRE position so

11:03 finally let's wrap this up by talking

11:04 about SRE and AI because I know there's

11:06 a lot of fear in like the tech market

11:08 like are we all going to be replaced we

11:09 don't know what's going to happen

11:10 realistically in 10 years right but what

11:12 we do know is that the SRB role is still

11:14 in demand and actually there is synergy

11:17 happen between Ai and the development

11:19 and the increase in adoption and the SRE

11:21 role AI platforms also need to be

11:23 reliable which means they also need sres

11:26 take a look at this SRE role here you

11:27 know where that's from open AI the

11:29 creators of chat GPT one of the most

11:31 popular chat Bots that are in existence

11:34 they need sres they need reliability

11:36 Engineers all this one here that is from

11:38 anthropic another AI first company who

11:41 also need sres so you can actually Mel

11:44 or like start to join your interest in

11:45 Ai and machine learning if that's

11:47 something that is of interest to you to

11:49 your career like it doesn't have to be

11:50 an either or you don't have to panic

11:52 well not Year anyway about AI coming to

11:54 take your job despite the turbulence of

11:56 2023 it's an exciting time in Tech and

11:58 it's still an exciting time to be an SRE

12:00 and transitioning into the role I will

12:02 go more in depth about the application

12:04 process even things like interviews and

12:05 how to prep in another video but for now

12:08 thank you for watching and I will see

12:09 you in the next one

💫 FAQs about This YouTube Video

1. What is the role of a Site Reliability Engineer (SRE) and why is it important?

A Site Reliability Engineer (SRE) is responsible for ensuring that platforms, websites, and applications remain reliable once they are in production, and the end user gets the experience they expect. This role is important because the impact of an unreliable system can be catastrophic, leading to customer dissatisfaction, loss of revenue, and even compliance issues.

2. What are the key skills of a Site Reliability Engineer (SRE)?

The key skills of a Site Reliability Engineer (SRE) revolve around the principles of site reliability engineering, including a focus on reliability, automation, monitoring and alerting, embracing risk, service level objectives (SLOs) and indicators (SIs), and collaboration.

3. How can someone transition into a Site Reliability Engineer (SRE) role?

Transitioning into a Site Reliability Engineer (SRE) role can be done from various backgrounds such as software engineering, DevOps engineering, cloud architecture, and network and security engineering. Building skills in areas such as automation, monitoring, and cloud reliability can help in transitioning to an SRE role.

4. What is the relationship between AI and the role of a Site Reliability Engineer (SRE)?

The increasing adoption of AI technology has created a demand for Site Reliability Engineers (SREs) to ensure the reliability of AI platforms. SREs play a crucial role in supporting the reliability of AI systems, showcasing the continued importance and relevance of the SRE role in the context of AI.

5. Where can one acquire the necessary skills to become a Site Reliability Engineer (SRE)?

Skills to become a Site Reliability Engineer (SRE) can be acquired through resources like the Google SRE workbook, training and certifications from organizations like the Linux Foundation, and specialized courses focusing on SRE principles and practices. Additionally, gaining hands-on experience in areas such as automation, monitoring, and cloud architecture is valuable for skill development.

🎥 Related Videos

What vaccinating vampire bats can teach us about pandemics | Daniel Streicker

What vaccinating vampire bats can teach us about pandemics | Daniel Streicker

a16z Podcast | Things Come Together -- Truths about Tech in Africa

a16z Podcast | Things Come Together -- Truths about Tech in Africa

2024 TSCRS Applications of anterior segments diagnostic instruments in cataract surgery

2024 TSCRS Applications of anterior segments diagnostic instruments in cataract surgery

a16z Podcast | The Infrastructure of Total Health

a16z Podcast | The Infrastructure of Total Health

The Robot Lawyer Resistance with Joshua Browder of DoNotPay

The Robot Lawyer Resistance with Joshua Browder of DoNotPay

NES Controllers Explained

NES Controllers Explained

🔥 Recently Summarized Examples

The Hitler-Stalin Pact | Reflections Episode 9

The Hitler-Stalin Pact | Reflections Episode 9

Uncovering Corruption From Health "Experts" | Scott Carney

Uncovering Corruption From Health "Experts" | Scott Carney

The Forgotten Geometry: A New Path to Unification

The Forgotten Geometry: A New Path to Unification

Joe Rogan Experience #2194 - Luis Elizondo

Joe Rogan Experience #2194 - Luis Elizondo

From Tesla to DNA: The Science of Scalar Waves - Dr. Sandra Rose Michael - Think Tank E44

From Tesla to DNA: The Science of Scalar Waves - Dr. Sandra Rose Michael - Think Tank E44

Bitcoin Holders...Watch Out for Sept

Bitcoin Holders...Watch Out for Sept

View original video