Reliability Enablers
Ash Patel & Sebastian Vietz
Publishing Details
Contact & Outreach
About This Podcast
read.srepath.com
Explore Statistics
Recent Episodes
You (and AI) can't automate reliability away
What if the hardest part of reliability has nothing to do with tooling or automation? Jennifer Petoff explains why real reliability comes from the human workflows wrapped around the engineering…
#67 Why the SRE Book Fails Most Orgs — Lessons from a Google Veteran
A new or growing SRE team. A copy of the book. A company that says it cares about reliability. What happens next? Usually… not much.In this episode, I sit down with Dave O’Connor, a 16-year Google…
#66 - Unpacking 2025 SRE Report’s Damning Findings
I know it’s already six months into 2025, but we recorded this almost three months ago. I’ve been busy with my foray into the world of tech consulting and training —and, well, editing these podcast…
#65 - In Critical Systems, 99.9% Isn’t Reliable — It’s a Liability
Most teams talk about reliability with a margin for error. “What’s our SLO? What’s our budget for failure?” But in the energy sector? There is no acceptable downtime. Not even a little.In this…
#64 - Using AI to Reduce Observability Costs
Exploring how to manage observability tool sprawl, reduce costs, and leverage AI to make smarter, data-driven decisions.It's been a hot minute since the last episode of the Reliability Enablers…
#63 - Does "Big Observability" Neglect Mobile?
Andrew Tunall is a product engineering leader focused on pushing the boundaries of reliability with a current focus on mobile observability. Using his experience from AWS and New Relic, he’s vocal…
#62 - Early Youtube SRE shares Modern Reliability Strategy
Andrew Fong’s take on engineering cuts through the usual role labels, urging teams to start with the problem they’re solving instead of locking into rigid job titles. He sees reliability,…
#61 Scott Moore on SRE, Performance Engineering, and More
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com
#60 How to NOT fail in Platform Engineering
Here’s what we covered:Defining Platform Engineering* Platform engineering: Building compelling internal products to help teams reuse capabilities with less coordination.* Cloud computing connection:…
#59 Who handles monitoring in your team and how?
Why many copy Google’s monitoring team setup* Google’s Influence. Google played a key role in defining the concept of software reliability.* Success in Reliability. Few can dispute Google’s ability…
#58 Fixing Monitoring's Bad Signal-to-Noise Ratio
Monitoring in the software engineering world continues to grapple with poor signal-to-noise ratios. It’s a challenge that’s been around since the beginning of software development and will persist…
#57 How Technical Leads Support Software Reliability
The question then condenses down to: Can technical leads support reliability work? Yes, they can! Anemari has been a technical lead for years — even spending a few years doing that at the coveted…
#56 Resolving DORA Metrics Mistakes
We're already well into 2024 and it’s sad that people still have enough fuel to complain about various aspects of their engineering life. DORA seems to be turning into one of those problem areas.Not…
#55 3 Uses for Monitoring Data Other Than Alerts and Dashboards
We’ll explore 3 use cases for monitoring data. They are:* Analyzing long-term trends* Comparing over time or experiment groups* Conducting ad hoc retrospective analysis Analyzing long-term trends …
#54 Becoming a Valuable Engineer Without Sacrificing Your Sanity
Shlomo Bielak is the Head of Engineering (Operational Excellence and Cloud) at Penn Interactive, an interactive gaming company. He’s dedicated much of his talk time at DevOps events to talk about a…
#53 What's Missing in Incident Response Processes?
Incident response is an increasingly difficult area for organizations. Many teams end up paying a lot of money for incident management solutions. However, issues remain because processes supporting…
Can ITIL Benefit from Site Reliability Engineering?
According to Vlad Ukis, there are a lot of enterprises around whose IT functions are organized around ITIL. What you use SRE for is something completely different. SRE is not for setting up the IT…
#52 Navigating Complexity within Incidents
Sonja Blignaut is a complexity expert. That might not sound relevant to incident response in reliability engineering. But it is!Our systems are becoming more complex and so are the resulting…
#51 Whitebox vs Blackbox Monitoring
Have you got complete monitoring of your software in effect? Are you sure? Google's SREs break monitoring down to white box versus black box monitoring.It's not the same as internal versus external…
#50 Making Better Sense of Observability Data
Jack Neely is a DevOps observability architect at Palo Alto Networks and has a few interesting ways of extracting value from o11y data.We crammed into just under 25 minutes ideas like these 7…
Frequently Asked Questions
Reliability Enablers has published 70 episodes since April 2023, covering topics in Business, Technology.
Reliability Enablers is currently dormant with new episodes weekly. Average episode length is 28m.
Sign up on Grep.FM to access contact details for Reliability Enablers, including email and social media links.