Publishing Details
About This Podcast
Behind every reliable software system, there are people working hard to keep it online.
Humans of Reliability is a series that spotlights the engineers, leaders, and innovators at the heart of incident management and system reliability. Through candid conversations, we explore the challenges, lessons, and personal journeys of those navigating complex technical landscapes to ensure the systems we rely on run smoothly.
From unforgettable incident stories to favorite tools, workflows, and hobbies, Humans of Reliability uncovers the human side of technology—offering insights and inspiration for anyone passionate about building and maintaining resilient systems.
https://rootly.com/humans-of-reliability
Explore Statistics
Recent Episodes
Burnout Doesn't Ask Permission: Recognizing, Recovering, and Rebuilding w/ Stephen Townsend
Burnout doesn't announce itself. For Stephen Townsend, SRE team lead and host of the Slight Reliability podcast, it crept in over months of mounting pressure on a massive transformation program, and…
S2026E2 Code Is Cheap, Reliability Isn’t: Owning Production in the AI era w/ Swizec Teller
Code has never been easier to write. With AI copilots and agentic coding tools, spinning up features feels almost effortless. But production systems don’t run on vibes, they run on reliability.In…
S2026E1 Democratizing Reliability: Empowering Non-Devs with Dileshni Jayasinghe (commonsku)
Many companies don’t invest in incident management until something goes wrong. commonsku took a different path.In this episode of Humans of Reliability, Sylvain sits down with Dileshni Jayasingha, VP…
S1E23 99%+ Accuracy on a Moving Target: Model Deprecation and Reliability with Tomás Hernando Koffman (Not Diamond)
Shipping systems powered by LLMs would be hard enough if the models stayed the same. But in reality, they don’t. Models get updated and deprecated at a pace traditional software wouldn’t. All while…
S1E22 The Reality of GenAI in Production with Eduardo Ordax (AWS)
GenAI demos are easy. Production is where everything breaks. In this episode, Eduardo Ordax, Principal GTM GenAI at AWS, breaks down what actually stops companies from shipping reliable AI systems,…
S1E21 It’s Never Different This Time: LLM Reliability Without the Hype with Julien Simon
In this episode, Julien Simon, longtime voice in the open-source ML world, reminds us that even in the era of GenAI, reliability fundamentals haven’t changed.Julien breaks down why calling “the same…
S1E20 You Can’t Fix What You Don’t Measure: Observability in the Age of AI with Conor Bronsdon
Only 50% of companies monitor their ML systems. Building observability for AI is not simple: it goes beyond 200 OK pings. In this episode, Sylvain Kalache sits down with Conor Brondsdon (Galileo) to…
S1E19 The End of “Good Code”? AI, Throughput, and Reliability with CircleCI CTO Rob Zuber
Is “good code” still the right measure of engineering success in an AI-driven world? In this episode of Humans of Reliability, Rob Zuber, CircleCI CTO, joins Sylvain to explore how coding assistants…
S1E18 Frontline Reliability: Protecting User Journeys with SLOs with Shery Brauner (Razor, ex-Zalando)
What does it really take to move from firefighting incidents to building reliability at scale? In this episode of Humans of Reliability, Shery Brauner (Razor, ex-Zalando) shares her unique journey…
S1E17 Balancing Reliability at the Crypto-Finance Frontier with Brian Shaw (Uphold)
Sylvain Kalache sits down with Brian Shaw, Senior Engineering Leader at Uphold, to explore the reliability challenges that arise when operating at the intersection of traditional finance and crypto…
S1E16 Command Under Pressure: David Owczarek on Incident Leadership and Human-Centered Reliability
Incident response is as much about people as it is about systems. In this episode, David Owczarek, a veteran engineer leader and seasoned incident commander, joins Silvan Kalache to unpack the human…
S1E15 AI at the Frontlines of Healthcare Reliability with Ryan Lockard (CVS Health)
AI is transforming reliability work—from reactive firefighting to proactive engineering. In this episode, Ryan Lockard, VP of Platform Engineering and AI Enablement at CVS Health, joins Sylvain…
S1E14 Trust Is the Product: Building Reliable Billing in the AI Era with Cosmo Wolfe (Metronome)
In this episode, we sit down with Cosmo Wolfe, Head of Technology at Metronome, to unpack how reliability, trust, and architecture intersect in one of the most critical and overlooked parts of the AI…
S1E13 The Golden Path to Nowhere: When Platforms Undermine Reliability with Chase Roberts (Northflank)
Internal platforms promise speed, consistency, and scale — but what happens when they become a distraction? In this episode, Chase Roberts, COO at Northflank, joins Sylvain Kalache to examine the…
S1E12 AI can boost developer productivity, if used right, with Justin Reock, Deputy CTO at DX
In this episode of Humans of Reliability, we sit down with Justin Reock, Deputy CTO at DX, to unpack the real impact of generative AI on developer productivity. Drawing from early data in DX’s GenAI…
S1E11 Why Reliability in the AI Era Starts with the Network with Marino Wijay
In this episode, we explore how networking has shaped reliability as we know it. Marino Wijay cloud networking expert and Staff Solutions Architect at Kong shares how his journey began not as an SRE,…
S1E10 Metrics That Matter: Measuring Developer Productivity in the AI Era
In this episode of Humans of Reliability, Ryan McDonald is joined by Mark Quigley, Head of Platform Engineering at 90, for a conversation that cuts through the noise around developer productivity…
Are AI and Platforms Making SRE Obsolete? With Kaspar von Grünberg, Humanitec’s CEO
Last year, over 89% of companies claimed to have adopted platform engineering. And, in the past month, LLMs have been disrupting how we think about software development. In this context, Kaspar, asks…
S1E7 Scientific Incident Management with Dan Slimmon
Dan Slimmon is an incident management veteran who's worked at Etsy, HashiCorp, and now leads consulting and training on pragmatic, non-bureaucratic incident response. In this episode, Dan shares his…
How AI broke serverless and what to do about it with Vercel’s Mariano Fernández Cocirio
Mariano, Staff Product Manager at Vercel, explains why serverless architectures are hitting unexpected limits—they’re too fast. The industry has spent millions optimizing serverless for speed, but AI…
Frequently Asked Questions
Humans of Reliability has published 26 episodes since January 2025, covering topics in Technology.
Humans of Reliability is currently highly active with new episodes monthly. Average episode length is 26m.