From Metrics to Root Cause
A practical guide to debugging production systems using observability data.
Observability is not a dashboard. It's a diagnostic process.
This talk explores how to move from "something is wrong" to "here's the fix" using a systematic approach to debugging production systems.
Abstract
Every engineer has been there: alerts fire, dashboards show anomalies, but finding the actual root cause feels like searching for a needle in a haystack. We collect terabytes of metrics, logs, and traces, yet debugging still feels like guesswork.
This talk presents a structured approach to production debugging that turns observability data into actionable insights. We'll explore:
- Why dashboards alone aren't enough
- The three questions every debugging session should answer
- How to correlate signals across metrics, logs, and traces
- Real-world examples of debugging complex distributed systems
What You'll Learn
By the end of this talk, you'll have a mental framework for approaching any production incident, along with practical techniques for using your observability stack more effectively.
Outline
- The Problem with Dashboards - Why visualization isn't investigation
- The Diagnostic Mindset - Thinking like a detective
- Signal Correlation - Connecting metrics, logs, and traces
- Case Studies - Real debugging sessions from production systems
- Building Better Alerts - From symptoms to causes
Target Audience
This talk is for anyone who has ever stared at a dashboard wondering "why is this happening?" Whether you're debugging your first production incident or your hundredth, you'll find practical techniques to add to your toolkit.