Observability is not a dashboard. It's a diagnostic process.

This talk explores how to move from "something is wrong" to "here's the fix" using a systematic approach to debugging production systems.

Abstract

Every engineer has been there: alerts fire, dashboards show anomalies, but finding the actual root cause feels like searching for a needle in a haystack. We collect terabytes of metrics, logs, and traces, yet debugging still feels like guesswork.

This talk presents a structured approach to production debugging that turns observability data into actionable insights. We'll explore:

Why dashboards alone aren't enough
The three questions every debugging session should answer
How to correlate signals across metrics, logs, and traces
Real-world examples of debugging complex distributed systems

What You'll Learn

By the end of this talk, you'll have a mental framework for approaching any production incident, along with practical techniques for using your observability stack more effectively.

Outline

The Problem with Dashboards - Why visualization isn't investigation
The Diagnostic Mindset - Thinking like a detective
Signal Correlation - Connecting metrics, logs, and traces
Case Studies - Real debugging sessions from production systems
Building Better Alerts - From symptoms to causes

Target Audience

This talk is for anyone who has ever stared at a dashboard wondering "why is this happening?" Whether you're debugging your first production incident or your hundredth, you'll find practical techniques to add to your toolkit.