Back to talks

From Metrics to Root Cause

A practical guide to debugging production systems using observability data.

KubeCon EU 2025|Poland||45 minutes
Audience: Backend engineers, SREs, Platform engineers, DevOps engineers
observabilitydebuggingproductionmetrics

Observability is not a dashboard. It's a diagnostic process.

This talk explores how to move from "something is wrong" to "here's the fix" using a systematic approach to debugging production systems.

Abstract

Every engineer has been there: alerts fire, dashboards show anomalies, but finding the actual root cause feels like searching for a needle in a haystack. We collect terabytes of metrics, logs, and traces, yet debugging still feels like guesswork.

This talk presents a structured approach to production debugging that turns observability data into actionable insights. We'll explore:

  • Why dashboards alone aren't enough
  • The three questions every debugging session should answer
  • How to correlate signals across metrics, logs, and traces
  • Real-world examples of debugging complex distributed systems

What You'll Learn

By the end of this talk, you'll have a mental framework for approaching any production incident, along with practical techniques for using your observability stack more effectively.

Outline

  1. The Problem with Dashboards - Why visualization isn't investigation
  2. The Diagnostic Mindset - Thinking like a detective
  3. Signal Correlation - Connecting metrics, logs, and traces
  4. Case Studies - Real debugging sessions from production systems
  5. Building Better Alerts - From symptoms to causes

Target Audience

This talk is for anyone who has ever stared at a dashboard wondering "why is this happening?" Whether you're debugging your first production incident or your hundredth, you'll find practical techniques to add to your toolkit.