Observability in Agile Teams

28 slides

Press → or Space to advance

Press F to toggle fullscreen

Back to talk details

Observability in Agile Teams

From Signals to Decisions

Workshop · 21.05.2026

Mateusz Grużewski Assistant Professor, West Pomeranian University of Technology in Szczecin

Principal Software Architect, Asseco Data Systems

Three Weeks Ago

We talked about why observability matters.

Today we go deeper into what it actually is — and how it changes the way Agile teams work.

What You'll Leave With Today

By 14:45 you will:

•understand what metrics, logs, and traces really are
•know which signal answers which question
•have diagnosed real problems in a real system, yourself
•see how this connects to the Agile work you've studied this week

A Quick Reminder

You've all seen this picture before.

✅ CPU: normal

✅ Memory: normal

✅ Error rate: 0%

✅ Alerts: silent

"Something is wrong with the app. The price shows zero. Customers are confused."

— your support team, 9:47 AM

Dashboards confirm symptoms. They rarely reveal causes.

Today We Look Inside Each Signal

Three pillars. We'll examine each one separately.

What it is. What it can do. What it cannot.

By the end you'll know not just the names — but when to reach for which.

Pillar 1: Metrics

What a Metric Actually Is

A number that changes over time.

That's it.

http_request_duration_seconds = 0.234

active_users = 1847

error_rate = 0.02

Aggregated across many events. Cheap to store. Cheap to query.

What Metrics Are Good At

Trends — "error rate has been rising for 30 minutes"

Comparisons — "p95 latency this week vs last week"

Alerts — "page someone if errors exceed 5%"

Capacity planning — "we're at 70% of disk usage, growing 5% per week"

Metrics are the early warning system.

What Metrics Cannot Do

They tell you something is wrong.

They cannot tell you:

•which user was affected
•why a specific request failed
•what the exact error message was

Metrics aggregate. Aggregation hides individuals.

Metrics in an Agile Team

Every iteration produces metrics that inform the next one.

•Did latency improve after our last sprint's optimization?
•Is the new feature actually used? How often?
•Are we delivering faster, or just shipping more bugs?
•DORA metrics: deployment frequency, lead time, change failure rate, MTTR

Without metrics, retrospectives are opinions. With metrics, retrospectives are evidence.

Pillar 2: Logs

What a Log Actually Is

An event that happened, with a timestamp and a description.

2026-04-25 14:32:15 [ERROR] payment failed for user=42 reason=timeout

2026-04-25 14:32:16 [INFO] retry scheduled in 5s

One event = one log entry. Specific, not aggregated.

What Logs Are Good At

Specific incidents — "what happened to user 12345 at 10:23?"

Context — full error messages, stack traces, request bodies

Audit trail — who did what, when

Debugging — when you know roughly what to search for

Logs are the eyewitness account.

What Logs Cannot Do

Show you trends without aggregation.

Tell you about errors that weren't logged.

Easily connect across services — unless they share a traceId.

Logs only show what the developer chose to log. Silence in logs does not mean absence of problems.

Logs in an Agile Team

After every release, production becomes input for the next sprint.

•Bug reports become reproducible — "here's the log line where it broke"
•Postmortems become factual — "here's the exact sequence of events"
•User-reported issues stop being mysteries

Without logs, the team builds blind. With logs, every incident becomes a learning opportunity.

Pillar 3: Traces

What a Trace Actually Is

The complete journey of a single request through your system.

Made up of spans — each one is "this service did this operation, this took that long."

[INSERT VISUAL: simple waterfall diagram showing 4-5 spans, e.g.: HTTP request [████████████████] 200ms └── auth check [██] 15ms └── database query [████████] 80ms └── external API call [██████] 60ms ]

A trace is a causal graph. It shows what called what, in what order, and how long each part took.

What Traces Are Good At

Cross-service debugging — "which of our 12 microservices is slow?"

Bottleneck analysis — "this request took 2 seconds: where did the time go?"

System topology — "oh, I didn't realize that service even called this one"

Span events — structured details about what happened, attached to a specific operation

Traces show the system as it actually behaves, not as the diagram on the wall.

What Traces Cannot Do

Show you aggregate behavior — use metrics for that

Capture every request — usually sampled at 1% or 10%

Help you if your services aren't instrumented — no spans, no traces

Traces are deep, not wide. One trace = one request, deeply.

Traces in an Agile Team

Microservices are easier to understand when you can read traces.

•New team members onboard faster — "watch the trace, learn the system"
•Architecture diagrams stop lying — traces show real dependencies
•Performance issues become visible — not theoretical

Traces turn distributed systems from abstract to observable.

Three Signals, Three Strengths

Metrics what and how much Wide. Cheap. Fast.

Logs who and exactly what Specific. Detailed. Sometimes silent.

Traces where and how Deep. Causal. Sampled.

No single signal is enough. The skill is knowing which to use when.

One Investigation, Three Signals

A single bug. Three different views.

•Metrics: "GetProduct error rate jumped from 0% to 100%"
•Logs: "only a few unrelated errors logged"
•Traces: "every failed request involves product ID OLJCESPC7Z"

Each signal showed a different part of the truth. Combined, they told the whole story.

How This Connects to Agile

Observability is not a DevOps tool.

It is the empirical layer that Agile assumes you have.

Every Agile principle — inspect and adapt, fast feedback, learning over guessing — needs real data from production.

Without observability, you're inspecting nothing. With it, every release becomes a hypothesis you can test.

The System You'll Work With

OpenTelemetry Demo — a microservices application built to teach observability.

[INSERT VISUAL: architecture diagram of OpenTelemetry Demo showing main services]

You can find a good architecture diagram here: https://opentelemetry.io/docs/demo/architecture/

I'd recommend showing a simplified version with:

•Frontend (web shop)
•Cart, Checkout, Payment
•Product Catalog, Recommendation
•And the observability stack: Grafana, Jaeger, OpenSearch

Your Tools Today

[INSERT SCREENSHOT: Grafana APM dashboard with normal traffic]

Grafana — dashboards for metrics, search interface for logs

[INSERT SCREENSHOT: Jaeger trace waterfall view]

Jaeger — distributed traces and span events

[INSERT SCREENSHOT: Feature flag UI]

Feature Flag UI — to inject failures into the system

All accessible at: workshop.gruzewski.dev

What Happens Next

Right after this lecture (with a short break):

I'll show you one scenario, end-to-end. You watch.

Then:

You split into groups. Each group gets a scenario card.

You diagnose. You report what you found.

By 14:45:

You'll have done this 2–3 times. You'll know how it feels.

A Final Thought Before We Begin

You don't need to be an expert in software engineering to do this.

You need to ask good questions and read evidence.

That's what we'll practice.

Observability is a way of thinking, more than a set of tools.

Questions?

Anything before we move into the demo?