Observability in Agile Teams
28 slides
Press → or Space to advance
Press F to toggle fullscreen
Observability in Agile Teams
From Signals to Decisions
Workshop · 21.05.2026
Mateusz Grużewski Assistant Professor, West Pomeranian University of Technology in Szczecin
Principal Software Architect, Asseco Data Systems
Three Weeks Ago
We talked about why observability matters.
Today we go deeper into what it actually is — and how it changes the way Agile teams work.
What You'll Leave With Today
By 14:45 you will:
- •understand what metrics, logs, and traces really are
- •know which signal answers which question
- •have diagnosed real problems in a real system, yourself
- •see how this connects to the Agile work you've studied this week
A Quick Reminder
You've all seen this picture before.
✅ CPU: normal
✅ Memory: normal
✅ Error rate: 0%
✅ Alerts: silent
"Something is wrong with the app. The price shows zero. Customers are confused."
— your support team, 9:47 AM
Dashboards confirm symptoms. They rarely reveal causes.
Today We Look Inside Each Signal
Three pillars. We'll examine each one separately.
What it is. What it can do. What it cannot.
By the end you'll know not just the names — but when to reach for which.
Pillar 1: Metrics
What a Metric Actually Is
A number that changes over time.
That's it.
http_request_duration_seconds = 0.234
active_users = 1847
error_rate = 0.02
Aggregated across many events. Cheap to store. Cheap to query.
What Metrics Are Good At
Trends — "error rate has been rising for 30 minutes"
Comparisons — "p95 latency this week vs last week"
Alerts — "page someone if errors exceed 5%"
Capacity planning — "we're at 70% of disk usage, growing 5% per week"
Metrics are the early warning system.
What Metrics Cannot Do
They tell you something is wrong.
They cannot tell you:
- •which user was affected
- •why a specific request failed
- •what the exact error message was
Metrics aggregate. Aggregation hides individuals.
Metrics in an Agile Team
Every iteration produces metrics that inform the next one.
- •Did latency improve after our last sprint's optimization?
- •Is the new feature actually used? How often?
- •Are we delivering faster, or just shipping more bugs?
- •DORA metrics: deployment frequency, lead time, change failure rate, MTTR
Without metrics, retrospectives are opinions. With metrics, retrospectives are evidence.
Pillar 2: Logs
What a Log Actually Is
An event that happened, with a timestamp and a description.
2026-04-25 14:32:15 [ERROR] payment failed for user=42 reason=timeout
2026-04-25 14:32:16 [INFO] retry scheduled in 5s
One event = one log entry. Specific, not aggregated.
What Logs Are Good At
Specific incidents — "what happened to user 12345 at 10:23?"
Context — full error messages, stack traces, request bodies
Audit trail — who did what, when
Debugging — when you know roughly what to search for
Logs are the eyewitness account.
What Logs Cannot Do
Show you trends without aggregation.
Tell you about errors that weren't logged.
Easily connect across services — unless they share a traceId.
Logs only show what the developer chose to log. Silence in logs does not mean absence of problems.
Logs in an Agile Team
After every release, production becomes input for the next sprint.
- •Bug reports become reproducible — "here's the log line where it broke"
- •Postmortems become factual — "here's the exact sequence of events"
- •User-reported issues stop being mysteries
Without logs, the team builds blind. With logs, every incident becomes a learning opportunity.
Pillar 3: Traces
What a Trace Actually Is
The complete journey of a single request through your system.
Made up of spans — each one is "this service did this operation, this took that long."
[INSERT VISUAL: simple waterfall diagram showing 4-5 spans, e.g.: HTTP request [████████████████] 200ms └── auth check [██] 15ms └── database query [████████] 80ms └── external API call [██████] 60ms ]
A trace is a causal graph. It shows what called what, in what order, and how long each part took.
What Traces Are Good At
Cross-service debugging — "which of our 12 microservices is slow?"
Bottleneck analysis — "this request took 2 seconds: where did the time go?"
System topology — "oh, I didn't realize that service even called this one"
Span events — structured details about what happened, attached to a specific operation
Traces show the system as it actually behaves, not as the diagram on the wall.
What Traces Cannot Do
Show you aggregate behavior — use metrics for that
Capture every request — usually sampled at 1% or 10%
Help you if your services aren't instrumented — no spans, no traces
Traces are deep, not wide. One trace = one request, deeply.
Traces in an Agile Team
Microservices are easier to understand when you can read traces.
- •New team members onboard faster — "watch the trace, learn the system"
- •Architecture diagrams stop lying — traces show real dependencies
- •Performance issues become visible — not theoretical
Traces turn distributed systems from abstract to observable.
Three Signals, Three Strengths
Metrics what and how much Wide. Cheap. Fast.
Logs who and exactly what Specific. Detailed. Sometimes silent.
Traces where and how Deep. Causal. Sampled.
No single signal is enough. The skill is knowing which to use when.
One Investigation, Three Signals
A single bug. Three different views.
- •Metrics: "GetProduct error rate jumped from 0% to 100%"
- •Logs: "only a few unrelated errors logged"
- •Traces: "every failed request involves product ID OLJCESPC7Z"
Each signal showed a different part of the truth. Combined, they told the whole story.
How This Connects to Agile
Observability is not a DevOps tool.
It is the empirical layer that Agile assumes you have.
Every Agile principle — inspect and adapt, fast feedback, learning over guessing — needs real data from production.
Without observability, you're inspecting nothing. With it, every release becomes a hypothesis you can test.
The System You'll Work With
OpenTelemetry Demo — a microservices application built to teach observability.
[INSERT VISUAL: architecture diagram of OpenTelemetry Demo showing main services]
You can find a good architecture diagram here: https://opentelemetry.io/docs/demo/architecture/
I'd recommend showing a simplified version with:
- •Frontend (web shop)
- •Cart, Checkout, Payment
- •Product Catalog, Recommendation
- •And the observability stack: Grafana, Jaeger, OpenSearch
Your Tools Today
[INSERT SCREENSHOT: Grafana APM dashboard with normal traffic]
Grafana — dashboards for metrics, search interface for logs
[INSERT SCREENSHOT: Jaeger trace waterfall view]
Jaeger — distributed traces and span events
[INSERT SCREENSHOT: Feature flag UI]
Feature Flag UI — to inject failures into the system
All accessible at: workshop.gruzewski.dev
What Happens Next
Right after this lecture (with a short break):
I'll show you one scenario, end-to-end. You watch.
Then:
You split into groups. Each group gets a scenario card.
You diagnose. You report what you found.
By 14:45:
You'll have done this 2–3 times. You'll know how it feels.
A Final Thought Before We Begin
You don't need to be an expert in software engineering to do this.
You need to ask good questions and read evidence.
That's what we'll practice.
Observability is a way of thinking, more than a set of tools.
Questions?
Anything before we move into the demo?