What we learned from watching our own logs

Most teams treat logging as an afterthought. Something you add when a bug is hard to reproduce, or when a production incident forces you to answer the question: “what happened at 3:47 AM?” We used to think about it the same way. Then we started reading our own logs regularly, not during incidents, but as a daily habit. What we found changed how we build things.

The gap between intention and execution

Code expresses what we intend to happen. Logs show what actually happens. The distance between those two things is often wider than anyone expects.

We noticed this first with error handling. A function would catch an exception, log a generic message like “operation failed,” and continue. The code looked correct. The tests passed. But in production, that catch block was swallowing connection timeouts, malformed responses, and permission errors, all under the same uninformative label. The log told us something was wrong. It just did not tell us what.

This led to our first rule: every log line should answer “what happened” and “why it matters” without needing to read the source code. A message like token refresh failed: 401 from auth service, will retry in 30s is useful at 3 AM. A message like error in refreshToken is not.

Structured logs changed how we think

Switching from plain text logs to structured JSON was a mechanical change. We expected it to make searching easier. What we did not expect was how it changed the way we think about what information flows through the system.

When every log entry is a set of key-value pairs, you start asking different questions. Not just “did this work” but “how long did it take,” “which tenant triggered it,” “what was the request size.” Adding a field to a structured log is trivial, so we started adding context that would have felt excessive in a plain text message.

{
  "event": "task_checkout",
  "issueId": "abc-123",
  "agentId": "devops-01",
  "durationMs": 340,
  "attempt": 2,
  "status": "success"
}

Over time, these fields became a secondary schema for our system. Not the database schema, which describes what we store, but an operational schema that describes what we do. We can query for all checkout attempts over 500ms, or all retries by a specific agent, without ever opening the code.

Alerts are opinions about what matters

Setting up alerting forced us to articulate what we actually care about. It is easy to say “we care about uptime” in the abstract. Turning that into concrete alert rules requires precision. Does a single failed health check matter, or only three in a row? Is a 2-second response time acceptable for a batch job but not for an API call? At what error rate do we wake someone up?

Every alert threshold is an opinion about the system’s behavior. We learned to treat alert configuration with the same seriousness as application code, because a poorly tuned alert is worse than no alert. False positives train people to ignore notifications. False negatives let real problems grow silently.

Our approach now is to start with zero alerts on a new service and add them only when we can articulate the sentence: “If this condition is true, someone needs to act within X minutes.” If we cannot fill in the blanks, we are not ready for an alert. We might be ready for a dashboard.

Dashboards tell stories

A log line is a fact. A dashboard is a narrative. The act of deciding which metrics to put on a dashboard is an exercise in understanding what the system’s story should look like when things are healthy.

We build dashboards around user-facing outcomes, not internal mechanics. “Successful task completions per hour” is more useful than “database query count.” The internal metrics still exist for debugging, but the top-level view should reflect what the system is for, not how it works.

One pattern that has served us well is the “yesterday vs. today” overlay. Comparing today’s traffic shape to yesterday’s makes anomalies immediately visible. A spike that looks alarming in isolation might be perfectly normal when you see it happens every day at the same time. A flat line that looks fine might be alarming when yesterday’s chart shows a peak that is missing today.

The feedback loop nobody talks about

The most valuable thing about good observability is not faster incident response. It is the feedback loop it creates between operations and development.

When we can see how code behaves in production, we write different code. We add context to error messages because we know we will be reading them later. We add timing around slow operations because we know the dashboard will surface them. We think about failure modes earlier, not because we are paranoid, but because we have seen what neglected failure modes look like in the logs at 3 AM.

This feedback loop is quiet. It does not show up in sprint metrics or architecture diagrams. But it is one of the most reliable ways we have found to improve code quality over time. The systems we observe closely are the systems we understand best, and the systems we understand best are the ones that break the least.

Investing in observability before you need it feels like overhead. After you have needed it once, it feels like the bare minimum.

What we learned from watching our own logs

The gap between intention and execution

Structured logs changed how we think

Alerts are opinions about what matters

Dashboards tell stories

The feedback loop nobody talks about

More from the team

How we made our deploys safe to interrupt

What the first agentic ransomware actually ran on