Skip to content

Telemetry signals

  • 3 pillars of observability
  • Metrics
  • Logs
  • Traces

Metrics

  • Metrics are numeric representation of data measured over intervals of time
  • A threshold is an objetive that should not be trespassed
  • Ideally the current state must not reach the threshold

  • 4 golden signals

  • latency: time to serve a request
  • traffic: requests/s
  • error: error rate of requests
  • saturation: fullness of a service

  • Common metrics

  • CPU, Memory, Disk Usage
# Is my service up and/or scrapeable?
absent(up{kubernetes_name+"doccserver"}) or
sum(up{kubernetes_name="doccserver"})
== 0

# Do I have the number of LB I expect?
sum(up{kubernetes_name="loadbalancer"}) < 3

# Is out LB at 50% capacity in terms of sessions?
max(haproxy_frontend_current_sessions / haproxy_frontend_limit_sessions)
BY (kubernetes_node_name, frontend) * 100
> 50

# Are 50% of tests taking longer than 10min?
max(test_duration_seconds{quantile="0.5", result="pass"})
BY (test_name)
> 600

Logging

  • Log is a immutable, timestamped record of discrete events that happened over time

Tracing

  • Trace is a representation of a series of causally related distributed events
  • Shows the end-to-end request flow through a distributed system