Closed
Description
Linkerd has metrics at the service level. These are valuable for understanding the general health of a service. Unfortunately, it is usually a specific route that is having problems. While it is possible to run top
and get some metrics today, these are gathered in real time and not stored in prometheus.
It should be possible to see per-route metrics alongside the existing per-service metrics (success rate, latency, throughput). These can improve the time to fixing issues and provide visibility into what is really happening.
User Stories
- As a service owner, I would like to see per-route metrics in my dashboards so that I can quickly see any endpoints that are operating outside my SLO.
- As a service owner, I would like to see a list of all the routes in my service and sort that list by success rate, so that I can quickly see what is currently failing.
- As a service owner, I would like to have per-route metrics aggregated by URL parameters such as user id, so that I can quickly see what code path is being taken.
- As a service owner, I would like to persist per-route metrics so that I can use them to debug historical issues.