Description
libp2p currently collects a few metrics (for example, for TCP and QUIC). We also collect metrics for the resource manager, however, that is currently left as a responsibility to the application. This is how we end up with slightly different implementations (and metrics names) in go-ipfs and lotus.
In the (very near!) future, we want to expose more metrics from the swarm regarding transports, security protocols and muxers. This would have allowed us to detect the muxer prioritisation bug (ipfs/kubo#8750) a lot earlier.
We therefore should have a coherent libp2p metrics story.
Open questions:
- In the past, we debated if the metrics code should live within the respective repository, or if that code should live separately and hook in via tracers. Note that this issue mostly goes away once we make progress with our repo consolidation (and that's what we should base our designs on).
- Some people have expressed concerns introducing an OpenCensus / Prometheus dependency into the default build introduces too much bloat. Is that a valid concern? Does it really add too much overhead? If so, would it make sense to hide all metrics collection across our code base behind a build flag?
- OpenCensus vs. Prometheus: In the past, we've used both tools, very inconsistently. If I understand correctly, OpenCensus is essentially another abstraction layer between the tracer and Prometheus, and would in theory allow the usage of other tools than Prometheus, although it might be debatable how much we care about that.
Thoughts, @vyzo @aschmahmann @mxinden @Stebalien @lanzafame?
Tracking the various components we want to instrument:
- Resource Manager
- Swarm: swarm: minimal set of metrics #1910
- AutoNAT: autonat: expose metrics #2017
- Identify: identify: expose metrics #2019
- Relay service: relay service: expose metrics #2018
- Eventbus: eventbus: expose metrics #2020
- Autorelay: autorelay: expose metrics #2181
- Hole puncher: holepunching: expose metrics #2103
Supporting work (need to close this issue):
- circuitv2: relaysvc metrics track client disconnects #2180
- Add example for metrics and dashboard #2232
- metrics in go-libp2p blog#69
- Showcase metrics dashboard on website website#175
defer:
Metadata
Assignees
Labels
Type
Projects
Status
🥞 Todo