-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prometheus /metrics
support
#420
Comments
Hey, I think Prometheus support is definitely something that should be on my roadmap. Looking into it a bit it looks like the major metric types are: https://prometheus.io/docs/concepts/metric_types/#metric-types I think that it'd make sense for me to export counters for each repo with names e.g.
And similar for each plan i.e.
etc. |
Started work on Prometheus metrics in https://github.com/garethgeorge/backrest/pull/459/files Added metrics:
These have a number of dimensions, typically "repo_id" and "plan_id" at least, but also "task_type" and "status" for the task level metrics. I've not actually setup prometheus for any of my machines before, interested to hear from anyone with background setting up dashboards on how this will be to work with / whether this is a good setup? |
In my work, we usually just use a library called micrometer (in Java), so can't tell much about it, unfortunately 😅 But if you build and push a container with a test version to the registry, I can try to build a dashboard in grafana and share my experience :) Btw, I don't see a metric for the compression ratio or is it calculated via "bytes_added" and "bytes_processed"? Also, from the name of the metrics, it looks like all metrics only in the particular backup (snapshot?) level or it's just my misinterpretation and all of them are on repository level? |
Hey, nothing added yet for repo level stats -- that's something that can definitely be expanded on but because of how restic works, stats are only computed infrequently at the moment (each time a prune runs stats are computed if its been 30 days since the last stats check). At the moment all of the metrics are exported at the plan level. I'll probably need to spend some time setting up prometheus and actually prototyping some dashboards to get a sense of what this will look like. |
The CI system provides preview builds e.g. with prometheus support https://github.com/garethgeorge/backrest/actions/runs/10754697192 , but they aren't dockerized so local testing means either direct install OR swapping the binary in the image with a short Dockerfile ! |
Initial prometheus metrics support went out in 1.5.0 , docs aren't written up yet / this is largely a preview as I may rename or redefine a few of these. Definitions can be found in the PR #459 and may change in the next release. Note: normal authentication applies to the /metrics endpoint so you'll want to disable auth to use this feature. |
I don't fully understand which values are supposed to be shown by these metrics. The first panel doesn't seem to show the accurate repo size, and the second one I don't understand what I suppose to get from it 😵💫 |
More details about my setup I shared in one of other issues: #457 The only difference since then, the version is |
Thanks for experimenting with this -- appreciate it and love seeing the graphs. Looks like I made a few mistakes when defining metrics -- it seems like a lot of the metrics I exported are sums, but a gauge seems like it would be more appropriate for bytes added, bytes processed, and task duration based on how it's appearing in charts. I also agree that some more metrics need exporting. It would make sense for Backrest to export the stats that it collects whenever executing a stats operation (the caveat I'd add here is that I'm not sure how likely it is that they'll get scraped reliably? stats is a very infrequently run operation). |
Found this project that also relies on the https://github.com/netinvent/npbackup?tab=readme-ov-file#monitoring |
Hi @MrModest, I'm struggeling setting up an alarm with Prometheus, based on the exposed metrics. I thought I can use the Could you help me out here? P.S.: Would you be able to share your Grafana dashboards? |
Hi @nlsrchtr I don't have any alerts so far. Just dashboards: https://gist.github.com/MrModest/3dd90ed388456886e09e6c18fb6a358f But, TBH, I haven't checked it for a long time, so I wouldn't say that they make any sense :D For example, the 1st one is definitely lying (if compare to screenshot from backrest itself): I hope that @garethgeorge will add a gauge metrics, so it will be easier to monitor. I don't see much value in counters in this scenario :D |
Hey, prometheus metrics definitely are a feature that still need some love. Main blocker at the moment is I just haven't had time to setup a prometheus install on my system to create my own configuration and iterate on making the exported data more useful. Perhaps I can find some time soon to borrow @MrModest 's configuration and mess with this. I'm fairly heads down on #562 when I have time for backrest work, so I'd also be very happy to take PRs on the prometheus front if there's something you can pinpoint that needs changing about how backrest exports its metrics. Metrics are defined in backrest/internal/metric/metric.go Lines 1 to 84 in a1e3a70
This approach is pretty good for exporting info about task runs e.g. backups, forgets, etc. But it's harder for infrequent operations i.e. prune or stats commands. |
Hello ! I'm also trying to put some alerts and do some grafana graph with the metrics. (That's why I make de PR #625) If you're open to this change I can make the PR. |
Thank you for the app! The WebUI looks nice and straightforward! I like it!
Is your feature request related to a problem? Please describe.
Even though the app provides some stats in the WebUI itself, it would be nice to be able to fetch metrics and configure custom dashboards in Grafana.
For example, I'd love to create a dashboard (based on these metrics) that shows a list of snapshots/repositories with info like last timestamp, original backup size, size after deduplication and compression, saved space ratio.
Or Timeseries that shows latency for executed backups or growing backup size.
The text was updated successfully, but these errors were encountered: