Add Prometheus `/metrics` support #420

MrModest · 2024-08-18T16:56:11Z

Thank you for the app! The WebUI looks nice and straightforward! I like it!

Is your feature request related to a problem? Please describe.
Even though the app provides some stats in the WebUI itself, it would be nice to be able to fetch metrics and configure custom dashboards in Grafana.

For example, I'd love to create a dashboard (based on these metrics) that shows a list of snapshots/repositories with info like last timestamp, original backup size, size after deduplication and compression, saved space ratio.

Or Timeseries that shows latency for executed backups or growing backup size.

garethgeorge · 2024-08-18T19:44:20Z

Hey, I think Prometheus support is definitely something that should be on my roadmap.

Looking into it a bit it looks like the major metric types are: https://prometheus.io/docs/concepts/metric_types/#metric-types

I think that it'd make sense for me to export counters for each repo with names e.g.

repo_<repo ID>_snapshot_count
repo_<repo ID>_size
etc

And similar for each plan i.e.

plan_<plan ID>_snapshot_count
plan_<plan ID>_error_count

etc.

garethgeorge · 2024-09-07T21:53:38Z

Started work on Prometheus metrics in

https://github.com/garethgeorge/backrest/pull/459/files

Added metrics:

	commonDims := []string{"repo_id", "plan_id"}

	registry := &Registry{
		reg: prometheus.NewRegistry(),
		backupBytesProcessed: prometheus.NewSummaryVec(prometheus.SummaryOpts{
			Name: "backrest_backup_bytes_processed",
			Help: "The total number of bytes processed during a backup",
		}, commonDims),
		backupBytesAdded: prometheus.NewSummaryVec(prometheus.SummaryOpts{
			Name: "backrest_backup_bytes_added",
			Help: "The total number of bytes added during a backup",
		}, commonDims),
		backupFileWarnings: prometheus.NewSummaryVec(prometheus.SummaryOpts{
			Name: "backrest_backup_file_warnings",
			Help: "The total number of file warnings during a backup",
		}, commonDims),
		tasksDuration: prometheus.NewSummaryVec(prometheus.SummaryOpts{
			Name: "backrest_tasks_duration_secs",
			Help: "The duration of a task in seconds",
		}, append(slices.Clone(commonDims), "task_type")),
		tasksRun: prometheus.NewCounterVec(prometheus.CounterOpts{
			Name: "backrest_tasks_run_total",
			Help: "The total number of tasks run",
		}, append(slices.Clone(commonDims), "task_type", "status")),
		tasksErrors: prometheus.NewCounterVec(prometheus.CounterOpts{
			Name: "backrest_tasks_errors_total",
			Help: "The total number of tasks that errored",
		}, append(slices.Clone(commonDims), "task_type")),
	}

These have a number of dimensions, typically "repo_id" and "plan_id" at least, but also "task_type" and "status" for the task level metrics.

I've not actually setup prometheus for any of my machines before, interested to hear from anyone with background setting up dashboards on how this will be to work with / whether this is a good setup?

MrModest · 2024-09-08T08:47:21Z

In my work, we usually just use a library called micrometer (in Java), so can't tell much about it, unfortunately 😅

But if you build and push a container with a test version to the registry, I can try to build a dashboard in grafana and share my experience :)

Btw, I don't see a metric for the compression ratio or is it calculated via "bytes_added" and "bytes_processed"?

Also, from the name of the metrics, it looks like all metrics only in the particular backup (snapshot?) level or it's just my misinterpretation and all of them are on repository level?

garethgeorge · 2024-09-09T07:43:49Z

Hey, nothing added yet for repo level stats -- that's something that can definitely be expanded on but because of how restic works, stats are only computed infrequently at the moment (each time a prune runs stats are computed if its been 30 days since the last stats check).

At the moment all of the metrics are exported at the plan level. I'll probably need to spend some time setting up prometheus and actually prototyping some dashboards to get a sense of what this will look like.

garethgeorge · 2024-09-09T07:45:30Z

The CI system provides preview builds e.g. with prometheus support https://github.com/garethgeorge/backrest/actions/runs/10754697192 , but they aren't dockerized so local testing means either direct install OR swapping the binary in the image with a short Dockerfile !

garethgeorge · 2024-09-13T08:49:52Z

Initial prometheus metrics support went out in 1.5.0 , docs aren't written up yet / this is largely a preview as I may rename or redefine a few of these. Definitions can be found in the PR #459 and may change in the next release.

Note: normal authentication applies to the /metrics endpoint so you'll want to disable auth to use this feature.

MrModest · 2024-09-21T14:46:42Z

Sorry for the long reply. Sometimes it's very hard to find a free time :D
And thank you for the test version in the docker hub.

I toyed the test version and here're my findings.

The size of the backing up folders:
- /home - 605.4 MiB
- /mnt/pools/fast/apps-data - 6.1 GiB
- /mnt/pools/slow/backups/db_dumps - 3.1 GiB
- So, the total is 6.1 + 3.1 + (605.4/1024) ~ 9.79 GiB
The size of my local repo is 9.2 GiB.

(All size measurements made with ncdu v1.15.1)

Some info from the restic CLI from inside the container:

backrest:/# restic-0.17.0 snapshots -r /repos/main
repository e4fc16be opened (version 2, compression level auto)
ID        Time                 Host        Tags                                           Paths                                    Size
--------------------------------------------------------------------------------------------------------------------------------------------
7764c799  2024-08-18 19:25:38  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             2.113 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

0bc578c9  2024-08-31 09:32:18  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             1.460 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

03837cf1  2024-09-01 11:58:19  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             1.748 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

bccbf5ea  2024-09-08 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             3.943 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

f5a95a68  2024-09-12 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             5.509 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

59a55178  2024-09-13 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             5.907 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

cbd8d84c  2024-09-14 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             6.316 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

8ddf324e  2024-09-15 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             6.707 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

8e115dcb  2024-09-16 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             7.110 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

ad4fbbc9  2024-09-17 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             7.508 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

158fa862  2024-09-18 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             7.903 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

f57dac83  2024-09-19 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             8.299 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

4b990bfe  2024-09-20 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             8.696 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps

e60c8e56  2024-09-21 02:00:01  backrest    plan:main__local__daily,created-by:HomeServer  /hostfs/home                             8.799 GiB
                                                                                          /hostfs/mnt/pools/fast/apps-data
                                                                                          /hostfs/mnt/pools/slow/backups/db_dumps
--------------------------------------------------------------------------------------------------------------------------------------------
14 snapshots

backrest:/# restic-0.17.0 stats -r /repos/main
repository e4fc16be opened (version 2, compression level auto)
[0:00] 100.00%  1 / 1 index files loaded
scanning...
Stats in restore-size mode:
     Snapshots processed:  14
        Total File Count:  78315
              Total Size:  82.017 GiB

Metric browser in Grafana shows these metrics:

I tried to create several panels:

For backrest_backup_bytes_added_sum

For backrest_backup_bytes_processed_sum

For backrest_tasks_duration_secs_sum

MrModest · 2024-09-21T14:48:28Z

I don't fully understand which values are supposed to be shown by these metrics. The first panel doesn't seem to show the accurate repo size, and the second one I don't understand what I suppose to get from it 😵‍💫

MrModest · 2024-09-21T14:51:12Z

Almost forgot, the stats from Backrest's UI:

main__local repo

main__mailru-webdav repo

MrModest · 2024-09-21T14:52:18Z

More details about my setup I shared in one of other issues: #457

The only difference since then, the version is v1.5.0

garethgeorge · 2024-09-26T03:02:51Z

Thanks for experimenting with this -- appreciate it and love seeing the graphs.

Looks like I made a few mistakes when defining metrics -- it seems like a lot of the metrics I exported are sums, but a gauge seems like it would be more appropriate for bytes added, bytes processed, and task duration based on how it's appearing in charts.

I also agree that some more metrics need exporting. It would make sense for Backrest to export the stats that it collects whenever executing a stats operation (the caveat I'd add here is that I'm not sure how likely it is that they'll get scraped reliably? stats is a very infrequently run operation).

MrModest · 2024-10-25T10:33:32Z

Found this project that also relies on the restic and implements Prometheus metrics. Maybe you can learn something from their source code or get some inspirations for metrics?

https://github.com/netinvent/npbackup?tab=readme-ov-file#monitoring

nlsrchtr · 2024-12-11T16:31:34Z

Hi @MrModest,

I'm struggeling setting up an alarm with Prometheus, based on the exposed metrics. I thought I can use the backrest_backup_file_warnings metric, but don't have a good idea. Since the backups could shrink over time as well, the size of the backup doesn't seem to be a good indicator for me.

Could you help me out here?

P.S.: Would you be able to share your Grafana dashboards?

MrModest · 2024-12-11T16:40:29Z

Hi @nlsrchtr I don't have any alerts so far. Just dashboards: https://gist.github.com/MrModest/3dd90ed388456886e09e6c18fb6a358f

But, TBH, I haven't checked it for a long time, so I wouldn't say that they make any sense :D

For example, the 1st one is definitely lying (if compare to screenshot from backrest itself):

I hope that @garethgeorge will add a gauge metrics, so it will be easier to monitor. I don't see much value in counters in this scenario :D

garethgeorge · 2024-12-11T23:16:16Z

Hey, prometheus metrics definitely are a feature that still need some love. Main blocker at the moment is I just haven't had time to setup a prometheus install on my system to create my own configuration and iterate on making the exported data more useful.

Perhaps I can find some time soon to borrow @MrModest 's configuration and mess with this. I'm fairly heads down on #562 when I have time for backrest work, so I'd also be very happy to take PRs on the prometheus front if there's something you can pinpoint that needs changing about how backrest exports its metrics.

Metrics are defined in

backrest/internal/metric/metric.go

Lines 1 to 84 in a1e3a70

    
           package metric 
        
           import ( 
        
           	"net/http" 
        
           	"slices" 
        
           	"github.com/prometheus/client_golang/prometheus" 
        
           	"github.com/prometheus/client_golang/prometheus/promhttp" 
        
           ) 
        
           var ( 
        
           	globalRegistry = initRegistry() 
        
           ) 
        
           func initRegistry() *Registry { 
        
           	commonDims := []string{"repo_id", "plan_id"} 
        
           	registry := &Registry{ 
        
           		reg: prometheus.NewRegistry(), 
        
           		backupBytesProcessed: prometheus.NewSummaryVec(prometheus.SummaryOpts{ 
        
           			Name: "backrest_backup_bytes_processed", 
        
           			Help: "The total number of bytes processed during a backup", 
        
           		}, commonDims), 
        
           		backupBytesAdded: prometheus.NewSummaryVec(prometheus.SummaryOpts{ 
        
           			Name: "backrest_backup_bytes_added", 
        
           			Help: "The total number of bytes added during a backup", 
        
           		}, commonDims), 
        
           		backupFileWarnings: prometheus.NewSummaryVec(prometheus.SummaryOpts{ 
        
           			Name: "backrest_backup_file_warnings", 
        
           			Help: "The total number of file warnings during a backup", 
        
           		}, commonDims), 
        
           		tasksDuration: prometheus.NewSummaryVec(prometheus.SummaryOpts{ 
        
           			Name: "backrest_tasks_duration_secs", 
        
           			Help: "The duration of a task in seconds", 
        
           		}, append(slices.Clone(commonDims), "task_type")), 
        
           		tasksRun: prometheus.NewCounterVec(prometheus.CounterOpts{ 
        
           			Name: "backrest_tasks_run_total", 
        
           			Help: "The total number of tasks run", 
        
           		}, append(slices.Clone(commonDims), "task_type", "status")), 
        
           	} 
        
           	registry.reg.MustRegister(registry.backupBytesProcessed) 
        
           	registry.reg.MustRegister(registry.backupBytesAdded) 
        
           	registry.reg.MustRegister(registry.backupFileWarnings) 
        
           	registry.reg.MustRegister(registry.tasksDuration) 
        
           	registry.reg.MustRegister(registry.tasksRun) 
        
           	return registry 
        
           } 
        
           func GetRegistry() *Registry { 
        
           	return globalRegistry 
        
           } 
        
           type Registry struct { 
        
           	reg                  *prometheus.Registry 
        
           	backupBytesProcessed *prometheus.SummaryVec 
        
           	backupBytesAdded     *prometheus.SummaryVec 
        
           	backupFileWarnings   *prometheus.SummaryVec 
        
           	tasksDuration        *prometheus.SummaryVec 
        
           	tasksRun             *prometheus.CounterVec 
        
           } 
        
           func (r *Registry) Handler() http.Handler { 
        
           	return promhttp.HandlerFor(r.reg, promhttp.HandlerOpts{}) 
        
           } 
        
           func (r *Registry) RecordTaskRun(repoID, planID, taskType string, duration_secs float64, status string) { 
        
           	if repoID == "" { 
        
           		repoID = "_unassociated_" 
        
           	} 
        
           	if planID == "" { 
        
           		planID = "_unassociated_" 
        
           	} 
        
           	r.tasksRun.WithLabelValues(repoID, planID, taskType, status).Inc() 
        
           	r.tasksDuration.WithLabelValues(repoID, planID, taskType).Observe(duration_secs) 
        
           } 
        
           func (r *Registry) RecordBackupSummary(repoID, planID string, bytesProcessed, bytesAdded int64, fileWarnings int64) { 
        
           	r.backupBytesProcessed.WithLabelValues(repoID, planID).Observe(float64(bytesProcessed)) 
        
           	r.backupBytesAdded.WithLabelValues(repoID, planID).Observe(float64(bytesAdded)) 
        
           	r.backupFileWarnings.WithLabelValues(repoID, planID).Observe(float64(fileWarnings)) 
        
           }

and types can be tweaked easily -- I wouldn't consider the prometheus metrics to be stable yet so breaking changes here are fine.

This approach is pretty good for exporting info about task runs e.g. backups, forgets, etc. But it's harder for infrequent operations i.e. prune or stats commands.

titilambert · 2025-01-13T08:34:53Z

Hello ! I'm also trying to put some alerts and do some grafana graph with the metrics. (That's why I make de PR #625)
But I have found another issue. So I went across the metric.go and I was wondering why you choose to use SummaryVec instead of GaugeVec. Reading this doc, https://prometheus.io/docs/tutorials/understanding_metric_types/, I would use Gauge. And Update the value for each task.
Then you will have an exact state of your backup.

If you're open to this change I can make the PR.
Thanks for your response @garethgeorge

MrModest added the enhancement New feature or request label Aug 18, 2024

garethgeorge added the p2 label Aug 18, 2024

garethgeorge added the help wanted Extra attention is needed label Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prometheus `/metrics` support #420

Add Prometheus `/metrics` support #420

MrModest commented Aug 18, 2024 •

edited

Loading

garethgeorge commented Aug 18, 2024

garethgeorge commented Sep 7, 2024

MrModest commented Sep 8, 2024 •

edited

Loading

garethgeorge commented Sep 9, 2024

garethgeorge commented Sep 9, 2024

garethgeorge commented Sep 13, 2024 •

edited

Loading

MrModest commented Sep 21, 2024 •

edited

Loading

MrModest commented Sep 21, 2024

MrModest commented Sep 21, 2024

MrModest commented Sep 21, 2024 •

edited

Loading

garethgeorge commented Sep 26, 2024

MrModest commented Oct 25, 2024 •

edited

Loading

nlsrchtr commented Dec 11, 2024

MrModest commented Dec 11, 2024 •

edited

Loading

garethgeorge commented Dec 11, 2024

titilambert commented Jan 13, 2025

Add Prometheus /metrics support #420

Add Prometheus /metrics support #420

Comments

MrModest commented Aug 18, 2024 • edited Loading

garethgeorge commented Aug 18, 2024

garethgeorge commented Sep 7, 2024

MrModest commented Sep 8, 2024 • edited Loading

garethgeorge commented Sep 9, 2024

garethgeorge commented Sep 9, 2024

garethgeorge commented Sep 13, 2024 • edited Loading

MrModest commented Sep 21, 2024 • edited Loading

MrModest commented Sep 21, 2024

MrModest commented Sep 21, 2024

MrModest commented Sep 21, 2024 • edited Loading

garethgeorge commented Sep 26, 2024

MrModest commented Oct 25, 2024 • edited Loading

nlsrchtr commented Dec 11, 2024

MrModest commented Dec 11, 2024 • edited Loading

garethgeorge commented Dec 11, 2024

titilambert commented Jan 13, 2025

Add Prometheus `/metrics` support #420

Add Prometheus `/metrics` support #420

MrModest commented Aug 18, 2024 •

edited

Loading

MrModest commented Sep 8, 2024 •

edited

Loading

garethgeorge commented Sep 13, 2024 •

edited

Loading

MrModest commented Sep 21, 2024 •

edited

Loading

MrModest commented Sep 21, 2024 •

edited

Loading

MrModest commented Oct 25, 2024 •

edited

Loading

MrModest commented Dec 11, 2024 •

edited

Loading