Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: centralized backup management #68

Open
edersong opened this issue Feb 6, 2024 · 23 comments
Open

Feature: centralized backup management #68

edersong opened this issue Feb 6, 2024 · 23 comments
Assignees
Labels
enhancement New feature or request

Comments

@edersong
Copy link
Contributor

edersong commented Feb 6, 2024

For me, thr big problem with Restic is to manage, I mean, for each server it's installed, will need a different management location.
It's difficult to manage a big set of servers, so I would like to know if is there a plan for Backrest to manage all the Restic backups from a single WebUI.
It could be like Cockpit do with Linux Server, but it will be better if there is a dashbord where we can follow the latest backups results from all servers.

@garethgeorge garethgeorge added the enhancement New feature or request label Feb 6, 2024
@garethgeorge
Copy link
Owner

Really interested to see this come in as a feature request, this is something that I'm thinking about in the background (and something that backrest intends to be able to support architecturally).

Can you elaborate a bit on your use case? Do you care primarily about being able to view backup status and results in a centralized place? Or do you want to be able to manage backup configurations across a fleet of machines / perform bulk operations?

@Vatson112
Copy link

Hi @garethgeorge!

I am also interested in this feature.

I propose the same design as Bareos.

We will have:

  1. Central Server = backrest
  2. Agents on hosts we want to backup.

Our server send request to agent (authenticated by mTLS) with restic config , some pre/post-scripts. Then our client backup directrly to repository backend or we can use rest-server and backup to central server using rest protocol.

Also may be we can use agentless setup = send restic command using ssh connection. But there are caveats aka long lived ssh may be interupted by SSHD configuration.

@garethgeorge
Copy link
Owner

garethgeorge commented Feb 14, 2024

Interesting, when I'd considered this feature in the past I'd imagined something along the lines of

  • A central server that manages configuration and replicates logs from a collection of workers.
  • Each worker receives config updates from the central server when an admin user is changing settings.
  • Workers accept configuration updates and schedule operations locally
  • Each worker pushes copies of each operation log update to the central server (in addition to maintaining a local operation log such that backup history can be viewed locally on the worker only for that worker).

I'll read through the Bareos docs, do you see there as being strong advantages one way or the other w.r.t. the central server being responsible for pushing commands to each of the workers? I have some concern that the central server becomes very highly privileged if it's SSHing in and running backup operations. From an implementation perspective though, it might be very simple to just open up to running restic commands over SSH (as you mention) and to support scheduling operations in parallel such that backups can be run over multiple at the same time (constraint would likely be 1 backup per repository at a time).

@edersong
Copy link
Contributor Author

A central server that manages configuration and replicates logs from a collection of workers.
Each worker receives config updates from the central server when an admin user is changing settings.
Workers accept configuration updates and schedule operations locally
Each worker pushes copies of each operation log update to the central server (in addition to maintaining a local operation log such that backup history can be viewed locally on the worker only for that worker).

That's I desire.
Currently, I'm using UrBackup which has a centralized administration, but I think that Restic is more modern in terms of backup technologies, but don't have a centralized management yet which difficults the backup management from a farm of servers.

@Nebulosa-Cat
Copy link
Contributor

for my example, i have
1 raspberry pi, 4 vps (debian and ubuntu, x86 and armv8) need use restic backup
i use rclone create a dropbox remote and create another encrypt remote in the dropbox one
and for my many mechine, both of them have there own restic repo
so the resitc is rclone:the-encrypt-one-name:hostname
for example
rclone:abc-encrypt:raspberry-pi-backup
rclone:abc-encrypt:vps-1
rclone:abc-encrypt:vps-2
...

For my usage scenario, I hope that the Backrest I run on the Raspberry Pi is the main control device, and the ones running on other devices/VPS are clients, and all control is done through Backrest on the pi.

And in terms of UI, my backup mode is to back up once a day (retain up to 30 items), and back up once a week (never delete), so the structure I expect is like this:

Plan:
raspberry-pi
- Daily-Plan
- Week-Plan
VPS-1
- XXX
- YYY
VPS-2
- XXX
- YYY

for this kind many mechine management, i think it will need some custom folder tree that user can order there plan

@oliverjhyde
Copy link

oliverjhyde commented Apr 5, 2024

This would be a killer feature, at the moment I'm using Synology Active Backup for Business to backup and deduplicate across 12 Windows machines but there are a couple of issues:

  • Laptops with low disk space frequently error out
  • Remote machines (connection via Tailscale) sometimes drop out and the resume isn't handled particularly well resulting in a paused backup state for a couple of days before it sorts itself out
  • I have a couple of openSuse machines that ABB doesn't support (currently manually backing these up with Restic and now Backrest)

Being able to use Backrest centrally manage the backup configuration (scheduling, directories) with a status page to know if something is failing/missed its schedule for x interval would be fantastic.

Even better if the local web interface could be used to restore an older version of the file (either to a new location as current or over the original as #118 suggests making possible). Ideally if centrally managed the configuration couldn't be changed here - though this could be locked down by user account?

@garethgeorge garethgeorge changed the title Is there a plan to use this tool for centralized Restic backup management? Feature: centralized backup management Apr 11, 2024
@garethgeorge
Copy link
Owner

garethgeorge commented Apr 11, 2024

Hey all, updating this thread as it's a pretty requested feature and it's also a capability I want for my own systems -- likely looking at prototyping this in the near term

I'm investigating a few avenues for implementation

  1. As a cloud service that's user deployable (e.g. think terraform configs provided). There's some value here in that a savvy user can deploy this in a serverless model and only pay for what they use (doesn't need to be running all the time.
  2. As an always running service e.g. with a monitor and daemon model. In this model my main concern is making it easy for daemon's (possibly behind complicated firewalls) to establish connection to and also receive commands from the monitor. I'm vaguely interested to investigate whether something like http://libp2p.io is a good fit to solve some of the networking problems here re: firewall hole punching.

To provide some design details -- this will likely look something like:

  • backrest binary (now referred to as the daemon) continues to ship as it does today and will always host a local copy of it's UI
  • backrest binary will add a new --monitor-uri flag where a connection string to a monitor process can be provided. Think of this like the [docker swarm join](https://docs.docker.com/reference/cli/docker/swarm/join/) command which enroles a node in a monitor, the token will also contain credentials e.g. a shared secret w/the monitor process.
  • monitor process will receive operation log updates from various backrest daemons and will expose a new UI (shared code base) with the operation tree grouped by host name. Logically, hosts will use cryptographic unique identifiers under the hood.

I'm slightly leaning towards the daemon / monitor process model because it's more in line with the self-hosted ethos. There are also some interesting possibilities to examine in the future here e.g. centralizing some operations e.g. run backups on daemon processes (with read only credentials to repos) but run prune operations only on the trusted monitor process. I'm still thinking through what this might look like / how it'd be configured. Perhaps a concept of a meta-plan is needed to logically group plans across multiple nodes.

@edersong
Copy link
Contributor Author

Hello, @garethgeorge
Thank you for the feedback!
For the options you provided, I think that the 2. will be better because, in my case at least, I use all my services locally and will not be using a cloud service just for monitoring and less paying for that service.
Count with me as beta tester and to give feedbacks. ;-)

@brandonkal
Copy link

P2P is not necessary for this use-case. I suggest:

  1. Users deploy backrest binary to all nodes that should backup
  2. The only requirement is that one backrest deployment is accessible by all nodes. Therefore, the main backrest should be accessible via the internet or the user can choose their own network infrastructure (Tailscale, wire-guard, netbird, etc)
  3. Each node is configured to register itself with the main backrest node. It then polls the main node API eg GET /api/backrest-config?node=node-uuid at a set interval. 10s or even longer is fine as this is just config.
  4. In the main backrest web UI, you can assign plans to nodes. When that node requests its configuration, it gets only the plans it is assigned and the repositories those plans depend on
  5. Nodes push progress updates to the central server POST /api/backrest-logs?node=node-uuid

@asitemade4u
Copy link

+1

@Emiliaaah
Copy link

I definitely like all the ideas presented so far, but I just thought I'd think aloud about some of my thoughts.

For syncing the nodes config just using an API endpoint with a set interval is probably just fine, but I'd personally also like to be able to manually run actions on those agents. Now using this same model for that, but then for some sort of action queue would probably work fine. It got me thinking however wouldn't something like web sockets be more ideal for such a use case?

Using web sockets would have the benefit of not having to constantly pull 1 or multiple endpoints every x seconds, especially if we'd need to have separate endpoint for the config, manual actions, etc.. It would also prevent having a delay between triggering an action and it actually being performed on the node (assuming it can perform that action at that time).

@garethgeorge
Copy link
Owner

garethgeorge commented May 19, 2024

Hey all, thanks for all the interest in this issue -- just updating to say that steady progress is being made toward supporting centralized backup management. Much of the refactoring (and migrations) in the 1.0.0 release are focused on readying the Backrest data model to support operations (possibly created by other installations) in repos and correct tracking of those operations.

On the networking front: I'm still investigating here, Backrest uses gRPC under-the-hood which is natively http/2. Because connectivity / syncing operations will happen on the backend we're not restricted to web technologies e.g. Websockets. I'm agreed that polling is a not the way we want to go, TCP keep-alive is much cheaper than repeatedly re-establishing connections (especially if they are HTTPS -- and they should be!).

I'm hoping to find a good OSS option that I can shim gRPC requests onto (such that they can be initiated by the hub and sent to the clients -- which really looks like some sort of inversion layer where clients will actually be establishing and "keeping alive" TCP channels to the backrest hub). I think https://libp2p.io/ may have some capabilities here (though I do not want to pull in any of the mesh networking / connectivity to the ipfs swarm from that project) but I'm wanting to find simpler alternatives -- which could ultimately look like building it myself!

Another problem space I'm still giving thought to is the relationship between the hub and clients. In particular, the model I'm imagining will be common is many client devices backing up to a single repo. In this case, I feel that the hub should be able to centrally coordinate maintenance operations e.g. "forget" and "prune" execution. I'm considering here whether:

  • The hub can act as a lock / scheduling coordinator, allowing clients to take out an advisory lock on a repo that will block other clients from attempting their own forgets / prunes.
  • The hub can act as a centralized place to run maintenance e.g. the hub runs forgets / prunes on repos locally.

The latter approach has the disadvantage that the hub will need access to a repo config for each repo used by a client BUT it also has the significant advantage that clients may be read only (e.g. you can centralize trust in the hub with many low-trust clients, this protects against ransomware. In my case I'd likely run a low-cost IPv6 VPS dedicated to this purpose). Not yet sure what will be best here but I am leaning towards the latter option.

@jcunix
Copy link

jcunix commented May 26, 2024

Very happy you are going down this path!

@brandonkal
Copy link

brandonkal commented May 26, 2024

I don't want to have to depend on persistent TCP connections or websockets for this functionality. While it is less work than if you polled frequently, it limits the usefulness of backrest for bandwidth-limited clients, IOT, edge, etc. A lot of the things that would be useful to centrally manage with this project really only need to check their config before they run a backup task, once a day at most. For SIM connections that bill by connection time and bandwidth, the minimal benefit (instant config update) would not outweigh the high cost incurred.

@swartjie
Copy link

@brandonkal I disagree, If data usage is an issue, you're not going to be running backups over that connection anyway.
Polling is fine I think since it does update, the issue with polling though can be the delay in actions. The TCP & Websockets route is nice since the actions are not as delayed, which improves the UI experience

@jcunix
Copy link

jcunix commented Jul 6, 2024

Hi, just thought I'd check in. Any updates on progress? Very interested in this functionality.

@garethgeorge
Copy link
Owner

garethgeorge commented Jul 10, 2024

edited with update: starting work on the multi-host model now, I've decided to skip the oplog decoupling from bbolt for now (will work on that in a future revision).

In progress PR is in #385 . No promises on ETA, but I expect to have some prototypes in the next few weeks. Likely a month or more out until it's tested / ready for release and documentation.

@garethgeorge garethgeorge self-assigned this Jul 14, 2024
@mattdale77
Copy link

+1 for this feature. It looks like you're already well into the development but the way I would do it is as has already been suggested. The backrest binary is installed on all the servers and accepts api calls from the central one and returns status updates to it. So all backups are executed locally on each server.

Related to the design here I would like to be able to define a plan and then execute it on multiple servers

@tigattack
Copy link

I agree with @mattdale77, I'd also prefer this method.

That said, the way it's currently being implemented will still be useful. Looking forward to trying it out!

@electrofloat
Copy link

In progress PR is in #385 .

This has been closed last week. Is #562 the continuation of it?

@garethgeorge
Copy link
Owner

garethgeorge commented Nov 22, 2024

Yes -- continuing work in #562 . I found the hub based model I was thinking about in #385 to be awkward to implement. Pairing back the scope a bit in #562 to be a bit more incremental and to allow synchronization to be organized around repos.

@garethgeorge
Copy link
Owner

garethgeorge commented Dec 17, 2024

Posting a progress update now that multihost management is making good progress and largely passing tests in #562 . It's been a long road to get here and has taken some significant rethinking of the feature as well as significant refactoring to backrest's operation storage model (and pushing half a year of preparatory refactoring and stability improvements).

What's done so far

  • Data model for multihost management: repos going forward will track a guid that uniquely identifies the repo. For existing repos this is set randomly on migration, and is pulled from the restic repo's configuration on the next backup operation or configuration change.
  • Repo configuration sync: optionally, repos can be configured to sync their configuration with selected peers (e.g. password, env vars, etc). These repos are provided to clients as "remote repo" targets and may be referred to with a special backrest:instanceID/repoID URI. Repo configurations will be kept up to date whenever clients connect to a host.
  • Repo operation sync: a client will synchronize local operations with a remote host providing a repo if using a "remote repo" as a backup target.

Todo in the near term

  • Full operation sync: allow a client to opt to send its full operation history (for all repos) to a remote host that may be used as a status dashboard. This will not require the remote host to have knowledge of the repos configured on the client / their configurations. Helpful for cases where the host is trusted to view status but possibly not to access the backup repositories.
  • Cryptography - channel security: sync should be used with https wherever possible, but to reduce the risk of leaking configurations, backrest will use https://pkg.go.dev/crypto/ecdh to establish encryption keys for sync sessions and to verify that sessions are tamper-proof / mitm resistant. This is largely intended to provide a balance of good security and deployment convenience, https for any backrest deployment providing remote repos to peers is strongly recommended.
    • note: alternatives considered here included requiring that sync run http2 w/https signed certificates, but this likely significantly complicates key generation and authorizing peers as these certs would be self signed. Additionally, it would require a second http port to be used if a user is using http to access the dashboard (e.g. on localhost with a self-signed cert).
  • Cryptography - identity management: sync should verify peer identity. Peers will be identified by their ECDSA public keys (using https://pkg.go.dev/crypto/ecdsa). To connect two peers both must configure either a "knownHost" or "authorizedClient" peer entry for the other peer's identity to establish trust. Milestones for release 1.9.0 will likely include thinking through some way to simplify peer key exchange as keys are somewhat large and must be entered into the config manually.
  • Dcumentation -- would like to document the sync protocol and especially the cryptographic components that ensure security such that they are understandable and auditable.

Rollout Plan

1.7.0

My expectation is to release the data model changes that support multihost management in 1.7.0 and, under the hood, the configuration and syncapi for multihost management will be available in an alpha state. There will not be any ui support in this revision, but it will be possible to see operations sync'd from other hosts in the repo view if a peer is added correctly. I'll post some instructions for doing this here in the release. Test coverage is good, but I'll primarily be using this stage of things prove out the migration logic and start getting users configs updated to include repo GUIDs. Within my setup I'll be building some experience running sync with a stable release version and putting out patches, improving error messages, and thinking through what status info needs to be available as I start UI work.

1.8.0

Aiming to include initial UI support for settings related to multihost management in this revision. This is likely to include

  • A new "instances" side column which will provide a new operation-tree view that shows all operations from any instance syncing with the local peer.
  • A peer connectivity status dashboard on the summary view (backrest's home page) making it easy to tell what peers are connected / or diagnose errors as they happen.
  • Peers configuration will be exposed only in the config.json , no UI support planned for adding peers to keep the feature limited access / in alpha.

1.9.0 or beyond

Will aim to provide support for some sort of easy peering flow to easily connect instances together. This is also a possible target for a stable revision of multihost management, but it is very possible that this milestone will slip to later versions.

@jcunix
Copy link

jcunix commented Dec 17, 2024

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests