-
-
Notifications
You must be signed in to change notification settings - Fork 112
Database branching
OrioleDB implements copy-on-write checkpoints. The idea of branching is that we can keep checkpoints for a longer period than needed for recovery. When one creates a branch, it works like pg_basebackup, but skips OrioleDB data files. Instead of copying OrioleDB data files, it starts "holding" the current checkpoint for a branch. The new database cluster copy will have a reference to the parent branch (or multiple branches for nested branching).
We should reserve a few bits for branch numbers in tree downlinks. This way we will distinguish where to read to referenced block from. Updated downlinks will reference the current branch.
"Holding" checkpoint basically means following.
- Prevent file extents belonging to the given checkpoint from being reused.
- Prevent deletion of relfilenodes that existed in the given checkpoint.
Once the checkpoint is "released" we should both start reusing free extents from the "released" checkpoint and delete unreferenced relfilenodes.
The new directory orioledb_branches
under the PostgreSQL data directory, will contain the information about branches of this database instance. Each branch is described by a separate file containing two lines: the branch path and the checkpoint number.
The contents of orioledb_branches
directory are read on startup and corresponding checkpoints are held.
This file should be located in the PostgreSQL data directory. It should contain n
lines for n
-th level of the branch. Each line contains the path of the corresponding parent branch data directory.
The new function pg_backup_start_branch(path text)
will start the backup as usual, but with the following changes:
- The checkpoint is not released after
pg_backup_stop()
. The checkpoint is held till the branch is released bypg_branch_release(path text)
. - It creates and fsync's the new file in the
orioledb_branches
directory.
This function removes the corresponding branch file in the orioledb_branches
directory. It also releases the corresponding checkpoint if it becomes unreferenced.
This function lists all the branches of the current database instance.
When the block is read using the downlink, it is read from the directory specified in orioledb_sources
line corresponding to the branch number specified in the downlink.
A new block should be written to the current data directory. The corresponding downlinks should contain the current branch number indicated by the number of lines in orioledb_sources
.
The algorithm is following.
SELECT pg_backup_start_branch(:branch_dir);
- Copy contents of data directory excepts the
orioledb_data
folder SELECT pg_backup_stop();
- Append to the
orioledb_sources
file a line containing the path to original data dirctory
Probably it's worth creating a command-line utility to automate the above steps.
- Stop the branch PostgreSQL instance (if it's running)
- Delete the branch data directory
- Do
SELECT pg_branch_release(:branch_dir);
on the parent