Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement restore points #4169

Conversation

i-am-not-giving-my-name-to-a-machine
Copy link

@i-am-not-giving-my-name-to-a-machine i-am-not-giving-my-name-to-a-machine commented Dec 2, 2024

This feature request is about implementing restore points in H2. The idea is that one can mark a database state and return to it easily without resorting to backups. The primary motivation for this is integration testing.

The current implementation status is:

Restore points can be used with SQL, e.g.

  • create restore point rp1
  • restore to point rp1
  • drop restore point rp1

and they can be selected via select restore_point_name, created_at, database_version from information_schema.restore_points. Using one of the first three commands requires admin privileges.
It is possible to create or drop a restore point while the database is in use concurrently. When restoring to a point, all concurrent sessions are dropped, any ongoing transactions a reaped, the database is brought back to a previous state and any data created since is completely lost.

Restore points are saved as a separate MVMap and inventoried by MVStore.meta. This way they can be accessed without accessing the public.sys system table which in turn means that a database need not be started.

Creating one or more restore points makes a FileStore effectively append only. If no restore points exist, e.g. because none have been created or all have been dropped, this restriction does not apply and the FileStore is free to do maintenance work.

The DEFRAG_ALWAYS=true flag works even with restore points. The compaction routine, that is called on a database shutdown, will copy all store versions, i.e. chunks, referenced by restore points and the latest one.

When used with in-memory databases, the behavior is the same as with persistent databases, but because of how the MVStore is versioning data, the entire history up to the oldest restore point is kept. This also means that a JVM can run out of memory although just a fraction of data or none at all is referenced by the latest version of the MVStore. Restoring an in-memory database to a point releases used memory of course.

Restore points are marked as an experimental feature and that they should not be used in a production environment.

Notes:

My understanding of chunks and pages of a FileStore is as follows: Chunks reference pages, which reference pages, possibly across chunks. There can be only one (hehe) chunk for one version. This is what, to my understanding, makes it safe to compact databases across different versions. If a chunk is copied from one FileStore to another, all pages across all relevant chunks are copied from a source FileStore into a single new chunk to a destination FileStore.

Link to the previous discussion(s) in the mailing list: https://groups.google.com/g/h2-database/c/iJLjES5j2TU

Copy link
Contributor

@katzyn katzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

I don't comment on changes in MVStore excluding few places, these changes need to be reviewed by @andreitokar. And this is just a first quick look.

h2/src/main/org/h2/api/ErrorCode.java Outdated Show resolved Hide resolved
h2/src/main/org/h2/command/dml/RestorePointCommand.java Outdated Show resolved Hide resolved
h2/src/main/org/h2/table/InformationSchemaTable.java Outdated Show resolved Hide resolved
h2/src/main/org/h2/table/InformationSchemaTable.java Outdated Show resolved Hide resolved
h2/src/main/org/h2/table/InformationSchemaTable.java Outdated Show resolved Hide resolved
@@ -1567,14 +1567,14 @@ private void testMeta() {
try (MVStore s = openStore(fileName)) {
s.setRetentionTime(Integer.MAX_VALUE);
Map<String, String> m = s.getMetaMap();
assertEquals("[]", s.getMapNames().toString());
assertEquals("[restorePoints]", s.getMapNames().toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MVStore is also used as a standalone multi-map key-value storage, such change in it is unacceptable. It will break some applications. MVStore shouldn't expose any internal maps.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. What do think of adding the restore point map "next to" meta instead of inventorying it inside of meta?

.stream()
.mapToLong(it -> it.getOldestDatabaseVersionToKeep().getLong())
.findFirst()
.orElse(Long.MAX_VALUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be restorePoints.firstKey(), isn't it? Why a separate property with oldest version is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple restore points exist, younger restore points "cover" older ones. E.g. if A, B, and C exist and A is dropped, restoring to C or B brings A back to life.
Now, if you say that this is overkill and dropping old restore points should actually remove them and eventually also the data they reference, I agree. However, that would require an area in the database storage (no matter if in-memory or persistent) that is not versioned at all and, apart from the header of persistent databases, I didn't find such an area.
I'm open to suggestions. Anything that makes it easier to reason about what data can be restored at what point, I'd gladly take it.

return restorePoints.values();
}

public boolean isEffectivelyAppendOnly() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such logic isn't suitable for any non-experimental usage. You already have a map with versions, so you should prevent garbage collection of these versions only, other old versions should be collectable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that's not sufficient. If a restore point references a database version in a persistent database, it references a chunk. But because chunks only contain changed pages, because of the CoW approach, older versions must be retained in order to be able to materialize a version completely. At least that's is my current understanding of chunks and pages.

h2/src/main/org/h2/mvstore/MVStore.java Outdated Show resolved Hide resolved
@andreitokar
Copy link
Contributor

@i-am-not-giving-my-name-to-a-machine,
Thank you for your contribution!
While I certainty realize how much time and effort it takes to prepare such PR, I would like to question (again) necessity of this feature.
What testers want is a quick way to restore database to a known state after some amount of changes was made by the test, so it will be ready to a next test. This can be relatively easy achieved by closing database, copying db file and re-opening a copy, and you even did some PoC work along those lines. Such approach does not require any changes to H2 itself, and therefore is risk-free. This PR makes some serious changes across all H2 layers of abstraction - SQL, transaction engine, persistent store, and IMHO there is a good chance something might go wrong. For Instance, basic assumption is no loner holds, that if mvstore is closed, the only valid / preserved version is the current one.

I've heard the argument that presented approach will alleviate "the need for clearing a connection pool of connections to a database" between tests, but in reality we are talking about pool of size one here, don't we? It is impossible to run multiple test concurrently, all based on same known starting state. It is totally possible, on the other hand, if you make enough copies of a database file and hand them to each test separately.

After restore point is made, all subsequent changes will bloat database file (or RAM in case of in-memory db) to possibly unusable file size or OOM, since any unused space management will effectively stop at that point. That would definitely limit what test can do after restore point is created. BTW, what is the driving force beyond introduction of multiple restore points? And even if there is a need for them, it's easier to achieve with multiple db file copies.
As far as "SHUTDOWN COMPACT" changes, it seems to me, that you will end up with separate independent complete copies of the data for each restore point and the current state. Even if you only modify a tiny portion of data after each restore point, such "compaction" may double or triple file size instead.

You statement "If a chunk is copied from one FileStore to another, all pages across all relevant chunks are copied from a source FileStore into a single new chunk to a destination FileStore" is not true on multiple accounts. First of all, chunks are not copied there, but separate pages, and only pages reachable from the root of the version of MVMap copied. Different versions of same map do share pages from sub-trees, that were not modified between those versions. But since you copy multiple versions separately such sub-trees will be duplicated.
Also not a single, but many chunks will be written to destination store for a file of any significant size.
Keep in mind that chunk size is limited by memory, amongst other things.

With regard to concurrent connections, the only possible behaviour (roll back of all transactions in progress and closure of all sessions) is not much different from a complete shutdown, but requires quite elaborate manipulations. And that is were I have my major concerns. MVStore's method rollbackTo(long version) while somewhat tested, is not used by H2 at all, and I am not that confident in it's proper functioning, compare to other MVStore API. If it would be up to me, I would rather drop that feature (with a good chunk of code, btw), instead of start using it.

Unfortunately, given all this, I would vote against implementation of a "restore point" feature at a database level.

@i-am-not-giving-my-name-to-a-machine

Hello @andreitokar,

[...] I would like to question (again) necessity of this feature. What testers want is a quick way to restore database to a known state after some amount of changes was made by the test, so it will be ready to a next test.

Guilty as charged. I think the general integration test approach that almost everyone is taking, to simply rollback a transaction, is just wrong. This is where my motivation comes from. And because H2 is so widely used in integration tests, because it is a complete RDBMS which can run in-memory, I figured restore points would be a nice addition to it.

This can be relatively easy achieved by closing database, copying db file and re-opening a copy, and you even did some PoC work along those lines. Such approach does not require any changes to H2 itself, and therefore is risk-free.

Yes, it's an alternative approach. And it works nicely - no doubt about it.

This PR makes some serious changes across all H2 layers of abstraction - SQL, transaction engine, persistent store, and IMHO there is a good chance something might go wrong.

Absolutely right. However, the serious changes are in the MVStore. The other changes aren't serious imo and some are even very simple additions to what is designed to be extensible. And yes, things can go wrong, which is why I say, if nothing else, the feature needs more testing.

For Instance, basic assumption is no loner holds, that if mvstore is closed, the only valid / preserved version is the current one.

Yes. Tbh I thought that's ok, because of the versionsToKeep number, that already exists.

I've heard the argument that presented approach will alleviate "the need for clearing a connection pool of connections to a database" between tests, but in reality we are talking about pool of size one here, don't we? It is impossible to run multiple test concurrently, all based on same known starting state. It is totally possible, on the other hand, if you make enough copies of a database file and hand them to each test separately.

That completely depends on the tests imo. The simplest tests are run sequentially and that's where this feature would be most beneficial.

After restore point is made, all subsequent changes will bloat database file (or RAM in case of in-memory db) [...].

Yes, that is a limitation. But I question tests that write hundreds of megabytes of data into an RDBMS or perform 1000s of transactions. I think that's load test territory and one wouldn't use restore points there.

BTW, what is the driving force beyond introduction of multiple restore points?

Again testing. With it one can create test suites that first write data to an already populated database and subsequent tests could restore the database to a state at which some data exists and they could test from there.

As far as "SHUTDOWN COMPACT" changes, it seems to me, that you will end up with separate independent complete copies of the data for each restore point and the current state. Even if you only modify a tiny portion of data after each restore point, such "compaction" may double or triple file size instead.

That's not a desired outcome of course. I thought that only changed pages are written, because of the CoW approach. However, I didn't verify it and relied on the implementation of MVStore and MVMap. It could simply be that I've misused it.

You statement "If a chunk is copied from one FileStore to another, all pages across all relevant chunks are copied from a source FileStore into a single new chunk to a destination FileStore" is not true on multiple accounts. [...]

Well, that's just bad for my approach and I would need to come up with something different.

MVStore's method rollbackTo(long version) while somewhat tested, is not used by H2 at all, and I am not that confident in it's proper functioning, compare to other MVStore API. If it would be up to me, I would rather drop that feature (with a good chunk of code, btw), instead of start using it.

Ouch, that's a big one, because all my implementation relies on the ability to rollback an MVStore. Too bad. But I agree: If it's not used at all (outside of my PR), isn't well tested, and you as an author and expert on this want to drop it, it should probably be dropped. And imo along with it the versionsToKeep number, which is currently always set to 0 anyway.

I personally would go one step further, but this one is highly opinionated: There should be no distinction between MVStore and FileStore. An in-memory database should simply write to an in-memory file. That would reduce the code even further.

Unfortunately, given all this, I would vote against implementation of a "restore point" feature at a database level.

That's ok. If there's nobody from the H2 maintainers, who want this feature to be finished and added, there's no point in continuing. At least I tried and that is what counts from my point of view.

Feel free to close this PR. I'd drop all future efforts then.

Regards,
Enno

@andreitokar
Copy link
Contributor

Hello Enno,

Yes. Tbh I thought that's ok, because of the versionsToKeep number, that already exists.

It is reset to currentVersion when store is closed, because all transactions are closed (means no outstanding references to older versions), and extra retention time (historical precaution) is set to zero.
store() / storeNow() / storeIt() / serializeAndStore() / serializeToBuffer() / onVersionChange() / dropUnusedVersions() / setOldestVersionToKeep()

versionsToKeep number, which is currently always set to 0 anyway

Not really, on MVstore opening it is reset to currentVersion again:
MVStore() / onVersionChange() / dropUnusedVersions() / setOldestVersionToKeep()

There should be no distinction between MVStore and FileStore

That would simplify code base a bit, but you will pay a price, by constantly serializing / de-serializing data. On the other hand, it should save you some memory. The choice is yours, and it's good to have one.

I thought that only changed pages are written, because of the CoW approach.

That is true, of course, but think of analogy with VCS - if you check out two branches (into different locations) you will get two full copies, regardless of the fact, that VCS itself saves only deltas from the base for each branch.

At least I tried and that is what counts from my point of view.

Absolutely, You've gain some knowledge in the process, and hopefully we'll see some of yours fresh ideas / contributions to H2 soon. ( in-memory H2 files with the ability of CoW forking 😺 )

Best wishes,
Andrei

@andreitokar andreitokar closed this Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants