feat: implement restore points #4169

i-am-not-giving-my-name-to-a-machine · 2024-12-02T14:33:26Z

This feature request is about implementing restore points in H2. The idea is that one can mark a database state and return to it easily without resorting to backups. The primary motivation for this is integration testing.

The current implementation status is:

Restore points can be used with SQL, e.g.

create restore point rp1
restore to point rp1
drop restore point rp1

and they can be selected via select restore_point_name, created_at, database_version from information_schema.restore_points. Using one of the first three commands requires admin privileges.
It is possible to create or drop a restore point while the database is in use concurrently. When restoring to a point, all concurrent sessions are dropped, any ongoing transactions a reaped, the database is brought back to a previous state and any data created since is completely lost.

Restore points are saved as a separate MVMap and inventoried by MVStore.meta. This way they can be accessed without accessing the public.sys system table which in turn means that a database need not be started.

Creating one or more restore points makes a FileStore effectively append only. If no restore points exist, e.g. because none have been created or all have been dropped, this restriction does not apply and the FileStore is free to do maintenance work.

The DEFRAG_ALWAYS=true flag works even with restore points. The compaction routine, that is called on a database shutdown, will copy all store versions, i.e. chunks, referenced by restore points and the latest one.

When used with in-memory databases, the behavior is the same as with persistent databases, but because of how the MVStore is versioning data, the entire history up to the oldest restore point is kept. This also means that a JVM can run out of memory although just a fraction of data or none at all is referenced by the latest version of the MVStore. Restoring an in-memory database to a point releases used memory of course.

Restore points are marked as an experimental feature and that they should not be used in a production environment.

Notes:

My understanding of chunks and pages of a FileStore is as follows: Chunks reference pages, which reference pages, possibly across chunks. There can be only one (hehe) chunk for one version. This is what, to my understanding, makes it safe to compact databases across different versions. If a chunk is copied from one FileStore to another, all pages across all relevant chunks are copied from a source FileStore into a single new chunk to a destination FileStore.

Link to the previous discussion(s) in the mailing list: https://groups.google.com/g/h2-database/c/iJLjES5j2TU

katzyn

Thank you for your contribution!

I don't comment on changes in MVStore excluding few places, these changes need to be reviewed by @andreitokar. And this is just a first quick look.

h2/src/main/org/h2/api/ErrorCode.java

h2/src/main/org/h2/command/dml/RestorePointCommand.java

h2/src/main/org/h2/table/InformationSchemaTable.java

katzyn · 2024-12-03T00:29:36Z

h2/src/test/org/h2/test/store/TestMVStore.java

@@ -1567,14 +1567,14 @@ private void testMeta() {
        try (MVStore s = openStore(fileName)) {
            s.setRetentionTime(Integer.MAX_VALUE);
            Map<String, String> m = s.getMetaMap();
-            assertEquals("[]", s.getMapNames().toString());
+            assertEquals("[restorePoints]", s.getMapNames().toString());


MVStore is also used as a standalone multi-map key-value storage, such change in it is unacceptable. It will break some applications. MVStore shouldn't expose any internal maps.

Makes sense. What do think of adding the restore point map "next to" meta instead of inventorying it inside of meta?

katzyn · 2024-12-03T00:35:38Z

h2/src/main/org/h2/mvstore/MVStore.java

+                .stream()
+                .mapToLong(it -> it.getOldestDatabaseVersionToKeep().getLong())
+                .findFirst()
+                .orElse(Long.MAX_VALUE);


It should be restorePoints.firstKey(), isn't it? Why a separate property with oldest version is needed?

If multiple restore points exist, younger restore points "cover" older ones. E.g. if A, B, and C exist and A is dropped, restoring to C or B brings A back to life.
Now, if you say that this is overkill and dropping old restore points should actually remove them and eventually also the data they reference, I agree. However, that would require an area in the database storage (no matter if in-memory or persistent) that is not versioned at all and, apart from the header of persistent databases, I didn't find such an area.
I'm open to suggestions. Anything that makes it easier to reason about what data can be restored at what point, I'd gladly take it.

katzyn · 2024-12-03T00:38:43Z

h2/src/main/org/h2/mvstore/MVStore.java

+        return restorePoints.values();
+    }
+
+    public boolean isEffectivelyAppendOnly() {


Such logic isn't suitable for any non-experimental usage. You already have a map with versions, so you should prevent garbage collection of these versions only, other old versions should be collectable.

I'm afraid that's not sufficient. If a restore point references a database version in a persistent database, it references a chunk. But because chunks only contain changed pages, because of the CoW approach, older versions must be retained in order to be able to materialize a version completely. At least that's is my current understanding of chunks and pages.

h2/src/main/org/h2/mvstore/MVStore.java

…admins

andreitokar · 2024-12-05T03:29:39Z

@i-am-not-giving-my-name-to-a-machine,
Thank you for your contribution!
While I certainty realize how much time and effort it takes to prepare such PR, I would like to question (again) necessity of this feature.
What testers want is a quick way to restore database to a known state after some amount of changes was made by the test, so it will be ready to a next test. This can be relatively easy achieved by closing database, copying db file and re-opening a copy, and you even did some PoC work along those lines. Such approach does not require any changes to H2 itself, and therefore is risk-free. This PR makes some serious changes across all H2 layers of abstraction - SQL, transaction engine, persistent store, and IMHO there is a good chance something might go wrong. For Instance, basic assumption is no loner holds, that if mvstore is closed, the only valid / preserved version is the current one.

I've heard the argument that presented approach will alleviate "the need for clearing a connection pool of connections to a database" between tests, but in reality we are talking about pool of size one here, don't we? It is impossible to run multiple test concurrently, all based on same known starting state. It is totally possible, on the other hand, if you make enough copies of a database file and hand them to each test separately.

After restore point is made, all subsequent changes will bloat database file (or RAM in case of in-memory db) to possibly unusable file size or OOM, since any unused space management will effectively stop at that point. That would definitely limit what test can do after restore point is created. BTW, what is the driving force beyond introduction of multiple restore points? And even if there is a need for them, it's easier to achieve with multiple db file copies.
As far as "SHUTDOWN COMPACT" changes, it seems to me, that you will end up with separate independent complete copies of the data for each restore point and the current state. Even if you only modify a tiny portion of data after each restore point, such "compaction" may double or triple file size instead.

You statement "If a chunk is copied from one FileStore to another, all pages across all relevant chunks are copied from a source FileStore into a single new chunk to a destination FileStore" is not true on multiple accounts. First of all, chunks are not copied there, but separate pages, and only pages reachable from the root of the version of MVMap copied. Different versions of same map do share pages from sub-trees, that were not modified between those versions. But since you copy multiple versions separately such sub-trees will be duplicated.
Also not a single, but many chunks will be written to destination store for a file of any significant size.
Keep in mind that chunk size is limited by memory, amongst other things.

With regard to concurrent connections, the only possible behaviour (roll back of all transactions in progress and closure of all sessions) is not much different from a complete shutdown, but requires quite elaborate manipulations. And that is were I have my major concerns. MVStore's method rollbackTo(long version) while somewhat tested, is not used by H2 at all, and I am not that confident in it's proper functioning, compare to other MVStore API. If it would be up to me, I would rather drop that feature (with a good chunk of code, btw), instead of start using it.

Unfortunately, given all this, I would vote against implementation of a "restore point" feature at a database level.

i-am-not-giving-my-name-to-a-machine · 2024-12-05T14:03:17Z

Hello @andreitokar,

[...] I would like to question (again) necessity of this feature. What testers want is a quick way to restore database to a known state after some amount of changes was made by the test, so it will be ready to a next test.

Guilty as charged. I think the general integration test approach that almost everyone is taking, to simply rollback a transaction, is just wrong. This is where my motivation comes from. And because H2 is so widely used in integration tests, because it is a complete RDBMS which can run in-memory, I figured restore points would be a nice addition to it.

This can be relatively easy achieved by closing database, copying db file and re-opening a copy, and you even did some PoC work along those lines. Such approach does not require any changes to H2 itself, and therefore is risk-free.

Yes, it's an alternative approach. And it works nicely - no doubt about it.

This PR makes some serious changes across all H2 layers of abstraction - SQL, transaction engine, persistent store, and IMHO there is a good chance something might go wrong.

Absolutely right. However, the serious changes are in the MVStore. The other changes aren't serious imo and some are even very simple additions to what is designed to be extensible. And yes, things can go wrong, which is why I say, if nothing else, the feature needs more testing.

For Instance, basic assumption is no loner holds, that if mvstore is closed, the only valid / preserved version is the current one.

Yes. Tbh I thought that's ok, because of the versionsToKeep number, that already exists.

I've heard the argument that presented approach will alleviate "the need for clearing a connection pool of connections to a database" between tests, but in reality we are talking about pool of size one here, don't we? It is impossible to run multiple test concurrently, all based on same known starting state. It is totally possible, on the other hand, if you make enough copies of a database file and hand them to each test separately.

That completely depends on the tests imo. The simplest tests are run sequentially and that's where this feature would be most beneficial.

After restore point is made, all subsequent changes will bloat database file (or RAM in case of in-memory db) [...].

Yes, that is a limitation. But I question tests that write hundreds of megabytes of data into an RDBMS or perform 1000s of transactions. I think that's load test territory and one wouldn't use restore points there.

BTW, what is the driving force beyond introduction of multiple restore points?

Again testing. With it one can create test suites that first write data to an already populated database and subsequent tests could restore the database to a state at which some data exists and they could test from there.

As far as "SHUTDOWN COMPACT" changes, it seems to me, that you will end up with separate independent complete copies of the data for each restore point and the current state. Even if you only modify a tiny portion of data after each restore point, such "compaction" may double or triple file size instead.

That's not a desired outcome of course. I thought that only changed pages are written, because of the CoW approach. However, I didn't verify it and relied on the implementation of MVStore and MVMap. It could simply be that I've misused it.

You statement "If a chunk is copied from one FileStore to another, all pages across all relevant chunks are copied from a source FileStore into a single new chunk to a destination FileStore" is not true on multiple accounts. [...]

Well, that's just bad for my approach and I would need to come up with something different.

MVStore's method rollbackTo(long version) while somewhat tested, is not used by H2 at all, and I am not that confident in it's proper functioning, compare to other MVStore API. If it would be up to me, I would rather drop that feature (with a good chunk of code, btw), instead of start using it.

Ouch, that's a big one, because all my implementation relies on the ability to rollback an MVStore. Too bad. But I agree: If it's not used at all (outside of my PR), isn't well tested, and you as an author and expert on this want to drop it, it should probably be dropped. And imo along with it the versionsToKeep number, which is currently always set to 0 anyway.

I personally would go one step further, but this one is highly opinionated: There should be no distinction between MVStore and FileStore. An in-memory database should simply write to an in-memory file. That would reduce the code even further.

Unfortunately, given all this, I would vote against implementation of a "restore point" feature at a database level.

That's ok. If there's nobody from the H2 maintainers, who want this feature to be finished and added, there's no point in continuing. At least I tried and that is what counts from my point of view.

Feel free to close this PR. I'd drop all future efforts then.

Regards,
Enno

andreitokar · 2024-12-07T16:24:20Z

Hello Enno,

Yes. Tbh I thought that's ok, because of the versionsToKeep number, that already exists.

It is reset to currentVersion when store is closed, because all transactions are closed (means no outstanding references to older versions), and extra retention time (historical precaution) is set to zero.
store() / storeNow() / storeIt() / serializeAndStore() / serializeToBuffer() / onVersionChange() / dropUnusedVersions() / setOldestVersionToKeep()

versionsToKeep number, which is currently always set to 0 anyway

Not really, on MVstore opening it is reset to currentVersion again:
MVStore() / onVersionChange() / dropUnusedVersions() / setOldestVersionToKeep()

There should be no distinction between MVStore and FileStore

That would simplify code base a bit, but you will pay a price, by constantly serializing / de-serializing data. On the other hand, it should save you some memory. The choice is yours, and it's good to have one.

I thought that only changed pages are written, because of the CoW approach.

That is true, of course, but think of analogy with VCS - if you check out two branches (into different locations) you will get two full copies, regardless of the fact, that VCS itself saves only deltas from the base for each branch.

At least I tried and that is what counts from my point of view.

Absolutely, You've gain some knowledge in the process, and hopefully we'll see some of yours fresh ideas / contributions to H2 soon. ( in-memory H2 files with the ability of CoW forking 😺 )

Best wishes,
Andrei

feat: implement restore points

d97a094

katzyn reviewed Dec 3, 2024

View reviewed changes

i-am-not-giving-my-name-to-a-machine added 2 commits December 3, 2024 14:18

feat: implement restore points; adjust code according to feedback

e1dc5b5

feat: implement restore points; restrict restore points selection to …

931043a

…admins

andreitokar closed this Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement restore points #4169

feat: implement restore points #4169

i-am-not-giving-my-name-to-a-machine commented Dec 2, 2024 •

edited

Loading

katzyn left a comment

katzyn Dec 3, 2024

i-am-not-giving-my-name-to-a-machine Dec 3, 2024

katzyn Dec 3, 2024

i-am-not-giving-my-name-to-a-machine Dec 3, 2024 •

edited

Loading

katzyn Dec 3, 2024

i-am-not-giving-my-name-to-a-machine Dec 3, 2024

andreitokar commented Dec 5, 2024

i-am-not-giving-my-name-to-a-machine commented Dec 5, 2024

andreitokar commented Dec 7, 2024

feat: implement restore points #4169

feat: implement restore points #4169

Conversation

i-am-not-giving-my-name-to-a-machine commented Dec 2, 2024 • edited Loading

The current implementation status is:

Notes:

katzyn left a comment

Choose a reason for hiding this comment

katzyn Dec 3, 2024

Choose a reason for hiding this comment

i-am-not-giving-my-name-to-a-machine Dec 3, 2024

Choose a reason for hiding this comment

katzyn Dec 3, 2024

Choose a reason for hiding this comment

i-am-not-giving-my-name-to-a-machine Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

katzyn Dec 3, 2024

Choose a reason for hiding this comment

i-am-not-giving-my-name-to-a-machine Dec 3, 2024

Choose a reason for hiding this comment

andreitokar commented Dec 5, 2024

i-am-not-giving-my-name-to-a-machine commented Dec 5, 2024

andreitokar commented Dec 7, 2024

i-am-not-giving-my-name-to-a-machine commented Dec 2, 2024 •

edited

Loading

i-am-not-giving-my-name-to-a-machine Dec 3, 2024 •

edited

Loading