Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement move support for xml_document #170

Merged
merged 15 commits into from
Nov 13, 2017
Merged

Implement move support for xml_document #170

merged 15 commits into from
Nov 13, 2017

Conversation

zeux
Copy link
Owner

@zeux zeux commented Nov 13, 2017

This change implements move ctor and assign support for xml_document.

All node handles remain valid after the move and point to the new document; the only exception is the document node itself (that remains unmoved).

Move is O(document size) in theory because it needs to relocate immediate document children (there is just one in conformant documents) and all memory pages; in practice the memory pages only need the header adjusted, which is ~0.1% of the actual data size.

Move requires no allocations in general, except when using compact mode where some moves need to grow the hash table which can fail (throw).

Fixes #104

zeux added 15 commits September 25, 2017 19:31
This change implements the initial version of move construction and
assignment support for documents.

When moving a document to another document, we always make sure move
target is in "clean" state (empty document), and proceed by relocating
all structures in the most efficient way possible.

Complications arise from the fact that the root (document) node is
embedded into xml_document object, so all pointers to it have to change;
this includes parent pointers of all first-level children as well as
allocator pointers in all memory pages and previous pointer in the first
on-heap memory page.

Additionally, compact mode makes everything even more complicated
because some of the pointers we need to update are stored in the hash
table (in fact, document first_child pointer is very likely to be there;
some parent pointers in first-level children will be using
compact_shared_parent but some won't be) which requires allocating a new
hash table which can fail.

Some details of this process are not fully fleshed out, especially for
compact mode; and this definitely requires many tests.
These just verify that move ctor/assignment operator work as expected in
simple cases - there are a number of ways in which the internal
structure can be incorrect...
Verify that move doesn't allocate and that it preserves structures
required for tree memory management and append_buffer in tact.
Make sure we have coverage for empty documents and for large documents
that trigger compact_shared_parent != root for some pages.
Add a test that checks that static buffer pointer was moved correctly
by checking if offset_debug still works.
Large test wasn't testing shared parent condition properly - add one
more level of hierarchy so that it works as expected.
We now check that appending a child to a moved document performs no
allocations - this is already the case, but if we neglected to copy the
allocator state this test would fail.
After move some nodes in the hash table can have keys that point to
other; this makes the table somewhat larger but this does not impact
correctness.

The reason is that for us to access a key in the hash table, there
should be a compact_pointer/string object with the state indicating that
it is stored in a hash table, and with the address matching the key. For
this to happen, we had to have put this object into this state which
would mean that we'd overwrite the hash entry with the new, correct
value.

When nodes/pages are being removed, we do not clean up keys from the
hash table - it's safe for the same reason, and thus move doesn't
introduce additional contracts here.
These tests currently fail for compact mode because of ->reserve()
failing.
These days OSX clang supports UB sanitizer so we can just use the same
settings for all systems.
This allows us to do a single reserve for a known amount of assignments
that is larger than the default minimum per reserve (16).
In compact mode, we currently can not support zero-allocation moves
since some pointer assignments required during the move need to allocate
hash table slots.

This is mostly applicable to xml_document_struct::first_child, since the
pointer to this element is used as a hash table key, but there are some
contrived cases where parents of root's children need a hash slot and
didn't have it before.

These cases can be fixed by changing the compact encoding to be a bit
more move friendly, but for now it's easier to handle the error and
throw/return during move.

When this happens, the source document doesn't change.
This helps make sure our error handling logic works and is exercised.
@zeux zeux merged commit 7c6d001 into master Nov 13, 2017
@zeux zeux deleted the move branch November 13, 2017 21:24
@zeux zeux mentioned this pull request Nov 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant