Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address performance of existing unique-name generation (Part 2) #18676

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kpemartin
Copy link
Contributor

As described in Issue #16849, the existing Tools::getUniqueName method requires calling code to form a vector of existing names to be avoided.

This leads to poor performance both in the O(n) cost of building such a vector and also getUniqueName's O(n) algorithm for actually generating the unique name (where 'n' is the number of pre-existing names).

This has a particularly noticeable cost in documents with large numbers of DocumentObjects because generating both Names and Labels for each new object incurs this cost. During an operation such as importing this results in an O(n^2) time spent generating names.

PR #18589 introduces a new UniqueNameManager class, which is a pruned-down multiset of std::string names tailored for, and providing the ability to create, names made unique by inserting digit strings.

This PR uses this new class to replace all uses of Tools::getUniqueName and removes the latter.

It also moves the special handling for DocumentObject.Label (which must satisfy optional uniqueness requirements and update references) from PropertyString to DocumentObject where it more rightly belongs.

It also adds a general method to traverse all the Properties of a PropertyContainer as these are not stored in a general std container class that allows normal iteration.

Resolves #16849
This PR requires and includes PR #18589 in its commits.

As described in Issue 16849, the existing Tools::getUniqueName method
requires calling code to form a vector of existing names to be avoided.

This leads to poor performance both in the O(n) cost of building such a
vector and also getUniqueName's O(n) algorithm for actually generating
the unique name (where 'n' is the number of pre-existing names).

This has  particularly noticeable cost in documents with large numbers
of DocumentObjects because generating both Names and Labels for each new
object incurs this cost. During an operation such as importing this
results in an O(n^2) time spent generating names.

The other major cost is in the saving of the temporary backup file,
which uses name generation for the "files" embedded in the Zip file.
Documents can easily need several such "files" for each object in the
document.

This is the first part of the correction, adding an efficient class for
managing sets of unique names.

New class UniqueNameManager keeps a list of existing names organized in
a manner that eases unique-name generation. This class essentially acts
as a set of names, with the ability to add and remove names and check if
a name is already there, with the added ability to take a prototype name
and generate a unique form for it which is not already in the set.

a new unit test for UniqueNameManager has been added as well.

There is a small regression, compared to the existing unique-name code,
insofar as passing a prototype name like "xyz1234" to the old code would
yield "xyz1235" whether or not "xyz1234" already existed, while the new
code will return the next name above the currently-highest name on the
"xyz" model, which could be "xyz" or "xyz1" or "xyz0042"..
@github-actions github-actions bot added Mod: Core Issue or PR touches core sections (App, Gui, Base) of FreeCAD Mod: Spreadsheet Related to the Spreadsheet Workbench labels Dec 22, 2024
@kpemartin kpemartin marked this pull request as ready for review December 22, 2024 16:58
@maxwxyz maxwxyz requested a review from wwmayer December 22, 2024 18:23
kpemartin and others added 3 commits December 23, 2024 09:21
Co-authored-by: Chris Hennes <chennes@pioneerlibrarysystem.org>
Co-authored-by: Chris Hennes <chennes@pioneerlibrarysystem.org>
Co-authored-by: Chris Hennes <chennes@pioneerlibrarysystem.org>
@kpemartin
Copy link
Contributor Author

There are more changes coming on #18589 so this pr has to wait til that's done

@kpemartin kpemartin marked this pull request as draft December 23, 2024 14:26
As described in Issue 16849, the existing Tools::getUniqueName method
requires calling code to form a vector of existing names to be avoided.

This leads to poor performance both in the O(n) cost of building such a
vector and also getUniqueName's O(n) algorithm for actually generating
the unique name (where 'n' is the number of pre-existing names).

This has  particularly noticeable cost in documents with large numbers
of DocumentObjects because generating both Names and Labels for each new
object incurs this cost. During an operation such as importing this
results in an O(n^2) time spent generating names.

The other major cost is in the saving of the temporary backup file,
which uses name generation for the "files" embedded in the Zip file.
Documents can easily need several such "files" for each object in the
document.

This update includes the following changes to use the newly-added
UniqueNameManager as a replacement for the old Tools::getUniqueName
method and deletes the latter to remove any temptation to use it as
its usage model breeds inefficiency:

Eliminate Tools::getUniqueName, its local functions, and its unit tests.

Make DocumentObject naming use the new UniqueNameManager class.

Make DocumentObject Label naming use the new UniqueNameManager class.
This needs to monitor DocumentObject Labels for changes since this
property is not read-only. The special handling for the Label
property, which includes optionally forcing uniqueness and updating
links in referencing objects, has been mostly moved from
PropertyString to DocumentObject.

Add Document::containsObject(DocumentObject*) for a definitive
test of an object being in a Document. This is needed because
DocumentObjects can be in a sort of limbo (e.g. when they are in the
Undo/Redo lists) where they have a parent linkage to the Document but
should not participate in Label collision checks.

Rename Document.getStandardObjectName to getStandardObjectLabel
to better represent what it does.

Use new UniqueNameManager for Writer internal filenames within the zip
file.

Eliminate unneeded Reader::FileNames collection. The file names
already exist in the FileList collection elements. The only existing
use for the FileNames collection was to determine if there were any
files at all, and with FileList and FileNames being parallel
vectors, they both had the same length so FileList could be used
for this test..

Use UniqueNameManager for document names and labels. This uses ad hoc
UniqueNameManager objects created on the spot on the assumption that
document creation is relatively rare and there are few documents, so
although the cost is O(n), n itself is small.

Use an ad hoc UniqueNameManager to name new DymanicProperty entries.
This is only done if a property of the proposed name already exists,
since such a check is more-or-less O(log(n)), almost never finds a
collision, and avoids the O(n) building of the UniqueNameManager.
If there is a collision an ad-hoc UniqueNameManager is built
and discarded after use.
The property management classes have a bit of a mess of methods
including several to populate various collection types with all
existing properties. Rather than introducing yet another such
collection-specific method to fill a UniqueNameManager, a
visitProperties method was added which calls a passed function for
each property. The existing code (e.g. getPropertyMap) would be
simpler if they all used this but the cost of calling a lambda
for each property must be considered. It would clarify the semantics
of these methods, which have a bit of variance in which properties
populate the passed collection, e.g. when there are duplicate names..
Ideally the PropertyContainer class would keep a central directory of
all properties ("static", Dynamic, and exposed by ExtensionContainer and
other derivations) and a permanent UniqueNameManager. However the
Property management is a bit of a mess making such a change a project
unto itself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Mod: Core Issue or PR touches core sections (App, Gui, Base) of FreeCAD Mod: Spreadsheet Related to the Spreadsheet Workbench
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generation of unique names is a performance sinkhole due to poor algorithms
1 participant