-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address performance of existing unique-name generation (Part 2) #18676
Draft
kpemartin
wants to merge
7
commits into
FreeCAD:main
Choose a base branch
from
kpemartin:Issue16849B
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+905
−479
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As described in Issue 16849, the existing Tools::getUniqueName method requires calling code to form a vector of existing names to be avoided. This leads to poor performance both in the O(n) cost of building such a vector and also getUniqueName's O(n) algorithm for actually generating the unique name (where 'n' is the number of pre-existing names). This has particularly noticeable cost in documents with large numbers of DocumentObjects because generating both Names and Labels for each new object incurs this cost. During an operation such as importing this results in an O(n^2) time spent generating names. The other major cost is in the saving of the temporary backup file, which uses name generation for the "files" embedded in the Zip file. Documents can easily need several such "files" for each object in the document. This is the first part of the correction, adding an efficient class for managing sets of unique names. New class UniqueNameManager keeps a list of existing names organized in a manner that eases unique-name generation. This class essentially acts as a set of names, with the ability to add and remove names and check if a name is already there, with the added ability to take a prototype name and generate a unique form for it which is not already in the set. a new unit test for UniqueNameManager has been added as well. There is a small regression, compared to the existing unique-name code, insofar as passing a prototype name like "xyz1234" to the old code would yield "xyz1235" whether or not "xyz1234" already existed, while the new code will return the next name above the currently-highest name on the "xyz" model, which could be "xyz" or "xyz1" or "xyz0042"..
github-actions
bot
added
Mod: Core
Issue or PR touches core sections (App, Gui, Base) of FreeCAD
Mod: Spreadsheet
Related to the Spreadsheet Workbench
labels
Dec 22, 2024
Co-authored-by: Chris Hennes <chennes@pioneerlibrarysystem.org>
Co-authored-by: Chris Hennes <chennes@pioneerlibrarysystem.org>
Co-authored-by: Chris Hennes <chennes@pioneerlibrarysystem.org>
There are more changes coming on #18589 so this pr has to wait til that's done |
As described in Issue 16849, the existing Tools::getUniqueName method requires calling code to form a vector of existing names to be avoided. This leads to poor performance both in the O(n) cost of building such a vector and also getUniqueName's O(n) algorithm for actually generating the unique name (where 'n' is the number of pre-existing names). This has particularly noticeable cost in documents with large numbers of DocumentObjects because generating both Names and Labels for each new object incurs this cost. During an operation such as importing this results in an O(n^2) time spent generating names. The other major cost is in the saving of the temporary backup file, which uses name generation for the "files" embedded in the Zip file. Documents can easily need several such "files" for each object in the document. This update includes the following changes to use the newly-added UniqueNameManager as a replacement for the old Tools::getUniqueName method and deletes the latter to remove any temptation to use it as its usage model breeds inefficiency: Eliminate Tools::getUniqueName, its local functions, and its unit tests. Make DocumentObject naming use the new UniqueNameManager class. Make DocumentObject Label naming use the new UniqueNameManager class. This needs to monitor DocumentObject Labels for changes since this property is not read-only. The special handling for the Label property, which includes optionally forcing uniqueness and updating links in referencing objects, has been mostly moved from PropertyString to DocumentObject. Add Document::containsObject(DocumentObject*) for a definitive test of an object being in a Document. This is needed because DocumentObjects can be in a sort of limbo (e.g. when they are in the Undo/Redo lists) where they have a parent linkage to the Document but should not participate in Label collision checks. Rename Document.getStandardObjectName to getStandardObjectLabel to better represent what it does. Use new UniqueNameManager for Writer internal filenames within the zip file. Eliminate unneeded Reader::FileNames collection. The file names already exist in the FileList collection elements. The only existing use for the FileNames collection was to determine if there were any files at all, and with FileList and FileNames being parallel vectors, they both had the same length so FileList could be used for this test.. Use UniqueNameManager for document names and labels. This uses ad hoc UniqueNameManager objects created on the spot on the assumption that document creation is relatively rare and there are few documents, so although the cost is O(n), n itself is small. Use an ad hoc UniqueNameManager to name new DymanicProperty entries. This is only done if a property of the proposed name already exists, since such a check is more-or-less O(log(n)), almost never finds a collision, and avoids the O(n) building of the UniqueNameManager. If there is a collision an ad-hoc UniqueNameManager is built and discarded after use. The property management classes have a bit of a mess of methods including several to populate various collection types with all existing properties. Rather than introducing yet another such collection-specific method to fill a UniqueNameManager, a visitProperties method was added which calls a passed function for each property. The existing code (e.g. getPropertyMap) would be simpler if they all used this but the cost of calling a lambda for each property must be considered. It would clarify the semantics of these methods, which have a bit of variance in which properties populate the passed collection, e.g. when there are duplicate names.. Ideally the PropertyContainer class would keep a central directory of all properties ("static", Dynamic, and exposed by ExtensionContainer and other derivations) and a permanent UniqueNameManager. However the Property management is a bit of a mess making such a change a project unto itself.
kpemartin
force-pushed
the
Issue16849B
branch
from
December 26, 2024 04:46
b20509b
to
edbfde7
Compare
for more information, see https://pre-commit.ci
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Mod: Core
Issue or PR touches core sections (App, Gui, Base) of FreeCAD
Mod: Spreadsheet
Related to the Spreadsheet Workbench
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As described in Issue #16849, the existing
Tools::getUniqueName
method requires calling code to form a vector of existing names to be avoided.This leads to poor performance both in the O(n) cost of building such a vector and also
getUniqueName
's O(n) algorithm for actually generating the unique name (where 'n' is the number of pre-existing names).This has a particularly noticeable cost in documents with large numbers of
DocumentObject
s because generating both Names and Labels for each new object incurs this cost. During an operation such as importing this results in an O(n^2) time spent generating names.PR #18589 introduces a new
UniqueNameManager
class, which is a pruned-down multiset ofstd::string
names tailored for, and providing the ability to create, names made unique by inserting digit strings.This PR uses this new class to replace all uses of
Tools::getUniqueName
and removes the latter.It also moves the special handling for
DocumentObject.Label
(which must satisfy optional uniqueness requirements and update references) fromPropertyString
toDocumentObject
where it more rightly belongs.It also adds a general method to traverse all the Properties of a
PropertyContainer
as these are not stored in a general std container class that allows normal iteration.Resolves #16849
This PR requires and includes PR #18589 in its commits.