Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BigString] Fix accidentally quadratic BigString.init #405

Merged
merged 1 commit into from
Jul 15, 2024

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Jul 15, 2024

When ingesting a String instance, BigString assumes that the input string has a reasonably efficient UTF-8 view.

Unfortunately, that is very much not the case when the input happens to be backed by a bridged NSString object — it appears that in this case, the ingester loop invokes some operation(s) with linear complexity in the size of the entire input, rendering the ingester’s overall complexity quadratic.

The BigString ingester is only expected to operate within a single chunk at the time. It’s unclear precisely which operation triggers the quadratic behavior; ideally we should figure it out and resolve it with a more targeted fix.

In the meantime, a blunt stopgap fix is to force-transcode the input string to UTF-8 at the time the ingester is initialized. This unnecessarily wastes some (temporary) memory on holding the transcoded string, but it avoids the quadratic cliff.

Before:
results-before

After:
results-after

rdar://131176282

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (if appropriate).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
  • I've updated the documentation if necessary.

@lorentey lorentey added the RopeModule Positional B-trees label Jul 15, 2024
@lorentey lorentey added this to the 1.1.3 milestone Jul 15, 2024
When ingesting a `String` instance, `BigString` assumes that the input string has a reasonably efficient UTF-8 view.

Unfortunately, that is very much not the case when the input happens to be backed by a bridged NSString object — it appears that in this case, the ingester loop invokes some operation(s) with linear complexity in the size of the entire input, rendering the ingester’s overall complexity quadratic.

The BigString ingester is only expected to operate within a single chunk at the time. It’s unclear precisely which operation triggers the quadratic behavior; ideally we should figure it out and resolve it with a more targeted fix.

In the meantime, a blunt stopgap fix is to force-transcode the input string to UTF-8 at the time the ingester is initialized. This unnecessarily wastes some (temporary) memory on holding the transcoded string, but it avoids the quadratic cliff.
@lorentey lorentey force-pushed the accidentally-quadratic-BigString-init branch from f3e6ec1 to 5e1fe6e Compare July 15, 2024 20:13
@lorentey
Copy link
Member Author

@swift-ci test

@lorentey lorentey merged commit 2db08b3 into release/1.1 Jul 15, 2024
1 of 2 checks passed
@lorentey lorentey deleted the accidentally-quadratic-BigString-init branch July 15, 2024 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RopeModule Positional B-trees
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants