-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LZ4_memcpy_using_offset small optimization #1347
base: dev
Are you sure you want to change the base?
Conversation
It's likely that the impact of this patch depends on the data to decompress. |
So, I made some tests, across a variety of compilers and compressed files,
Summary : Ultimately, the wild performance differences measured probably have nothing to do with the patch itself, at least not directly. Which means, I don't think this way of measuring performance is good enough to analyze the benefits of the patch. A few potential ideas, for discussion:
|
Sending this PR as a subset of #1222, trying to isolate an optimization.
Tested it on an 12th gen Intel CPU.
Showed to be faster on gcc-9, gcc-10, gcc-11 and gcc-12.
Tied on clang-12 and clang-13, with a tiny favorable trend on clang-12.
Seems to incur a small performance regression on clang-11 and clang-14.
Overall it seems to be faster, and in theory it should be faster as well.
Might worth giving it a try.