Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare256 function pointer #273

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

folkertdev
Copy link
Collaborator

when e.g. the avx2 target feature is not enabled at compile time, but the feature is available at runtime, this approach reduces branching. We still dispatch statically if the target feature is already enabled at compile time

when e.g. the avx2 target feature is not enabled at compile time, but the feature is available at runtime, this approach reduces branching. We still dispatch statically if the target feature is already enabled at compile time
@folkertdev folkertdev marked this pull request as draft December 24, 2024 17:52
@folkertdev
Copy link
Collaborator Author

for level 1

Benchmark 1 (62 runs): target/release/examples/compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.7ms ±  991us    80.3ms … 87.8ms          3 ( 5%)        0%
  peak_rss           26.7MB ± 53.8KB    26.6MB … 26.7MB         13 (21%)        0%
  cpu_cycles          305M  ± 3.65M      298M  …  328M           5 ( 8%)        0%
  instructions        661M  ±  278       661M  …  661M           0 ( 0%)        0%
  cache_references   19.8M  ±  185K     19.5M  … 20.8M           2 ( 3%)        0%
  cache_misses        432K  ± 86.9K      336K  …  783K           2 ( 3%)        0%
  branch_misses      2.94M  ± 8.26K     2.88M  … 2.96M           2 ( 3%)        0%
Benchmark 2 (63 runs): target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          80.3ms ±  584us    78.7ms … 83.1ms          2 ( 3%)        ⚡-  1.6% ±  0.3%
  peak_rss           26.7MB ± 63.2KB    26.5MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          299M  ± 2.38M      293M  …  312M           3 ( 5%)        ⚡-  2.0% ±  0.4%
  instructions        620M  ±  255       620M  …  620M           1 ( 2%)        ⚡-  6.2% ±  0.0%
  cache_references   19.7M  ±  146K     19.5M  … 20.2M           3 ( 5%)          -  0.8% ±  0.3%
  cache_misses        409K  ± 74.2K      311K  …  664K           1 ( 2%)          -  5.3% ±  6.6%
  branch_misses      2.95M  ± 15.7K     2.87M  … 2.96M           2 ( 3%)          +  0.6% ±  0.2%

@brian-pane
Copy link

brian-pane commented Dec 28, 2024

This shows an improvement on x86_64 Intel as well (compiled without RUSTFLAGS="-Ctarget-cpu=native", in order to generate the new function-pointer code path).

Benchmark 1 (64 runs): ./blogpost-compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          78.9ms ±  852us    77.6ms … 83.1ms          1 ( 2%)        0%
  peak_rss           26.6MB ± 60.6KB    26.5MB … 26.7MB          0 ( 0%)        0%
  cpu_cycles          307M  ±  536K      305M  …  308M           0 ( 0%)        0%
  instructions        661M  ±  290       661M  …  661M           0 ( 0%)        0%
  cache_references    265K  ± 3.68K      262K  …  285K           4 ( 6%)        0%
  cache_misses        231K  ± 6.74K      205K  …  237K           8 (13%)        0%
  branch_misses      2.93M  ± 3.84K     2.92M  … 2.94M           1 ( 2%)        0%
Benchmark 2 (67 runs): ./target/release/examples/blogpost-compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          75.4ms ±  526us    74.6ms … 76.7ms          0 ( 0%)        ⚡-  4.3% ±  0.3%
  peak_rss           26.6MB ± 60.4KB    26.6MB … 26.7MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          292M  ±  545K      291M  …  294M           0 ( 0%)        ⚡-  4.6% ±  0.1%
  instructions        620M  ±  261       620M  …  620M           0 ( 0%)        ⚡-  6.2% ±  0.0%
  cache_references    266K  ± 5.65K      262K  …  298K           4 ( 6%)          +  0.2% ±  0.6%
  cache_misses        231K  ± 7.01K      203K  …  237K           5 ( 7%)          -  0.2% ±  1.0%
  branch_misses      3.03M  ± 4.91K     3.02M  … 3.04M           0 ( 0%)        💩+  3.6% ±  0.1%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants