Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve async runtime scaling #946

Merged
merged 18 commits into from
Aug 23, 2023
Merged

Improve async runtime scaling #946

merged 18 commits into from
Aug 23, 2023

Conversation

jaspervdj
Copy link
Owner

@jaspervdj jaspervdj commented Aug 8, 2022

This noticeably improves performance on most sites I tried, though I usually need to copy
one of the blogposts around a 1000 times for the effect to become more noticeable and
measurable.

Still in draft because the error you get on cyclic dependencies got much worse, that's
fixable though.

@jaspervdj jaspervdj requested a review from Minoru August 8, 2022 07:42
@jaspervdj jaspervdj marked this pull request as ready for review August 12, 2022 14:19
Copy link
Collaborator

@Minoru Minoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry it took me so long to take a look at this! The code looks good.

I tested it on my site (https://github.com/Minoru/blog.debiania.in.ua), which only has 187 Markdown files to compile, but also embeds 32 thousand MathJax files (processed with copyFileCompiler). The performance is indeed better, but the scaling is still bad. Furthermore, the sweet spot is now at 2 cores rather than 4 cores (for an 4-core CPU with hyperthreading), which is odd.

scaling

$ hyperfine --prepare './debiania-old-1645d9c clean' './debiania-old-1645d9c build'
Benchmark 1: ./debiania-old-1645d9c build
  Time (mean ± σ):     15.999 s ±  0.324 s    [User: 52.089 s, System: 14.393 s]
  Range (min … max):   15.472 s … 16.603 s    10 runs


$ hyperfine --parameter-scan threads 1 9 --prepare './debiania-old-1645d9c clean' './debiania-old-1645d9c build +RTS -N{threads}'
Benchmark 1: ./debiania-old-1645d9c build +RTS -N1
  Time (mean ± σ):     13.335 s ±  0.150 s    [User: 13.083 s, System: 0.768 s]
  Range (min … max):   13.101 s … 13.601 s    10 runs
 
Benchmark 2: ./debiania-old-1645d9c build +RTS -N2
  Time (mean ± σ):     11.098 s ±  0.344 s    [User: 15.222 s, System: 2.124 s]
  Range (min … max):   10.690 s … 11.691 s    10 runs
 
Benchmark 3: ./debiania-old-1645d9c build +RTS -N3
  Time (mean ± σ):     10.812 s ±  0.601 s    [User: 18.158 s, System: 3.344 s]
  Range (min … max):    9.991 s … 11.861 s    10 runs
 
Benchmark 4: ./debiania-old-1645d9c build +RTS -N4
  Time (mean ± σ):     11.622 s ±  0.496 s    [User: 22.643 s, System: 5.129 s]
  Range (min … max):   11.041 s … 12.421 s    10 runs
 
Benchmark 5: ./debiania-old-1645d9c build +RTS -N5
  Time (mean ± σ):     12.484 s ±  0.622 s    [User: 28.233 s, System: 7.239 s]
  Range (min … max):   11.621 s … 13.492 s    10 runs
 
Benchmark 6: ./debiania-old-1645d9c build +RTS -N6
  Time (mean ± σ):     13.577 s ±  0.473 s    [User: 33.883 s, System: 9.680 s]
  Range (min … max):   12.752 s … 14.310 s    10 runs
 
Benchmark 7: ./debiania-old-1645d9c build +RTS -N7
  Time (mean ± σ):     14.613 s ±  0.323 s    [User: 40.769 s, System: 12.303 s]
  Range (min … max):   13.957 s … 14.992 s    10 runs
 
Benchmark 8: ./debiania-old-1645d9c build +RTS -N8
  Time (mean ± σ):     15.961 s ±  0.250 s    [User: 51.374 s, System: 14.443 s]
  Range (min … max):   15.504 s … 16.296 s    10 runs
 
Benchmark 9: ./debiania-old-1645d9c build +RTS -N9
  Time (mean ± σ):     17.072 s ±  0.263 s    [User: 57.513 s, System: 16.108 s]
  Range (min … max):   16.535 s … 17.337 s    10 runs
 
Summary
  './debiania-old-1645d9c build +RTS -N3' ran
    1.03 ± 0.07 times faster than './debiania-old-1645d9c build +RTS -N2'
    1.07 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N4'
    1.15 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N5'
    1.23 ± 0.07 times faster than './debiania-old-1645d9c build +RTS -N1'
    1.26 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N6'
    1.35 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N7'
    1.48 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N8'
    1.58 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N9'


$ hyperfine --prepare './debiania-new-3a54b21 clean' './debiania-new-3a54b21 build'
Benchmark 1: ./debiania-new-3a54b21 build
  Time (mean ± σ):     14.586 s ±  0.188 s    [User: 46.536 s, System: 11.439 s]
  Range (min … max):   14.244 s … 14.874 s    10 runs


$ hyperfine --parameter-scan threads 1 9 --prepare './debiania-new-3a54b21 clean' './debiania-new-3a54b21 build +RTS -N{threads}'
Benchmark 1: ./debiania-new-3a54b21 build +RTS -N1
  Time (mean ± σ):      9.744 s ±  0.020 s    [User: 9.390 s, System: 0.611 s]
  Range (min … max):    9.703 s …  9.764 s    10 runs
 
Benchmark 2: ./debiania-new-3a54b21 build +RTS -N2
  Time (mean ± σ):      8.385 s ±  0.061 s    [User: 11.360 s, System: 1.233 s]
  Range (min … max):    8.304 s …  8.494 s    10 runs
 
Benchmark 3: ./debiania-new-3a54b21 build +RTS -N3
  Time (mean ± σ):      8.974 s ±  0.420 s    [User: 14.677 s, System: 2.336 s]
  Range (min … max):    8.464 s …  9.780 s    10 runs
 
Benchmark 4: ./debiania-new-3a54b21 build +RTS -N4
  Time (mean ± σ):      9.638 s ±  0.633 s    [User: 18.245 s, System: 3.308 s]
  Range (min … max):    8.705 s … 10.665 s    10 runs
 
Benchmark 5: ./debiania-new-3a54b21 build +RTS -N5
  Time (mean ± σ):     10.708 s ±  0.595 s    [User: 23.503 s, System: 5.118 s]
  Range (min … max):    9.555 s … 11.704 s    10 runs
 
Benchmark 6: ./debiania-new-3a54b21 build +RTS -N6
  Time (mean ± σ):     11.364 s ±  0.385 s    [User: 28.192 s, System: 6.938 s]
  Range (min … max):   10.809 s … 11.875 s    10 runs
 
Benchmark 7: ./debiania-new-3a54b21 build +RTS -N7
  Time (mean ± σ):     13.280 s ±  0.382 s    [User: 35.940 s, System: 9.331 s]
  Range (min … max):   12.556 s … 13.666 s    10 runs
 
Benchmark 8: ./debiania-new-3a54b21 build +RTS -N8
  Time (mean ± σ):     14.559 s ±  0.205 s    [User: 46.252 s, System: 11.489 s]
  Range (min … max):   14.222 s … 14.912 s    10 runs
 
Benchmark 9: ./debiania-new-3a54b21 build +RTS -N9
  Time (mean ± σ):     15.182 s ±  0.164 s    [User: 49.571 s, System: 12.024 s]
  Range (min … max):   14.943 s … 15.395 s    10 runs
 
Summary
  './debiania-new-3a54b21 build +RTS -N2' ran
    1.07 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N3'
    1.15 ± 0.08 times faster than './debiania-new-3a54b21 build +RTS -N4'
    1.16 ± 0.01 times faster than './debiania-new-3a54b21 build +RTS -N1'
    1.28 ± 0.07 times faster than './debiania-new-3a54b21 build +RTS -N5'
    1.36 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N6'
    1.58 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N7'
    1.74 ± 0.03 times faster than './debiania-new-3a54b21 build +RTS -N8'
    1.81 ± 0.02 times faster than './debiania-new-3a54b21 build +RTS -N9'

Could it be that with copyFIleCompiler the individual compiler invocations is so short that the contention on IORef is killing the gains? I guess it could be ameliorated if threads took multiple jobs from the queue at once, but that complicates the design. Perhaps I should write a recursiveCopyFilesCompier that will copy the entirety of MathJax in one go :)

@jaspervdj
Copy link
Owner Author

Even though the performance gains aren't as great as I hoped, it drastically improves resource usage (open file handles), as discussed in haskellfoundation/error-message-index#444, so I will merge this in and push a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants