Improve async runtime scaling #946

jaspervdj · 2022-08-08T07:42:32Z

This noticeably improves performance on most sites I tried, though I usually need to copy
one of the blogposts around a 1000 times for the effect to become more noticeable and
measurable.

Still in draft because the error you get on cyclic dependencies got much worse, that's
fixable though.

Minoru

Hey, sorry it took me so long to take a look at this! The code looks good.

I tested it on my site (https://github.com/Minoru/blog.debiania.in.ua), which only has 187 Markdown files to compile, but also embeds 32 thousand MathJax files (processed with copyFileCompiler). The performance is indeed better, but the scaling is still bad. Furthermore, the sweet spot is now at 2 cores rather than 4 cores (for an 4-core CPU with hyperthreading), which is odd.

$ hyperfine --prepare './debiania-old-1645d9c clean' './debiania-old-1645d9c build'
Benchmark 1: ./debiania-old-1645d9c build
  Time (mean ± σ):     15.999 s ±  0.324 s    [User: 52.089 s, System: 14.393 s]
  Range (min … max):   15.472 s … 16.603 s    10 runs


$ hyperfine --parameter-scan threads 1 9 --prepare './debiania-old-1645d9c clean' './debiania-old-1645d9c build +RTS -N{threads}'
Benchmark 1: ./debiania-old-1645d9c build +RTS -N1
  Time (mean ± σ):     13.335 s ±  0.150 s    [User: 13.083 s, System: 0.768 s]
  Range (min … max):   13.101 s … 13.601 s    10 runs
 
Benchmark 2: ./debiania-old-1645d9c build +RTS -N2
  Time (mean ± σ):     11.098 s ±  0.344 s    [User: 15.222 s, System: 2.124 s]
  Range (min … max):   10.690 s … 11.691 s    10 runs
 
Benchmark 3: ./debiania-old-1645d9c build +RTS -N3
  Time (mean ± σ):     10.812 s ±  0.601 s    [User: 18.158 s, System: 3.344 s]
  Range (min … max):    9.991 s … 11.861 s    10 runs
 
Benchmark 4: ./debiania-old-1645d9c build +RTS -N4
  Time (mean ± σ):     11.622 s ±  0.496 s    [User: 22.643 s, System: 5.129 s]
  Range (min … max):   11.041 s … 12.421 s    10 runs
 
Benchmark 5: ./debiania-old-1645d9c build +RTS -N5
  Time (mean ± σ):     12.484 s ±  0.622 s    [User: 28.233 s, System: 7.239 s]
  Range (min … max):   11.621 s … 13.492 s    10 runs
 
Benchmark 6: ./debiania-old-1645d9c build +RTS -N6
  Time (mean ± σ):     13.577 s ±  0.473 s    [User: 33.883 s, System: 9.680 s]
  Range (min … max):   12.752 s … 14.310 s    10 runs
 
Benchmark 7: ./debiania-old-1645d9c build +RTS -N7
  Time (mean ± σ):     14.613 s ±  0.323 s    [User: 40.769 s, System: 12.303 s]
  Range (min … max):   13.957 s … 14.992 s    10 runs
 
Benchmark 8: ./debiania-old-1645d9c build +RTS -N8
  Time (mean ± σ):     15.961 s ±  0.250 s    [User: 51.374 s, System: 14.443 s]
  Range (min … max):   15.504 s … 16.296 s    10 runs
 
Benchmark 9: ./debiania-old-1645d9c build +RTS -N9
  Time (mean ± σ):     17.072 s ±  0.263 s    [User: 57.513 s, System: 16.108 s]
  Range (min … max):   16.535 s … 17.337 s    10 runs
 
Summary
  './debiania-old-1645d9c build +RTS -N3' ran
    1.03 ± 0.07 times faster than './debiania-old-1645d9c build +RTS -N2'
    1.07 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N4'
    1.15 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N5'
    1.23 ± 0.07 times faster than './debiania-old-1645d9c build +RTS -N1'
    1.26 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N6'
    1.35 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N7'
    1.48 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N8'
    1.58 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N9'


$ hyperfine --prepare './debiania-new-3a54b21 clean' './debiania-new-3a54b21 build'
Benchmark 1: ./debiania-new-3a54b21 build
  Time (mean ± σ):     14.586 s ±  0.188 s    [User: 46.536 s, System: 11.439 s]
  Range (min … max):   14.244 s … 14.874 s    10 runs


$ hyperfine --parameter-scan threads 1 9 --prepare './debiania-new-3a54b21 clean' './debiania-new-3a54b21 build +RTS -N{threads}'
Benchmark 1: ./debiania-new-3a54b21 build +RTS -N1
  Time (mean ± σ):      9.744 s ±  0.020 s    [User: 9.390 s, System: 0.611 s]
  Range (min … max):    9.703 s …  9.764 s    10 runs
 
Benchmark 2: ./debiania-new-3a54b21 build +RTS -N2
  Time (mean ± σ):      8.385 s ±  0.061 s    [User: 11.360 s, System: 1.233 s]
  Range (min … max):    8.304 s …  8.494 s    10 runs
 
Benchmark 3: ./debiania-new-3a54b21 build +RTS -N3
  Time (mean ± σ):      8.974 s ±  0.420 s    [User: 14.677 s, System: 2.336 s]
  Range (min … max):    8.464 s …  9.780 s    10 runs
 
Benchmark 4: ./debiania-new-3a54b21 build +RTS -N4
  Time (mean ± σ):      9.638 s ±  0.633 s    [User: 18.245 s, System: 3.308 s]
  Range (min … max):    8.705 s … 10.665 s    10 runs
 
Benchmark 5: ./debiania-new-3a54b21 build +RTS -N5
  Time (mean ± σ):     10.708 s ±  0.595 s    [User: 23.503 s, System: 5.118 s]
  Range (min … max):    9.555 s … 11.704 s    10 runs
 
Benchmark 6: ./debiania-new-3a54b21 build +RTS -N6
  Time (mean ± σ):     11.364 s ±  0.385 s    [User: 28.192 s, System: 6.938 s]
  Range (min … max):   10.809 s … 11.875 s    10 runs
 
Benchmark 7: ./debiania-new-3a54b21 build +RTS -N7
  Time (mean ± σ):     13.280 s ±  0.382 s    [User: 35.940 s, System: 9.331 s]
  Range (min … max):   12.556 s … 13.666 s    10 runs
 
Benchmark 8: ./debiania-new-3a54b21 build +RTS -N8
  Time (mean ± σ):     14.559 s ±  0.205 s    [User: 46.252 s, System: 11.489 s]
  Range (min … max):   14.222 s … 14.912 s    10 runs
 
Benchmark 9: ./debiania-new-3a54b21 build +RTS -N9
  Time (mean ± σ):     15.182 s ±  0.164 s    [User: 49.571 s, System: 12.024 s]
  Range (min … max):   14.943 s … 15.395 s    10 runs
 
Summary
  './debiania-new-3a54b21 build +RTS -N2' ran
    1.07 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N3'
    1.15 ± 0.08 times faster than './debiania-new-3a54b21 build +RTS -N4'
    1.16 ± 0.01 times faster than './debiania-new-3a54b21 build +RTS -N1'
    1.28 ± 0.07 times faster than './debiania-new-3a54b21 build +RTS -N5'
    1.36 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N6'
    1.58 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N7'
    1.74 ± 0.03 times faster than './debiania-new-3a54b21 build +RTS -N8'
    1.81 ± 0.02 times faster than './debiania-new-3a54b21 build +RTS -N9'

Could it be that with copyFIleCompiler the individual compiler invocations is so short that the contention on IORef is killing the gains? I guess it could be ameliorated if threads took multiple jobs from the queue at once, but that complicates the design. Perhaps I should write a recursiveCopyFilesCompier that will copy the entirety of MathJax in one go :)

jaspervdj · 2023-08-23T15:08:34Z

Even though the performance gains aren't as great as I hoped, it drastically improves resource usage (open file handles), as discussed in haskellfoundation/error-message-index#444, so I will merge this in and push a release.

jaspervdj added 15 commits August 3, 2022 12:17

Stub: async scheduler

d5f4aa7

Stub: async scheduler

912e36b

Stub: async scheduler

4c86f66

Stub: async scheduler

ecafce9

Stub: async scheduler

ef9fcc8

Stub: async scheduler

7cc5e9d

Stub: async scheduler

d022b27

Stub: async scheduler

7f411cb

Stub: async scheduler

1ea25de

Stub: async scheduler

d3274cc

Stub: async scheduler

1f9ba8b

Stub: async scheduler

866202c

Stub: async scheduler

04e9f87

Stub: async scheduler

8266a73

Stub: async scheduler

0e1dcf2

jaspervdj requested a review from Minoru August 8, 2022 07:42

jaspervdj added 2 commits August 12, 2022 15:47

Make logger more testable

5c70f28

Add better error for cycles and test for this

3a54b21

jaspervdj marked this pull request as ready for review August 12, 2022 14:19

Minoru approved these changes Aug 18, 2022

View reviewed changes

Merge branch 'master' into async-scheduler

8f8f578

jaspervdj mentioned this pull request Aug 23, 2023

Build problems on Mac due to max number of file handles haskellfoundation/error-message-index#444

Closed

jaspervdj merged commit 9696a85 into master Aug 23, 2023

vaibhavsagar mentioned this pull request Aug 26, 2023

Make async runtime scale better on SMT machines #850

Open

Minoru deleted the async-scheduler branch August 26, 2023 12:45

BinderDavid mentioned this pull request Sep 1, 2023

Observable change of execution order was introduced in #946 if two matches write same file #1000

Open

Minoru mentioned this pull request Oct 13, 2024

Call for more co-maintainers (as I step down) #1048

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve async runtime scaling #946

Improve async runtime scaling #946

jaspervdj commented Aug 8, 2022 •

edited

Loading

Minoru left a comment

jaspervdj commented Aug 23, 2023

Improve async runtime scaling #946

Improve async runtime scaling #946

Conversation

jaspervdj commented Aug 8, 2022 • edited Loading

Minoru left a comment

Choose a reason for hiding this comment

jaspervdj commented Aug 23, 2023

jaspervdj commented Aug 8, 2022 •

edited

Loading