-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve async runtime scaling #946
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry it took me so long to take a look at this! The code looks good.
I tested it on my site (https://github.com/Minoru/blog.debiania.in.ua), which only has 187 Markdown files to compile, but also embeds 32 thousand MathJax files (processed with copyFileCompiler
). The performance is indeed better, but the scaling is still bad. Furthermore, the sweet spot is now at 2 cores rather than 4 cores (for an 4-core CPU with hyperthreading), which is odd.
$ hyperfine --prepare './debiania-old-1645d9c clean' './debiania-old-1645d9c build'
Benchmark 1: ./debiania-old-1645d9c build
Time (mean ± σ): 15.999 s ± 0.324 s [User: 52.089 s, System: 14.393 s]
Range (min … max): 15.472 s … 16.603 s 10 runs
$ hyperfine --parameter-scan threads 1 9 --prepare './debiania-old-1645d9c clean' './debiania-old-1645d9c build +RTS -N{threads}'
Benchmark 1: ./debiania-old-1645d9c build +RTS -N1
Time (mean ± σ): 13.335 s ± 0.150 s [User: 13.083 s, System: 0.768 s]
Range (min … max): 13.101 s … 13.601 s 10 runs
Benchmark 2: ./debiania-old-1645d9c build +RTS -N2
Time (mean ± σ): 11.098 s ± 0.344 s [User: 15.222 s, System: 2.124 s]
Range (min … max): 10.690 s … 11.691 s 10 runs
Benchmark 3: ./debiania-old-1645d9c build +RTS -N3
Time (mean ± σ): 10.812 s ± 0.601 s [User: 18.158 s, System: 3.344 s]
Range (min … max): 9.991 s … 11.861 s 10 runs
Benchmark 4: ./debiania-old-1645d9c build +RTS -N4
Time (mean ± σ): 11.622 s ± 0.496 s [User: 22.643 s, System: 5.129 s]
Range (min … max): 11.041 s … 12.421 s 10 runs
Benchmark 5: ./debiania-old-1645d9c build +RTS -N5
Time (mean ± σ): 12.484 s ± 0.622 s [User: 28.233 s, System: 7.239 s]
Range (min … max): 11.621 s … 13.492 s 10 runs
Benchmark 6: ./debiania-old-1645d9c build +RTS -N6
Time (mean ± σ): 13.577 s ± 0.473 s [User: 33.883 s, System: 9.680 s]
Range (min … max): 12.752 s … 14.310 s 10 runs
Benchmark 7: ./debiania-old-1645d9c build +RTS -N7
Time (mean ± σ): 14.613 s ± 0.323 s [User: 40.769 s, System: 12.303 s]
Range (min … max): 13.957 s … 14.992 s 10 runs
Benchmark 8: ./debiania-old-1645d9c build +RTS -N8
Time (mean ± σ): 15.961 s ± 0.250 s [User: 51.374 s, System: 14.443 s]
Range (min … max): 15.504 s … 16.296 s 10 runs
Benchmark 9: ./debiania-old-1645d9c build +RTS -N9
Time (mean ± σ): 17.072 s ± 0.263 s [User: 57.513 s, System: 16.108 s]
Range (min … max): 16.535 s … 17.337 s 10 runs
Summary
'./debiania-old-1645d9c build +RTS -N3' ran
1.03 ± 0.07 times faster than './debiania-old-1645d9c build +RTS -N2'
1.07 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N4'
1.15 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N5'
1.23 ± 0.07 times faster than './debiania-old-1645d9c build +RTS -N1'
1.26 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N6'
1.35 ± 0.08 times faster than './debiania-old-1645d9c build +RTS -N7'
1.48 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N8'
1.58 ± 0.09 times faster than './debiania-old-1645d9c build +RTS -N9'
$ hyperfine --prepare './debiania-new-3a54b21 clean' './debiania-new-3a54b21 build'
Benchmark 1: ./debiania-new-3a54b21 build
Time (mean ± σ): 14.586 s ± 0.188 s [User: 46.536 s, System: 11.439 s]
Range (min … max): 14.244 s … 14.874 s 10 runs
$ hyperfine --parameter-scan threads 1 9 --prepare './debiania-new-3a54b21 clean' './debiania-new-3a54b21 build +RTS -N{threads}'
Benchmark 1: ./debiania-new-3a54b21 build +RTS -N1
Time (mean ± σ): 9.744 s ± 0.020 s [User: 9.390 s, System: 0.611 s]
Range (min … max): 9.703 s … 9.764 s 10 runs
Benchmark 2: ./debiania-new-3a54b21 build +RTS -N2
Time (mean ± σ): 8.385 s ± 0.061 s [User: 11.360 s, System: 1.233 s]
Range (min … max): 8.304 s … 8.494 s 10 runs
Benchmark 3: ./debiania-new-3a54b21 build +RTS -N3
Time (mean ± σ): 8.974 s ± 0.420 s [User: 14.677 s, System: 2.336 s]
Range (min … max): 8.464 s … 9.780 s 10 runs
Benchmark 4: ./debiania-new-3a54b21 build +RTS -N4
Time (mean ± σ): 9.638 s ± 0.633 s [User: 18.245 s, System: 3.308 s]
Range (min … max): 8.705 s … 10.665 s 10 runs
Benchmark 5: ./debiania-new-3a54b21 build +RTS -N5
Time (mean ± σ): 10.708 s ± 0.595 s [User: 23.503 s, System: 5.118 s]
Range (min … max): 9.555 s … 11.704 s 10 runs
Benchmark 6: ./debiania-new-3a54b21 build +RTS -N6
Time (mean ± σ): 11.364 s ± 0.385 s [User: 28.192 s, System: 6.938 s]
Range (min … max): 10.809 s … 11.875 s 10 runs
Benchmark 7: ./debiania-new-3a54b21 build +RTS -N7
Time (mean ± σ): 13.280 s ± 0.382 s [User: 35.940 s, System: 9.331 s]
Range (min … max): 12.556 s … 13.666 s 10 runs
Benchmark 8: ./debiania-new-3a54b21 build +RTS -N8
Time (mean ± σ): 14.559 s ± 0.205 s [User: 46.252 s, System: 11.489 s]
Range (min … max): 14.222 s … 14.912 s 10 runs
Benchmark 9: ./debiania-new-3a54b21 build +RTS -N9
Time (mean ± σ): 15.182 s ± 0.164 s [User: 49.571 s, System: 12.024 s]
Range (min … max): 14.943 s … 15.395 s 10 runs
Summary
'./debiania-new-3a54b21 build +RTS -N2' ran
1.07 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N3'
1.15 ± 0.08 times faster than './debiania-new-3a54b21 build +RTS -N4'
1.16 ± 0.01 times faster than './debiania-new-3a54b21 build +RTS -N1'
1.28 ± 0.07 times faster than './debiania-new-3a54b21 build +RTS -N5'
1.36 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N6'
1.58 ± 0.05 times faster than './debiania-new-3a54b21 build +RTS -N7'
1.74 ± 0.03 times faster than './debiania-new-3a54b21 build +RTS -N8'
1.81 ± 0.02 times faster than './debiania-new-3a54b21 build +RTS -N9'
Could it be that with copyFIleCompiler
the individual compiler invocations is so short that the contention on IORef
is killing the gains? I guess it could be ameliorated if threads took multiple jobs from the queue at once, but that complicates the design. Perhaps I should write a recursiveCopyFilesCompier
that will copy the entirety of MathJax in one go :)
Even though the performance gains aren't as great as I hoped, it drastically improves resource usage (open file handles), as discussed in haskellfoundation/error-message-index#444, so I will merge this in and push a release. |
This noticeably improves performance on most sites I tried, though I usually need to copy
one of the blogposts around a 1000 times for the effect to become more noticeable and
measurable.
Still in draft because the error you get on cyclic dependencies got much worse, that'sfixable though.