-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async runtime with graph-based dependency cycle checks #844
Conversation
Here is the benchmark on a site with quite a few JPGs to compress with hakyll-images:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know next to nothing about async, so this PR will have to wait until @jaspervdj has time to review it. I left a few stylistic comments though :)
This shows noticeable improvements on my own site:
There is some scalability weirdness though. I have Intel Core i7 7700HQ, which is a mobile quad-core with HyperThreading. Theoretically, 8-thread workload should slightly outperform 4-thread, but it's not the case:
It's still faster than a single-thread, so not a blocker for this PR; just curious if anyone has any insight into what might be causing this. (The numbers in two benchmarks match exactly because it's actually a single benchmark whose report I split in two.) |
I had this problem with pandoc-plot where if the number of async jobs was unlimited, resource contention would slow down the overall running time. Let me try to limit the number of concurrent jobs to the number of cores and see if scaling makes more sense. |
@LaurentRDC Thanks for looking into that! I thought it might be related to my |
Indeed, limiting the number of concurrent tasks helps with scaling. After the most recent patch, building my site with 1 - 4 cores yields the following results:
|
I think you're looking at a different problem. Your results look like you have four physical cores or more, and you didn't go past their count. The problem I describe happens when you have SMT (e.g. HyperThreading) and you go past the number of physical cores (e.g. use 8 threads on a quad-core processor). The latest patch still exhibits this problem. Furthermore, it runs slower than the previous commit.
Note how runtime starts to climb past 4 threads (i.e. once we got into HyperThreading territory), and the overall times are higher than they were on the previous commit. I included 9- and 10-thread cases just to see the penalty for overloading the machine. It might be something specific to my setup though. I'm running Linux. What do you run, @LaurentRDC? As a workaround, I'd suggest to use something like cpuinfo to figure out the number of physical cores, and cap the number of threads to that. I didn't check what it does on non-Linux platforms though. |
I run Windows, and I don't have access to a Hyperthreading machine. Given that commit 38984f6 is not unequivocally better, I'll revert it. It appears Maybe we can leave further optimizations until more people have had their hands on it? |
This reverts commit 38984f6.
Out of my personal curiosity, what processor do you have? Four cores, runs Windows, but no SMT — that's either ARM, or an SMT-capable chip with SMT disabled. I'm really curious! (Um, just realized I was calling SMT "SMP" throughout the thread =\ )
You could just force-push the previous commit, keeping the history tidy :P
Well, we could also fall back to
Yeah, it's still better than what we have at the moment, so I'm in favour of merging even if it doesn't scale as well as I wish it did. But the scalability issue should probably be mentioned in the changelog, so people can experiment with |
I'm using an Intel Core i5-6500. 4 cores, no hyperthreading. As for the the revert commit, we can squash everything at merge time. |
@jaspervdj Would it be possible for you to review this? Thanks! |
@Minoru Given that this is a clear performance win, can you merge this? I've been testing this branch daily in CI for nearly a month without hiccups. Thanks! |
Okay. I still don't know async Haskell and can't properly review this, but it doesn't look actively malicious either :) And it'll get more testing in Note that I don't have credentials to release to Hackage. An actual release will have to wait for Jasper. |
Scalability problem is now tracked in #850. |
Thank you very much for your help! |
Thank you for doing the actual work :) |
Hey @LaurentRDC, silly question: why did you end up using |
@vaibhavsagar hmmm good question. I can't recall really. Maybe it's an artifact of using more STM in an earlier prototype? One advantage is that a Tvar is never empty, but that might not be important. |
Thanks for clarifying! If you have time I would definitely appreciate your review on my PR. I don't know how to test performance but I would expect it to be on par or better than the |
I'm on mobile for a few days unfortunately. I just tried two different builds of Hakyll on my personal website and timed it, nothing very scientific. @Minoru may be able to help more with this |
Hello,
This is a PR to make hakyll's runtime concurrent.
The general idea is based on #725. That particular implementation incorrectly identified a dependency cycle in my personal website that did not exist. Hence, this PR includes more robust dependency cycle checking based on
Data.Graph
. A test case has been added to ensure this works.The concurrency is implemented as follows.
The advantage of this implementation is that it is easy to swap between concurrent/serial by simply swapping
forConcurrently_
forforM_
. Maybe we can offer this option as a compile flag?I have tested this version of hakyll on my personal page and I'm very satisfied with the results.
closes #714