-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add support for async worker processes #2547
Conversation
No users yet, but next up is the process pool.
This makes buck call the tool concurrently, without waiting for responses. It's up to the tool to provide responses with the right ids, in any order.
@mikekap Please fix this test error: Check license of: src/com/facebook/buck/worker/WorkerProcessPool.java. |
@v-jizhang fixed the missing license issue. |
@mikekap I discussed this PR with the Buck team and really for this huge change, it's hard for me to get it reviewed. Thanks |
Let me try to answer these. In terms of reviewing, it looks more daunting than it is because github doesn't have rename detection -
As built, Beyond that - no code is shared between sync/async worker processes.
For us, it improves performance significantly. We compile Clojure via a JVM-based process. Clojure is Java based, but uses a lot of static initializers even after compilation, which makes dependency loading during compilation a bottleneck. This PR lets us use a single process to cache dependencies that we have already loaded and parallelize via regular futures in the worker process.
Sorry, my intent wasn't to burden. Apologies. We are running this in our CI - so it helps in our use case. Unfortunately my previous work around adding persistent processes didn't yield as much gains for our use case. I'm happy to restructure this as needed - not tied to the current implementation at all. I'm hoping to get this upstreamed so various |
Don't crash all of buck when a worker process fails.
No apologies, my previous comment is just some thoughts from buck team. |
private final CountDownLatch readerShutdownSignal = new CountDownLatch(1); | ||
private final Thread watchdogThread; | ||
private final CountDownLatch watchdogShutdownSignal = new CountDownLatch(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variables readerShutdownSignal and watchdogShutdownSignal are used as flags, should we make them volatile boolean? That way we will have 2 less objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite - the latches are .await
-able. Realistically these objects probably aren't memory-intensive for buck - there's a WorkerProcess
object per long-lived process. You would rarely have more than ncpus of these processes running -- though I guess it depends how many worker tools you use!
Update: I got rid of these by killing the watchdog and went with your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you
} | ||
|
||
@Override | ||
public synchronized void close() { | ||
LOG.debug("Closing process %d", this.hashCode()); | ||
try { | ||
readerShutdownSignal.countDown(); | ||
synchronized (commandExitCodes) { | ||
commandExitCodes.notifyAll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why notifyAll? we have only one thread in waiting.
|
||
WorkerProcessProtocol.CommandResponse commandResponse = | ||
this.protocol.receiveNextCommandResponse(); | ||
SettableFuture<Integer> result = commandExitCodes.remove(commandResponse.getCommandId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is called in thread readerLoop, a different thread from submitJob, and both threads access commandExitCodes. Should we protect it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commandExitCodes
is a ConcurrentHashMap
so there's no need. Though honestly the threading here is a bit complex because of the watchdog. I updated the implementation to get rid of the watchdog (it was mostly for paranoia) and add a volatile boolean for the shutdown signal as you suggested. Also added a test for worker death.
The locking is still a bit more complex than I'd prefer, so I'm open to suggestions here - there's a weird internal vs external locking of data - externally this
is used to lock submitJob
& close
, but the reader thread uses readerThread
to be woken up when there is something to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's Ok.
if (!readerAlive) { | ||
failAllFutures(); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once readerAlive becomes false, it stays false forever. Should it be set to true here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused with the readerAlive logic. Could you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's the intent. readerAlive
is set to false
only if the worker returns a malformed response. In this context, malformed responses are signs something terrible happened - even build step failures are well formed responses. This could only happen on e.g. malformed json or if the process just died. In that case, we stop attempting to communicate with the process at all and just fail all incoming requests until the build eventually close()
s the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I c, thank you.
"Tried to submit a job to the worker process before the handshake was performed."); | ||
|
||
Preconditions.checkState( | ||
!Thread.holdsLock(this), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be false, why need we check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as written, this is definitely true. This is mostly guarding to future changes in the code - if e.g. the caller is synchronized
by accident.
lgtm. Thanks |
* Respect user input if abi_generation_mode is explicitly provided in java_library (facebook#2467) * Automate publishing docs on CirclrCI (facebook#2486) * Automate publishing docs on CirclrCI * Automate publishing docs on CircleCI * Automate publishing docs on CircleCI * Automate publishing docs using linux VM * Fixed Python3 compatible issues * Added code to install android_sdk * Automate docs, installed JDK for java docs * Automate publishing docs on CircleCI, added CNAME * Automate publishing docs deleted an extra line * Automate publishing docs, refactored code * Automate publishing docs, changed a required test name Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * Fixed git push access denied issue (facebook#2487) Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * handle bundling resources of static frameworks (Originally by rmaz) (facebook#2490) * handle bundling resources of static framworks add a test for bundling resources of static deps make formatter happy * Indentation fix * Reverted some filtering changes Co-authored-by: Richard Howell <richardh@uber.com> Co-authored-by: Gabe <gabriel.beltrame@luizalabs.com> * Add support for java 11 builds in jitpack (facebook#2499) * Fixed Unable to resolve dependency jdk8 (= 8.0.172) (facebook#2492) Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * CircleCI fix (facebook#2495) Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * Added python3 to interpreter names (facebook#2497) * Added python3 to interpreter names * Fixed an infinite recursive call * Test fixes * test fixes * Pull out unrelated changes Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * Fixes for rust tests (facebook#2503) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Add support for ANDROID_SDK_ROOT environment variable (facebook#2507) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fix for facebook#2498 (facebook#2504) * Fix for facebook#2498 Buck fails to build non predex apps which include jars containing module-info.class files` facebook#2498 * Update DalvikStatsCache.java Typo fix * Update DalvikStatsCache.java Added comment for special case * Update DalvikStatsCache.java Added new New line and space before comments. v-jizhang 12 hours ago Contributor Could you move your comment 1 line down and add 1 space after "//"? Thanks @surapuramakhil * extracted filename from entry relative path, updated check with equals intead of ends with * updated comment * Set max store size bytes correctly so it is used by AbstractAsynchronousCache to decide what to upload (facebook#2510) * CI rust test fixes (facebook#2511) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fixed a phthon installation bug on CircleCI (facebook#2509) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * add Easynvest to buck showcases (facebook#2514) Co-authored-by: Lucas Marçal <lucas.fernandes@easynvest.com.br> * Make Python3 the first choice of python interpreter (facebook#2512) * Make Python3 the first choice of python interpreter * Fixed tests * Fixed tests * Fixed tests * Fixed testgs * Fixed tests * Fixed tests Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fixed a python3 compatible issue (facebook#2517) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Added application_region and application_language properties to xcode_workspace_config (facebook#2519) * Add region and language option to launch and test actions in xcode_workspace_config * Add application region and language arguments to projectV2 * Create unit test Co-authored-by: Lucas Marçal <lucas.fernandes@easynvest.com.br> * Added Android NDK tests (facebook#2515) * Added Android NDK 19 tests * Added ndk 20 tests * Added NDK 21 tests * Added timout for long running tests Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Saifulriza (facebook#2523) * Add logo * Update README.md * Update README.md * Add full kotlin configurations to kotlin rule (v2) (facebook#2526) * Add full kotlin configurations to kotlin rule * Adding test for KotlinConfiguredComilerFactory * Fixing pmd issue on setup method * Renaming remaining references to extra_kotlinc_arguments to free_compiler_args * Adding license info to new test file Co-authored-by: Zac Sweers <pandanomic@gmail.com> * Fixing paths that fail during project generation (facebook#2527) * (Rebase) Add snapshot_images_diff_path to apple_test (facebook#2528) * Add snapshot_images_diff_path to apple_test * Fix some issues found at code review Co-authored-by: Lucas Marçal <lucas.fernandes@easynvest.com.br> * Fixes for Windows tests (facebook#2529) * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Added Flavors to docs (facebook#2522) * Added Flavors to docs * Flavor documentation, moddified per code review. Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Added a kotlin tutorial for android (facebook#2525) * Added a kotlin tutorial for android * Corrected errors in kotlin tutorial Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Prepare release v2020.09.01 (facebook#2532) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Added dependency to pulbishing docs (facebook#2534) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fixed CircleCI tests (facebook#2531) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Upgraded ruby syntax in replease process (facebook#2533) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Skip flaky tests on Windows platform (facebook#2536) * Skip flaky tests on Windows platform * Skip flaky tests on Windows platform * Skip flaky tests on Windows platform * Skip flaky tests on Windows platform * Skip a flaky test on Windows * Increased a timeout in Winddows test. (facebook#2544) * Fixed dlang installation error (facebook#2543) * Fixed dlang installation error * Fixed dlang installation error Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Install python2 from archive. (facebook#2542) * Install python2 from archive. * Fixed an python3 installation error. Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fixed a link (facebook#2539) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Added a rust tutorial (facebook#2540) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Add support for async worker processes (facebook#2547) * Allow WorkerProcess to be used for multiple jobs at the same time. No users yet, but next up is the process pool. * Introduce a new `async` option to worker tools. This makes buck call the tool concurrently, without waiting for responses. It's up to the tool to provide responses with the right ids, in any order. * Write some tests, fix some bugs * Add back license * Update WorkerProcessPool.java * Fix windows tests Don't crash all of buck when a worker process fails. * Code review feedback and clarifications, I hope Co-authored-by: David Z Hao <davidhao3300@yahoo.com> Co-authored-by: Jinlin Zhang <jinlinzhang@fb.com> Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> Co-authored-by: Gabe <gabriel.unknoob@gmail.com> Co-authored-by: Richard Howell <richardh@uber.com> Co-authored-by: Gabe <gabriel.beltrame@luizalabs.com> Co-authored-by: Gautam Korlam <kageiit@users.noreply.github.com> Co-authored-by: Akhil <surapuramakhil@gmail.com> Co-authored-by: Ravi Agarwal <raviagarwal7@users.noreply.github.com> Co-authored-by: Lucas Marçal <lucas.fernandes@easynvest.com.br> Co-authored-by: Saiful Riza <45083824+Saifulriza123@users.noreply.github.com> Co-authored-by: Ty Smith <tys@uber.com> Co-authored-by: Zac Sweers <pandanomic@gmail.com> Co-authored-by: Mike Kaplinskiy <mike.kaplinskiy@gmail.com> Co-authored-by: Lucas Marçal <lucas.marcal@ifood.com.br>
* Respect user input if abi_generation_mode is explicitly provided in java_library (facebook#2467) * Automate publishing docs on CirclrCI (facebook#2486) * Automate publishing docs on CirclrCI * Automate publishing docs on CircleCI * Automate publishing docs on CircleCI * Automate publishing docs using linux VM * Fixed Python3 compatible issues * Added code to install android_sdk * Automate docs, installed JDK for java docs * Automate publishing docs on CircleCI, added CNAME * Automate publishing docs deleted an extra line * Automate publishing docs, refactored code * Automate publishing docs, changed a required test name Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * Fixed git push access denied issue (facebook#2487) Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * handle bundling resources of static frameworks (Originally by rmaz) (facebook#2490) * handle bundling resources of static framworks add a test for bundling resources of static deps make formatter happy * Indentation fix * Reverted some filtering changes Co-authored-by: Richard Howell <richardh@uber.com> Co-authored-by: Gabe <gabriel.beltrame@luizalabs.com> * Add support for java 11 builds in jitpack (facebook#2499) * Fixed Unable to resolve dependency jdk8 (= 8.0.172) (facebook#2492) Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * CircleCI fix (facebook#2495) Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * Added python3 to interpreter names (facebook#2497) * Added python3 to interpreter names * Fixed an infinite recursive call * Test fixes * test fixes * Pull out unrelated changes Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> * Fixes for rust tests (facebook#2503) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Add support for ANDROID_SDK_ROOT environment variable (facebook#2507) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fix for facebook#2498 (facebook#2504) * Fix for facebook#2498 Buck fails to build non predex apps which include jars containing module-info.class files` facebook#2498 * Update DalvikStatsCache.java Typo fix * Update DalvikStatsCache.java Added comment for special case * Update DalvikStatsCache.java Added new New line and space before comments. v-jizhang 12 hours ago Contributor Could you move your comment 1 line down and add 1 space after "//"? Thanks @surapuramakhil * extracted filename from entry relative path, updated check with equals intead of ends with * updated comment * Set max store size bytes correctly so it is used by AbstractAsynchronousCache to decide what to upload (facebook#2510) * CI rust test fixes (facebook#2511) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fixed a phthon installation bug on CircleCI (facebook#2509) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * add Easynvest to buck showcases (facebook#2514) Co-authored-by: Lucas Marçal <lucas.fernandes@easynvest.com.br> * Make Python3 the first choice of python interpreter (facebook#2512) * Make Python3 the first choice of python interpreter * Fixed tests * Fixed tests * Fixed tests * Fixed testgs * Fixed tests * Fixed tests Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fixed a python3 compatible issue (facebook#2517) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Added application_region and application_language properties to xcode_workspace_config (facebook#2519) * Add region and language option to launch and test actions in xcode_workspace_config * Add application region and language arguments to projectV2 * Create unit test Co-authored-by: Lucas Marçal <lucas.fernandes@easynvest.com.br> * Added Android NDK tests (facebook#2515) * Added Android NDK 19 tests * Added ndk 20 tests * Added NDK 21 tests * Added timout for long running tests Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Saifulriza (facebook#2523) * Add logo * Update README.md * Update README.md * Add full kotlin configurations to kotlin rule (v2) (facebook#2526) * Add full kotlin configurations to kotlin rule * Adding test for KotlinConfiguredComilerFactory * Fixing pmd issue on setup method * Renaming remaining references to extra_kotlinc_arguments to free_compiler_args * Adding license info to new test file Co-authored-by: Zac Sweers <pandanomic@gmail.com> * Fixing paths that fail during project generation (facebook#2527) * (Rebase) Add snapshot_images_diff_path to apple_test (facebook#2528) * Add snapshot_images_diff_path to apple_test * Fix some issues found at code review Co-authored-by: Lucas Marçal <lucas.fernandes@easynvest.com.br> * Fixes for Windows tests (facebook#2529) * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Fixes for Windows tests * Added Flavors to docs (facebook#2522) * Added Flavors to docs * Flavor documentation, moddified per code review. Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Added a kotlin tutorial for android (facebook#2525) * Added a kotlin tutorial for android * Corrected errors in kotlin tutorial Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Prepare release v2020.09.01 (facebook#2532) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Added dependency to pulbishing docs (facebook#2534) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fixed CircleCI tests (facebook#2531) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Upgraded ruby syntax in replease process (facebook#2533) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Skip flaky tests on Windows platform (facebook#2536) * Skip flaky tests on Windows platform * Skip flaky tests on Windows platform * Skip flaky tests on Windows platform * Skip flaky tests on Windows platform * Skip a flaky test on Windows * Increased a timeout in Winddows test. (facebook#2544) * Fixed dlang installation error (facebook#2543) * Fixed dlang installation error * Fixed dlang installation error Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Install python2 from archive. (facebook#2542) * Install python2 from archive. * Fixed an python3 installation error. Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Fixed a link (facebook#2539) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Added a rust tutorial (facebook#2540) Co-authored-by: v-jizhang <66389669+buck-bot@users.noreply.github.com> * Add app clip support * Add integration test * fix target name at Integration test * fix typo * Fix bundle ProductType * Add support for async worker processes (facebook#2547) * Allow WorkerProcess to be used for multiple jobs at the same time. No users yet, but next up is the process pool. * Introduce a new `async` option to worker tools. This makes buck call the tool concurrently, without waiting for responses. It's up to the tool to provide responses with the right ids, in any order. * Write some tests, fix some bugs * Add back license * Update WorkerProcessPool.java * Fix windows tests Don't crash all of buck when a worker process fails. * Code review feedback and clarifications, I hope * remove duplicated change Co-authored-by: David Z Hao <davidhao3300@yahoo.com> Co-authored-by: Jinlin Zhang <jinlinzhang@fb.com> Co-authored-by: buck-bot <66389669+buck-bot@users.noreply.github.com> Co-authored-by: Gabe <gabriel.unknoob@gmail.com> Co-authored-by: Richard Howell <richardh@uber.com> Co-authored-by: Gabe <gabriel.beltrame@luizalabs.com> Co-authored-by: Gautam Korlam <kageiit@users.noreply.github.com> Co-authored-by: Akhil <surapuramakhil@gmail.com> Co-authored-by: Ravi Agarwal <raviagarwal7@users.noreply.github.com> Co-authored-by: Lucas Marçal <lucas.fernandes@easynvest.com.br> Co-authored-by: Saiful Riza <45083824+Saifulriza123@users.noreply.github.com> Co-authored-by: Ty Smith <tys@uber.com> Co-authored-by: Zac Sweers <pandanomic@gmail.com> Co-authored-by: Lucas Marçal <lucas.marcal@ifood.com.br> Co-authored-by: Mike Kaplinskiy <mike.kaplinskiy@gmail.com>
* Allow WorkerProcess to be used for multiple jobs at the same time. No users yet, but next up is the process pool. * Introduce a new `async` option to worker tools. This makes buck call the tool concurrently, without waiting for responses. It's up to the tool to provide responses with the right ids, in any order. * Write some tests, fix some bugs * Add back license * Update WorkerProcessPool.java * Fix windows tests Don't crash all of buck when a worker process fails. * Code review feedback and clarifications, I hope
* Allow WorkerProcess to be used for multiple jobs at the same time. No users yet, but next up is the process pool. * Introduce a new `async` option to worker tools. This makes buck call the tool concurrently, without waiting for responses. It's up to the tool to provide responses with the right ids, in any order. * Write some tests, fix some bugs * Add back license * Update WorkerProcessPool.java * Fix windows tests Don't crash all of buck when a worker process fails. * Code review feedback and clarifications, I hope
* Allow WorkerProcess to be used for multiple jobs at the same time. No users yet, but next up is the process pool. * Introduce a new `async` option to worker tools. This makes buck call the tool concurrently, without waiting for responses. It's up to the tool to provide responses with the right ids, in any order. * Write some tests, fix some bugs * Add back license * Update WorkerProcessPool.java * Fix windows tests Don't crash all of buck when a worker process fails. * Code review feedback and clarifications, I hope
The idea is that you might be able to parallelize internally in the worker process, with the benefit of shared memory to cache details about dependencies. Async worker processes can receive & run multiple commands concurrently. They respond with results asynchronously in any order.
PR Notes:
solo_async=True
becauseasync
is a python keyword.ListenableFuture
s - open to suggestions. The only real benefit is that it makes testing concurrent execution/cancellation easier.Sync
worker pool can be cleaned up with the new interface (less of a need for aBurrowedWorkerProcess
& lifecycle). I did not do this to keep this PR manageable.