Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid synchronized section in CoroutineScheduler during internal pool growth #3652

Closed
qwwdfsad opened this issue Mar 2, 2023 · 2 comments
Closed
Assignees

Comments

@qwwdfsad
Copy link
Contributor

qwwdfsad commented Mar 2, 2023

We had multiple freeze reports from IDEA, all with the following stacktrace:

	at kotlinx.coroutines.scheduling.CoroutineScheduler.createNewWorker(CoroutineScheduler.kt:1007)
		at kotlinx.coroutines.scheduling.CoroutineScheduler.tryCreateWorker(CoroutineScheduler.kt:439)
		at kotlinx.coroutines.scheduling.CoroutineScheduler.tryCreateWorker$default(CoroutineScheduler.kt:430)
		at kotlinx.coroutines.scheduling.CoroutineScheduler.signalCpuWork(CoroutineScheduler.kt:426)
		at kotlinx.coroutines.scheduling.CoroutineScheduler.dispatch(CoroutineScheduler.kt:398)
		at kotlinx.coroutines.scheduling.CoroutineScheduler.dispatch$default(CoroutineScheduler.kt:382)
		at kotlinx.coroutines.scheduling.SchedulerCoroutineDispatcher.dispatch(Dispatcher.kt:97)

During the startup phase, IDEA aggressively launches hundreds of coroutines, and some of them are run from the main thread;
Due to the fact that the growing mechanism uses coarse-grained lock as the easiest way to avoid thread overprovision, that during contention might cause sub-ms pauses

@gregsh
Copy link

gregsh commented Mar 2, 2023

I see freezes of several hundreds of milliseconds in length in UI thread.

@qwwdfsad qwwdfsad self-assigned this Mar 3, 2023
@qwwdfsad
Copy link
Contributor Author

qwwdfsad commented Mar 21, 2023

I was trying to figure out why on earth this synchronized section was contended up to hundreds of milliseconds.

Here is the more or less self-explanatory benchmarking result:

Benchmark                                         Mode  Cnt   Score   Error  Units
ChannelSinkBenchmark.allocateAndStartEmptyThread  avgt    5  48.987 ± 0.850  us/op
ChannelSinkBenchmark.allocateEmptyArray64         avgt    5   0.019 ± 0.001  us/op
ChannelSinkBenchmark.allocateEmptyThread          avgt    5   1.295 ± 0.056  us/op

Allocating a new thread [under a lock] is a non-trivial operation that incurs a non-trivial overhead of JVM upcalls, that may consume unpredictable (when compared to "increment, TLAB-allocate and CAS" action) amount of time that may lead to an additional contention

Flame graph image

qwwdfsad added a commit that referenced this issue Mar 21, 2023
The previous implementation was prone to a non-trivial contention that caused EDT freezes and, potentially, Android's ANRs.

Two root causes were identified:

1) Thread() constructor that has a non-trivial complexity along with JVM upcalls and is significantly slower than any other regular allocation
2) Thread.start() is on itself a JVM upcall that ends up on a global JVM lock[s]

Thread.start() is now invoked when the lock is released to reduce contention.
The first root cause is not addressed as optimistic thread allocation may lead to a potential CPU waste due to how optimistically lock-less detection is and because Thread.start() is an order of magnitude slower anyway

Fixes #3652
woainikk pushed a commit that referenced this issue Apr 4, 2023
The previous implementation was prone to a non-trivial contention that caused EDT freezes and, potentially, Android's ANRs.

Two root causes were identified:

1) Thread() constructor that has a non-trivial complexity along with JVM upcalls and is significantly slower than any other regular allocation
2) Thread.start() is on itself a JVM upcall that ends up on a global JVM lock[s]

Thread.start() is now invoked when the lock is released to reduce contention.
The first root cause is not addressed as optimistic thread allocation may lead to a potential CPU waste due to how optimistically lock-less detection is and because Thread.start() is an order of magnitude slower anyway

Fixes #3652
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants