Set scheduler log sizes automatically based on available memory #5570

gjoseph92 · 2021-12-07T18:41:40Z

There are frequent reports of scheduler memory growing over time:

Scheduler memory just keep increasing in idle mode #5509
Are reference cycles a performance problem? #4987 (comment)
Scheduler memory leak / large worker footprint on simple workload #3898 (there is a different problem here; memory is leaking even with logs turned off, but turning off logs was necessary to debug)
What's the best way of diagnosing scheduler memory issues? #4998

They often involve memory graphs that look like:

It's very likely that there is a real bug in the scheduler causing memory to accumulate (#3898 (comment)), but often the steep slope on these graphs is caused by various logs on the scheduler accumulating, such as:

transition_log - distributed.scheduler.transition-log-length
log - distributed.scheduler.transition-log-length (should maybe be distributed.admin.log-length?)
events - distributed.scheduler.events-log-length
computations - distributed.diagnostics.computations.max-history
Node._deque_handler - distributed.admin.log-length

I propose two things:

Log lengths should be set as a percentage of available memory, not as a length—this is much easier for users to configure
Note that for some/most of these, that may be difficult to do accurately, since the size of the entries is unknown. A rough estimate is probably okay.
A memory-cleanup callback that runs, say, once a second, and clears our excess logs if the scheduler is under memory pressure.

The text was updated successfully, but these errors were encountered:

fjetter · 2021-12-14T14:02:02Z

xref #4762 for the various pieces of logging mentioned here

fjetter added diagnostics documentation Improve or add to documentation stability Issue or feature related to cluster stability (e.g. deadlock) labels Dec 8, 2021

gjoseph92 mentioned this issue Mar 18, 2022

Possible memory leak when using LocalCluster #5960

Open

fjetter mentioned this issue Sep 6, 2023

Mild memory leak in dask workers #8164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set scheduler log sizes automatically based on available memory #5570

Set scheduler log sizes automatically based on available memory #5570

gjoseph92 commented Dec 7, 2021

fjetter commented Dec 14, 2021

Set scheduler log sizes automatically based on available memory #5570

Set scheduler log sizes automatically based on available memory #5570

Comments

gjoseph92 commented Dec 7, 2021

fjetter commented Dec 14, 2021