Skip to content

Services running out of disk space when pulling images on autoscaled machines #6990

Open
@GitHK

Description

Services running on autoscaled machines run out of disk space when are being opened and fail to start.

Something similar to the following error communicating that disk space ran out.

"2024-12-23T08:15:55.593Z","ip-10-0-3-149","dy-sidecar_a2a82999-1c8d-4cfe-b081-41c6a587366b.1.vtsax3o9cnggg3fv9ql1d37av","log_level=ERROR | log_timestamp=2024-12-23 08:15:55,593 | log_source=servicelib.docker_utils:pull_image(253) | log_uid=None | log_msg=Unexpected error while validating 'pull_progress={'errorDetail': {'message': 'failed to register layer: write /usr/sbin/wipefs: no space left on device'}, 'error': 'failed to register layer: write /usr/sbin/wipefs: no space left on device'}'. TIP: This is probably an unforeseen pull status text that shall be added to the code. The pulling process will still continue.

NOTE

It is possible for the error to also occur when pulling inputs the state or the outputs!

After asking the user to start their service again the service was able to start. It was running on a machine with 873.3 Gb Free space.


What I think it happened

I can think of the possible situation:

  1. a machine with lower disk space is used and the service does not fit
  2. the disk space on a previously used machine ran out
  3. a mix of 1 and 2

Metadata

Labels

High Prioritya totally crucial bug/feature to be fixed asapbugbuggy, it does not work as expected

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions