Skip to content
This repository has been archived by the owner on Jul 27, 2022. It is now read-only.

Failure to create container can leave behind state requiring manual intervention #37

Open
@abhishekrai

Description

Consider the following sequence of events:


$ lmctfy create /test/abhishek 'cpu { limit: 1000 } memory { limit: -1 }'
Command exited with error message: 14: Failed to write "-1" to file "/sys/fs/cgroup/memory/test/abhishek/memory.memsw.limit_in_bytes" for hierarchy 7

The "/test" container being created above spans CPU and memory resource controllers. Here, lmctfy was successfuly in creating the state under freezer and cpu controllers, but had an error in creating memory controller state. Ignore the cause for the failure for the purpose of this issue.

Here's what the cgroup file system looks like at this point:


$ find /sys/fs/cgroup/ -name abhishek
/sys/fs/cgroup/memory/test/abhishek

So, there is a leftover directory for test/abhishek under memory controller, but not under others. This confuses subsequent commands. For example, destroy throws the following error:


$ lmctfy destroy -f /test/abhishek
Command exited with error message: 5: Can't get non-existent container "/test/abhishek"

Trying to re-run the create command throws up a different error this time:


$ lmctfy create /test/abhishek ' cpu { limit: 1000 } memory { limit: 1000 }'
Command exited with error message: 6: Expected cgroup "/sys/fs/cgroup/memory/test/abhishek" to not exist.

To summarize, manual intervention is necessary to recover from this state, which is unfortunate. I realize the manual intervention may be necessary in some other cases as well if say the application unexpectedly crashes in an intermediate state. Either we should fix it, or provide a "cleanup" operation which, given a container, cleans up all its state, including any such leftover state. The "destroy" operation is unable to do this as it only acts if the container's node under "freezer" exists.

All of this weirdness is because of the leftover state of memory controller after the failure of the first create operation. Deleting this directory brings us back to the original create error again:


$ rmdir /sys/fs/cgroup/memory/test/abhishek
$ lmctfy create /test/abhishek ' cpu { limit: 1000 } memory { limit: 1000 }'
Command exited with error message: 14: Failed to write "-1" to file "/sys/fs/cgroup/memory/test/abhishek/memory.memsw.limit_in_bytes" for hierarchy 7

From the code for the "create" operation, it looks like lmctfy is smart about cleaning up some state. In particular, ContainerApiImpl::Create() uses UniqueDestroyPtr<> for resource handlers to ensure that if some resource handlers were successfully created, before one failed, state of the previously initialized resource handlers gets cleaned up.

But, within the context of the same resource handler, this is not true. In the following code, perhaps we could inject a Destroy() call if the Update() call fails?


StatusOr CgroupResourceHandlerFactory::Create(
      const string &container_name,
      const ContainerSpec &spec) {
  // Create the ResourceHandler for the container.
  StatusOr statusor_handler =
      CreateResourceHandler(container_name, spec);
  if (!statusor_handler.ok()) {
    return statusor_handler.status();
  }
  unique_ptr handler(statusor_handler.ValueOrDie());

  // Run the create action before applying the update.
  RETURN_IF_ERROR(handler->CreateResource(spec));

  // Prepare the container by doing a replace update.
  Status status = handler->Update(spec, Container::UPDATE_REPLACE);
  if (!status.ok()) {
    return status;
  }

  return handler.release();
}

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions