Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

name is reserved caused by removing mount point failed #8596

Open
xuegege5290 opened this issue Sep 12, 2024 · 7 comments
Open

name is reserved caused by removing mount point failed #8596

xuegege5290 opened this issue Sep 12, 2024 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@xuegege5290
Copy link

xuegege5290 commented Sep 12, 2024

What happened?

For a container within a certain pod that fails to be created, the error log states: "name is reserved."
Possible reasons for this log error might be:

Sep 10 15:35:31 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:31.752491901+08:00" level=info msg="createCtr: deleting container ID 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from idIndex" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:35:31 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:31.752790887+08:00" level=info msg="createCtr: removing container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:35:31 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:31.753041212+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:35:32 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:32.254439999+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer
 Sep 10 15:35:33 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:33.005206802+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:35:34 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:34.131856612+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:35:35 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:35.820226753+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:35:38 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:38.353905768+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:35:42 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:42.163061797+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer
 Sep 10 15:35:47 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:47.859579179+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer
Sep 10 15:35:56 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:56.404759693+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer
 Sep 10 15:36:09 ldy5-test-2 crio[84998]: time="2024-09-10 15:36:09.228844836+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer
 Sep 10 15:36:28 ldy5-test-2 crio[84998]: time="2024-09-10 15:36:28.451864749+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer
Sep 10 15:36:57 ldy5-test-2 crio[84998]: time="2024-09-10 15:36:57.286396766+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer
 Sep 10 15:37:40 ldy5-test-2 crio[84998]: time="2024-09-10 15:37:40.547148861+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:38:45 ldy5-test-2 crio[84998]: time="2024-09-10 15:38:45.421142075+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:40:42 ldy5-test-2 crio[84998]: time="2024-09-10 15:40:42.715716683+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer
Sep 10 15:43:24 ldy5-test-2 crio[84998]: time="2024-09-10 15:43:24.728747439+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:47:28 ldy5-test-2 crio[84998]: time="2024-09-10 15:47:28.170874844+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 15:52:55 ldy5-test-2 crio[84998]: time="2024-09-10 15:52:55.917450711+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 16:01:08 ldy5-test-2 crio[84998]: time="2024-09-10 16:01:08.864124435+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 16:13:28 ldy5-test-2 crio[84998]: time="2024-09-10 16:13:28.763862972+08:00" level=info msg="createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer 
Sep 10 16:13:28 ldy5-test-2 crio[84998]: time="2024-09-10 16:13:28.764809011+08:00" level=error msg="Retried cleanup function \"createCtr: deleting container 98f9ae7b5d00c0166cc2c7501f4eefdf2604754f90d761c4c99510b82965e0a2 from storage\" too often, giving up" id=cd37373c-2182-4d2f-9f6a-e54d563a7cc9 name=/runtime.v1.RuntimeService/CreateContainer

this is container of the problem pod

Sep 10 15:35:31 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:31.021027125+08:00" level=error msg="Failed to cleanup (probably retrying): failed to cleanup container storage: removing mount point \"/var/lib/containers/storage/overlay/66e42556f8165a9a15dbf3d0d65066891cfd91bf2eece43bd6712f2b57b54c95/merged\": directory not empty" id=5c1afc50-5aab-4189-8b4e-705e8ce14ac2 name=/runtime.v1.RuntimeService/CreateContainer
Sep 10 15:35:31 ldy5-test-2 crio[84998]: time="2024-09-10 15:35:31.113909391+08:00" level=error msg="Failed to cleanup (probably retrying): failed to cleanup container storage: removing mount point \"/var/lib/containers/storage/overlay/85f3ed9cb6867897c5eaa31536da50af11ea3541cb8ba39314d49a2d4f6250db/merged\": directory not empty" id=d5f8f6e7-d4d9-444f-8d20-ab6b51624607 name=/runtime.v1.RuntimeService/CreateContainer

What did you expect to happen?

The container is created and running normally.

How can we reproduce it (as minimally and precisely as possible)?

During the process of deleting a container, why can't it be unmounted, with a prompt that related files and folders exist?

Anything else we need to know?

// Put unmounts the mount path created for the give id.
func (d *Driver) Put(id string) error {
	dir := d.dir(id)
	if _, err := os.Stat(dir); err != nil {
		return err
	}
	mountpoint := path.Join(dir, "merged")
	if count := d.ctr.Decrement(mountpoint); count > 0 {
		return nil
	}
	if _, err := os.ReadFile(path.Join(dir, lowerFile)); err != nil && !os.IsNotExist(err) {
		return err
	}

	unmounted := false

	mappedRoot := filepath.Join(d.home, id, "mapped")
	// It should not happen, but cleanup any mapped mount if it was leaked.
	if _, err := os.Stat(mappedRoot); err == nil {
		mounts, err := os.ReadDir(mappedRoot)
		if err == nil {
			// Go through all of the mapped mounts.
			for _, m := range mounts {
				_ = unix.Unmount(filepath.Join(mappedRoot, m.Name()), unix.MNT_DETACH)
			}
		}
	}

	if d.options.mountProgram != "" {
		// Attempt to unmount the FUSE mount using either fusermount or fusermount3.
		// If they fail, fallback to unix.Unmount
		for _, v := range []string{"fusermount3", "fusermount"} {
			err := exec.Command(v, "-u", mountpoint).Run()
			if err != nil && !errors.Is(err, exec.ErrNotFound) {
				logrus.Debugf("Error unmounting %s with %s - %v", mountpoint, v, err)
			}
			if err == nil {
				unmounted = true
				break
			}
		}
		// If fusermount|fusermount3 failed to unmount the FUSE file system, make sure all
		// pending changes are propagated to the file system
		if !unmounted {
			fd, err := unix.Open(mountpoint, unix.O_DIRECTORY, 0)
			if err == nil {
				if err := unix.Syncfs(fd); err != nil {
					logrus.Debugf("Error Syncfs(%s) - %v", mountpoint, err)
				}
				unix.Close(fd)
			}
		}
	}

	if !unmounted {
		if err := unix.Unmount(mountpoint, unix.MNT_DETACH); err != nil && !os.IsNotExist(err) {
			logrus.Debugf("Failed to unmount %s overlay: %s - %v", id, mountpoint, err)
			if !errors.Is(err, unix.EINVAL) {
				return fmt.Errorf("unmounting %q: %w", mountpoint, err)
			}
		}
	}

	if err := unix.Rmdir(mountpoint); err != nil && !os.IsNotExist(err) {
		logrus.Debugf("Failed to remove mountpoint %s overlay: %s - %v", id, mountpoint, err)
		return fmt.Errorf("removing mount point %q: %w", mountpoint, err)
	}

	return nil
}

CRI-O and Kubernetes version

$ crio --version
crio version 1.28.4
Version:        1.28.4
GitCommit:      c5fc2a463053cf988db2aebe9b762700484922e5
GitCommitDate:  2024-02-22T19:17:55Z
GitTreeState:   clean
BuildDate:      1970-01-01T00:00:00Z
GoVersion:      go1.20.10
Compiler:       gc
Platform:       linux/amd64
Linkmode:       static
BuildTags:      
  static
  netgo
  osusergo
  exclude_graphdriver_btrfs
  exclude_graphdriver_devicemapper
  seccomp
  apparmor
  selinux
LDFlags:          unknown
SeccompEnabled:   true
AppArmorEnabled:  false
$ kubectl version --output=json

{
"clientVersion": {
"major": "1",
"minor": "25",
"gitVersion": "v1.25.8-1+2b6d997c6-dirty",
"gitCommit": "2b6d997c6",
"gitTreeState": "dirty",
"buildDate": "2024-05-14T16:08:55Z",
"goVersion": "go1.19.3",
"compiler": "gc",
"platform": "linux/amd64"
},
"kustomizeVersion": "v4.5.7",
"serverVersion": {
"major": "1",
"minor": "25+",
"gitVersion": "v1.25.8-87+25a9d193afa588-dirty",
"gitCommit": "25a9d193afa58836db7a70cbfafe75002a434c4e",
"gitTreeState": "dirty",
"buildDate": "2024-08-08T16:17:11Z",
"goVersion": "go1.19.3",
"compiler": "gc",
"platform": "linux/amd64"
}
}

OS version

# On Linux:
$ cat /etc/os-release
NAME="CentOS Stream"
VERSION="9"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="9"
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream 9"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:centos:centos:9"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

$ uname -a
Linux localhost.localdomain 5.14.0-171.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Oct 1 12:56:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

@xuegege5290 xuegege5290 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 12, 2024
@haircommander
Copy link
Member

is this reproducible?

@kwilczynski
Copy link
Member

@xuegege5290, are you using any custom storage drivers, perhaps?

What would be the storage on which the /var/lib/containers is deployed?

Also, does it always happen or after an upgrade?

What runtime is it? crun or runc?

@kwilczynski
Copy link
Member

@xuegege5290, also... any chance if you could try a bit newer version of Kubernetes?

Maybe even match it with CRI-O version? So you have both running 1.28? If possible, of course.

@xuegege5290
Copy link
Author

is this reproducible?

the root cause of the issue has not been identified and it cannot be reproduced, although it occurs quite frequently. I am doing my best to find the reason and try to make it reproducible.

@xuegege5290
Copy link
Author

xuegege5290 commented Sep 13, 2024

@xuegege5290, are you using any custom storage drivers, perhaps?

What would be the storage on which the /var/lib/containers is deployed?

Also, does it always happen or after an upgrade?

What runtime is it? crun or runc?

@xuegege5290, also... any chance if you could try a bit newer version of Kubernetes?

Maybe even match it with CRI-O version? So you have both running 1.28? If possible, of course.

I'm sorry for not providing detailed information. I am using runc. I have two sets of clusters; the first one is the information mentioned above. The other cluster has consistent versions of kubelet and crio. This issue has occurred in both, showing up intermittently after some time.

[root@node crio]# crio --version
crio version 1.25.4
Version:        1.25.4
GitCommit:      unknown
GitCommitDate:  unknown
GitTreeState:   clean
GoVersion:      go1.18.4
Compiler:       gc
Platform:       linux/amd64
Linkmode:       dynamic
BuildTags:      
  rpm_crashtraceback
  exclude_graphdriver_btrfs
  btrfs_noversion
  exclude_graphdriver_devicemapper
  libdm_no_deferred_remove
  seccomp
  selinux
  libtrust_openssl
LDFlags:          -linkmode=external -compressdwarf=false -X github.com/cri-o/cri-o/version=1.25.4 -B 0xc0ee73ac46a4fa7414849c210d767beff31d5c89 -extldflags '-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  '
SeccompEnabled:   true
AppArmorEnabled:  false
Dependencies:     
  
[root@node12  crio]# kubelet --version
Kubernetes v1.25.8-65+e623590b219e8f-dirty

[root@node12 crio]# runc --version
runc version 1.1.12
spec: 1.0.2-dev
go: go1.18.4
libseccomp: 2.5.2

@xuegege5290
Copy link
Author

xuegege5290 commented Sep 13, 2024

@xuegege5290, also... any chance if you could try a bit newer version of Kubernetes?

Maybe even match it with CRI-O version? So you have both running 1.28? If possible, of course.

The first one of the cluster runc version is :

[root@ldy5-ceasphere-2 ~]# runc --version
runc version 1.0.2
spec: 1.0.2-dev
go: go1.16.12
libseccomp: 2.5.1

And the storage driver is overlay,storage default configuration is :

# This file is generated by the Machine Config Operator's containerruntimeconfig controller.
#
# storage.conf is the configuration file for all tools
# that share the containers/storage libraries
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver
driver = "overlay"

# Temporary storage location
runroot = "/var/run/containers/storage"

# Primary Read/Write location of container storage
graphroot = "/var/lib/containers/storage"

[storage.options]
# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Size is used to set a maximum size of the container image.  Only supported by
# certain container storage drivers.
size = ""

Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants