-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: btrfs inspect logical-resolve and inode-resolve can not find some corrupted files #724
Comments
It's not that uncommon that some corrupted sectors are no longer referred by any file, but only in part of a data extent that no one is referring (btrfs extent bookend). Or the inode is already under deletion (unlinked but not yet fully deleted). |
The inode in Unreachable sectors can be found with the Note that in the case where the corrupt block is unreachable, there may be no corrupted files, in the sense that no block of any existing file that is still reachable in the filesystem is corrupted; however, if any file still references any part of the extent, the extent containing the bad block will remain in the filesystem, where it will interfere with balance, resize, device remove, and scrub. If the goal is to remove the corrupted blocks, all files that reference the extent containing the corrupted block must be removed or replaced, whether the files are corrupted or not. |
Some more context from the upstream link:
That's a single-bit error, which frequently means bad or misconfigured host RAM. The device looks encrypted, which means a single-bit error introduced by the NVMe is astronomically unlikely (any such error would be as wide as the encryption block and affect the entire CRC, not merely one bit). The specific error message can only appear after several levels of metadata validation in btrfs, which means that even a single-bit error on an unencrypted drive could not trigger this message. The csum failure could be the tip of the iceberg--there could be a lot of filesystem metadata corruption that we're not seeing, and some of it is interfering with The other clue from the upstream link is that the NVMe device is new and recently added to the system. I've had systems with long and unblemished service histories start throwing orders-of-magnitude higher ECC error rates in the host RAM after adding a new NVMe device. Usually the problem is resolved by upgrading the power supply or using an adapter to power the NVMe directly. NVMe uses power rails closer to the CPU and RAM than SATA devices, so NVMe device power usage is much more likely to interfere with the host than SATA devices (at least those using traditional SATA power connectors--M.2 SATA might have the same problem, but I don't have any of those to test with). There's also:
This is not the conclusion I'd draw from the information presented, but NVMe failure could also point to a power issue (low voltage is as disruptive for NVMe devices as it is for host RAM) and that might break the NVMe device. On the systems where I observed the elevated ECC error rates, the NVMe devices frequently disconnected from the bus, most likely because the embedded controller CPU had crashed. I didn't look at the SMART data for my devices because upgrading the power supply resolved all of the problems. In the absence of any other information, I'd remove the 'bug' tag here. |
Someone has corrupted files that was detected by Btrfs filesystem to show:
Btrfs warning message from dmesg
We tried to determine which file is damaged:
$ sudo btrfs inspect-internal logical-resolve 180469760 /
The output:
$ sudo btrfs inspect inode-resolve 3037134 /
The output:
His/her system:
Kernel: 6.6.8
btrfs-progs v6.6.3
Source: https://forum.manjaro.org/t/update-ended-in-readonly-filesystem/154442/19
The text was updated successfully, but these errors were encountered: