-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid db page layout #15498
Comments
Please attached db file if possible. |
No db file, only the mentioned dump with data redacted. |
This issue seems similar to etcd-io/bbolt#402 |
It doesn't help to provide such vague info, please provide at least all related page IDs next time. It's also most likely incorrect info, I do not see any dedicated alarm pages at all, since the pageID is 0, which means there is no any alarm or it's inline page. |
@ahrtr Just wondering, how did you draw that diagram? Is there any bbolt specific tool for that or did you use a general tool? |
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
A no-op write transaction has two consequences: 1. The txid increases by 1; 2. Two meta pages point to the same root page. Please also read etcd-io/etcd#15498 (comment). Signed-off-by: Benjamin Wang <wachao@vmware.com>
A no-op write transaction has two consequences: 1. The txid increases by 1; 2. Two meta pages point to the same root page. Please also read etcd-io/etcd#15498 (comment). Signed-off-by: Benjamin Wang <wachao@vmware.com>
A no-op write transaction has two consequences: 1. The txid increases by 1; 2. Two meta pages point to the same root page. Please also read etcd-io/etcd#15498 (comment). Signed-off-by: Benjamin Wang <wachao@vmware.com> Signed-off-by: samuelbartels20 <bartelssamuel20@gmail.com>
What happened?
Etcd started crashlooping with
When analysing the db file I found invalid etcd db file layout.
Under bucket branch page, in
keys
bucket there was a branch page linking to another bucket branch page.This resulted in bbolt returning key
alarm
when reading wholekeys
bucket. This is correct layout for bbolt, but not for etcd.From etcd point of view this is invalid as it assumes that all keys in
keys
bucket are revision numbers.Panic from above comes from
bytesToRev
function that parses revision. It failed as main rev has 8 bytes, while key "alarm" has only 5 bytes.This means that at some point bbolt either:
We can't exclude hardware issue that resulted in memory corruption.
Providing the dump.txt for further investigation
What did you expect to happen?
Want to report the issue to start the discussion of etcd handling potential memory corruptions.
Assuming that this was indeed a memory corruption, I expect that should avoid writing corrupted page to disk.
Running mmapped memory comes with risk with memory stamping, etcd should have mechanism that prevent corruption from being persisted.
Was discussing with @ptabor idea of protected mode for bbolt where it would verifying every write to ensure corruptions are not persisted.
How can we reproduce it (as minimally and precisely as possible)?
Don't think so.
Anything else we need to know?
No response
Etcd version (please run commands below)
v3.4.21
Etcd configuration (command line flags or environment variables)
Nothing unusual
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
N/A
Relevant log output
No response
The text was updated successfully, but these errors were encountered: