-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLite3::CorruptException / Zlib::DataError #108
Comments
Good morning and thanks for reporting this issue! I would normally blame deleted notes for something like this, but the SQLite errors seem to point more towards something happening to your database files since all 4 are throwing errors and they're at the level of the SQLite file being corrupt. I appreciate the patch for adding in the rescue statements, I certainly don't have enough of those. The section you're looking at had been an issue in the past, but I haven't come across any data that both checked out as gzipped data and was truncated after adding The first thing I'd recommend is making sure that the files look correct. Do they open in a tool like sqlite3 or sqlitebrowser? Do you see data that looks order of magnitude like what you're expecting (i.e. if you have 800 notes and the file shows a handful of entries in each table, this would fail)? As evidenced by the first error message you listed, I've seen issues where the database wasn't in a great state when the OS updated. So I would also recommend open Notes, then closing it explicitly, and maybe wait a breath or two before doing your backups again. However, because all four are failing I suspect there's something wrong inside of them. Are all four of these notes from the same account? There might be something that got corrupted that is being shared across them which you need to delete. The debug log should give you some relevant information. The best place to start for a failing pass is simply looking at the very last lines. That should help you bound which note is failing (or would prior to your patch above. I've added in an error statement to the log in ea799e0, so now you can look at those lines rather than the last line). You can then look for that specific note in the ZICNOTEDATA table to make sure it looks decent. I hope this helps for initial troubleshooting. Thank you again for the rescue code, I'm sure it will help others as well! |
One other thought, just because you're getting these errors part of the way through a run. I would highly recommend making sure you're not running it while something else has the database open. I copy my backups elsewhere before running this script to avoid the file being locked. Make sure Notes is closed and copy the files to another location and try running it there. |
Thanks for your friendly feedback.
The following moronic workaround salvages as much binary data as possible. There is probably a way to make Ruby's gzip library similarily decompress a stream and/or return stubs of incomplete data. I assume this is as good as it gets.
The types of integrity checks reported were:
To save yourself unnecessary exception handling and bug reports, pre-pass with SQLite's built-ins before parsing. Anything more information that would be useful to you? |
Thanks for the detailed feedback, I'm going to look this over and ponder the best way to move forward. May I ask if you were expecting these to be corrupted databases? Having 50 databases, with 14 corrupted, makes me think this is more of a data recovery situation and not actual backups of a healthy system. I would be somewhat interested in knowing how they got corrupted to try to recreate that for test data. |
Well, the "transaction" things could be be because those tables are trimmed before backing up, as they were taking up an enormous amount of space (much more than the content itself), as revealed by delete from atransaction where unixepoch() - 978307200 - ztimestamp > 86400*7;
delete from achange where ztransactionid not in (select rowid from atransaction);
pragma wal_checkpoint(full);
vacuum; Regardless, if the database is well defined, deleting items should trigger necessary cleanup. Furthermore, running this manually on the resident database, What are they used for anyway? Full undo history? Excessive safety measure for rollbacks to prevent content loss? Assume the other errors are unrelated to the transaction things, and general corruption of backup data can also be excluded, as this has not happened a single time for any other files within the same backup system. Happy to provide more tests or information be it useful. |
That's fascinating, thanks. So you have 50 NoteStore files, and you ran something akin to the above SQLite command to better manage the size. After doing that 14 of the 50 threw SQLite errors. This is certainly a unique use case. I think I lean towards running those integrity checks after such errors are caught (maybe in a rescue, or maybe as an explicit switch someone can run intentionally) rather than on every database. This code started as a forensic tool and I currently try to keep everything read only, other than adding the plaintext note contents back into the NoteStore for readability. I wouldn't want to truncate or alter it other than that if the user is not expecting it. I'm still interested in the corrupted gzipped blobs, I don't know how removing the transactions and changes would affect that. I am not very keen to trying to parse as much as possible from potentially bad objects for a similar reason as above. Because the protobuf order is not strictly specified (a lot of fields are optional, as is the order for repeated fields), I'd be concerned at displaying something that isn't quite right with no obvious way to know it was only partially done. I like the idea of writing out the bad blobs, but would probably again gate that behind a switch to make sure the user wanted them dumped to disk. Taking this back to the thread genesis, do you feel the addition of these rescue statements (thank you for them, again!) satisfies the immediate need of this issue? It should allow the program to keep running whether a blob is corrupted or not. |
The files come from a temporal backup routine. If you know gzip on a detailed binary level, you could try to look at the data to see if the corruption is noticeable. Post here if there are tools that already do this. Initial guess(!) is that a single changed/added/removed bit can make If you can find out how many bytes/percent of the compressed data is read before failing, you can add a output notification saying, "Only 42% of the compressed data of the note "Dinner" in the folder "Food" could be decompressed. Some of your content may be missing. Open the note in macOS Notes and see if more content is available there." The rescue statements were minute quick fixes, not intended for production, but use and modify them as you please. As mentioned, try to do as mentioned earlier, but if it doesn't, and doing a dirty system call works, why not.
|
After sitting on this a while, I've opted to add a note to the README addressing what folks can do to attempt to fix similar circumstances. I suspect it is fairly niche and do not want this program to automatically change databases for people (even if only on a copy). Thank you for the detailed note and fix! |
Describe the bug
What more information (short of the files) can be provided to resolve the issues?
Expected behavior
No crashes, obviously.
Desktop (please complete the following information):
Command used
ruby notes_cloud_parser.rb -f NoteStore.sqlite
Please confirm the following
bundle install
: YesAdditional context
Workaround:
The text was updated successfully, but these errors were encountered: