-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropbox synced library often loses bibtex/metadata #415
Comments
Ok, this is a "known issue" that's been bothering me for quite a while already. It's one of the reasons why I've decided to do a rewrite of the software, but that's mostly due to blocking problems in the first ~100 issues listed here at gitHub. Anyway, back to your observed problem: What I would like you to try is this work-around (technology cause-and-effect elaboration later on in this writing): Preface
The Action
the reverse process for the other machine(s):
Technologic background infoWriting that in the next response msg. Footnotes
|
Technologic background infoWhy does DropBox "nuke" our Qiqqa databases? Well, that's rather vague and a very tough nut to crack, regrettably. First there's this: Qiqqa uses the well-known SQLite database library (a .NET port/derivative in fact, but the argument remains as-is) and apparently DropBox, Google Drive and a few others (cloud storage providers) have a hard time dealing with SQLite databases when these are read/written by SQLite, i.e. when these database files ( Why does this go wrong? Frankly, I'm stumped. Theoretically this should not happen as SQLite uses regular random access APIs, so there's nothing fancy happening - or so you might think. Actual reality contradicts this theory. There's two things:
Tests thus far have led to these findings (which are rather fuzzy for these tests are irritatingly ill-willed to produce consistent results: one day you fare better or worse than another...):
Meanwhile Qiqqa isn't helping either as it has its own little nastiness built-in since day one (see also https://github.com/GerHobbelt/qiqqa-revengin#database-file-format-as-discovered-via-db-inspection): Qiqqa, for some very odd reason (apparently they didn't trust their own software at some point, I guess) adds a MD5 hash checksum to every bibTeX metadata record stored in the database. Fine, that should always succeed am-I-right? Yes, usually. Unless... you start hacking the database from outside Qiqqa (which I was doing back in the day when qiqqa was still commercial) or when someone does the hacking for you: enter DropBox&friends at-times-odd-behaviour and to qiqqa that looks like "record corruption" as SQLite (temporarily or permanently) produces a b0rked record for a given query. The measure taken in the software is to DELETE the metadata record. Ouch Okay, so what if we keep those records intact, i.e. take out the MD5 checksum check? Well, that has passed my mind as well, and I tried that one for a while too. But the corruption that DropBox manages to induce into SQLite databases is such that I then get spurious crashes and odd(er) behaviour elsewhere in Qiqqa, so that bright idea turns out to be... not so bright. Is DropBox evil, then? Well, insert a bit of handwaving here, but at least it works for what it's designed for, and that's file copying. Which is a sequential access pattern, rather than random access. And that translates to...? That means we MUST NOT use SQLite to execute the SYNC action onto DropBox et al; instead that part of the "sync" action should be a regular "file copy" action, to make sure you're not toasted by DropBox et al. Which is what the work-around described in the previous message aims for:
For advanced folks, there's Anyway, what we should aim for, technology-wise:
Alas, the current state of affairs is that we haven't yet produced such a sync system, hence the manual work-around attempt in the previous message. 😢 My apologies....
Footnotes
|
Well, I appear to have really stumbled into it, haven't I! Sorry to poke the wound, as it seems that what seemed a minor inconvenience to me has been a true thorn in your side! However, at the risk of opening up another can of worms (and with none of the hard won expertise of the relevant programming issues at play), I will also put out an idea I just had based on what I believe I understand to be the issue (random access by SQLlite to a "Dropbox" folder proper). It would seem that dropbox also has the ability to not just automatically sync its own folders but also automatically "back up" a local folder like Documents or Desktop. I'm going to wildly assume that this "back up" operation is treated differently from whatever black magic they do in their own folders and hypothesize that if I tell drop box to "back up" a local sync folder that I will place in "Documents" or something benign like that, then I will be automatically be able to back up from said folder on my other machine to a similarly benign place that can also be watched. It may involve having two different online "backups" hosted by dropbox, but if its any easier and doesn't lead to corruption, maybe worth it? |
Though I suppose this may easily end up just as manual as copying and pasting if it requires manual downloads... |
I believe that within the SQLITE database there is just one table. If this is truly the case, is SQLITE truly a necessity? Why use a relational database if you have no relations? Is it just being used as a cleaver storage manager? There might be better tool and/or data structure to use for handling the data that is more suited for cloud/network storage. I am happy to help with this issue but I have to point out that I am mostly baffled by the logic of what Qiqqa is doing! |
Certainly worth a try! Keep in mind that what I explained is not proven; given my troubles (and apparent failure) to get these problems to show up in a testable/analyzable manner, we're reasoning based on conjecture, so be ready for "surprises". Side note: it might be handy to copy the |
Thanks. :-) (We should set up a voice call soon, faster discussion that way.) re SQLite: yup, 1 table. Jimmy apparently used SQLite as a glorified key-value store. Why, I don't know, haven't talked to him, so I can only guess. (I have some assumptions, but actual history may be quite different from those guesses of mine.) Just a few comments for now as I'm way overdue for bed (3:49 AM here; gotta work tomorrow, so ho-hummm):
|
Oh, and always keep in mind: all databases (including the key/value NoSQL ones) are engineered for local disk storage only. None of them have been engineered to sit on top of cloud storage. The way the "big boys" do it, is run multiple instances (phrases such as "sharding" or "partitioning" when you're more into Oracle et al) and demand a solid network setup1, so nothing flaky like consumer-level cloud storage access through arbitrary ISP hardware (NATs, routers2, etc.) and applications you have no real control over. Meanwhile, Qiqqa wishes to work in just such an unkempt environment. The short end of this stick: whatever database, SQL or NoSQL, you pick, you can bet your bottom you'll be facing similar or just other crap happening out there -- remember, Qiqqa is used by several folks, who use it as a tool and are not "into computers for the sake of IT" 😉 -- one of the selling point of "running in the cloud", I guess. Thing is, I don't wanna go there. -- Footnotes
|
An temporary/intermediate solution might be writing a new file (with a new name) every time the SQlite file updates. I got the impression that the SQlite-Dropbox pair created these headaches because they both tend to work on tiny fractions within a file. If you force them to drop their cleverness they may actually work. This will defy most of the points to use a DBMS in the first place, but it's like 3 lines of code and "editing metadata is human effort that comes at great cost" ¯_(ツ)_/¯ |
Hi @GerHobbelt , |
Just an update here: I implemented your solution, @GerHobbelt, setting the sync point of each machine to a local folder (in Documents), fixing all of the missing bibtex and what not on the source library, syncing, manually copying over to dropbox, then updating the local sync point of the second machine with the dropbox files. |
Hello,
I've been a big fan of qiqqa since the commercial days and have really been enjoying all the new features and especially the return of the sync capability to keep my libraries harmonized between my different computers. However, I've been running into a worsening issue where I'll finally have waded through all of the bibtex sniffing (and recurrent lockouts from google) and gotten my library up to date, and then on my next sign in (usually when I sign into the other device and especially after I have just synced), I will lose dozens or more of hard won bibtex data, for papers that clearly already had them (and have even been part of an Expedition already, see below.
This has been occurring as of a couple of years ago at some low level, with several versions of v83. I have tried reverting to v82 and simply maintaining separate libraries on my desktop and laptop that watch the same dropbox folder (have to sniffer each paper twice to keep them up to date, but at least it is stable), but it would be very nice to be able to actually take advantage of the new sync capability.
I know there is a lot going on under the hood and not enough of you all to go around, but any help here would be most appreciated!
Happy to drop in my logs (extensive, when last I checked, and with various warnings and errors that may be helpful).
Thanks!
The text was updated successfully, but these errors were encountered: