-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed a problem with the import of .csv-files #47
Conversation
Up until now, the DBBrowserDB:decodeCSV used to load the csv-file character by character using the getChar() method of the QFile-class. Unfortunatelly, this approach caused multibyte-chars as used for UTF-8 encoding to be split and displayed incorrectly. The fix uses the QTextStream-class to load the file line by line. Every line is then again iterated character by character, before the old algorithm is applied. Using this approach, the characters are loaded and encoded properly. Splitting multibyte-chars is thus prevented.
This sounds cool. I'm not technical enough to review it (Rene or Martin will), but thanks. 😄 |
I'll leave this one to Martin, as he did more on csv import/export.
should be changed to a do .. while loop, otherwise I think we do nothing if the import file is just one line. |
Thanks for finding and fixing this bug 👍 |
Thank you for the feedback. I am happy that I can contribute something usefull to the project. As my schedule permits I will try to add and test your proposed changes and push them up this afternoon. @rp- I don't think this would be the case. As far as I know the file pointer is not moved line-wise. Even if there is only one line, it should return false as the pointer is at the begining of the line. Do-While seems to be risky, as an empty file might cause the loop-block to execute once before the condition detects it. |
Had a look at the docs, which confused me a bit more :) I didn't try any of the code yet, this is just wild brain-compiler based thinking, so please prove me wrong :) @MKleusberg We should probably setup some unittests for the csv parsing. |
You are right, this might turn into a problem later. I must admit that I haven't worked with QT for a long time, so I might be a bit rusty ;-) |
I might have found a first bug. A colleague of mine build the program on a mac, and there it seemed to have messed up the separators, as columns are not separated properly anymore. As I've only tested on a windows-machine I didn't know before, but I will investigate the problem asap. |
Previously, the read-line was performed using the stream-operator. Unfortunately, this approach limited the possible reading range to a word, causing problems when parsing files with blanks between the quotes and separators. Instead, the readline is performed using the readLine() method of the QTextStream class.
During the import, the parser used to append blanks between quotes and separators as part of the cell-content. Now blanks are detected and ignored, iff they are not used as separator or in between quotes.
Ok, I've pushed a few more changes. EDIT: Made a quick test on my Mac and everything seems to work. Even files with only one line are imported properly. |
Fixed a problem with the import of .csv-files This fixes the import of CSV files with multi-byte UTF-8 characters in them. Also handle CSV files without a trailing line break better.
I've tested this with every configuration I could come up with and it always yielded the expected results. CSV files without trailing line break are now fully imported as well, so another issue we had is fixed as well. |
@rp- Good idea! I'll try to write some unit tests today or tomorrow. |
And the unit tests are written. Here's the commit which adds tests for all the cases I could come up with - they all pass: e7924f3 |
Up until now, the DBBrowserDB:decodeCSV used to load the csv-file character by character using the getChar() method of the QFile-class.
Unfortunatelly, this approach caused multibyte-chars as used for UTF-8 encoding to be split and displayed incorrectly.
The fix uses the QTextStream-class to load the file line by line.
Every line is then again iterated character by character, before the old algorithm is applied. Using this approach, the characters are loaded and encoded properly. Splitting multibyte-chars is thus prevented.