Fixed a problem with the import of .csv-files #47

TheVanDoom · 2014-07-08T14:20:21Z

Up until now, the DBBrowserDB:decodeCSV used to load the csv-file character by character using the getChar() method of the QFile-class.
Unfortunatelly, this approach caused multibyte-chars as used for UTF-8 encoding to be split and displayed incorrectly.

The fix uses the QTextStream-class to load the file line by line.
Every line is then again iterated character by character, before the old algorithm is applied. Using this approach, the characters are loaded and encoded properly. Splitting multibyte-chars is thus prevented.

Up until now, the DBBrowserDB:decodeCSV used to load the csv-file character by character using the getChar() method of the QFile-class. Unfortunatelly, this approach caused multibyte-chars as used for UTF-8 encoding to be split and displayed incorrectly. The fix uses the QTextStream-class to load the file line by line. Every line is then again iterated character by character, before the old algorithm is applied. Using this approach, the characters are loaded and encoded properly. Splitting multibyte-chars is thus prevented.

justinclift · 2014-07-08T15:02:26Z

This sounds cool. I'm not technical enough to review it (Rene or Martin will), but thanks. 😄

rp- · 2014-07-08T18:34:20Z

I'll leave this one to Martin, as he did more on csv import/export.
Just a thing i noted on looking at it, the

while (!inStream.atEnd()) {

should be changed to a do .. while loop, otherwise I think we do nothing if the import file is just one line.

MKleusberg · 2014-07-08T19:02:17Z

Thanks for finding and fixing this bug 👍
I'd actually merge this even without applying the change Rene suggested as the old code didn't handle this any better. And it does work as long as the file ends with a line break - which all CSV files should. But of course you're free to change the code to handle those files correctly as well.
The one thing I'd suggest to change before merging is moving the 'current' variable definition inside the body of the outer loop in order to reduce its scope and make the code a bit easier to read. But other than that the change looks pretty good :)

TheVanDoom · 2014-07-09T09:35:49Z

Thank you for the feedback. I am happy that I can contribute something usefull to the project. As my schedule permits I will try to add and test your proposed changes and push them up this afternoon.

@rp- I don't think this would be the case. As far as I know the file pointer is not moved line-wise. Even if there is only one line, it should return false as the pointer is at the begining of the line. Do-While seems to be risky, as an empty file might cause the loop-block to execute once before the condition detects it.

rp- · 2014-07-09T09:51:28Z

Had a look at the docs, which confused me a bit more :)
http://qt-project.org/doc/qt-4.8/qtextstream.html#operator-gt-gt-4
"inStream >> QString" works on words and not on lines, which makes me suspicions how this line based loop will work with that?

I didn't try any of the code yet, this is just wild brain-compiler based thinking, so please prove me wrong :)

@MKleusberg We should probably setup some unittests for the csv parsing.

TheVanDoom · 2014-07-09T09:58:50Z

You are right, this might turn into a problem later. I must admit that I haven't worked with QT for a long time, so I might be a bit rusty ;-)
Exchanging it with inStream.readLine() should do the trick, thought.

TheVanDoom · 2014-07-09T14:56:25Z

I might have found a first bug. A colleague of mine build the program on a mac, and there it seemed to have messed up the separators, as columns are not separated properly anymore. As I've only tested on a windows-machine I didn't know before, but I will investigate the problem asap.

Previously, the read-line was performed using the stream-operator. Unfortunately, this approach limited the possible reading range to a word, causing problems when parsing files with blanks between the quotes and separators. Instead, the readline is performed using the readLine() method of the QTextStream class.

During the import, the parser used to append blanks between quotes and separators as part of the cell-content. Now blanks are detected and ignored, iff they are not used as separator or in between quotes.

TheVanDoom · 2014-07-09T18:38:46Z

Ok, I've pushed a few more changes.
The line is now loaded using the readLine() command instead of the stream-operator. Also I've cleaned the code a bit up by removing unnecessary if-clauses and moved some of the variables into the outer-loop (as per request).
Finally, I've introduced an additional condition into the algorithm to detect blanks between separator and quote.

EDIT: Made a quick test on my Mac and everything seems to work. Even files with only one line are imported properly.

Fixed a problem with the import of .csv-files This fixes the import of CSV files with multi-byte UTF-8 characters in them. Also handle CSV files without a trailing line break better.

MKleusberg · 2014-07-10T15:37:21Z

I've tested this with every configuration I could come up with and it always yielded the expected results. CSV files without trailing line break are now fully imported as well, so another issue we had is fixed as well.
Thanks again for putting your time into this - it was a big help! If you feel like fixing other problems or the like you're always welcome to send in your commits :)

MKleusberg · 2014-07-10T15:38:21Z

@rp- Good idea! I'll try to write some unit tests today or tomorrow.

MKleusberg · 2014-07-11T19:43:11Z

And the unit tests are written. Here's the commit which adds tests for all the cases I could come up with - they all pass: e7924f3

TheVanDoom added 2 commits July 9, 2014 19:56

Fixed a problem with the csv-import.

37e195a

During the import, the parser used to append blanks between quotes and separators as part of the cell-content. Now blanks are detected and ignored, iff they are not used as separator or in between quotes.

justinclift mentioned this pull request Jul 10, 2014

Multilingual support #50

Closed

MKleusberg merged commit ad392fa into sqlitebrowser:master Jul 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed a problem with the import of .csv-files #47

Fixed a problem with the import of .csv-files #47

TheVanDoom commented Jul 8, 2014

justinclift commented Jul 8, 2014

rp- commented Jul 8, 2014

MKleusberg commented Jul 8, 2014

TheVanDoom commented Jul 9, 2014

rp- commented Jul 9, 2014

TheVanDoom commented Jul 9, 2014

TheVanDoom commented Jul 9, 2014

TheVanDoom commented Jul 9, 2014

MKleusberg commented Jul 10, 2014

MKleusberg commented Jul 10, 2014

MKleusberg commented Jul 11, 2014

Fixed a problem with the import of .csv-files #47

Fixed a problem with the import of .csv-files #47

Conversation

TheVanDoom commented Jul 8, 2014

justinclift commented Jul 8, 2014

rp- commented Jul 8, 2014

MKleusberg commented Jul 8, 2014

TheVanDoom commented Jul 9, 2014

rp- commented Jul 9, 2014

TheVanDoom commented Jul 9, 2014

TheVanDoom commented Jul 9, 2014

TheVanDoom commented Jul 9, 2014

MKleusberg commented Jul 10, 2014

MKleusberg commented Jul 10, 2014

MKleusberg commented Jul 11, 2014