Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a problem with the import of .csv-files #47

Merged
merged 3 commits into from
Jul 10, 2014

Conversation

TheVanDoom
Copy link
Contributor

Up until now, the DBBrowserDB:decodeCSV used to load the csv-file character by character using the getChar() method of the QFile-class.
Unfortunatelly, this approach caused multibyte-chars as used for UTF-8 encoding to be split and displayed incorrectly.

The fix uses the QTextStream-class to load the file line by line.
Every line is then again iterated character by character, before the old algorithm is applied. Using this approach, the characters are loaded and encoded properly. Splitting multibyte-chars is thus prevented.

Up until now, the DBBrowserDB:decodeCSV used to load the csv-file character by character using the getChar() method of the QFile-class.
Unfortunatelly, this approach caused multibyte-chars as used for UTF-8 encoding to be split and displayed incorrectly.

The fix uses the QTextStream-class to load the file line by line.
Every line is then again iterated character by character, before the old algorithm is applied. Using this approach, the characters are loaded and encoded properly. Splitting multibyte-chars is thus prevented.
@justinclift
Copy link
Member

This sounds cool. I'm not technical enough to review it (Rene or Martin will), but thanks. 😄

@rp-
Copy link
Contributor

rp- commented Jul 8, 2014

I'll leave this one to Martin, as he did more on csv import/export.
Just a thing i noted on looking at it, the

while (!inStream.atEnd()) {

should be changed to a do .. while loop, otherwise I think we do nothing if the import file is just one line.

@MKleusberg
Copy link
Member

Thanks for finding and fixing this bug 👍
I'd actually merge this even without applying the change Rene suggested as the old code didn't handle this any better. And it does work as long as the file ends with a line break - which all CSV files should. But of course you're free to change the code to handle those files correctly as well.
The one thing I'd suggest to change before merging is moving the 'current' variable definition inside the body of the outer loop in order to reduce its scope and make the code a bit easier to read. But other than that the change looks pretty good :)

@TheVanDoom
Copy link
Contributor Author

Thank you for the feedback. I am happy that I can contribute something usefull to the project. As my schedule permits I will try to add and test your proposed changes and push them up this afternoon.

@rp- I don't think this would be the case. As far as I know the file pointer is not moved line-wise. Even if there is only one line, it should return false as the pointer is at the begining of the line. Do-While seems to be risky, as an empty file might cause the loop-block to execute once before the condition detects it.

@rp-
Copy link
Contributor

rp- commented Jul 9, 2014

Had a look at the docs, which confused me a bit more :)
http://qt-project.org/doc/qt-4.8/qtextstream.html#operator-gt-gt-4
"inStream >> QString" works on words and not on lines, which makes me suspicions how this line based loop will work with that?

I didn't try any of the code yet, this is just wild brain-compiler based thinking, so please prove me wrong :)

@MKleusberg We should probably setup some unittests for the csv parsing.

@TheVanDoom
Copy link
Contributor Author

You are right, this might turn into a problem later. I must admit that I haven't worked with QT for a long time, so I might be a bit rusty ;-)
Exchanging it with inStream.readLine() should do the trick, thought.

@TheVanDoom
Copy link
Contributor Author

I might have found a first bug. A colleague of mine build the program on a mac, and there it seemed to have messed up the separators, as columns are not separated properly anymore. As I've only tested on a windows-machine I didn't know before, but I will investigate the problem asap.

Previously, the read-line was performed using the stream-operator. Unfortunately, this approach limited the possible reading range to a word, causing problems when parsing files with blanks between the quotes and separators.
Instead, the readline is performed using the readLine() method of the QTextStream class.
During the import, the parser used to append blanks between quotes and separators as part of the cell-content.
Now blanks are detected and ignored, iff they are not used as separator or in between quotes.
@TheVanDoom
Copy link
Contributor Author

Ok, I've pushed a few more changes.
The line is now loaded using the readLine() command instead of the stream-operator. Also I've cleaned the code a bit up by removing unnecessary if-clauses and moved some of the variables into the outer-loop (as per request).
Finally, I've introduced an additional condition into the algorithm to detect blanks between separator and quote.

EDIT: Made a quick test on my Mac and everything seems to work. Even files with only one line are imported properly.

@justinclift justinclift mentioned this pull request Jul 10, 2014
MKleusberg added a commit that referenced this pull request Jul 10, 2014
Fixed a problem with the import of .csv-files

This fixes the import of CSV files with multi-byte UTF-8 characters in them. Also handle CSV files without a trailing line break better.
@MKleusberg MKleusberg merged commit ad392fa into sqlitebrowser:master Jul 10, 2014
@MKleusberg
Copy link
Member

I've tested this with every configuration I could come up with and it always yielded the expected results. CSV files without trailing line break are now fully imported as well, so another issue we had is fixed as well.
Thanks again for putting your time into this - it was a big help! If you feel like fixing other problems or the like you're always welcome to send in your commits :)

@MKleusberg
Copy link
Member

@rp- Good idea! I'll try to write some unit tests today or tomorrow.

@MKleusberg
Copy link
Member

And the unit tests are written. Here's the commit which adds tests for all the cases I could come up with - they all pass: e7924f3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants