-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control characters in text field breaking usql output #509
Comments
The formatter can be fixed, fairly easily, to accommodate this for JSON output. It would be helpful if you could share an example of the row(s) with the bad JSON data, and what other tools do. For the most part, it should be simply changing |
I've pushed a change to $ git clone https://github.com/xo/usql.git && cd usql
$ go get github.com/xo/tblfmt@v0.15.0
$ ./build.sh -b && ./usql Let me know if this fixes all the issues for you. I'll tag a change with this shortly. |
@kenshaw I had done much the same as a fix to have a working output. I will try your change as soon as I can manage. However I am wondering about the approach that doesn't differentiate between JSON output and others and does the encoding before choosing the output format.. I've also noticed that characters outside BMP are rendered as-is and there is no option to force ASCII-only output for JSON. |
@kenshaw I've checked out the JSON output and it seems to be correct now. |
JSON is a specific encoding standard -- IIRC, it's quite old (predating UTF-8), not friendly to non ASCII characters, and has some non-intuitive encoding requirements. If you want "raw" character codes from your database, it would probably be better to use the database's actual client, and other tools to encode it to JSON. Alternately, you might want to try the CSV output. |
Hello @kenshaw .
I've been dealing with control characters that are present in a text field of a database I need to access.
As I can see in this code https://github.com/xo/tblfmt/blame/1af8a162785fd2d26eddb90fbd8ad9d407b3408d/fmt.go#L389 instead of being outputted literally and then, for instance, properly encoded in JSON output, they are rendered as, for instance
\x1c
for the U+001C character.Apart from behaving differently from every other tool I've used on the database in question (they all output the literal character), it then completely breaks the JSON output, by putting in it the illegal
\x
sequence.The other options in the switch block of the aforementioned code do not seem much better. Is there a way to just get the raw data in the output? I can't find one in the documentation.
There is also the fact that the various output format, JSON for instance, will encode characters in different ways (e.g. the surrogate pairs JSON uses for characters outside the BMP).
Also, silently modifying the contents of the data in an arbitrary way without the user being aware of it does seem an approach prone to nasty surprises for the user.
The text was updated successfully, but these errors were encountered: