Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --nul-output|-0 flag to separate element with NUL character #1550

Merged
merged 1 commit into from
Mar 28, 2023

Conversation

vaab
Copy link
Contributor

@vaab vaab commented Feb 8, 2023

This option is provided to ensure solid parsing of complex data (with any binary content except NUL chars) by separating the yq root collection member's output with NUL char. As a safe-guard, an error will be cast if trying to use NUL character with content that contains itself NUL characters inside.

I'm aware of the possible heated debate on the inclusion of this as it happened over a similar request on jqlang/jq#1271 ... As a heavy bash user, IMHO, NUL separated output is the safest way to treat complex data in bash, and allows to pipe results directly with many other tools without mangling in bash. The previous thread however have raised important concerns about NUL chars that could be appear in the valid output of a query and could potentially mess with unprepared code badly.

This is why I added a check when using -0 option to fail with an error when the data being separated with NUL is containing a NUL characters itself. In practice, this is nearly never expected and if it happens, I felt I wanted to be in position to fail and warn the user.

I did write some acceptance tests and am very welcome to any of your suggestions. I'll write unit test if you feel this PR is worthwhile for you.

Please note that to be sure of NOT triggering the error, the flag -r=false with the yaml encoder needs to be added and NUL chars will be encoded in yaml format which doesn't include raw NUL chars. But the better would probably to fail in the calling code (which is made possible with this implementation) when encountering data that is not formatted as you had expected.

As a side-note, I'm the author of shyaml, and am using it extensively in bash projects, and would be happy to archive my project : it is slow. Alas, without the current PR, yq is not able to replace current usages of shyaml in my case.

There are workaround that I'm aware of (mainly from the jq thread posted previously) that are too costly in term of efficiency, code readability or simple data solidity in my own opinion.

I would totally understand if you are not interested in this addition on your side. The magic of open-source allows me to keep using it on my side. I'm curious to receive your feedback, as these are complex topics.

This project is probably my first time fiddling with Go language and project, so I probably did some newcomer's mistake. I'll be happy to correct them if you can stress them out for me.

Many thanks for writing yq, that's a really awesome project.

@vaab vaab force-pushed the nul-output-separator branch from e5785f0 to 35b92b4 Compare February 8, 2023 10:18
@mikefarah
Copy link
Owner

Hi @vaab - thanks for the contribution :) I'll need to find some time to read through this and the relevant links

@mikefarah
Copy link
Owner

@vaab ok finally got around to reading the jq discussion and looking over the code. I think it does make sense to add this to yq 👍🏼 . The go-lang code looks pretty good to me, obviously missing unit tests. The acceptance tests ... it's clear to me that your bash skills are better than mine :D. To that end, I'd appreciate more comments in those tests so that I know what's going on :) I'll put in comments in the PR so you can see where.

Looking forward to getting this in!

}
}

wyq() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the w stand for here?

EOL

res=""
while read-0 a || ! ret="$a"; do
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lines are black magic to me :)

@vaab
Copy link
Contributor Author

vaab commented Feb 22, 2023

Many thanks for your interest, and your feedback make all perfect sense to me. I'll correct all this in the upcoming days and add some unit tests.

@mikefarah
Copy link
Owner

ping!

@vaab
Copy link
Contributor Author

vaab commented Mar 27, 2023

So sorry, had several things on my plate to deal with and I was not this good at organizing my time. I hope to finish this this week. I'm working on it right now. Thanks for your kind reminder.

This is to ensure solid parsing of complex data (with any binary
content except NUL chars) by separating the `yq` root collection
member's output with NUL char. As a safe-guard, an error will be cast
if trying to use NUL character with content that contains itself NUL
characters inside.
@vaab vaab force-pushed the nul-output-separator branch from 35b92b4 to 6cc43c8 Compare March 28, 2023 14:57
@vaab
Copy link
Contributor Author

vaab commented Mar 28, 2023

The code was rebased, your comments were taken into account (I have heavily commented and changed for the better some code on the bash part), and added some unit test as promised.

Tell me your feelings. I'm sorry that it might require you to step a little in the murky waters of bash.

Thank you very much for your review and sorry for the delay.
Feel free to point me out further modification if you feel so.

@mikefarah
Copy link
Owner

Thanks @vaab - very well commented bash :)

@mikefarah mikefarah merged commit 5fd2890 into mikefarah:master Mar 28, 2023
@vaab
Copy link
Contributor Author

vaab commented Mar 29, 2023

many thanks for your time and for yq !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants