-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
define handling of plus and space #261
base: master
Are you sure you want to change the base?
Conversation
@matt-phylum this is more than nice research and work you did there! 🙇 Let me review this in details and come with feedback! |
I found links to three more implementations in the readme file of this repository and updated the description to include them. packageurl-java merged some PRs that fixed most of the problems with spaces and plus signs so it's been updated here. |
I've published a slightly cleaned up version of the code used to collect the data for the above table in https://github.com/phylum-dev/purl-survey. Hopefully it's useful for future cross-implementation testing like this. |
+1 to this. I just discovered that syft will not encode I think you could probably add this https://github.com/anchore/packageurl-go library to this list since that's what syft is using. |
This is a big change (in terms of impact, not lines), and I'm not entirely sure its the correct change, but something needs to be done in this area.
Problem
The PURL spec describes qualifiers as being an
&
delimited sequenced of=
delimited key value pairs where the value is percent encoded, and the section on encoding describes a minimal set of characters that are supposed to be percent encoded in different contexts. This looks a lot like x-www-formurlencoded, but x-www-formurlencoded encodes almost all characters besides the ascii alphanumeric set, and has a special behavior where ' ' is encoded as '+'.I am aware of thirteen PURL implementations:
That means if you're working with qualifiers that have spaces or plus signs in their values, it's fairly likely that software using a different implementation of PURL will interpret the PURLs differently. This may also happen if your PURLs have plus signs in their versions (deb) or spaces in their names (swid) and the implementation incorrectly decodes the name and version as if they were qualifier values (see below).
6/14 of the implementations are decoding '+' as ' ', so here's why I think it's better for the spec to specify '+' is '+':
Proposal
The spec is updated to be specific, new tests are added to the test suite, and incorrectly escaped examples in the package types spec are updated to be consistent.
Unfortunately, this requires changes to at least 7/14 implementations to get everything aligned, but they should be minor changes.