Description
- Expanding out from Improve SBOM Product Management #2685
SBOMs can specify a product in a number of different ways. For example, a SBOM can include a product as a Name, a CPE or a PURL (and possibly all three!). Whilst the quality of SBOMs is variable (and in some cases inconsistent) extending the mechanisms beyond a name should increase the reliability of the SBOM scanning and reduce false reports as CPE and/or PURL can include vendor information as well.
- related to feat: add CPE support to SBOM parsing #2943
Currently our SBOM processing uses the "name" field to try to look up a product. This is good enough in a lot of scenarios, but definitely not all of them because a single piece of software may have many names (e.g. beautifulsoup could be bs4 or python-beautifulsoup)
NVD uses a "vendor, product" pair that corresponds to CPE https://nvd.nist.gov/products/cpe.
There is also a reasonable amount of support for another "unique name" solution called PURL (package uniform resource locator): https://github.com/package-url/purl-spec
It's entirely possible that users would want to generate an SBOM using whatever tool they have, then manually annotate PURL information on top of it as a way of de-duplicating or fixing product names that don't line up. As such, we might want to treat PURL as higher priority than name in the parsers.
Checklist:
- Add PURL support to our SBOM parsers. The parsers are here: https://github.com/intel/cve-bin-tool/tree/main/cve_bin_tool/sbom_manager and if you search for "name" you'll see where they current get the name. Have it grab the PURL and parse the string appropriately in lieu of or in addition to the name. You may need to read the format docs for CycloneDX, SPDX, etc. to figure out what the field is called in each format.
- Change out the current get_vendor logic to use the vendor parsed from PURL (if available). The get_vendor code starts here if you need to change it:
- it's possible that you'll need to use some sort of lookup table to translate from PURL to CPE -- that could be left to future work if it's not easy and we start with a heuristic lookup similar to what we do with name right now.
- Add a test showing that it works
- Add a note to the documentation saying that we support PURL and noting any limitations or other things that users might need to know.
Note that In theory the PURL would be a unique identifier but in practice, I don't know if that will be true now or in the future. You can probably try treating it as unique and noting that as a potential limitation in the docs.
Related reading on the challenges of matching up "names" to packages: https://owasp.org/blog/2022/09/13/sbom-forum-recommends-improvements-to-nvd.html
This issue is reserved for a participant in the Open Source Hackaton 2023. Please leave it for hackathon participants through the end of April. If it hasn't been claimed by May 5 it will be open to any contributor who wants to work on it.