Include hasFiles and documentDescribes information in relationship comparison #153
Description
I had a look into the comparison method in the tools-java to find out if we can also use them in the testbed. As this method originates from this repo, I decided to open the issue here. So, apart from the fact that we would need to change one namespace to compare two equal docs, I ran into another issue. This is also an issue within the testbed (#51).
When comparing the doument
SPDXVersion: SPDX-2.3
DataLicense: CC0-1.0
DocumentNamespace: https://some.namespace
DocumentName: document name
SPDXID: SPDXRef-DOCUMENT
## Creation Information
Creator: Tool: test-tool
Created: 2022-01-01T00:00:00Z
## Relationships
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileA
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB
FileName: ./fileA.c
SPDXID: SPDXRef-fileA
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileA DEPENDS_ON SPDXRef-fileB
Relationship: SPDXRef-fileA DESCRIBED_BY SPDXRef-DOCUMENT
FileName: ./fileB.c
SPDXID: SPDXRef-fileB
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileB DEPENDENCY_OF SPDXRef-fileA
Relationship: SPDXRef-fileB DESCRIBED_BY SPDXRef-DOCUMENT
with
SPDXVersion: SPDX-2.3
DataLicense: CC0-1.0
DocumentNamespace: https://some.namespace2
DocumentName: document name
SPDXID: SPDXRef-DOCUMENT
## Creation Information
Creator: Tool: test-tool
Created: 2022-01-01T00:00:00Z
## Relationships
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileA
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB
FileName: ./fileA.c
SPDXID: SPDXRef-fileA
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileA DEPENDS_ON SPDXRef-fileB
Relationship: SPDXRef-fileA DESCRIBED_BY SPDXRef-DOCUMENT
FileName: ./fileB.c
SPDXID: SPDXRef-fileB
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileB DEPENDENCY_OF SPDXRef-fileA
the resulting xlsx file lists a difference in the file relationships (which is understandable as the relationship Relationship: SPDXRef-fileB DESCRIBED_BY SPDXRef-DOCUMENT
is missing -although it is a duplicate of SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB
). But the result also marks document describes with diff although the values in the rows below are the same. The same holds for the file relationships of ./fileA.c
.
Why is this marked as diff?
In general I would expect that the comparison is somehow independant from the direction of the relationship although I see that this is a rather complex topic and I can understand that the comparison would only be about the "actual present" relationships.
Another problem, which is related to this, is the following: For json, yaml and xml there are additionally the tags "documentDescribes" and "hasFiles" in packages to represent DESCRIBES
, resp. CONTAINS
-relationships. The tools-python
avoid duplications when serialising and do not write for example SPDXRef-DOCUMENT DESCRIBES SPDXRef-File
additionally out, because this information is already mapped in "documentDescribes: [SPDXRef-File]". This leads then with the comparison however again to the fact that two documents are not evaluated as equal.
Do you think it would make sense to add some additional logic to the comparison that checks for this kind of semantic equivalence of the relationships ( so including the information from hasFiles
and documentDescribes
) and not only for actual existence?