Skip to content

Include hasFiles and documentDescribes information in relationship comparison #153

Open
@meretp

Description

I had a look into the comparison method in the tools-java to find out if we can also use them in the testbed. As this method originates from this repo, I decided to open the issue here. So, apart from the fact that we would need to change one namespace to compare two equal docs, I ran into another issue. This is also an issue within the testbed (#51).
When comparing the doument

SPDXVersion: SPDX-2.3
DataLicense: CC0-1.0
DocumentNamespace: https://some.namespace
DocumentName: document name
SPDXID: SPDXRef-DOCUMENT

## Creation Information
Creator: Tool: test-tool
Created: 2022-01-01T00:00:00Z
## Relationships
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileA
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB

FileName: ./fileA.c
SPDXID: SPDXRef-fileA
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileA DEPENDS_ON SPDXRef-fileB
Relationship: SPDXRef-fileA DESCRIBED_BY SPDXRef-DOCUMENT
FileName: ./fileB.c
SPDXID: SPDXRef-fileB
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileB DEPENDENCY_OF SPDXRef-fileA
Relationship: SPDXRef-fileB DESCRIBED_BY SPDXRef-DOCUMENT

with

SPDXVersion: SPDX-2.3
DataLicense: CC0-1.0
DocumentNamespace: https://some.namespace2
DocumentName: document name
SPDXID: SPDXRef-DOCUMENT

## Creation Information
Creator: Tool: test-tool
Created: 2022-01-01T00:00:00Z
## Relationships
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileA
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB

FileName: ./fileA.c
SPDXID: SPDXRef-fileA
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileA DEPENDS_ON SPDXRef-fileB
Relationship: SPDXRef-fileA DESCRIBED_BY SPDXRef-DOCUMENT
FileName: ./fileB.c
SPDXID: SPDXRef-fileB
FileChecksum: SHA1: d6a770ba38583ed4bb4525bd96e50461655d2758
LicenseConcluded: NOASSERTION
## Relationships
Relationship: SPDXRef-fileB DEPENDENCY_OF SPDXRef-fileA

the resulting xlsx file lists a difference in the file relationships (which is understandable as the relationship Relationship: SPDXRef-fileB DESCRIBED_BY SPDXRef-DOCUMENT is missing -although it is a duplicate of SPDXRef-DOCUMENT DESCRIBES SPDXRef-fileB). But the result also marks document describes with diff although the values in the rows below are the same. The same holds for the file relationships of ./fileA.c.
Why is this marked as diff?
In general I would expect that the comparison is somehow independant from the direction of the relationship although I see that this is a rather complex topic and I can understand that the comparison would only be about the "actual present" relationships.

Another problem, which is related to this, is the following: For json, yaml and xml there are additionally the tags "documentDescribes" and "hasFiles" in packages to represent DESCRIBES, resp. CONTAINS-relationships. The tools-python avoid duplications when serialising and do not write for example SPDXRef-DOCUMENT DESCRIBES SPDXRef-File additionally out, because this information is already mapped in "documentDescribes: [SPDXRef-File]". This leads then with the comparison however again to the fact that two documents are not evaluated as equal.

Do you think it would make sense to add some additional logic to the comparison that checks for this kind of semantic equivalence of the relationships ( so including the information from hasFilesand documentDescribes) and not only for actual existence?

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions