Skip to content

global: clean up duplicate table DOIs in production instance #790

Open
@GraemeWatt

Description

When reindexing the QA instance after deploying PR #766 some of the records gave an exception:

sqlalchemy.exc.MultipleResultsFound: Multiple rows were found when exactly one was required

from the line:

submission = DataSubmission.query.filter_by(doi=doc["doi"]).one()

I just changed this line in commit 319ff15 to make it tolerate multiple results. However, it should be investigated in more detail why there are multiple DataSubmission objects with the same doi. I found 6 examples:

These all date from the early days of hepdata.net in 2017/2018 when the submission code was buggy and the procedure for replacing uploads was not done cleanly. It should be investigated how to clean up the database to remove the duplicate DOIs.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    • Status

      To do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions