You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We now have a better understanding of what data are published by Nextstrain's pipeline and how we can use it to get the COVID genome sequences+metadata required to generate target data for the variant nowcast hub.
Currently, the target data pipeline gets Genbank sequences via NCBI's API. It then uses an NBCI command line tool to format the accompanying metadata.
Background
We now have a better understanding of what data are published by Nextstrain's pipeline and how we can use it to get the COVID genome sequences+metadata required to generate target data for the variant nowcast hub.
Currently, the target data pipeline gets Genbank sequences via NCBI's API. It then uses an NBCI command line tool to format the accompanying metadata.
However, Nextstrain also sources Genbank sequences from NCBI, and they publish both the sequences and sequence metadata: https://docs.nextstrain.org/projects/ncov/en/latest/reference/remote_inputs.html#remote-inputs-open-files
Switching to Nextstrain as a source for the target data pipeline information (versus getting it directly from NCBI) would have several advantages:
The text was updated successfully, but these errors were encountered: