Add support to specify the start and end of a date range for sequence collection dates #6

elray1 · 2024-05-07T17:10:01Z

We will typically just need to get clade assignments (and summarize to counts of clade assignments) for samples that were collected within a particular date range. We should be able to specify those dates as part of a call to assign_clades. This is related to discussion in reichlab/variant-nowcast-hub#3 in that we'll need to be sure that when we pull the sequence data, we get everything that has a collection date within the specified range.

For more specificity, here's a suggestion that I'm not at all committed to: we could introduce command line arguments seq_start_date and seq_end_date and keep anything with seq_start_date <= collection_date <= seq_end_date.

The text was updated successfully, but these errors were encountered:

bsweger · 2024-06-21T15:14:09Z

@elray1 quick clarification: how would these start and end dates interplay with the --released-after parameter we're already using when getting sequence data via the NCBI API?

Is the starting range of the sequence collection date the same date we'd use as --released-after on the API call, or a different parameter altogether?

elray1 · 2024-06-21T16:37:02Z

I believe it should at least be closely related (maybe released-after = seq_start_date - 1??). But I'm not sure. Do you know of a place where these dates are documented?

bsweger · 2024-06-21T19:58:42Z

I couldn't find anything definitive on the relationship between the API's released-after parameter and the collection-date in the metadata.

collection-date definition from the metadata schema:

The collection date for the sample from which the viral nucleotide sequence was derived

reference to "released after" property of NCBI's virus dataset downloads:

genomes released after

Let's chat about how to get a definitive answer. In the meantime, I'll use your released-after = seq_start_date - 1 to get started.

bsweger transferred this issue from reichlab/variant-nowcast-hub Aug 6, 2024

bsweger added this to the Variant Nowcast Launch milestone Aug 6, 2024

bsweger added the assign clades label Sep 12, 2024

bsweger added needed for eval and removed assign clades labels Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to specify the start and end of a date range for sequence collection dates #6

Add support to specify the start and end of a date range for sequence collection dates #6

elray1 commented May 7, 2024

bsweger commented Jun 21, 2024

elray1 commented Jun 21, 2024

bsweger commented Jun 21, 2024

Add support to specify the start and end of a date range for sequence collection dates #6

Add support to specify the start and end of a date range for sequence collection dates #6

Comments

elray1 commented May 7, 2024

bsweger commented Jun 21, 2024

elray1 commented Jun 21, 2024

bsweger commented Jun 21, 2024