Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to specify the start and end of a date range for sequence collection dates #6

Open
elray1 opened this issue May 7, 2024 · 3 comments

Comments

@elray1
Copy link
Collaborator

elray1 commented May 7, 2024

We will typically just need to get clade assignments (and summarize to counts of clade assignments) for samples that were collected within a particular date range. We should be able to specify those dates as part of a call to assign_clades. This is related to discussion in reichlab/variant-nowcast-hub#3 in that we'll need to be sure that when we pull the sequence data, we get everything that has a collection date within the specified range.

For more specificity, here's a suggestion that I'm not at all committed to: we could introduce command line arguments seq_start_date and seq_end_date and keep anything with seq_start_date <= collection_date <= seq_end_date.

@bsweger
Copy link
Collaborator

bsweger commented Jun 21, 2024

@elray1 quick clarification: how would these start and end dates interplay with the --released-after parameter we're already using when getting sequence data via the NCBI API?

Is the starting range of the sequence collection date the same date we'd use as --released-after on the API call, or a different parameter altogether?

@elray1
Copy link
Collaborator Author

elray1 commented Jun 21, 2024

I believe it should at least be closely related (maybe released-after = seq_start_date - 1??). But I'm not sure. Do you know of a place where these dates are documented?

@bsweger
Copy link
Collaborator

bsweger commented Jun 21, 2024

I couldn't find anything definitive on the relationship between the API's released-after parameter and the collection-date in the metadata.

collection-date definition from the metadata schema:

The collection date for the sample from which the viral nucleotide sequence was derived

reference to "released after" property of NCBI's virus dataset downloads:

genomes released after

Let's chat about how to get a definitive answer. In the meantime, I'll use your released-after = seq_start_date - 1 to get started.

@bsweger bsweger transferred this issue from reichlab/variant-nowcast-hub Aug 6, 2024
@bsweger bsweger added this to the Variant Nowcast Launch milestone Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants