Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimating the abundances of multiple viral genomes #210

Open
asierFernandezP opened this issue Apr 6, 2024 · 5 comments
Open

Estimating the abundances of multiple viral genomes #210

asierFernandezP opened this issue Apr 6, 2024 · 5 comments

Comments

@asierFernandezP
Copy link

asierFernandezP commented Apr 6, 2024

Hi,

I am currently running coverm genome to estimate the abundance of multiple viral genomes in my samples. However I am not sure which is the best way to do this:

  • Is it correct to specify with --genome-fasta-files a single FASTA file with all the viral genomes? Should I split this FASTA into files containing only one viral genome per file? (or these 2 options make no difference at all)

  • Should I use the --reference option instead?

Thank you,
Asier

@wwood
Copy link
Owner

wwood commented Apr 6, 2024

I think probably easiest to use contig mode instead of genome. The only downside is that you cannot output relative abundance. However that is readily calculated from the ratio of the means, perhaps taking into account the number of reads that map.

@asierFernandezP
Copy link
Author

asierFernandezP commented Apr 7, 2024

Thank you for the quick response!

And regarding the output, as I am currently using both --coupled (with paired FASTQs) and --single (with unpaired FASTQ) options, I get 2 columns of abundances (one for the paired files and one for the unpaired). Which would be the best way to combine this into a single column (as I am just interested in getting the total abundance of each contig in my sample - considering both paired and unpaired reads?

@wwood
Copy link
Owner

wwood commented Apr 7, 2024

If you are just using the mean output, I think easiest is just to add the results of the two columns. More complicated for other outputs.

@asierFernandezP
Copy link
Author

In this case I am using RPKM

@asierFernandezP asierFernandezP changed the title Estimating the abudnances of multiple viral genomes Estimating the abundances of multiple viral genomes Apr 7, 2024
@SebasSaenz
Copy link

Hi,

Thank you for the amazing tool that has saved a lot of time in my analysis !!!

I was following this question and I don't fully understand this: "However that is readily calculated from the ratio of the means, perhaps taking into account the number of reads that map."

Does this means?

Total mapped reads 10 out of 100 reads

              mean  reads    %

contig_a 2 3.3 3.3
contig_b 4. 6.7 6.7

I am sorry if this is nonsense

Best,

Johan Sebastián

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants