Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question abount the genome module of coverm #215

Open
quliping opened this issue Jun 5, 2024 · 1 comment
Open

Question abount the genome module of coverm #215

quliping opened this issue Jun 5, 2024 · 1 comment

Comments

@quliping
Copy link

quliping commented Jun 5, 2024

Hello, coverm is a good software and very helpful for my work. However, I'm not sure about the calculation process of the 'genome' module of CoverM. It is very easy to understand the calculation process, e.g., TPM, in the contig module. For a single contig, we count the mapped count per base (total number of paired-end mapped reads divided by the contig length, abbreviated as C/B) for the contig, then the value was divided by the sum of the C/B for all contigs and multiply by 1e+6.
However, how did coverM calculate the abundance for a genome with multiple contigs? There are three hypotheses: (1) the total TPM of all contigs of the genome, (2) the average TPM of genome contigs (total TPM divided by genome length or contig number), (3) the total mapped counts of the genome divided by genome length (C/G, mapped count per base for the genome) then divided by the sum of the C/G value of all genomes then multiply by 1e+6? Which one was selected by coverM, or coverM choose another different method?

@wwood
Copy link
Owner

wwood commented Jun 5, 2024

Hi @quliping,

Thanks for the kind words.

If I understand you correctly, it is (3) which coverm uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants