Hi,
I ran DADA2 on 3 16S-datasets , what went well.
Then, I imported the output files to phyloseq in R to create some abundances tables, graphs etc.
My problem is to correlate the ASV names between the 3 abundance tables. I mean, each ASV will be named ASV1, ASV2 ... for each dataset, but the ASV1 won't be the same ASV1 between each table.
For the moment, I have my 3 phyloseq objects from which I removed any contaminants. I show one :
ps.pool1
phyloseq-class experiment-level object otu_table()
OTU Table: [ 1660 taxa and 80 samples ] sample_data()
Sample Data: [ 80 samples by 12 sample variables ] tax_table()
Taxonomy table: [ 1660 taxa by 8 taxonomic ranks ]
Then, I replace each ASV sequence by a generic name :
dna <- Biostrings::DNAStringSet(taxa_names(ps.pool1))
names(dna) <- taxa_names(ps.pool1)
ps.pool1 <- merge_phyloseq(ps.pool1, dna)
taxa_names(ps.pool1) <- paste0("ASV", seq(ntaxa(ps.pool1)))
taxa_names(ps.pool1)
[1] "ASV1" "ASV2" "ASV3" "ASV4" "ASV5" "ASV6" "ASV7"
[8] "ASV8" "ASV9" "ASV10" "ASV11" "ASV12" "ASV13" "ASV14"
[15] "ASV15" "ASV16" "ASV17" "ASV18" "ASV19" "ASV20" "ASV21"
[22] "ASV22" "ASV23" "ASV24" "ASV25" "ASV26" "ASV27" "ASV28"
What I need is to give the same ASV name to the same ASV sequences, between the 3 datasets. In order to compare the three abudance tables.
Any help? Best
Hi,
I'm not sure if I understood completely your problem. If so, there are several ways to address this issue:
The easiest is to give the ASV1...ASVn names to the object you obtain from
DADA2
. Before doing this you should keep a mapping table mapping each new ASV id to the ASV sequence for future reference. Then, all your downstream tables should match and correspond to this one.Create a table matching each ASV id, i.e., ASV1, ASV2...etc, to the ASV sequence. Then you use this table to order or match ASV across tables.
Use the ASV sequences throughout the analyses. I know this is less convenient due to its size, but it is a possibility.
It is quite difficult for me to exemplify this as I don't have or know your objects.
If you wanna try the option (1) you may want to check this tutorial I made about DADA2 awhile ago (check section 7 - link).
I hope this helps.
Best,
António
Thanks for your reply.
I meant, for example, the "ASV1" needs to correspond to the same biological sequence, through my 3 datasets. That's why your first option, doing a mapping table, could be the solution.
I have this object, where each sequence has a "ASVx" name for the 3 datasets. I need the ASV1 sequence in the dataset1 is the same in the dataset 2 for example.
I let you know, Best
Hi again, I am struggling a bit. I am able to export as a fasta file my
refseq(ps.pool)
objects with :I can create a matching "ASV-sequence" fasta file for my 3 datasets.When a same sequence is found in 2 or 3 fasta files, I check the headers and rename them to get only one header for the sequence. Something like that.
But I don't know how to create a "table" in R with this fasta file, and then, match ASV names for the 3 datasets..