Hello,
This is a theoretical question. Why scRNA seq data have their UMIs to later on eliminate the PCR bias, but in bulk RNA seq it has no UMIs?? I know that in scRNAseq the PCR contamination is bigger, and I know that you can use Picard, but doing it only by the coordinates I find it very risky, and losing biological information. What do you think?
Thanks
UMI's can indeed be used for any type of sequencing. They add complexity (and cost) and can add a significant amount of informatics overhead as well. There are primers with UMI available from IDT that can be used for a wide range of applications.
I always imagined (never thought it through in detail so far) that the limitation was that you would need to PCR amplify DNA to attach UMIs. In single-cell RNAseq there is sufficient material before amplification, but for single-cell DNA there wouldn't be. Thus one would not benefit that greatly from using UMIs. Perhaps someone here can settle whether this is indeed a factor.
There are illumina compatible adapters that have UMI's built-in so they get directly attached to single molecules. xGEN Prism from IDT is one example I linked above.
They've done that since the beginning (of NGS), they just call them "barcodes".
The Illumina implementation is called unique dual indexes (UDIs).
Barcodes
are molecular indexes for samples.Indexes
is likely more appropriate term for Illumina ones since they are never part of actual reads.barcodes
are better to indicate aninline
implementation and they are thus part of actual read.UMI
are molecular/oligo indexes for individual molecules of DNA/RNA.for bulk RNA-seq, the amount of UMIs for 8 random nucleodtes will only give you ~ 56k unique reads 4^8. So if you are using high input sample for RNA-seq, you won't capture the uniqueness of the transcripts because they exceed the number of unique UMIs. However in single cells, the input is very low and it is feasible then
Well, for bulk RNA-seq, you generally would deduplicate based on the locus a read mapped to in addition to the UMI.