I am a new postdoc student and I was given a folder of fastq.gz files. I was told they are not de-multiplexed and I need to basically extract each sample information separately from each of these fastq file (they contain info for multiple subjects) and save it as fastq file and run dada2 pipeline on them to get ASVs. My apologies if I am not using some terms correctly, I am very new to this. I worked with ASV table before, but never done de-multiplixing before. If you can help me how to do it or what software or platform I can use to separate these samples, I appreciate your help.
Are the sample barcodes in the indices, or are they internal to the read? Have they been pulled out the the read and moved to the read name? If the usual Illumina indices are used to multiplex, it is far easier for them to be demultiplexed as the fastqs are being generated than to do it after the fact.
This is how the data looks like when I open a fastq file in terminal. There is also a Barcode text file with a column of sample ID and Barcode pair name.
MWI006 is the sample ID and I have a bunch of that with different numbers in one fastq file, which means I need to Demultiplex the samples.
That pic doesn't work for me, just copy and paste the text.
Sorry about that, I am pretty new to this forum.
I looked for what each line means, and I get it, the only part I am not getting is NGCCTCTT|1|NCTGCATA|1 at the end of first line. can you help me with this? what it means?
That probably the sequences of the two indices, but why didn't the people who made the fastqs demultiplex for you? Anyway, you can write a little script with whatever to split out the reads by the sample name, since for some reason that's in the read name. If you have a modest number of samples, you can grep for the desired sample names one at a time.
if you wanted to try to do this manually yourself, you might look at the posts here: How to subset fastq data based on leading nt of sequences?
That's not what the OP needs. Their indices are not embedded in the read.
This is how the data looks like when I open a fastq file in terminal. There is also a Barcode text file with a column of sample ID and Barcode pair name.
Hi eli_bayat,
welcome to Biostars. No need to apologize for being new to the community, we all were at some point. As advice, it is recommended to add data and code examples as plain text and highlight them by using the code button
10101
which allows easy copy/paste for others to, e.g. test code one might suggest to you.For embedding images, please use the image buttom (the one right of the
10101
bottom). You have to paste-in the full link to the image from the image hoster so e.g.https://i.ibb.co/HF8PH8T/(...).png
to make sure it is properly embedded. I made the changes in this thread this time. Cheers!Thanks! I appreciate it :)