Hi,
Background: We got WGS data og fungal samples.
Coverage: 100x
Platform: Illumina NOvaseq x (Paired 150bp)
GOAL: Denovo Genome Assemblies
When performing initial FastQC analysis of raw data. everything seems to be really good.
Total_Sequences Total_Bases Seq_Length GC%
Read1 90264516 13.5 Gbp 150 47
Read2 90264516 13.5 Gbp 150 48
The Per Base Sequence Quality looks really good. as all the bases are in green zone. and a blue line (median line) is between 38-40 across the graph (from base 1 to 150).
Problem 1: The next graph Per tile sequence quality the FastQC report shows error (red cross). the blue grapg below has some red spectrum lines. After looking at QC-Fail I found out that this is related to Flow-Cell issues, here.
Questions:
- I their anything that should be done when performing the data QC via fastP or some other tool ?
- For De-novo genome assembly and Ref-guided genome assembly, is it important to remove these regions from data or I can just ignore these errors and move on with downstream analysis ?
Your help and views on this are welcomed.
Do not lose a lot of sleep on this. There should be enough data that is from the blue region for the assemblies. But if you must then you could scan and remove reads that fall below quality scores represented by blue regions.