I have fast5 files from Minion data. I want to perform the quality check for these files are there any tools available for Quality check of fast5 files?
I tried an alternative I extracted fastq from fast5 and then tried to do fastqc,but results are not satisfactory quality scores are very low , I am not sure if this can be right approach to assess Minion data.
I want to perform the quality check for these files ware there any
tools available for QC of fast5 files?
It's not entirely clear what you want to do. What kind of QC would you like to investigate?
Anyway, I would like to promote a script I wrote, NanoPlot. It's meant for plots of reads (fastq) and alignments (bam) of Oxford Nanopore sequencing data. I'm looking forward to your feedback. A few examples can be found on my blog. But I recently decided to remove fast5 support, because the latest basecaller directly outputs fastq.
As mentioned by you, you have performed : Per base sequence content and quality what are these value in the y-axis , if these values are Phred scores then Q30 is usually considered to be as cutoff for good quality read. In your plots all the bases shows quality value of 8, I want to know ,what threshold are you using as cut-off for good quality reads.
Please let me know how NanoPlot works for you and which issues you may encounter.
The Per base sequence quality has improved quite a bit with the release of the latest albacore basecaller, for that see also this post.
About filtering reads, have a look at this post. I also wrote NanoFilt for filtering and trimming. I don't know what your application is, but using a cut-off of average basecall quality > 12 removes the worst quality reads. If you put the cut-off at 16 or 17 you are always getting better reads, but also losing quite a bit of reads. Have a look at the plot below to decide:
You would expect shorter fragments in RNA-seq, but that shouldn't be a real problem. I haven't tested it on RNA-seq, so this is an interesting test. I suggest you use GMAP for the alignment. Perhaps the percent identities you obtain will be lower since spliced alignment might introduce some mismapping.
I prefer STAR over GMAP as STAR gives more specific alignment and False positive rate is less using STAR then GMAP.
and STAR has provided promising results on PacBio Long reads.
Seems your Nanoplot shall do the needful. I am looking for something similar you have performed here : https://gigabaseorgigabyte.wordpress.com/2017/06/01/example-gallery-of-nanoplot/ , the average read quality.
As mentioned by you, you have performed : Per base sequence content and quality what are these value in the y-axis , if these values are Phred scores then Q30 is usually considered to be as cutoff for good quality read. In your plots all the bases shows quality value of 8, I want to know ,what threshold are you using as cut-off for good quality reads.
Please let me know how NanoPlot works for you and which issues you may encounter.
The Per base sequence quality has improved quite a bit with the release of the latest albacore basecaller, for that see also this post.
About filtering reads, have a look at this post. I also wrote NanoFilt for filtering and trimming. I don't know what your application is, but using a cut-off of average basecall quality > 12 removes the worst quality reads. If you put the cut-off at 16 or 17 you are always getting better reads, but also losing quite a bit of reads. Have a look at the plot below to decide:
Yes I'll let you know how it looks for my data.Its for RNA-Seq data , any particular consideration for RNA-Seq data ?
How did you prepare the library?
You would expect shorter fragments in RNA-seq, but that shouldn't be a real problem. I haven't tested it on RNA-seq, so this is an interesting test. I suggest you use GMAP for the alignment. Perhaps the percent identities you obtain will be lower since spliced alignment might introduce some mismapping.
I prefer STAR over GMAP as STAR gives more specific alignment and False positive rate is less using STAR then GMAP. and STAR has provided promising results on PacBio Long reads.
Well, I've tried STAR for Oxford Nanopore sequencing, and my experience is different. I assume PacBio CCS reads?