Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40

doi:10.1371/journal.pone.0063673

. 2013 May 7;8(5):e63673.

doi: 10.1371/journal.pone.0063673. Print 2013.

Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40

Myco Umemura¹, Yoshinori Koyama, Itaru Takeda, Hiroko Hagiwara, Tsutomu Ikegami, Hideaki Koike, Masayuki Machida

Affiliations

PMID: 23667655
PMCID: PMC3646829
DOI: 10.1371/journal.pone.0063673

Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40

Myco Umemura et al. PLoS One. 2013.

. 2013 May 7;8(5):e63673.

doi: 10.1371/journal.pone.0063673. Print 2013.

Authors

Myco Umemura¹, Yoshinori Koyama, Itaru Takeda, Hiroko Hagiwara, Tsutomu Ikegami, Hideaki Koike, Masayuki Machida

Affiliation

¹ National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Hokkaido, Japan.

PMID: 23667655
PMCID: PMC3646829
DOI: 10.1371/journal.pone.0063673

Abstract

The development of next-generation sequencing (NGS) technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Overview of our *de novo* genome assembly pipeline.**
The assembly block is performed using SOLiD *De Novo* Accessory Tools 2.0 developed by Life Technologies. The data filtering and analysis blocks are written in shell, Ruby, or Perl languages.

**Figure 2. Coverage of the reference genome sequence by the assembled scaffolds.**
(a) Dot-plot alignments of assembled scaffolds vs the reference genome sequence of *Aspergillus oryzae* RIB40. (b) Reference genome sequences aligned by assembled scaffold fragments with lengths of ≤10 kb (yellow), >10 kb (green), and >50 kb (red). The Roman numerals I-VIII indicate the chromosome index of the RIB40 genome.

**Figure 3. Reconstruction of gene regions in the assembled scaffolds.**
(a) Plot showing the number of known *Aspergillus oryzae* RIB40 genes that were not found in the assembled genome. (b) Percentages of high-scoring segment pair (HSP; solid) and identical bases (open) in the gene regions. The graph includes the results of the assemblies using lib2.8 and lib1.9 with either unfiltered (nofilter), no undetermined bases (nodot), or QV >10 data. For lib2.8.qv10 and lib1.9.qv10, the results using k-mers of 25 to 35 are included.

**Figure 4. Proportion of assembled scaffold fragments aligned to the *Aspergillus oryzae* RIB40 reference genome.**
The length of aligned fragments are indicated by color (bluegreen, >50 kb; purple, >10 kb; gray, ≤10 kb; and yellow, 0 or none). The graph includes the results of the assemblies using lib2.8 and lib1.9 with either unfiltered (nofilter), no undetermined bases (nodot), or QV >10 data. For lib2.8.qv10 and lib1.9.qv10, the results using k-mers of 25 to 35 are included.

**Figure 5. Cumulative lengths of assembled scaffolds (>95 bp).**
(a) The profiles in the lib2.8.nofilter.k31, lib2.8.nodot.k31, lib2.8.qv10.k31, lib1.9.nodot.k31, and lib1.9.qv10.k31 assemblies using lib2.8 and lib1.9 with either unfiltered (nofilter), no undetermined bases (nodot), or QV >10 data. (b) The profiles of lib2.8.qv10 and (c) lib1.9.qv10 with changing k-mers from 25 to 35. The dashed grey line at 37.2 Mb in each graph denotes the size of the reference genome.

**Figure 6. R50 and N50 values with different k-mer sizes.**
(a) The R50 and (b) the N50 values for lib2.8.qv10 (closed square) and lib1.9.qv10 (open square). The R50 value corresponds to N50 using sequence fragments of the reference genome covered by highly accurate sequences of assembled scaffolds.

See this image and copyright information in PMC

Cited by

Aspergillus oryzae as a Cell Factory: Research and Applications in Industrial Production.
Sun Z, Wu Y, Long S, Feng S, Jia X, Hu Y, Ma M, Liu J, Zeng B. Sun Z, et al. J Fungi (Basel). 2024 Mar 26;10(4):248. doi: 10.3390/jof10040248. J Fungi (Basel). 2024. PMID: 38667919 Free PMC article. Review.
Safety of the fungal workhorses of industrial biotechnology: update on the mycotoxin and secondary metabolite potential of Aspergillus niger, Aspergillus oryzae, and Trichoderma reesei.
Frisvad JC, Møller LLH, Larsen TO, Kumar R, Arnau J. Frisvad JC, et al. Appl Microbiol Biotechnol. 2018 Nov;102(22):9481-9515. doi: 10.1007/s00253-018-9354-1. Epub 2018 Oct 6. Appl Microbiol Biotechnol. 2018. PMID: 30293194 Free PMC article. Review.
Taxonomy of Aspergillus section Flavi and their production of aflatoxins, ochratoxins and other mycotoxins.
Frisvad JC, Hubka V, Ezekiel CN, Hong SB, Nováková A, Chen AJ, Arzanlou M, Larsen TO, Sklenář F, Mahakarnchanakul W, Samson RA, Houbraken J. Frisvad JC, et al. Stud Mycol. 2019 Jun;93:1-63. doi: 10.1016/j.simyco.2018.06.001. Epub 2018 Jul 31. Stud Mycol. 2019. PMID: 30108412 Free PMC article.
Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study.
Raethong N, Wong-Ekkabut J, Laoteng K, Vongsangnak W. Raethong N, et al. Biomed Res Int. 2016;2016:8124636. doi: 10.1155/2016/8124636. Epub 2016 May 4. Biomed Res Int. 2016. PMID: 27274991 Free PMC article.
SATRAP: SOLiD Assembler TRAnslation Program.
Campagna D, Gasparini F, Franchi N, Manni L, Telatin A, Vitulo N, Ballarin L, Valle G. Campagna D, et al. PLoS One. 2015 Sep 14;10(9):e0137436. doi: 10.1371/journal.pone.0137436. eCollection 2015. PLoS One. 2015. PMID: 26368549 Free PMC article.

See all "Cited by" articles

References

1. Marroni F, Pinosio S, Morgante M (2012) The quest for rare variants: pooled multiplexed next generation sequencing in plants. Front Plant Sci 3: 133. - PMC - PubMed
1. Lee SW, Markham PF, Markham JF, Petermann I, Noormohammadi AH, et al. (2011) First complete genome sequence of infectious laryngotracheitis virus. BMC Genomics 12: 197. - PubMed
1. Cunningham C, Gatherer D, Hilfrich B, Baluchova K, Dargan DJ, et al. (2010) Sequences of complete human cytomegalovirus genomes from infected cell cultures and clinical specimens. J Gen Virol 91(Pt 3): 605–615. - PMC - PubMed
1. Ghosh W, George A, Agarwal A, Raj P, Alam M, et al. (2011) Whole-genome shotgun sequencing of the sulfur-oxidizing chemoautotroph Tetrathiobacter kashmirensis . J Bacteriol 193(19): 5553–5554. - PMC - PubMed
1. Perez Chaparro PJ, McCulloch JA, Cerdeira LT, Al-Dilaimi A, Canto de Sa LL, et al. (2011) Whole genome sequencing of environmental Vibrio cholerae O1 from 10 nanograms of DNA using short reads. J Microbiol Methods 87(2): 208–212. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

The authors have no support or funding to report.

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

[1] Marroni F, Pinosio S, Morgante M (2012) The quest for rare variants: pooled multiplexed next generation sequencing in plants. Front Plant Sci 3: 133. - PMC - PubMed

[2] Marroni F, Pinosio S, Morgante M (2012) The quest for rare variants: pooled multiplexed next generation sequencing in plants. Front Plant Sci 3: 133. - PMC - PubMed

[3] Lee SW, Markham PF, Markham JF, Petermann I, Noormohammadi AH, et al. (2011) First complete genome sequence of infectious laryngotracheitis virus. BMC Genomics 12: 197. - PubMed

[4] Lee SW, Markham PF, Markham JF, Petermann I, Noormohammadi AH, et al. (2011) First complete genome sequence of infectious laryngotracheitis virus. BMC Genomics 12: 197. - PubMed

[5] Cunningham C, Gatherer D, Hilfrich B, Baluchova K, Dargan DJ, et al. (2010) Sequences of complete human cytomegalovirus genomes from infected cell cultures and clinical specimens. J Gen Virol 91(Pt 3): 605–615. - PMC - PubMed

[6] Cunningham C, Gatherer D, Hilfrich B, Baluchova K, Dargan DJ, et al. (2010) Sequences of complete human cytomegalovirus genomes from infected cell cultures and clinical specimens. J Gen Virol 91(Pt 3): 605–615. - PMC - PubMed

[7] Ghosh W, George A, Agarwal A, Raj P, Alam M, et al. (2011) Whole-genome shotgun sequencing of the sulfur-oxidizing chemoautotroph Tetrathiobacter kashmirensis . J Bacteriol 193(19): 5553–5554. - PMC - PubMed

[8] Ghosh W, George A, Agarwal A, Raj P, Alam M, et al. (2011) Whole-genome shotgun sequencing of the sulfur-oxidizing chemoautotroph Tetrathiobacter kashmirensis . J Bacteriol 193(19): 5553–5554. - PMC - PubMed

[9] Perez Chaparro PJ, McCulloch JA, Cerdeira LT, Al-Dilaimi A, Canto de Sa LL, et al. (2011) Whole genome sequencing of environmental Vibrio cholerae O1 from 10 nanograms of DNA using short reads. J Microbiol Methods 87(2): 208–212. - PubMed

[10] Perez Chaparro PJ, McCulloch JA, Cerdeira LT, Al-Dilaimi A, Canto de Sa LL, et al. (2011) Whole genome sequencing of environmental Vibrio cholerae O1 from 10 nanograms of DNA using short reads. J Microbiol Methods 87(2): 208–212. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40

Affiliation

Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous