Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 16;23(18):10804.
doi: 10.3390/ijms231810804.

Chloroplast Genome Annotation Tools: Prolegomena to the Identification of Inverted Repeats

Affiliations

Chloroplast Genome Annotation Tools: Prolegomena to the Identification of Inverted Repeats

Ante Turudić et al. Int J Mol Sci. .

Abstract

The development of next-generation sequencing technology and the increasing amount of sequencing data have brought the bioinformatic tools used in genome assembly into focus. The final step of the process is genome annotation, which works on assembled genome sequences to identify the location of genome features. In the case of organelle genomes, specialized annotation tools are used to identify organelle genes and structural features. Numerous annotation tools target chloroplast sequences. Most chloroplast DNA genomes have a quadripartite structure caused by two copies of a large inverted repeat. We investigated the strategies of six annotation tools (Chloë, Chloroplot, GeSeq, ORG.Annotate, PGA, Plann) for identifying inverted repeats and analyzed their success using publicly available complete chloroplast sequences of taxa belonging to the asterid and rosid clades. The annotation tools use two different approaches to identify inverted repeats, using existing general search tools or implementing stand-alone solutions. The chloroplast sequences studied show that there are different types of imperfections in the assembled data and that each tool performs better on some sequences than the others.

Keywords: annotation; chloroplast genome; inverted repeats; repeat identification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Availability of the complete chloroplast genome, publishing years, sequence lengths, and the outcome of IR identification using six annotation tools in different families of asterids and rosids. Represented families contain 20 or more sequences. (a) Total number of species (gray bar) and number of available complete chloroplast sequences (colored bar) in NCBI, where yellow represents 20–49, orange 50–99, red 100–199, and brown ≥200 sequences. (b) Violin plots of the number of sequences published in NCBI per year. (c) Box plots of complete chloroplast sequence lengths. (d) The proportion of different outcomes of IR identification: six annotation tools produced the same outcome (green), two different outcomes (yellow), and three or more different outcomes (orange). Outcomes were treated as the same if the lengths of the identified IR differed by less than 10 bp.
Figure 2
Figure 2
Number of complete chloroplast sequences added to GenBank per year (leftmost chart) and the relative ratio of the three types of sequences identified using six annotation tools (and those annotated in NCBI): blue—identical IRs, orange—different IRs, and red—no IRs.
Figure 3
Figure 3
Box plots of the IR length distributions in complete chloroplast genome sequences grouped by type of IRs identified (identical or different) using six annotation tools (and those annotated in NCBI).

Similar articles

Cited by

References

    1. Dagan T., Roettger M., Stucken K., Landan G., Koch R., Major P., Gould S.B., Goremykin V.V., Rippka R., De Marsac N.T., et al. Genomes of Stigonematalean Cyanobacteria (Subsection V) and the Evolution of Oxygenic Photosynthesis from Prokaryotes to Plastids. Genome Biol. Evol. 2013;5:31–44. doi: 10.1093/gbe/evs117. - DOI - PMC - PubMed
    1. Sánchez-Baracaldo P., Raven J.A., Pisani D., Knoll A.H. Early Photosynthetic Eukaryotes Inhabited Low-Salinity Habitats. Proc. Natl. Acad. Sci. USA. 2017;114:E7737–E7745. doi: 10.1073/pnas.1620089114. - DOI - PMC - PubMed
    1. Ruhlman T.A., Jansen R.K. The Plastid Genomes of Flowering Plants. Methods Mol. Biol. 2014;1132:3–38. doi: 10.1007/978-1-62703-995-6_1. - DOI - PubMed
    1. Deng X.-W., Wing R.A., Gruissem W. The Chloroplast Genome Exists in Multimeric Forms. Proc. Natl. Acad. Sci. USA. 1989;86:4156–4160. doi: 10.1073/pnas.86.11.4156. - DOI - PMC - PubMed
    1. Bendich A.J., Smith S.B. Moving Pictures and Pulsed-Field Gel Electrophoresis Show Linear DNA Molecules from Chloroplasts and Mitochondria. Curr. Genet. 1990;17:421–425. doi: 10.1007/BF00334522. - DOI - PubMed

Substances

LinkOut - more resources