Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 May;9(5):507-17.
doi: 10.1261/rna.2193703.

A novel method for finding tRNA genes

Affiliations

A novel method for finding tRNA genes

Vickie Tsui et al. RNA. 2003 May.

Abstract

We describe a novel procedure for generating and optimizing pattern descriptors that can be used to find structural motifs in DNA or RNA sequences. This combines a pattern-description language (based primarily on secondary structure alignment and conservation of some key nucleotides) with a scoring function that relies heavily on estimated folding free energies for the secondary structure of interest. For the cloverleaf secondary structure characteristic of tRNA, we show that a fairly simple pattern descriptor can find almost all known tRNA genes in both bacterial and eukaryotic genomes, and that false positives (sequences that match the pattern but that are probably not tRNAs) can be recognized by their high estimated folding free energies. A general procedure for optimizing descriptors (and hence for finding new structural motifs) is also described. For six bacterial, four eukaryotic, and four archaea genome sequences, our results compare favorably with those of the more complex and specialized tRNAscan-SE algorithm. Prospects for using this general approach to find other RNA structural motifs are discussed.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
RNAMotif descriptor used to search for potential tRNA genes in bacterial, eukaryotic, and archaea genomes in graphic form (A) and in text form (B).
FIGURE 1.
FIGURE 1.
RNAMotif descriptor used to search for potential tRNA genes in bacterial, eukaryotic, and archaea genomes in graphic form (A) and in text form (B).
FIGURE 2.
FIGURE 2.
Nearest-neighbor energies of sequences found by RNAMotif as they adopt the secondary structure in Figure 1 ▶, for the following bacterial genomes: (A) Escherichia coli K-12, (B) E. coli O157:H7, (C) Bacillus subtilis, (D) Aquifex aeolicus, (E) Haemophilus influenzae Rd, and (F) Mycoplasma pneumoniae. (true tRNA) Sequences corresponding to those that were also found by tRNAscan-SE; (false pos.) sequences corresponding to those that were not found by tRNAscan-SE.
FIGURE 3.
FIGURE 3.
A sequence (and the corresponding cloverleaf structure) found by RNAMotif in the Escherichia coli O157:H7 genome that was not found by tRNAscan-SE. (Green) Conserved nucleotides included in the RNAMotif descriptor; (red) conserved nucleotides that were not included in the RNAMotif descriptor but were matched by this sequence; (blue) a conserved nucleotide that is violated by this sequence.
FIGURE 4.
FIGURE 4.
An optimized descriptor for the Escherichia coli genomes (both K-12 and O157:H7 strains).
FIGURE 5.
FIGURE 5.
Examples of lowest nearest-neighbor energy secondary structures from mfold, for sequences found by both RNAMotif and tRNAscan-SE.
FIGURE 6.
FIGURE 6.
Examples of lowest nearest-neighbor energy secondary structures from mfold, for sequences found only by RNAMotif.
FIGURE 7.
FIGURE 7.
(Top) Plot of the nearest-neighbor energies of sequences in Escherichia coli O157:H7 that were found by both tRNAscan-SE and RNAMotif (crosses), and sequences that were found only by RNAMotif (dots), as they adopt the secondary structure in Figure 1 ▶. (Bottom) The difference in nearest-neighbor energies between a sequence in cloverleaf structure and in its lowest-energy secondary structure, plotted for true tRNAs (filled circles) and false positives (open circles). The overlapping region is enclosed in dotted lines.
FIGURE 8.
FIGURE 8.
Nearest-neighbor energies of sequences found by RNAMotif as they adopt the secondary structure in Figure 1 ▶, for the following eukaryotic genomes: (A) Saccharomyces cerevisiae, (B) Arabidopsis thaliana, (C) Schizosaccharomyces pombe, and (D) Caenorhabditis elegans.
FIGURE 9.
FIGURE 9.
Nearest-neighbor energies of sequences found by RNAMotif as they adopt the secondary structure in Figure 1 ▶, for the following archaea genomes: (A) Archaeoglobus fulgidus, (B) Pyrococcus abyssi, (C) Methanobacterium thermoautotrophicum, and (D) Pyrococcus furiosus.
FIGURE 10.
FIGURE 10.
Examples illustrating the process of optimizing the descriptors, starting from a descriptor without sequence requirements (A) to a descriptor with sequence requirements but allowing a mispair in each stem (B).

Similar articles

Cited by

References

    1. Deutscher, M.P. 1982. tRNA nucleotidyltransferase. The Enzymes 15: 183–215.
    1. Diamond, J.M., Turner, D.H., and Mathews, D.H. 2001. Thermodynamics of three-way multibranch loops in RNA. Biochemistry 40: 6971–6981. - PubMed
    1. Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological sequence analysis. Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UK.
    1. Eddy, S.R. and Durbin, R. 1994. RNA sequence analysis using covariance models. Nucleic Acids Res. 22: 2079–2088. - PMC - PubMed
    1. Fichant, G.A. and Burks, C. 1991. Identifying potential tRNA genes in genomic DNA sequences. J. Mol. Biol. 220: 659–671. - PubMed

Publication types

LinkOut - more resources