Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 3:9:582.
doi: 10.1186/1471-2164-9-582.

Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva

Affiliations

Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva

Xiang Guo et al. BMC Genomics. .

Abstract

Background: Parasites in the genus Theileria cause lymphoproliferative diseases in cattle, resulting in enormous socio-economic losses. The availability of the genome sequences and annotation for T. parva and T. annulata has facilitated the study of parasite biology and their relationship with host cell transformation and tropism. However, the mechanism of transcriptional regulation in this genus, which may be key to understanding fundamental aspects of its parasitology, remains poorly understood. In this study, we analyze the evolution of non-coding sequences in the Theileria genome and identify conserved sequence elements that may be involved in gene regulation of these parasitic species.

Results: Intergenic regions and introns in Theileria are short, and their length distributions are considerably right-skewed. Intergenic regions flanked by genes in 5'-5' orientation tend to be longer and slightly more AT-rich than those flanked by two stop codons; intergenic regions flanked by genes in 3'-5' orientation have intermediate values of length and AT composition. Intron position is negatively correlated with intron length, and positively correlated with GC content. Using stringent criteria, we identified a set of high-quality orthologous non-coding sequences between T. parva and T. annulata, and determined the distribution of selective constraints across regions, which are shown to be higher close to translation start sites. A positive correlation between constraint and length in both intergenic regions and introns suggests a tight control over length expansion of non-coding regions. Genome-wide searches for functional elements revealed several conserved motifs in intergenic regions of Theileria genomes. Two such motifs are preferentially located within the first 60 base pairs upstream of transcription start sites in T. parva, are preferentially associated with specific protein functional categories, and have significant similarity to know regulatory motifs in other species. These results suggest that these two motifs are likely to represent transcription factor binding sites in Theileria.

Conclusion: Theileria genomes are highly compact, with selection seemingly favoring short introns and intergenic regions. Three over-represented sequence motifs were independently identified in intergenic regions of both Theileria species, and the evidence suggests that at least two of them play a role in transcriptional control in T. parva. These are prime candidates for experimental validation of transcription factor binding sites in this single-celled eukaryotic parasite. Sequences similar to two of these Theileria motifs are conserved in Plasmodium hinting at the possibility of common regulatory machinery across the phylum Apicomplexa.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of length and GC content of non-coding DNA in T. parva. Histograms of intergenic region length (A) and intron length (B), and GC content (%) of intergenic regions (C) and introns (D). The length of intergenic regions fits a lognormal distribution (A), while GC content of both types of non-coding DNA is normally distributed (C, D). The histogram of intron length is biomodal, which probably results from the overlay of two independent distributions of small (mode = 55) and large (mode = 80) introns, much like what had been documented for other organism [19]. Similar results are found in T. annulata (not shown).
Figure 2
Figure 2
Length, GC content and selective constraint distributions in three classes of intergenic regions in T. parva. Distribution of length (A), GC content (B) and selective constraint (C) per class of intergenic region (IGR) are depicted by boxplots. The three IGR classes (5'-5', 3'-5', 3'-3') are named according to the orientation of the flanking genes. Each box (interquartile range, IQR) contains the sample's 25% to 75% range (quartiles Q1 to Q3, respectively), with the bottleneck placed at the sample median. Horizontal tick marks show the range of all elements within Q1-1.5*IQR and Q3+1.5*IQR. Open circles mark data points outside this range, which are considered outliers. The width of the bottleneck (i.e the length of the V-shaped notch) is an indication of the confidence of the median; a lack of overlap of the bottleneck between samples implies that the medians are significantly different at ~95% confidence level. Similar results are found in T. annulata (not shown).
Figure 3
Figure 3
Length, GC content and selective constraint distributions across intron positions in T. parva. Distribution of length (A), GC content (B) and selective constraint (C) in introns of different ordinal numbers are depicted by boxplots. The last class averages across introns of position equal to or larger than 5. Graph description as in legend of Figure 2. In the boxplot of length distribution, a log scale is used for the vertical axis since intron lengths span a large range. Similar results are found in T. annulata (not shown).
Figure 4
Figure 4
Conserved motifs and their best matches in databases of known motifs. Pictogram representation of the top three MEME-derived motifs in T. parva and their most similar motifs in T. annulata (two left columns). The best matches to conserved motifs in T. parva among known motifs are shown (center), including name, database source, sequence logo, and STAMP E-value, which is a relative measure of similarity between two motifs based on simulated position specific score matrix models. Functional and structural annotations enriched in downstream genes of each T. parva motif are shown on the right.
Figure 5
Figure 5
Distribution of conserved sequence motifs with respect to putative transcription start sites in T. parva. The distances between the first base of each motif and the TSS were determined for all genes for which both types of information were available. Distances are binned in 10-bp intervals. The frequency is determined based on 132 sites for motif 1, 168 sites for motif 2, and 67 sites for motif 3.
Figure 6
Figure 6
Distribution of three conserved motifs in different partition of the T. parva genome. The distribution of the three highest-scoring MEME-derived motifs in T. parva was determined in coding regions (CDS), 5'-5', 5'-3' and 3'-3' intergenic regions, and introns using the MAST algorithm. Relative frequency of each type of sequence with at least one occurrence of a motif is plotted as a function of the MAST E-value cutoff.

Similar articles

Cited by

References

    1. Norval RAI, Perry BD, Young AS. The epidemiology of Theileriosis in Africa. London: Academic Press; 1992.
    1. Dobbelaere DA, Kuenzi P. The strategies of the Theileria parasite: a new twist in host-pathogen interactions. Curr Opin Immunol. 2004;16:524–530. doi: 10.1016/j.coi.2004.05.009. - DOI - PubMed
    1. Brown CGD, Stagg DA, Purnell RE, Kanhai GK, Payne RC. Infection and transformation of bovine lymphoid cells in vitro by infective particles of Theileria parva. Nature. 1973;245:101–103. doi: 10.1038/245101a0. - DOI - PubMed
    1. Shiels B, Swan D, McKellar S, Aslam N, Dando C, Fox M, Ben-Miled L, Kinnaird J. Directing differentiation in Theileria annulata: old methods and new posibilities for control of apicomplexan parasites. Int J Parasitol. 1998;28:1659–1670. doi: 10.1016/S0020-7519(98)00131-3. - DOI - PubMed
    1. van Noort V, Huynen MA. Combinatorial gene regulation in Plasmodium falciparum. Trends Genet. 2006;22:73–78. doi: 10.1016/j.tig.2005.12.002. - DOI - PubMed

LinkOut - more resources