Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan;4(1):e5.
doi: 10.1371/journal.pcbi.0040005. Epub 2007 Nov 27.

In silico detection of sequence variations modifying transcriptional regulation

Affiliations

In silico detection of sequence variations modifying transcriptional regulation

Malin C Andersen et al. PLoS Comput Biol. 2008 Jan.

Abstract

Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Impact on TFBS Score of Mutations Inserted into Synthesized TFBS Sequences
The boxes correspond to score deltas for (from left to right) 1 bp substitutions, 2 bp substitutions at adjacent positions, two randomly placed 1 bp substitutions, 3 bp substitutions both in adjacent and at random positions, four randomly placed base pair substitutions, five randomly placed substitutions, one randomly placed 1 bp insertion, and one randomly placed 1 bp deletion.
Figure 2
Figure 2. Fractions of Regulatory and Background SNPs Overlapping Predicted TFBSs
SNPs were analyzed using all transcription factors in the JASPAR database, and using TFBS score delta thresholds between one and nine.
Figure 3
Figure 3. Fractions of Regulatory and Background SNPs in Evolutionary Conserved Regions
SNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. The fractions of SNPs located within conserved regions were calculated for mean phastCons score thresholds between 0.1 and 0.9. For every threshold a Fisher's exact test was performed to test if there was a significantly different frequency of successes in the regulatory versus the background SNP sets; p-values are indicated above each pair of bars.
Figure 4
Figure 4. Distributions of Mean phastCons Scores for SNPs located at Different Distances from the TSS
SNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. For every interval a student's T-test was performed to check if there were significant differences in the distributions of phastCons values for the regulatory and background SNPs; the p-values from these tests are indicated above each pair of boxes.
Figure 5
Figure 5. Combination of TFBS Analysis and Phylogenetic Footprinting
Sensitivity of the predictions is plotted versus 1-specificity for phastCons score thresholds of 0, 0.1, 0.2, etc., up to 0.9. The whole range of values is only shown for the red curve; for the other curves, values for phastCons score thresholds 0 and 0.1 are outside the area covered by the plot. The curves correspond to different TFBS score delta thresholds. In the left panel, the relative TFBS score threshold for the best matching allele was 80%, in the right panel the relative TFBS score threshold for the best matching allele was 90%.
Figure 6
Figure 6. Distributions of TFBS Score Delta Values for Background SNPs, Regulatory SNPs, and Regulatory SNPs for Which the Affected TFBS Is Known
In the three leftmost boxes the average score delta for all matches to any PWMs in the JASPAR database was collected for every SNP. In the rightmost box the score delta for the PWM corresponding to the verified PWM was collected for every SNP.
Figure 7
Figure 7. Overview of the RAVEN Web Interface
(A) The search page. (B) The search results page where a list of genes corresponding to the search query is displayed. (C) The reference sequence selection page where the genomic location of the selected human sequence and cDNAs that map to it is displayed. (D) The graphical results view. (E) Table view of SNPs predicted to affect TFBSs. (F) Selection of TFBS profiles from the JASPAR database. (G) Upload of private SNP sequences.

Similar articles

Cited by

References

    1. Kuriki C, Tanaka T, Fukui Y, Sato O, Motojima K. Structural and functional analysis of a new upstream promoter of the human FAT/CD36 gene. Biol Pharm Bull. 2002;25:1476–1478. - PubMed
    1. Zabetian CP, Anderson GM, Buxbaum SG, Elston RC, Ichinose H, et al. A quantitative-trait analysis of human plasma-dopamine beta-hydroxylase activity: evidence for a major functional polymorphism at the DBH locus. Am J Hum Genet. 2001;68:515–522. - PMC - PubMed
    1. Rigat B, Hubert C, Alhenc-Gelas F, Cambien F, Corvol P, et al. An insertion/deletion polymorphism in the angiotensin I-converting enzyme gene accounting for half the variance of serum enzyme levels. J Clin Invest. 1990;86:1343–1346. - PMC - PubMed
    1. Bosma PJ, Chowdhury JR, Bakker C, Gantla S, de Boer A, et al. The genetic basis of the reduced expression of bilirubin UDP-glucuronosyltransferase 1 in Gilbert's syndrome. N Engl J Med. 1995;333:1171–1175. - PubMed
    1. De Gobbi M, Viprakasit V, Hughes JR, Fisher C, Buckle VJ, et al. A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science. 2006;312:1215–1217. - PubMed

Publication types

Substances

Grants and funding