Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan 1;24(1):11-7.
doi: 10.1093/bioinformatics/btm547. Epub 2007 Nov 15.

Determination and validation of principal gene products

Affiliations

Determination and validation of principal gene products

Michael L Tress et al. Bioinformatics. .

Abstract

Motivation: Alternative splicing has the potential to generate a wide range of protein isoforms. For many computational applications and for experimental research, it is important to be able to concentrate on the isoform that retains the core biological function. For many genes this is far from clear.

Results: We have combined five methods into a pipeline that allows us to detect the principal variant for a gene. Most of the methods were based on conservation between species, at the level of both gene and protein. The five methods used were the conservation of exonic structure, the detection of non-neutral evolution, the conservation of functional residues, the existence of a known protein structure and the abundance of vertebrate orthologues. The pipeline was able to determine a principal isoform for 83% of a set of well-annotated genes with multiple variants.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Human and mouse exonic structure for the six variants of SF1. Human exonic structure for the SF1 variants (locus AP001462.5) shown left, mouse exonic structure on the right. Differences between human and mouse exonic structure are shown as white blocks and flagged by arrows. For variants 001 and 005 (the variant that codes for the SwissProt display sequence) exonic structure is not conserved.
Fig. 2
Fig. 2
The visual SLR output for the three variants of the TH locus (AC132217.7). At the top, the horizontal dashed lines indicate which exon belongs to which transcript (from top to bottom, 001 in red, 002 in green and 003 in blue). The colour of the circles (SLR score) and bars (confidence intervals) denote selection mode. Below this is a per nucleotide measure of conservation with abnormally fast sites coloured orange (third codon positions) or red (first or second codon positions). The black hatching at the foot of the record of the number of sequences available at each alignment position. The whole of the second coding exon between positions 100 and 200 is clearly differently conserved, suggesting that it is under unusual selective pressures. This exon is present in variant 002 and 003. On the basis of the SLR output, we rejected the hypothesis that these two variants could give rise to the principal functional isoform.
Fig. 3
Fig. 3
Sequence to structure mapping. Human neurexin 2 is 82% sequence identical to neurexin 1 over the second LNS/LG domain. From the alignments between the two neurexins, we were able to map the sequence of the four variants of the NRXN2 gene (AP001092.3) onto the neurexin 1 structure (2h0b). The SwissProt display sequence (from variant 001) has an insertion of 15 residues relative to the structure of neurexin 1. These 15 residues would need to be squeezed in between the residues marked in red and yellow on the structure, thus breaking a beta sheet. This variant is therefore unlikely to be the primary variant.
Fig. 4
Fig. 4
Mapping functional residues. The output from firestar, showing the N-terminal end of the alignment between variant 001 of RAD50 (AC0040401.1, labelled as ‘Query’) and a sequence with known structure (1xexA). The top line indicates the residue number of the variant. Below the alignment the numbers in coloured squares indicate locally conserved regions. The last two lines indicate the presence of ligand binding residues for ATP and magnesium. These last two rows are colour-coded: the darker the colours, the more conserved the residue. The structure clearly has conserved ATP binding residues and these are unlikely to have arisen by chance. Variant 002 (data not shown) is missing 139 residues from the N-terminal, including these functionally important residues, and was rejected as a candidate to be the principal variant.
Fig. 5
Fig. 5
The numbers of variants discounted by each of the first four pipeline methods, showing where variants were rejected by more than one method.

Similar articles

Cited by

References

    1. Alekseyenko AV, et al. Global analysis of exon creation versus loss and the role of alternative splicing in 17 vertebrate genomes. RNA. 2007;13:661–670. - PMC - PubMed
    1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Arinobu Y, et al. Antagonistic effects of an alternative splice variant of human IL-4, IL-4delta2, on IL-4 activities in human monocytes and B cells. Cell Immunol. 1999;191:161–167. - PubMed
    1. Bairoch A, et al. Swiss-Prot: juggling between evolution and stability. Brief. Bioinformatics. 2004;5:39–55. - PubMed
    1. Berman HM, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed

Publication types

MeSH terms

Substances