Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 28;11(7):e0159627.
doi: 10.1371/journal.pone.0159627. eCollection 2016.

Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

Affiliations

Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

Atul Kumar Upadhyay et al. PLoS One. .

Abstract

3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Cartoon representation of 3D-domain swapping.
The region which connects the swapped domain with non-swapped domain part of the protein is known as “hinge region” and is marked in dark red color. There is a newly formed interface between non-domain swapped regions of the two monomeric units. The domain swapped interface is present both in monomer and the domain swapped molecule.
Fig 2
Fig 2. Workflow of 3D-domain swap prediction and analysis of aggregation-related sequences from the human genome.
Out of 136 aggregation-related sequences, 99 were predicted as to be involved in domain swapping and its distribution in different Pfam domain families was plotted in the pie chart. All the positively predicted sequences were searched for their structural homologues.
Fig 3
Fig 3. Function annotations of domain-swapped predicted sequences of human genome at three different levels.
(A) Different Pfam protein families having maximum number of domain swapped predicted sequences from human genome. (B) Maximum number of protein sequences present in these biological pathways. (C) Distribution of these sequences in different protein families.
Fig 4
Fig 4. Preferred Gene Ontology (GO) terms in positively predicted sequences from human genome.
Human genome is used as reference point and WEGO plotting tools is used. List of the GO terms as cellular component and biological functions, corresponding to X-axis labels, are provided in S1 File.
Fig 5
Fig 5. Case study on three different proteins of known structures.
The blue circle shows that these are experimentally known hinge regions (shown in red in left) that agree with our predictions. (A) Ribonuclease, Seminal (PDB code: 11BA), (B) Promyelocytic leukemia Zinc finger protein PLZF (PDB code: 1BUO) and (C) SH3 Domain (PDB code: 1AOJ). Complete list is provided in S3 Table.
Fig 6
Fig 6. Workflow to generate negative dataset from monomeric structures of protein database (PDB).
BRP approach was used to find the sequences form Pfam families which do not have known examples of domain swapping (please see Methods for details). DIAL was used for prediction of domain swapping in the given sequences.
Fig 7
Fig 7. Overall workflow of the method used in this study and creation of positive and negative datasets.
Feature selection by WEKA, and prediction model creation by Random Forest and Support Vector Machine. Genome-wide association study of sequences predicted to undergo domain swapping in the human genome.

Similar articles

Cited by

References

    1. Crestfield AM, Stein WH, Moor S. On the aggregation of bovine pancreatic ribonuclease. Arch Biochem Biophys. 1962. September;Suppl 1:217–22. - PubMed
    1. Bennett MJ, Choe S, Eisenberg D. Domain swapping: entangling alliances between proteins. Proc Natl Acad Sci U S A. 1994. p. 3127–31. - PMC - PubMed
    1. Ogihara NL, Ghirlanda G, Bryson JW, Gingery M, Degrado WF, Eisenberg D. Design of three-dimensional domain-swapped dimers and fibrous oligomers. 2000; - PMC - PubMed
    1. Gordon-Smith DJ, Carbajo RJ, Stott K, Neuhaus D. Solution studies of chymotrypsin inhibitor-2 glutamine insertion mutants show no interglutamine interactions. Biochem Biophys Res Commun. 2001. January 26;280(3):855–60. - PubMed
    1. Liu Y, Eisenberg D. 3D domain swapping: as domains continue to swap. Protein Sci. 2002. p. 1285–99. - PMC - PubMed

Grants and funding

This work was supported by NCBS (National Centre for Biological Sciences), TIFR (Tata Institute of Fundamental Research) fellowship, both to AKU. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources