Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 15;31(16):2653-9.
doi: 10.1093/bioinformatics/btv202. Epub 2015 Apr 8.

GS-align for glycan structure alignment and similarity measurement

Affiliations

GS-align for glycan structure alignment and similarity measurement

Hui Sun Lee et al. Bioinformatics. .

Abstract

Motivation: Glycans play critical roles in many biological processes, and their structural diversity is key for specific protein-glycan recognition. Comparative structural studies of biological molecules provide useful insight into their biological relationships. However, most computational tools are designed for protein structure, and despite their importance, there is no currently available tool for comparing glycan structures in a sequence order- and size-independent manner.

Results: A novel method, GS-align, is developed for glycan structure alignment and similarity measurement. GS-align generates possible alignments between two glycan structures through iterative maximum clique search and fragment superposition. The optimal alignment is then determined by the maximum structural similarity score, GS-score, which is size-independent. Benchmark tests against the Protein Data Bank (PDB) N-linked glycan library and PDB homologous/non-homologous N-glycoprotein sets indicate that GS-align is a robust computational tool to align glycan structures and quantify their structural similarity. GS-align is also applied to template-based glycan structure prediction and monosaccharide substitution matrix generation to illustrate its utility.

Availability and implementation: http://www.glycanstructure.org/gsalign.

Contact: wonpil@ku.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic illustration of the alignment algorithm in GS-align
Fig. 2.
Fig. 2.
The average raw GS-score (rGS-score) and GS-score of random glycan pairs as a function of glycan length
Fig. 3.
Fig. 3.
Representative examples to illustrate the relationship between GS-score and structural similarity. Each glycan structure (blue) aligned to the target glycan (green) in PDB:1L6X_A is shown with its PDB id_chain id and GS-score
Fig. 4.
Fig. 4.
Cumulative fraction of glycan structure similarity using GS-score for the homologous and non-homologous protein sets. The gray lines in each plot represent individual 35 glycan sequences and thick solid lines the average over all glycan sequences
Fig. 5.
Fig. 5.
Comparison of glycan similarity (GS-score) with glycoprotein similarity (TM-score) through PDB N-glycan library search. (A) TM-score versus GS-score plot. All PDB N-glycans and their parent proteins were structurally compared with the target glycan and its parent glycoprotein (PDB:1L6X_A), respectively. The green dotted line indicates a GS-score (0.69) whose P-value is 1 × 10−3. (B) An example where proteins show distinct folds, but the GS-score between their glycans is high. (C) An example where proteins show similar global folds, but the GS-score between their glycans is low. In these examples, the two pairs of glycans have the identical coverage (0.8)
Fig. 6.
Fig. 6.
An example of template-based glycan structure prediction. (A) Three fragment structures from 1L6X_A glycan, each of which has four residues (stick representation in green), that were individually used as the query structure to search for templates in the PDB library. Three best template glycans (line representation in blue) were identified based on GS-score for each query structure. (B) Structure similarity between the target glycan and a structure assembled using the three-fragment templates in (A). (C) Structural similarity between the target glycan (entire 1L6X_A glycan) and its best template
Fig. 7.
Fig. 7.
An example using GS-align for deriving a monosaccharide substitution matrix. (A) A representative example where two different glycans have similar structure (GS-score = 0.90) but different sequences. GlcNAc: N-acetyl-d-glucosamin, Man: d-mannose, Lyx: d-lyxose. Two unmatched residues (Man versus Lyx) are marked with red asterisks. (B) The percentages of other monosaccharides that can substitute α-d-Mannose in highly similar glycan structure pairs (GS-score ≥ 0.8). For comparison, the percentage of α-d-mannose itself is also included in the table

Similar articles

Cited by

References

    1. Andreeva A., et al. . (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res., 32, D226–D229. - PMC - PubMed
    1. Aoki K.F., et al. . (2005) A score matrix to reveal the hidden links in glycans. Bioinformatics, 21, 1457–1463. - PubMed
    1. Aoki K.F., et al. . (2003) Efficient tree-matching methods for accurate carbohydrate database queries. Genome Inform., 14, 134–143. - PubMed
    1. Apweiler R., et al. . (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim. Biophys. Acta, 1473, 4–8. - PubMed
    1. Baenziger J.U. (1985) The role of glycosylation in protein recognition. Warner-Lambert Parke-Davis Award Lecture. Am. J. Pathol., 121, 382–391. - PMC - PubMed

Publication types