Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 May 11;32(8):2566-77.
doi: 10.1093/nar/gkh580. Print 2004.

Predicting transmembrane beta-barrels in proteomes

Affiliations

Predicting transmembrane beta-barrels in proteomes

Henry R Bigelow et al. Nucleic Acids Res. .

Abstract

Very few methods address the problem of predicting beta-barrel membrane proteins directly from sequence. One reason is that only very few high-resolution structures for transmembrane beta-barrel (TMB) proteins have been determined thus far. Here we introduced the design, statistics and results of a novel profile-based hidden Markov model for the prediction and discrimination of TMBs. The method carefully attempts to avoid over-fitting the sparse experimental data. While our model training and scoring procedures were very similar to a recently published work, the architecture and structure-based labelling were significantly different. In particular, we introduced a new definition of beta- hairpin motifs, explicit state modelling of transmembrane strands, and a log-odds whole-protein discrimination score. The resulting method reached an overall four-state (up-, down-strand, periplasmic-, outer-loop) accuracy as high as 86%. Furthermore, accurately discriminated TMB from non-TMB proteins (45% coverage at 100% accuracy). This high precision enabled the application to 72 entirely sequenced Gram-negative bacteria. We found over 164 previously uncharacterized TMB proteins at high confidence. Database searches did not implicate any of these proteins with membranes. We challenge that the vast majority of our 164 predictions will eventually be verified experimentally. All proteome predictions and the PROFtmb prediction method are available at http://www.rostlab.org/ services/PROFtmb/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Model architecture and structure-based labelling. Dashed rectangles/arrows are shorthand for showing an arrow connection for each node in the rectangle. Strands are shown to depict the alternating lipid/water environments of the residues. State labels indicate two things: (i) the sequence label for which the state is valid and (ii) the set of emission parameters. Note that states sharing a label also share the same set of emission parameters. The structure-based labelling is illustrated for R.blastica porin [PDB code 1prn (49)].
Figure 2
Figure 2
Discrimination between TMB and non-TMB proteins. The ROC curves truncated at a value with false positives = 2* true positives (Materials and Methods) suggested a slight improvement of our model (dark lines with circles) over the previous work by Martelli et al. (36) (grey lines with crosses). Note that the thin lines indicate tests on the data set previously used (SetROCcomp), while the thick line marks our much larger set (SetROC). The larger set suggested lower performance (open versus filled circles). Standard deviations of ROCn scores (Materials and Methods) reveal the amount of noise.
Figure 3
Figure 3
Threshold for accurate discrimination. Higher whole-protein discrimination scores (equation 6) yielded higher accuracy (correctly predicted TMBs/predicted TMBs) in discriminating between TMBs and non-TMBs (black line with filled circles). The flipside of this was low coverage (correctly predicted TMBs/observed TMBs) at high accuracy (dotted grey lines mark thresholds used in Fig. 4). Nevertheless, this analysis performed on the largest possible sequence-unique data set (SetROC) suggested an impressive performance: 100% accuracy at levels of 45% coverage. (Note that the density of proteins—indicated by symbols—was much higher toward lower whole protein scores since most proteins in SetROC did not have beta-barrels.)
Figure 4
Figure 4
Transmembrane barrels predicted in entire proteomes. For each proteome, we reported the numbers of proteins in each set (or intersection of sets). Sets IOM, IOM and their homologues (IOM_homo), outer membrane (OM) and OM_homo are disjoint (Materials and Methods), while the set ‘PROFtmb’ denotes all proteins achieving a whole protein score above 8 (equation 6), and is not disjoint with the four sets above. Thus, categories in this graph are named according to which sets the proteins belong. For example, ‘PROFtmb 8→15’ denotes all previously un-annotated proteins achieving a score between 8 and 15, and similarly for ‘PROFtmb 15→20’ and ‘PROFtmb >20’. ‘OM + PROFtmb’ denotes all proteins annotated as outer membrane which also achieve a PROFtmb score above 8. ‘IOM’ denotes all proteins in IOM but with PROFtmb scores <8 (i.e. not in set PROFtmb). Note that all findings in typical Gram-positive bacteria constitute false positives. Categories IOM_homo and OM_homo without PROFtmb predictions are not reported, since many proteins in these sets are likely not generic TMBs. The following proteomes had neither known IOMs nor yielded any PROFtmb predictions: Gram-negative: Archaeoglobus fulgidus, Blochmannia floridanus, Buchnera aphidicola_Sg, Buchnera sp., Halobacterium sp., Leptospira intrerrogans, Mesorhizobium loti, Methanobacterium thermoautotrophicum, Methanococcus jannaschii, Methanosarcina mazei, Nostoc sp., Pirellula sp., Pyrococcus abyssi, Pyrococcus furiosus, Pyrococcus horikoshii, Sinorhizobium meliloti, Sulfolobus solfataricus, Sulfolobus tokodaii, Thermoplasma volcanium, Thermosynechococcus elongatus, Ureaplasma urealyticum, Wigglesworthia brevipalpis; atypical Gram-positive: Mycobacterium bovis, Mycobacterium leprae; typical Gram-positive: Bacillus subtilis, Clostridium perfringens, Enterococcus faecalis, Listeria innocua, Mycoplasma gallisepticum, Mycoplasma genitalium, Mycoplasma penetrans, Mycoplasma pneumoniae, Mycoplasma pulmonis, Oceanobacillus iheyensis, Staphylococcus aureus, Streptococcus pyogenes.

Similar articles

Cited by

References

    1. Schulz G.E. (2000) beta-Barrel membrane proteins. Curr. Opin. Struct. Biol., 10, 443–447. - PubMed
    1. Pautsch A. and Schulz,G.E. (2000) High-resolution structure of the OmpA membrane domain. J. Mol. Biol., 298, 273–282. - PubMed
    1. Forst D., Welte,W., Wacker,T. and Diederichs,K. (1998) Structure of the sucrose-specific porin ScrY from Salmonella typhimurium and its complex with sucrose. Nature Struct. Biol., 5, 37–46. - PubMed
    1. Wang Y.F., Dutzler,R., Rizkallah,P.J., Rosenbusch,J.P. and Schirmer,T. (1997) Channel specificity: structural basis for sugar discrimination and differential flux rates in maltoporin. J. Mol. Biol., 272, 56–63. - PubMed
    1. Koebnik R., Locher,K.P. and Van Gelder,P. (2000) Structure and function of bacterial outer membrane proteins: barrels in a nutshell. Mol. Microbiol., 37, 239–253. - PubMed

Publication types