A series of PDB-related databanks for everyday needs

Wouter G Touw; Coos Baakman; Jon Black; Tim A H te Beek; E Krieger; Robbie P Joosten; Gert Vriend

doi:10.1093/nar/gku1028

. 2014 Oct 28;43(Database issue):D364–D368. doi: 10.1093/nar/gku1028

A series of PDB-related databanks for everyday needs

Wouter G Touw ¹, Coos Baakman ¹, Jon Black ¹, Tim A H te Beek ², E Krieger ¹, Robbie P Joosten ^1,³, Gert Vriend ^1,^*

PMCID: PMC4383885 PMID: 25352545

Abstract

We present a series of databanks (http://swift.cmbi.ru.nl/gv/facilities/) that hold information that is computationally derived from Protein Data Bank (PDB) entries and that might augment macromolecular structure studies. These derived databanks run parallel to the PDB, i.e. they have one entry per PDB entry. Several of the well-established databanks such as HSSP, PDBREPORT and PDB_REDO have been updated and/or improved. The software that creates the DSSP databank, for example, has been rewritten to better cope with π-helices. A large number of databanks have been added to aid computational structural biology; some examples are lists of residues that make crystal contacts, lists of contacting residues using a series of contact definitions or lists of residue accessibilities. PDB files are not the optimal presentation of the underlying data for many studies. We therefore made a series of databanks that hold PDB files in an easier to use or more consistent representation. The BDB databank holds X-ray PDB files with consistently represented B-factors. We also added several visualization tools to aid the users of our databanks.

INTRODUCTION

The Protein Data Bank (PDB) is the worldwide repository of macromolecular structures determined experimentally mainly by X-ray crystallography, NMR or electron microscopy (1–4). The more than 100.000 entries in the PDB form a valuable source of information, and PDB entries are used all over the world in a wide variety of research projects in academia and industry alike.

When the PDB was conceived in 1971 (5), the initiators could hardly have imagined that their punchcard based PDB file format (1,6) would survive for more than 40 years. Although the PDB has superseded its archaic punchcard compromise between inclusiveness and human readability by the PDBx/mmCIF file format (3,6), in practice the old format is still used by most software applications. The PDB format is the source of many problems, some of which are addressed by our databanks.

All databanks are available from http://swift.cmbi.ru.nl/gv/facilities/. This site also provides extensive documentation that includes help for downloading individual files or whole databanks.

UPDATE ON EXISTING DATABANKS

Table 1 lists the main databanks with a brief description of their content. In the following sections we will describe the progress on the existing systems since we previously reported on them (7).

Table 1. List of databanks, their content and location.

Existing databanks
DSSP	Secondary structure of proteins http://swift.cmbi.ru.nl/gv/dssp/
HSSP	Multiple sequence alignments of UniProtKB against PDB http://swift.cmbi.ru.nl/gv/hssp/
PDBFINDER and PDBFINDER2	Searchable PDB entry meta-data and derived information http://swift.cmbi.ru.nl/gv/pdbfinder/
PDBREPORT	Lists many types of anomalies and errors in structures http://swift.cmbi.ru.nl/gv/pdbreport/
PDB_REDO	Re-refined and rebuilt crystallographic structure models http://www.cmbi.ru.nl/pdb_redo/
PDB_SELECT	Quality-sorted sequence-unique PDB chains http://swift.cmbi.ru.nl/gv/select/
WHY_NOT	Explains why entries in any databank do not exist http://www.cmbi.ru.nl/WHY_NOT/
New databanks
BDB	PDB entries with a consistent B-factor representation http://www.cmbi.ru.nl/bdb/
WHAT IF Lists	Lists for protein structure bioinformaticians http://swift.cmbi.ru.nl/gv/lists/
YASARA Scenes	YASARA scenes showing protein structure properties http://www.cmbi.ru.nl/pdb-vis/

Open in a new tab

Existing databanks were published earlier (7).

Most of our systems have been prepared for the PDB's transition from the PDB file format to the mmCIF file format. Most databanks are now also available derived from PDB_REDO structure models.

DSSP is the de facto standard for the assignment of secondary structure elements in PDB entries. The DSSP (8) software has been rewritten to better recognize π-helices (8–10). The determination of π-helices still follows the original description by Kabsch and Sander (8), but the assignment of π-helices is now given precedence over the assignment of α-helices, which should prevent underestimating the number of π-helices (9,10).

HSSP (11–15) multiple sequence alignments (MSAs) are now created with an improved version of the original Sander and Schneider (11) algorithm. These files are available in the original HSSP format and in Pfam Stockholm format (16). The Stockholm format is used by applications like HMMER (17) and Jalview (18). Projects like BioJava (19), BioPerl (20) and Biopython (21) provide parsers for these Stockholm-formatted files.

The PDBFINDER and PDBFINDER2 databanks (22) are now created using a slightly modified algorithm that better deals with exceptions in PDB files. The changes to the DSSP and HSSP software have also been taken into account. Furthermore, the two single flat text files are now compiled from separate files for each PDB ID.

New developments in the PDB_REDO decision-making algorithms were described elsewhere (23). Many recent improvements in PDB_REDO focus on enabling user-friendly data-mining and visualization. A list of all significant structural changes like changed rotamers and flipped peptide planes (24) is available in an easy-to-mine format to quickly figure out whether PDB_REDO has changed residues-of-interest in a particular PDB entry. Model validation data such as WHAT_CHECK Z-scores, crystallographic R-factors, per-residue measures of fit to the crystallographic data (real-space R-factors (25), and real-space correlation coefficients (26)), as well as comprehensive descriptions of ligand quality and structural interactions (27) are now also provided. A description of all data from PDB_REDO entries is given in Supplementary Table S1.

PDB_SELECT (28) now also provides quality-sorted sequence-redundant lists. These lists do not include entries deemed unwanted for bioinformatics purposes, e.g. entries that contain too many severe errors, too many incomplete or non-canonical amino acids or homology models.

The WHY_NOT indexing algorithm has been adapted to deal with the many novel databanks.

A NEW MAJOR DATABANK

Macromolecules are not static. The displacement of atoms in crystal structures can be modeled at various levels of detail. B-factors are commonly used to model the displacement of single atoms, while translation, libration and screw-rotation (TLS) parameters model the displacement of groups of atoms. Unfortunately, the meaning of the B-factor values on the ATOM records of PDB files is not always unambiguous. For example, ‘residual’ rather than ‘full’ B-factors have been reported for thousands of PDB structure models for which both B-factors and TLS parameters had been refined. Residual B-factors do not include the contribution of the TLS motion (29). The Databank of PDB files with consistent B-factors (BDB) (30) homogenizes the B-factor representations in PDB files to aid the bioinformatics and protein engineering applications that depend on B-factors (e.g. (31–36)). For every crystallographic PDB entry there is a BDB entry. The files in the BDB are simply identical to those in the PDB if full B-factors have been reported, but they contain full B-factors calculated from the PDB file data if the meta-data in the PDB file suggest that this is necessary.

WHAT GOOD IS BEAUTY, IF IT IS NOT TO BE SEEN?

The main users of PDB files are bioscientists in fields as diverse as drug design, molecular biology or biofuel engineering. These researchers often are not aware of all problems that come with the use of PDB files; see, for example, the B-factor problems that we addressed with the BDB.

A problem that is ubiquitous for all structures solved by X-ray crystallography is the implicit description of symmetry related ions, waters and ligands (the absence of symmetry related macromolecules has been solved, for example, with PISA (37). Figure 1 illustrates this problem.

The problem illustrated in Figure 1 has been addressed in two databanks. One databank holds PDB files that include the symmetry-related waters. A second databank has been compiled from PDB files with a shell of symmetry-related residues.

Solvent exposed amino acid side chains tend to be mobile, and as a consequence are not observable in the electron density map that is at the basis of modeling atomic coordinates in X-ray crystallography. The absence of side chains is unlikely to remain unnoticed, but may cause problems for protein structure software or perturb structural analyses otherwise. We therefore made one databank in which we computationally filled in the missing side chains using the rotamer library that is also at the basis of WHAT IF's homology modeling module (38).

DATABANKS SPECIFICALLY FOR BIOINFORMATICIANS

Any aspiring protein structure bioinformatician will need to write or obtain a PDB file parser before he or she can start working on the intended research project. Writing a parser that can cope with a large enough fraction of all problems in PDB files can be a major practical problem. We made a large number of databanks to overcome this problem in many cases. These databanks include the per-residue molecular and solvent accessible surface area, the secondary structure in four states (helix, strand, turn, loop), the number of crystal contacts, torsion angles and backbone angles. Lists of salt bridges and metal-coordinating residues are also created.

Recently, several groups made breakthroughs in the field of ab initio protein structure prediction (39–41). The idea behind these methods is that correlations between the variability patterns of residue positions i and j in a multiple sequence alignment are indicative for a contact between those residues i and j. We believe that these studies could benefit from a better definition of what constitutes an inter amino acid contact. To support research in this field a large group of databanks has been made in which contacting amino acids are listed. In each databank contacts are defined in a different way (direct atomic contacts; Cα–Cα distances; side chain contacts only; etc.).

NEW VISUALIZATION TOOLS

The CMBI databanks provide a wealth of structure-related information. We aim to provide this information in files that are bioinformatician-friendly. Not all users, however, may feel equally comfortable writing scripts to show the number of crystal contacts per residue in 3D, to create entropy-variability (EV) plots from an HSSP alignment or to visualize the structural changes in a PDB_REDO optimized structure model. For convenience, and simply to speed up protein structure analyses, we created a set of visualization tools.

Optimized PDB_REDO structure models and their corresponding electron density maps are now directly available within the programs COOT (42) and CCP4mg (43), structure models are also directly available in YASARA (44). COOT additionally shows a list of all significant structural changes. A plugin for PyMOL (http://www.pymol.org/) to show structure models and their electron density is available from the PDB_REDO website. The required maps are generated on-the-fly in the CCP4 (45) format and are also supported by many other programs such as Jmol (46).

Combined PDB and BDB B-factor plots on the BDB website allow the user to rapidly see the corrections made to the PDB B-factors.

On our recently developed web tool pdb-vis (http://www.cmbi.ru.nl/pdb-vis/) secondary structure, symmetry contacts and several accessibility representations are visualized in 2D together with the protein sequence. PDBsum (47) provides many complementary pictures. Several types of 3D structure scenes are also available from pdb-vis, such as scenes of residues that make crystal contacts, or close-ups of metals and bound ligands. Scenes provide a convenient way of highlighting a specific structural feature or local region since the structure, visualization style and viewpoint are stored in a scene file. All scenes can always be inspected with the freely available molecular graphics program YASARA_View (44).

It has long been known that EV analysis of MSAs can elucidate the functional role of residues (48,49). EV values can be calculated from the HSSP alignment and they can be interpreted in a 3D context. We developed the visualization tool VASE (Visualization of Alignments, Structure and Entropy) (50) that connects the three components structure, alignment and entropy/variability in a single browser window. Selected residue positions in the HSSP MSA are color-coded in the 3D structure and vice versa (Figure 2). The web interface also shows the EV values (48) for the selected residues in a table or for all residues in an EV plot (Figure 2 inset).

Figure 2. — VASE screenshot. A selected position in the HSSP alignment (red column in the top panel) is shown in the structure (bottom right) and in the calculated entropy and variability table (bottom left). The inset shows an entropy-variability plot.

MATERIALS AND METHODS

DSSP files are produced with the newly written DSSP 2.2.1. HSSP files are produced using HSSP 2.0. PDBFINDER files currently have version 9.0. PDBREPORTs currently are produced with WHAT_CHECK 8.4 (51), but we are planning to release version 11.0 very soon. The PDBREPORT databank will be updated accordingly. The most recent version of PDB_REDO is 5.35. PDB_REDO is under active development and the PDB_REDO databank is continuously being renewed. At the time of writing all files have been created with PDB_REDO version 5.00 or newer which means that all structure models have undergone rebuilding of side chains and flipping of peptide plane orientations when needed (52). BDB files currently are created with version 0.6.5. Most other databanks are produced using the WHAT IF software (53).

AVAILABILITY

All CMBI's macromolecular structure databanks are freely available and can be accessed in many different ways. The single way of accessing PDB_SELECT lists is through the PDB_SELECT pages. WHY_NOT can be queried for single databank or WHY_NOT entries. Reversely, lists of all absent and annotated entries, present entries, obsolete entries, etc. are available from WHY_NOT. The previously described databanks are indexed by our search system MRS (http://mrs.cmbi.ru.nl/) (54), but the ‘Lists’ and scenes databanks are not. MRS also handles REST or SOAP web service requests. All databanks can be retrieved via rsync and ftp. The rsync protocol allows mirroring entire databanks or a subset of the databank, since all databanks are composed of individual files. Detailed instructions are provided at http://swift.cmbi.ru.nl/gv/facilities/.

The DSSP and HSSP web servers have been renewed and are provided at a single location: http://www.cmbi.ru.nl/xssp/. Existing DSSP and HSSP files can be accessed through this website. Secondary structure assignment of uploaded structure models is also possible via this xssp web server, and additionally via the WHAT IF web servers (http://swift.cmbi.ru.nl/) or the WHAT IF web services (55). A sample script explains the use of the xssp REST API. The ‘check model’ WHAT IF web server section constructs WHAT_CHECK reports from uploaded PDB files. PDB_REDO files can be created using the PDB_REDO web server (24). Some users might prefer creation of databank files on a local workstation. The software for creating BDB, DSSP, HSSP, WHAT IF lists, PDBREPORT, PDB_REDO and YASARA scene databanks is freely available.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

Acknowledgments

We thank our colleagues at the CMBI and at Bio-Prodict for many stimulating discussions and help with coding and checking.

FUNDING

NewProt, European Commission, FP7 Programme, KBBE-2011-5 [289350]; Research Programme 11319, STW; VENI, Netherlands Organization for Scientific Research (NWO) [722.011.011 to R.P.J.]. Funding for open access charge: CMBI.

Conflict of interest statement. None declared.

REFERENCES

1.Bernstein F.C., Koetzle T.F., Williams G.J.B., Meyer E.F., Brice M.D., Rodgers J.R., Kennard O., Shimanouchi T., Tasumi M. The protein data bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
2.Berman H., Henrick K., Nakamura H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 2003;10:980. doi: 10.1038/nsb1203-980. [DOI] [PubMed] [Google Scholar]
3.Berman H.M. Establishing the next generation of the protein data. The Winnower. 2014 doi:10.15200/winn.140076.68556. [Google Scholar]
4.Gutmanas A., Alhroub Y., Battle G.M., Berrisford J.M., Bochet E., Conroy M.J., Dana J.M., Fernandez Montecelo M.A., van Ginkel G., Gore S.P., et al. PDBe: Protein Data Bank in Europe. Nucleic Acids Res. 2014;42:D285–D291. doi: 10.1093/nar/gkt1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Protein Data Bank. Protein Data Bank. Nat. New Biol. 1971;233:223–223. [Google Scholar]
6.Berman H.M. The Protein Data Bank: a historical perspective. Acta Crystallogr. A. 2008;64:88–95. doi: 10.1107/S0108767307035623. [DOI] [PubMed] [Google Scholar]
7.Joosten R.P., Beek T.A.H., Krieger E., Hekkelman M.L., Hooft R.W.W., Schneider R., Sander C., Vriend G. A series of PDB related databases for everyday needs. Nucleic Acids Res. 2011;39:D411–D419. doi: 10.1093/nar/gkq1105. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
9.Cooley R.B., Arp D.J., Karplus P.A. Evolutionary origin of a secondary structure: π-helices as cryptic but widespread insertional variations of α-helices that enhance protein functionality. J. Mol. Biol. 2010;404:232–246. doi: 10.1016/j.jmb.2010.09.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Van der Kant R., Vriend G. Alpha-bulges in G protein-coupled receptors. Int. J. Mol. Sci. 2014;15:7841–7864. doi: 10.3390/ijms15057841. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sander C., Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9:56–68. doi: 10.1002/prot.340090107. [DOI] [PubMed] [Google Scholar]
12.Sander C., Schneider R. The HSSP data base of protein structure-sequence alignments. Nucleic Acids Res. 1993;21:3105–3109. doi: 10.1093/nar/21.13.3105. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Schneider R., Sander C. The HSSP database of protein structure-sequence alignments. Nucleic Acids Res. 1996;24:201–205. doi: 10.1093/nar/24.1.201. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Schneider R., de Daruvar A., Sander C. The HSSP database of protein structure-sequence alignments. Nucleic Acids Res. 1997;25:226–230. doi: 10.1093/nar/25.1.226. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Dodge C. The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res. 1998;26:313–315. doi: 10.1093/nar/26.1.313. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J., et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Eddy S.R. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
18.Waterhouse A.M., Procter J.B., Martin D.M.A., Clamp M., Barton G.J. Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–1191. doi: 10.1093/bioinformatics/btp033. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Prlić A., Yates A., Bliven S.E., Rose P.W., Jacobsen J., Troshin P. V, Chapman M., Gao J., Koh C.H., Foisy S., et al. BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics. 2012;28:2693–2695. doi: 10.1093/bioinformatics/bts494. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Stajich J.E., Block D., Boulez K., Brenner S.E., Chervitz S.A., Dagdigian C., Fuellen G., Gilbert J.G.R., Korf I., Lapp H., et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12:1611–1618. doi: 10.1101/gr.361602. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Cock P.J.A., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Hooft R.W.W., Sander C., Scharf M., Vriend G. The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Comput. Appl. Biosci. CABIOS. 1996;12:525–529. doi: 10.1093/bioinformatics/12.6.525. [DOI] [PubMed] [Google Scholar]
23.Joosten R.P., Joosten K., Murshudov G.N., Perrakis A. PDB_REDO: constructive validation, more than just looking for errors. Acta Crystallogr. Sect. D Biol. Crystallogr. 2012;68:484–496. doi: 10.1107/S0907444911054515. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Joosten R.P., Long F., Murshudov G.N., Perrakis A. The PDB_REDO server for macromolecular structure model optimization. IUCrJ. 2014;1:213–220. doi: 10.1107/S2052252514009324. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Jones T.A., Zou J.Y., Cowan S.W., Kjeldgaard M. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. Sect. A Found. Crystallogr. 1991;47:110–119. doi: 10.1107/s0108767390010224. [DOI] [PubMed] [Google Scholar]
26.Brändén C.-I., Jones T.A. Between objectivity and subjectivity. Nature. 1990;343:687–689. [Google Scholar]
27.Cereto-Massagué A., Ojeda M.J., Joosten R.P., Valls C., Mulero M., Salvado M.J., Arola-Arnal A., Arola L., Garcia-Vallvé S., Pujadas G. The good, the bad and the dubious: VHELIBS, a validation helper for ligands and binding sites. J. Cheminform. 2013;5:36. doi: 10.1186/1758-2946-5-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Hooft R.W.W., Sander C., Vriend G. Verification of protein structures: side-chain planarity. J. Appl. Crystallogr. 1996;29:714–716. [Google Scholar]
29.Winn M.D., Isupov M.N., Murshudov G.N. Use of TLS parameters to model anisotropic displacements in macromolecular refinement. Acta Crystallogr. Sect. D Biol. Crystallogr. 2001;57:122–133. doi: 10.1107/s0907444900014736. [DOI] [PubMed] [Google Scholar]
30.Touw W.G., Vriend G. BDB: Databank of PDB files with consistent B-factors. Protein Eng. Des. Sel. 2014;27:457–462. doi: 10.1093/protein/gzu044. [DOI] [PubMed] [Google Scholar]
31.Linding R., Jensen L.J., Diella F., Bork P., Gibson T.J., Russell R.B. Protein disorder prediction. Structure. 2003;11:1453–1459. doi: 10.1016/j.str.2003.10.002. [DOI] [PubMed] [Google Scholar]
32.Neuvirth H., Raz R., Schreiber G. ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. 2004;338:181–199. doi: 10.1016/j.jmb.2004.02.040. [DOI] [PubMed] [Google Scholar]
33.Reetz M.T., Carballeira J.D., Vogel A. Iterative saturation mutagenesis on the basis of B factors as a strategy for increasing protein thermostability. Angew. Chem. Int. Ed. Engl. 2006;45:7745–7751. doi: 10.1002/anie.200602795. [DOI] [PubMed] [Google Scholar]
34.Chung J.-L., Wang W., Bourne P.E. Exploiting sequence and structure homologs to identify protein-protein binding sites. Proteins. 2006;62:630–640. doi: 10.1002/prot.20741. [DOI] [PubMed] [Google Scholar]
35.Schlessinger A., Yachdav G., Rost B. PROFbval: predict flexible and rigid residues in proteins. Bioinformatics. 2006;22:891–893. doi: 10.1093/bioinformatics/btl032. [DOI] [PubMed] [Google Scholar]
36.Craig D.B., Dombkowski A.A. Disulfide by Design 2.0: a web-based tool for disulfide engineering in proteins. BMC Bioinformatics. 2013;14:346. doi: 10.1186/1471-2105-14-346. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Krissinel E., Henrick K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 2007;372:774–797. doi: 10.1016/j.jmb.2007.05.022. [DOI] [PubMed] [Google Scholar]
38.De Filippis V., Sander C., Vriend G. Predicting local structural changes that result from point mutations. Protein Eng. 1994;7:1203–1208. doi: 10.1093/protein/7.10.1203. [DOI] [PubMed] [Google Scholar]
39.Hopf T.A., Colwell L.J., Sheridan R., Rost B., Sander C., Marks D.S. Three-dimensional structures of membrane proteins from genomic sequencing. Cell. 2012;149:1607–1621. doi: 10.1016/j.cell.2012.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Jones D.T., Buchan D.W.A., Cozzetto D., Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28:184–190. doi: 10.1093/bioinformatics/btr638. [DOI] [PubMed] [Google Scholar]
41.Marks D.S., Hopf T.A., Sander C. Protein structure prediction from sequence variation. Nat. Biotechnol. 2012;30:1072–1080. doi: 10.1038/nbt.2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Emsley P., Lohkamp B., Scott W.G., Cowtan K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 2010;66:486–501. doi: 10.1107/S0907444910007493. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.McNicholas S., Potterton E., Wilson K.S., Noble M.E.M. Presenting your structures: the CCP4mg molecular-graphics software. Acta Crystallogr. D Biol. Crystallogr. 2011;67:386–394. doi: 10.1107/S0907444911007281. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Krieger E., Vriend G. YASARA View-molecular graphics for all devices-from smartphones to workstations. Bioinformatics. 2014;30:2981–2982. doi: 10.1093/bioinformatics/btu426. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Winn M.D., Ballard C.C., Cowtan K.D., Dodson E.J., Emsley P., Evans P.R., Keegan R.M., Krissinel E.B., Leslie A.G.W., McCoy A., et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D Biol. Crystallogr. 2011;67:235–242. doi: 10.1107/S0907444910045749. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Hanson R.M. Jmol – a paradigm shift in crystallographic visualization. J. Appl. Crystallogr. 2010;43:1250–1260. [Google Scholar]
47.De Beer T.A.P., Berka K., Thornton J.M., Laskowski R.A. PDBsum additions. Nucleic Acids Res. 2014;42:D292–D296. doi: 10.1093/nar/gkt940. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Oliveira L., Paiva P.B., Paiva A.C.M., Vriend G. Identification of functionally conserved residues with the use of entropy-variability plots. Proteins. 2003;52:544–552. doi: 10.1002/prot.10490. [DOI] [PubMed] [Google Scholar]
49.Folkertsma S., van Noort P., Van Durme J., Joosten H.-J., Bettler E., Fleuren W., Oliveira L., Horn F., de Vlieg J., Vriend G. A family-based approach reveals the function of residues in the nuclear receptor ligand-binding domain. J. Mol. Biol. 2004;341:321–335. doi: 10.1016/j.jmb.2004.05.075. [DOI] [PubMed] [Google Scholar]
50.Vroling B., Sanders M., Baakman C., Borrmann A., Verhoeven S., Klomp J., Oliveira L., de Vlieg J., Vriend G. GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Res. 2011;39:D309–D319. doi: 10.1093/nar/gkq1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Hooft R.W.W., Vriend G., Sander C., Abola E.E. Errors in protein structures. Nature. 1996;381:272. doi: 10.1038/381272a0. [DOI] [PubMed] [Google Scholar]
52.Joosten R.P., Joosten K., Cohen S.X., Vriend G., Perrakis A. Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank. Bioinformatics. 2011;27:3392–3398. doi: 10.1093/bioinformatics/btr590. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Vriend G. WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 1990;8:52–56. doi: 10.1016/0263-7855(90)80070-v. [DOI] [PubMed] [Google Scholar]
54.Hekkelman M.L., Vriend G. MRS: a fast and compact retrieval system for biological data. Nucleic Acids Res. 2005;33:W766–W769. doi: 10.1093/nar/gki422. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Hekkelman M.L., te Beek T.A.H., Pettifer S.R., Thorne D., Attwood T.K., Vriend G. WIWS: a protein structure bioinformatics Web service collection. Nucleic Acids Res. 2010;38:W719–W723. doi: 10.1093/nar/gkq453. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] 1.Bernstein F.C., Koetzle T.F., Williams G.J.B., Meyer E.F., Brice M.D., Rodgers J.R., Kennard O., Shimanouchi T., Tasumi M. The protein data bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 1977;112:535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]

[B2] 2.Berman H., Henrick K., Nakamura H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 2003;10:980. doi: 10.1038/nsb1203-980. [DOI] [PubMed] [Google Scholar]

[B3] 3.Berman H.M. Establishing the next generation of the protein data. The Winnower. 2014 doi:10.15200/winn.140076.68556. [Google Scholar]

[B4] 4.Gutmanas A., Alhroub Y., Battle G.M., Berrisford J.M., Bochet E., Conroy M.J., Dana J.M., Fernandez Montecelo M.A., van Ginkel G., Gore S.P., et al. PDBe: Protein Data Bank in Europe. Nucleic Acids Res. 2014;42:D285–D291. doi: 10.1093/nar/gkt1180. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Protein Data Bank. Protein Data Bank. Nat. New Biol. 1971;233:223–223. [Google Scholar]

[B6] 6.Berman H.M. The Protein Data Bank: a historical perspective. Acta Crystallogr. A. 2008;64:88–95. doi: 10.1107/S0108767307035623. [DOI] [PubMed] [Google Scholar]

[B7] 7.Joosten R.P., Beek T.A.H., Krieger E., Hekkelman M.L., Hooft R.W.W., Schneider R., Sander C., Vriend G. A series of PDB related databases for everyday needs. Nucleic Acids Res. 2011;39:D411–D419. doi: 10.1093/nar/gkq1105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]

[B9] 9.Cooley R.B., Arp D.J., Karplus P.A. Evolutionary origin of a secondary structure: π-helices as cryptic but widespread insertional variations of α-helices that enhance protein functionality. J. Mol. Biol. 2010;404:232–246. doi: 10.1016/j.jmb.2010.09.034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Van der Kant R., Vriend G. Alpha-bulges in G protein-coupled receptors. Int. J. Mol. Sci. 2014;15:7841–7864. doi: 10.3390/ijms15057841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Sander C., Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9:56–68. doi: 10.1002/prot.340090107. [DOI] [PubMed] [Google Scholar]

[B12] 12.Sander C., Schneider R. The HSSP data base of protein structure-sequence alignments. Nucleic Acids Res. 1993;21:3105–3109. doi: 10.1093/nar/21.13.3105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Schneider R., Sander C. The HSSP database of protein structure-sequence alignments. Nucleic Acids Res. 1996;24:201–205. doi: 10.1093/nar/24.1.201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Schneider R., de Daruvar A., Sander C. The HSSP database of protein structure-sequence alignments. Nucleic Acids Res. 1997;25:226–230. doi: 10.1093/nar/25.1.226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Dodge C. The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res. 1998;26:313–315. doi: 10.1093/nar/26.1.313. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J., et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Eddy S.R. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]

[B18] 18.Waterhouse A.M., Procter J.B., Martin D.M.A., Clamp M., Barton G.J. Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–1191. doi: 10.1093/bioinformatics/btp033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Prlić A., Yates A., Bliven S.E., Rose P.W., Jacobsen J., Troshin P. V, Chapman M., Gao J., Koh C.H., Foisy S., et al. BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics. 2012;28:2693–2695. doi: 10.1093/bioinformatics/bts494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Stajich J.E., Block D., Boulez K., Brenner S.E., Chervitz S.A., Dagdigian C., Fuellen G., Gilbert J.G.R., Korf I., Lapp H., et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12:1611–1618. doi: 10.1101/gr.361602. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Cock P.J.A., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Hooft R.W.W., Sander C., Scharf M., Vriend G. The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Comput. Appl. Biosci. CABIOS. 1996;12:525–529. doi: 10.1093/bioinformatics/12.6.525. [DOI] [PubMed] [Google Scholar]

[B23] 23.Joosten R.P., Joosten K., Murshudov G.N., Perrakis A. PDB_REDO: constructive validation, more than just looking for errors. Acta Crystallogr. Sect. D Biol. Crystallogr. 2012;68:484–496. doi: 10.1107/S0907444911054515. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24.Joosten R.P., Long F., Murshudov G.N., Perrakis A. The PDB_REDO server for macromolecular structure model optimization. IUCrJ. 2014;1:213–220. doi: 10.1107/S2052252514009324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Jones T.A., Zou J.Y., Cowan S.W., Kjeldgaard M. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. Sect. A Found. Crystallogr. 1991;47:110–119. doi: 10.1107/s0108767390010224. [DOI] [PubMed] [Google Scholar]

[B26] 26.Brändén C.-I., Jones T.A. Between objectivity and subjectivity. Nature. 1990;343:687–689. [Google Scholar]

[B27] 27.Cereto-Massagué A., Ojeda M.J., Joosten R.P., Valls C., Mulero M., Salvado M.J., Arola-Arnal A., Arola L., Garcia-Vallvé S., Pujadas G. The good, the bad and the dubious: VHELIBS, a validation helper for ligands and binding sites. J. Cheminform. 2013;5:36. doi: 10.1186/1758-2946-5-36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28.Hooft R.W.W., Sander C., Vriend G. Verification of protein structures: side-chain planarity. J. Appl. Crystallogr. 1996;29:714–716. [Google Scholar]

[B29] 29.Winn M.D., Isupov M.N., Murshudov G.N. Use of TLS parameters to model anisotropic displacements in macromolecular refinement. Acta Crystallogr. Sect. D Biol. Crystallogr. 2001;57:122–133. doi: 10.1107/s0907444900014736. [DOI] [PubMed] [Google Scholar]

[B30] 30.Touw W.G., Vriend G. BDB: Databank of PDB files with consistent B-factors. Protein Eng. Des. Sel. 2014;27:457–462. doi: 10.1093/protein/gzu044. [DOI] [PubMed] [Google Scholar]

[B31] 31.Linding R., Jensen L.J., Diella F., Bork P., Gibson T.J., Russell R.B. Protein disorder prediction. Structure. 2003;11:1453–1459. doi: 10.1016/j.str.2003.10.002. [DOI] [PubMed] [Google Scholar]

[B32] 32.Neuvirth H., Raz R., Schreiber G. ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. 2004;338:181–199. doi: 10.1016/j.jmb.2004.02.040. [DOI] [PubMed] [Google Scholar]

[B33] 33.Reetz M.T., Carballeira J.D., Vogel A. Iterative saturation mutagenesis on the basis of B factors as a strategy for increasing protein thermostability. Angew. Chem. Int. Ed. Engl. 2006;45:7745–7751. doi: 10.1002/anie.200602795. [DOI] [PubMed] [Google Scholar]

[B34] 34.Chung J.-L., Wang W., Bourne P.E. Exploiting sequence and structure homologs to identify protein-protein binding sites. Proteins. 2006;62:630–640. doi: 10.1002/prot.20741. [DOI] [PubMed] [Google Scholar]

[B35] 35.Schlessinger A., Yachdav G., Rost B. PROFbval: predict flexible and rigid residues in proteins. Bioinformatics. 2006;22:891–893. doi: 10.1093/bioinformatics/btl032. [DOI] [PubMed] [Google Scholar]

[B36] 36.Craig D.B., Dombkowski A.A. Disulfide by Design 2.0: a web-based tool for disulfide engineering in proteins. BMC Bioinformatics. 2013;14:346. doi: 10.1186/1471-2105-14-346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] 37.Krissinel E., Henrick K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 2007;372:774–797. doi: 10.1016/j.jmb.2007.05.022. [DOI] [PubMed] [Google Scholar]

[B38] 38.De Filippis V., Sander C., Vriend G. Predicting local structural changes that result from point mutations. Protein Eng. 1994;7:1203–1208. doi: 10.1093/protein/7.10.1203. [DOI] [PubMed] [Google Scholar]

[B39] 39.Hopf T.A., Colwell L.J., Sheridan R., Rost B., Sander C., Marks D.S. Three-dimensional structures of membrane proteins from genomic sequencing. Cell. 2012;149:1607–1621. doi: 10.1016/j.cell.2012.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] 40.Jones D.T., Buchan D.W.A., Cozzetto D., Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28:184–190. doi: 10.1093/bioinformatics/btr638. [DOI] [PubMed] [Google Scholar]

[B41] 41.Marks D.S., Hopf T.A., Sander C. Protein structure prediction from sequence variation. Nat. Biotechnol. 2012;30:1072–1080. doi: 10.1038/nbt.2419. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B42] 42.Emsley P., Lohkamp B., Scott W.G., Cowtan K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 2010;66:486–501. doi: 10.1107/S0907444910007493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] 43.McNicholas S., Potterton E., Wilson K.S., Noble M.E.M. Presenting your structures: the CCP4mg molecular-graphics software. Acta Crystallogr. D Biol. Crystallogr. 2011;67:386–394. doi: 10.1107/S0907444911007281. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B44] 44.Krieger E., Vriend G. YASARA View-molecular graphics for all devices-from smartphones to workstations. Bioinformatics. 2014;30:2981–2982. doi: 10.1093/bioinformatics/btu426. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B45] 45.Winn M.D., Ballard C.C., Cowtan K.D., Dodson E.J., Emsley P., Evans P.R., Keegan R.M., Krissinel E.B., Leslie A.G.W., McCoy A., et al. Overview of the CCP4 suite and current developments. Acta Crystallogr. D Biol. Crystallogr. 2011;67:235–242. doi: 10.1107/S0907444910045749. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B46] 46.Hanson R.M. Jmol – a paradigm shift in crystallographic visualization. J. Appl. Crystallogr. 2010;43:1250–1260. [Google Scholar]

[B47] 47.De Beer T.A.P., Berka K., Thornton J.M., Laskowski R.A. PDBsum additions. Nucleic Acids Res. 2014;42:D292–D296. doi: 10.1093/nar/gkt940. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] 48.Oliveira L., Paiva P.B., Paiva A.C.M., Vriend G. Identification of functionally conserved residues with the use of entropy-variability plots. Proteins. 2003;52:544–552. doi: 10.1002/prot.10490. [DOI] [PubMed] [Google Scholar]

[B49] 49.Folkertsma S., van Noort P., Van Durme J., Joosten H.-J., Bettler E., Fleuren W., Oliveira L., Horn F., de Vlieg J., Vriend G. A family-based approach reveals the function of residues in the nuclear receptor ligand-binding domain. J. Mol. Biol. 2004;341:321–335. doi: 10.1016/j.jmb.2004.05.075. [DOI] [PubMed] [Google Scholar]

[B50] 50.Vroling B., Sanders M., Baakman C., Borrmann A., Verhoeven S., Klomp J., Oliveira L., de Vlieg J., Vriend G. GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Res. 2011;39:D309–D319. doi: 10.1093/nar/gkq1009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 51.Hooft R.W.W., Vriend G., Sander C., Abola E.E. Errors in protein structures. Nature. 1996;381:272. doi: 10.1038/381272a0. [DOI] [PubMed] [Google Scholar]

[B52] 52.Joosten R.P., Joosten K., Cohen S.X., Vriend G., Perrakis A. Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank. Bioinformatics. 2011;27:3392–3398. doi: 10.1093/bioinformatics/btr590. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B53] 53.Vriend G. WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 1990;8:52–56. doi: 10.1016/0263-7855(90)80070-v. [DOI] [PubMed] [Google Scholar]

[B54] 54.Hekkelman M.L., Vriend G. MRS: a fast and compact retrieval system for biological data. Nucleic Acids Res. 2005;33:W766–W769. doi: 10.1093/nar/gki422. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B55] 55.Hekkelman M.L., te Beek T.A.H., Pettifer S.R., Thorne D., Attwood T.K., Vriend G. WIWS: a protein structure bioinformatics Web service collection. Nucleic Acids Res. 2010;38:W719–W723. doi: 10.1093/nar/gkq453. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A series of PDB-related databanks for everyday needs

Wouter G Touw

Coos Baakman

Jon Black

Tim A H te Beek

E Krieger

Robbie P Joosten

Gert Vriend

Abstract

INTRODUCTION

UPDATE ON EXISTING DATABANKS

Table 1. List of databanks, their content and location.

A NEW MAJOR DATABANK

WHAT GOOD IS BEAUTY, IF IT IS NOT TO BE SEEN?

Figure 1.

DATABANKS SPECIFICALLY FOR BIOINFORMATICIANS

NEW VISUALIZATION TOOLS

Figure 2.

MATERIALS AND METHODS

AVAILABILITY

SUPPLEMENTARY DATA

Acknowledgments

FUNDING

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A series of PDB-related databanks for everyday needs

Wouter G Touw

Coos Baakman

Jon Black

Tim A H te Beek

E Krieger

Robbie P Joosten

Gert Vriend

Abstract

INTRODUCTION

UPDATE ON EXISTING DATABANKS

Table 1. List of databanks, their content and location.

A NEW MAJOR DATABANK

WHAT GOOD IS BEAUTY, IF IT IS NOT TO BE SEEN?

Figure 1.

DATABANKS SPECIFICALLY FOR BIOINFORMATICIANS

NEW VISUALIZATION TOOLS

Figure 2.

MATERIALS AND METHODS

AVAILABILITY

SUPPLEMENTARY DATA

Acknowledgments

FUNDING

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases