The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data

Different types of phenotypic abnormalities covered by the HPO

‘Class’ of phenotype	HPO example
Morphological abnormality	Arachnodactyly (HP:0001166)
Abnormal process (organ)	Epistaxis (HP:0000421)
Abnormal process (cellular)	Abnormality of Krebs cycle metabolism (HP:0000816)
Abnormal laboratory finding	Glycosuria (HP:0003076)
Electrophysiological abnormality	Hypsarrhythmia (HP:0002521)
Abnormality by medical imaging	Choroid plexus cyst (HP:0002190)
Behavioral abnormality	Self-mutilation (HP:0000742)

‘Class’ of phenotype	HPO example
Morphological abnormality	Arachnodactyly (HP:0001166)
Abnormal process (organ)	Epistaxis (HP:0000421)
Abnormal process (cellular)	Abnormality of Krebs cycle metabolism (HP:0000816)
Abnormal laboratory finding	Glycosuria (HP:0003076)
Electrophysiological abnormality	Hypsarrhythmia (HP:0002521)
Abnormality by medical imaging	Choroid plexus cyst (HP:0002190)
Behavioral abnormality	Self-mutilation (HP:0000742)

Table 1.

Open in new tab Download slide

Different types of phenotypic abnormalities covered by the HPO

‘Class’ of phenotype	HPO example
Morphological abnormality	Arachnodactyly (HP:0001166)
Abnormal process (organ)	Epistaxis (HP:0000421)
Abnormal process (cellular)	Abnormality of Krebs cycle metabolism (HP:0000816)
Abnormal laboratory finding	Glycosuria (HP:0003076)
Electrophysiological abnormality	Hypsarrhythmia (HP:0002521)
Abnormality by medical imaging	Choroid plexus cyst (HP:0002190)
Behavioral abnormality	Self-mutilation (HP:0000742)

‘Class’ of phenotype	HPO example
Morphological abnormality	Arachnodactyly (HP:0001166)
Abnormal process (organ)	Epistaxis (HP:0000421)
Abnormal process (cellular)	Abnormality of Krebs cycle metabolism (HP:0000816)
Abnormal laboratory finding	Glycosuria (HP:0003076)
Electrophysiological abnormality	Hypsarrhythmia (HP:0002521)
Abnormality by medical imaging	Choroid plexus cyst (HP:0002190)
Behavioral abnormality	Self-mutilation (HP:0000742)

Each class of the HPO has a unique and stable identifier (e.g. HP:0002145), a label and a list of synonyms. Most (6603, 65%) of the classes are accompanied by a detailed textual definition created by clinical experts (Figure 2).

Figure 2.

Statistics of the data from the HPO project from January 2009 to August 2013. Ontology statistics shows quantities related to the file hp.obo. The annotation statistics clearly demonstrates the inclusion of Orphanet data in October 2012.

Additionally, HPO classes now contain one or more references to other resources to promote interoperability among different biomedical research areas. As such, 39% (3956) of the HPO terms contain cross-references, with 98% of the references pointing to Unified Medical Language System and Medical Subject Headings, references that are especially helpful for linking to resources such as the Disease Ontology (8). Other cross-references include the International Classification of Diseases 10^th revision and the European Paediatric Cardiac Coding list. Furthermore, flat files are made available that map HPO terms to other phenotype vocabularies such as Orphanet’s Signs and Symptoms (see Section HPO resources and workflow).

To achieve semantic interoperability with other ontologies from the OBO Foundry (7), the HPO project began in 2009 to create logical definitions for each HPO class. At the time of this writing, we have created these definitions for 46% (4591) of all HPO classes. These logical axioms define the phenotypic abnormalities based on classes from other OBO Foundry ontologies (e.g. anatomy, Gene Ontology process or cell type). They are formal descriptions, that are machine processable and usable for automated logical inference and reasoning (9,10). For example, we have created the following logical definition of the HPO term Hypoglycemia (shown in Manchester syntax):

Class: Hypoglycemia
EquivalentTo:
‘decreased concentration’
and towards some‘glucose’
and inheres_in some‘portion of blood’
and qualifier some‘abnormal’

Here, term identifiers are skipped and only term labels are shown for the purposes of readability. In this example, the class Hypoglycemia is defined as being equivalent to the intersection of all classes of things that are ‘A concentration which is lower relative to the normal’ (decreased concentration from PATO); ‘deviate from the normal or average’ (abnormal from PATO), with respect to (towards) glucose and inhering in ‘blood’ [using the term portion of blood from the Foundational Model of Anatomy (11)]. Defining ontology terms in this way assists in automating ontology construction, and provides a tool for integrative computational analysis of human and model organism phenotypes against the background of the knowledge incorporated in ontologies such as Gene Ontology, Foundational Model of Anatomy and Chemical entities of biological interest (ChEBI) (12–15).

PHENOTYPE ANNOTATION DATA

We provide a large set of phenotype annotations, i.e. statements that link a particular term from the HPO to specific diseases or genes. These annotations are made for the most specific term of the HPO, as all of the ancestor terms are implicitly annotated as well.

At the time of this writing, we provide 110 301 annotations to 7354 diseases listed in the Online Mendelian Inheritance in Man [OMIM, (16)] database, Orphanet (17) and DECIPHER (18). On average, each disease entry has 15 HPO annotations. For Orphanet entries that are exactly mapped to one OMIM entry, we merge the entries and record the provenance of the annotations.

The annotations of OMIM entries are a mixture of manual annotations performed by the HPO team and automated matching of the OMIM Clinical Synopsis to HPO term labels. The substantial increase in annotation data during since 2008 is shown in Figure 2.

Each annotation may have several meta-attributes such as the age of onset, the frequency or a modifier. At the moment, 46 149 annotations have information on the frequency with which individuals with a given disease have a certain phenotypic feature. For instance, 9 of 43 persons with the disease sialidosis type II have cherry red spot of the macula (HP:0010729) (19). At the moment, the majority of frequency annotations are derived from Orphanet, but a growing number is based on the manual annotation efforts by the HPO team. Furthermore, we provide a set of 303 negative annotations (NOT-modifier), for which patients with this disease are known not to have the clinical feature in question. The frequency and negation information may be important for the differential diagnosis (20). For 361 annotations, details on the onset are provided. Note that the onset-information may apply to a disease (e.g. Marfan syndrome has congenital onset) or to a single phenotype annotation (e.g. Kyphosis in Hurler syndrome (OMIM:607014) has the meta-annotation childhood onset).

Ontologies such as the HPO are not designed to capture quantitative information such as a blood glucose level of 146 mg/dl or an adult body height of 147 cm. Instead, HPO terms often express qualitative information about an excess or a reduction in quantity of the entity in question (i.e., Hypoglycemia and Tall stature). For some clinical manifestations, however, it has been found to be clinically useful to divide an entity into two or more categories. For instance, the degree of intellectual disability is often reported as one of the four categories Mild, Moderate, Severe and Profound. In these cases, the HPO aims to follow common clinical usage and provide corresponding terms defined according to clinical norms. Additionally, modifiers such as episodic or recurrent are possible. A summary of meta-annotations and their definitions can be found in Table 2.

Table 2.

Meta-information for HPO phenotype annotations

Meta-attribute	Possible values (explanation in brackets)
Qualifier/Modifier	not, mild (±2–3 SD from mean), moderate (±3–4 SD from mean), severe (±4–5 SD from mean), profound (±5SD and greater from mean), secondary, chronic, (non)progressive, episodic, recurrent, bilateral, unilateral, distal, proximal, refractory and generalized
Evidence Code	ITM (inferred by text mining), IEA (inferred from electronical annotation), PCS (published clinical study), ICE (individual clinical experience), TAS (traceable author statement)
Onset modifier	Any term from HPO-subontology Age of onset
Frequency modifier	percentage value (e.g. 25%), n of m (e.g. 3/10 patients), very rare, rare, occasional, frequent, typical, variable, common, hallmark and obligate

Meta-attribute	Possible values (explanation in brackets)
Qualifier/Modifier	not, mild (±2–3 SD from mean), moderate (±3–4 SD from mean), severe (±4–5 SD from mean), profound (±5SD and greater from mean), secondary, chronic, (non)progressive, episodic, recurrent, bilateral, unilateral, distal, proximal, refractory and generalized
Evidence Code	ITM (inferred by text mining), IEA (inferred from electronical annotation), PCS (published clinical study), ICE (individual clinical experience), TAS (traceable author statement)
Onset modifier	Any term from HPO-subontology Age of onset
Frequency modifier	percentage value (e.g. 25%), n of m (e.g. 3/10 patients), very rare, rare, occasional, frequent, typical, variable, common, hallmark and obligate

The meaning/definition of the values is shown in brackets. (SD = standard deviation).

Table 2.

Meta-information for HPO phenotype annotations

Meta-attribute	Possible values (explanation in brackets)
Qualifier/Modifier	not, mild (±2–3 SD from mean), moderate (±3–4 SD from mean), severe (±4–5 SD from mean), profound (±5SD and greater from mean), secondary, chronic, (non)progressive, episodic, recurrent, bilateral, unilateral, distal, proximal, refractory and generalized
Evidence Code	ITM (inferred by text mining), IEA (inferred from electronical annotation), PCS (published clinical study), ICE (individual clinical experience), TAS (traceable author statement)
Onset modifier	Any term from HPO-subontology Age of onset
Frequency modifier	percentage value (e.g. 25%), n of m (e.g. 3/10 patients), very rare, rare, occasional, frequent, typical, variable, common, hallmark and obligate

Meta-attribute	Possible values (explanation in brackets)
Qualifier/Modifier	not, mild (±2–3 SD from mean), moderate (±3–4 SD from mean), severe (±4–5 SD from mean), profound (±5SD and greater from mean), secondary, chronic, (non)progressive, episodic, recurrent, bilateral, unilateral, distal, proximal, refractory and generalized
Evidence Code	ITM (inferred by text mining), IEA (inferred from electronical annotation), PCS (published clinical study), ICE (individual clinical experience), TAS (traceable author statement)
Onset modifier	Any term from HPO-subontology Age of onset
Frequency modifier	percentage value (e.g. 25%), n of m (e.g. 3/10 patients), very rare, rare, occasional, frequent, typical, variable, common, hallmark and obligate

The meaning/definition of the values is shown in brackets. (SD = standard deviation).

CLINICAL INTEGRATION AND USE

The HPO project is collaborating with many clinical groups to refine and extend current terms and annotations. A major effort was undertaken in 2012 with clinicians from the Deciphering Developmental Disorders (21) project to ensure that HPO reflects the needs of that project. Efforts were made to eliminate redundancies and to fill in gaps in the HPO coverage of organ systems, metabolism, neoplasms, neurology and behavior. Among other things, the Onset section of the HPO was revised to provide a small set of well-defined and non-overlapping terms based on published recommendations (22) (Table 3). Input and collaboration from other clinical groups will be welcomed.

Table 3.

Definitions of age-of-onset terms in the HPO

Onset of manifestations	Definition
Less than 1 year
Embryonal	<8 weeks’ gestation
Fetal	8 weeks’ gestation–birth
Neonatal	Birth–28 days
Infantile	28 days–1 year
More than 1 year
Childhood	1–5 years
Juvenile	5–15 years
Adults
Young adult	<40 years
Mid adult	40–60 years
Old age	>60 years

Onset of manifestations	Definition
Less than 1 year
Embryonal	<8 weeks’ gestation
Fetal	8 weeks’ gestation–birth
Neonatal	Birth–28 days
Infantile	28 days–1 year
More than 1 year
Childhood	1–5 years
Juvenile	5–15 years
Adults
Young adult	<40 years
Mid adult	40–60 years
Old age	>60 years

Table 3.

Definitions of age-of-onset terms in the HPO

Onset of manifestations	Definition
Less than 1 year
Embryonal	<8 weeks’ gestation
Fetal	8 weeks’ gestation–birth
Neonatal	Birth–28 days
Infantile	28 days–1 year
More than 1 year
Childhood	1–5 years
Juvenile	5–15 years
Adults
Young adult	<40 years
Mid adult	40–60 years
Old age	>60 years

Onset of manifestations	Definition
Less than 1 year
Embryonal	<8 weeks’ gestation
Fetal	8 weeks’ gestation–birth
Neonatal	Birth–28 days
Infantile	28 days–1 year
More than 1 year
Childhood	1–5 years
Juvenile	5–15 years
Adults
Young adult	<40 years
Mid adult	40–60 years
Old age	>60 years

Whole-exome sequencing (WES) is accelerating the pace of discovery of novel Mendelian disease genes, but many challenges remain. A standard strategy for WES data analysis is to compare variants found in multiple affected patients. Especially with autosomal dominant disorders, many unrelated individuals must be analyzed for this strategy to be successful (23). Therefore, one of the first tasks in WES disease gene discovery projects is to identify multiple patients with the same disease phenotype, which has been extremely successful in identifying novel disease genes even in diseases for which there was little or no previous knowledge about the characteristics of the disease gene. However, many of the Mendelian diseases still waiting to be discovered are very rare or difficult to diagnose clinically. To make progress on elucidating these disorders, it will likely be necessary to combine data from multiple centers to identify a sufficient number of patients with mutations in the same gene and comparable phenotypes—which is widely accepted as a necessary criterion for the identification of a novel disease gene.

This approach has been implemented successfully for copy-number variation (CNV) disorders in the International Standards for Cytogenomic Arrays Consortium’s publicly available database of CNVs identified during the course of routine clinical microarray testing (http://www.ncbi.nlm.nih.gov/dbvar/studies/nstd37/;https://www.iscaconsortium.org). Recognizing that cataloging the phenotype information associated with each CNV would be key in trying to elucidate genotype–phenotype relationships, the group began using HPO terms (as opposed to free text) to describe the phenotypes in a manner that was generalizable (to maintain patient anonymity) as well as easily indexable and searchable for the clinical and research communities (24). Given the success of this approach, the International Standards for Cytogenomic Arrays has expanded its focus to include sequence variation, and, under the name International Collaboration for Clinical Genomics, will continue to use HPO terms to describe the phenotypes associated with results from additional testing modalities, including WES (25).

A similar approach is also being used by the DECIPHER project, which enables clinical scientists worldwide to maintain records of phenotype and chromosome rearrangement for their patients and, with informed consent, share this information with the wider clinical research community to find clusters of rare cases having phenotype and structural rearrangement in common (18). The Deciphering Developmental Disorders project of the Wellcome Trust Sanger Institute has been initiated to use new genomic technologies including especially WES to identify novel etiologies for developmental disorders, and is focused on severe and extreme developmental phenotypes affecting any organ system, which are coded using HPO.

An international collaborative study, the Biomedical Research Centres/Units Inherited Diseases Genetic Evaluation consortium, will use the HPO database to record detailed clinical phenotypes of patients with rare inherited disorders (www.bridgestudy.org). The HPO database that comprises phenotypes related to abnormalities in blood and blood-forming tissues has already facilitated detailed description of the clinical phenotypes of patients with bleeding and platelet disorders (Biomedical Research Centres/Units Inherited Diseases Genetic Evaluation-Bleeding and Platelet Disorders). The homogenization of these clinical phenotypes related to bleeding and platelet disorders will further assist in the clustering of data for detailed bioinformatics analysis of exome sequence data. These patients will be part of the NIHR Bioresource for Rare Diseases.

The European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations (ECARUCA, http://www.ecaruca.net), initiated in 2003, is an online database that collects and provides detailed, curated clinical and molecular information on rare unbalanced chromosome aberrations that are considered to be likely causative for the patient’s phenotype (26). The objective of ECARUCA is to improve the knowledge of rare chromosome aberrations both for medical and research purposes. Currently, the database contains more than 4800 cases with HPO features characterizing these cases, and all these data are publicly available to professionals in genetics.

The Nijmegen Genetics Phenotype Database (NGPD, https://www.clinicalfeatures.eu/default.aspx) aims to collect detailed phenotype information of patients with unexplained intellectual disability and/or congenital anomalies using the HPO. The goal of the NGPD is to identify patients who have similar clinical features that are likely due to the same or a related genetic defect. The NGPD currently contains more than 8000 patients with 73 496 HPO features annotated to these patients (median seven features per patient). Computational approaches are currently being developed for the identification of clusters of phenotypically overlapping patients. Exome sequencing and targeted candidate gene analysis will ultimately provide a diagnosis for many of these patients.

Cartagenia (www.cartagenia.com), a genetics software solution provider that services diagnostic laboratories through a set of automated tools for variant interpretation, filtration, reporting and sharing, has standardized the phenotype functions for clinical patient record annotation of its BENCH laboratory platform on HPO. Several advantages come by using HPO: automated genotype–phenotype correlation, advanced search of patients within laboratories but also in external databases (see earlier) and easy sharing of patient phenotype data among different consortia.

Interoperability between laboratories sharing case information has benefited from standardization on HPO. With more than 120 laboratories and clinics using Cartagenia BENCH in a routine setting, a number of consortia have emerged where not just genotype but also phenotype data are shared. Examples include a number of national consortia sharing variants and phenotype data (The Netherlands, France, UK and Norway) as well as disease-specific registries for (autism, primary immune deficiencies and cardiogenetics), ECARUCA, large prenatal case registries such as the UK-led NHS EACH study and a US-led study at the Columbia University, which have set the phenotyping standard for other prenatal genotype–phenotype registries.

HPO WORKFLOW AND RESOURCES

As mentioned before, we use a continuous integration system (Hudson) for the management of stable releases of the HPO-related data (27) to ensure that users are provided with up-to-date and validated resources. To achieve this, only stable builds are made public, and any curation errors that lead to build failures are detected by our software and prevented from being propagated onto the public Web site. For different aspects of the data, we have generated different jobs and an overview of the job organization can be found in Table 4. The major focus is the phenotype ontology and the annotation data, but closely related projects such as the cross-species phenotype ontology Uberpheno (13) are available as well.

Table 4.

Content of and access to the stable releases of the data provided by the HPO project

Release category	URL of latest stable release for job (relative to http://compbio.charite.de/hudson/job/)	File(s) at URL	File description
HPO releases	hpo/lastStableBuild/	hp.obo, hp.owl	HPO in OBO/OWL format as generated by Oort.
		human-phenotype-ontology_xp.obo	Logical definitions of HPO terms.
		onet_hpo.tsv, LDDB2HPO-v2.csv, medraMapping.tsv	Mappings to other phenotype vocabularies, e.g. Orphanet, LDDB, MedDRA.
Disease annotations	hpo.annotations/lastStableBuild/	negative_phenotype_annotation.tab	Disease-HPO term associations asserted not to be associated with the corresponding disease.
		phenotype_annotation_hpoteam.tab	Manual and semi-automatic annotations of syndromes from OMIM and DECIPHER.
		phenotype_annotation.tab	Manual and semi-automatic annotations of OMIM and DECIPHER augmented with annotations to Orphanet syndromes.
Other data	hpo.annotations.monthly/lastStableBuild/	<source >_ <freq >_genes_to_phenotype.txt	Mapping of human genes to phenotypic features (via disease-to-gene relationship). (<source > is one of ALL_SOURCES, OMIM, or ORPHANET; <freq > is either ALL_FREQ or TYPICAL).
		<source >_ <freq >_phenotype_to_genes.txt	Mapping of phenotypic features to human genes (via disease-to-gene relationship). (<source > is one of ALL_SOURCES, OMIM, or ORPHANET; <freq > is either ALL_FREQ or TYPICAL).
		MYHPO_MM_YYYY.sql	MySQL dump of the HPO database, where MM and YYYY denote the month and year of release.
	hpo.diseasesimilarity/lastStableBuild/	matrices.tar.gz	Precomputed disease–disease similarity matrix for all diseases with annotations to HPO’s phenotypic abnormality subontology. Symmetric and asymmetric semantic similarity score.
	hpo.ontology.uberpheno/lastStableBuild/	crossSpeciesPheno.obo	Cross-species phenotype ontology (human, mouse, zebrafish).
	hpo.ontology.uberpheno/lastStableBuild/	HSgenes_crossSpecies PhenoAnnotation.txt	Annotation of all human genes to terms in crossSpeciesPheno.obo (uses orthology to human genes obtained from MGI and ZFIN). See (13).

Release category	URL of latest stable release for job (relative to http://compbio.charite.de/hudson/job/)	File(s) at URL	File description
HPO releases	hpo/lastStableBuild/	hp.obo, hp.owl	HPO in OBO/OWL format as generated by Oort.
		human-phenotype-ontology_xp.obo	Logical definitions of HPO terms.
		onet_hpo.tsv, LDDB2HPO-v2.csv, medraMapping.tsv	Mappings to other phenotype vocabularies, e.g. Orphanet, LDDB, MedDRA.
Disease annotations	hpo.annotations/lastStableBuild/	negative_phenotype_annotation.tab	Disease-HPO term associations asserted not to be associated with the corresponding disease.
		phenotype_annotation_hpoteam.tab	Manual and semi-automatic annotations of syndromes from OMIM and DECIPHER.
		phenotype_annotation.tab	Manual and semi-automatic annotations of OMIM and DECIPHER augmented with annotations to Orphanet syndromes.
Other data	hpo.annotations.monthly/lastStableBuild/	<source >_ <freq >_genes_to_phenotype.txt	Mapping of human genes to phenotypic features (via disease-to-gene relationship). (<source > is one of ALL_SOURCES, OMIM, or ORPHANET; <freq > is either ALL_FREQ or TYPICAL).
		<source >_ <freq >_phenotype_to_genes.txt	Mapping of phenotypic features to human genes (via disease-to-gene relationship). (<source > is one of ALL_SOURCES, OMIM, or ORPHANET; <freq > is either ALL_FREQ or TYPICAL).
		MYHPO_MM_YYYY.sql	MySQL dump of the HPO database, where MM and YYYY denote the month and year of release.
	hpo.diseasesimilarity/lastStableBuild/	matrices.tar.gz	Precomputed disease–disease similarity matrix for all diseases with annotations to HPO’s phenotypic abnormality subontology. Symmetric and asymmetric semantic similarity score.
	hpo.ontology.uberpheno/lastStableBuild/	crossSpeciesPheno.obo	Cross-species phenotype ontology (human, mouse, zebrafish).
	hpo.ontology.uberpheno/lastStableBuild/	HSgenes_crossSpecies PhenoAnnotation.txt	Annotation of all human genes to terms in crossSpeciesPheno.obo (uses orthology to human genes obtained from MGI and ZFIN). See (13).

Table 4.

Content of and access to the stable releases of the data provided by the HPO project

Release category	URL of latest stable release for job (relative to http://compbio.charite.de/hudson/job/)	File(s) at URL	File description
HPO releases	hpo/lastStableBuild/	hp.obo, hp.owl	HPO in OBO/OWL format as generated by Oort.
		human-phenotype-ontology_xp.obo	Logical definitions of HPO terms.
		onet_hpo.tsv, LDDB2HPO-v2.csv, medraMapping.tsv	Mappings to other phenotype vocabularies, e.g. Orphanet, LDDB, MedDRA.
Disease annotations	hpo.annotations/lastStableBuild/	negative_phenotype_annotation.tab	Disease-HPO term associations asserted not to be associated with the corresponding disease.
		phenotype_annotation_hpoteam.tab	Manual and semi-automatic annotations of syndromes from OMIM and DECIPHER.
		phenotype_annotation.tab	Manual and semi-automatic annotations of OMIM and DECIPHER augmented with annotations to Orphanet syndromes.
Other data	hpo.annotations.monthly/lastStableBuild/	<source >_ <freq >_genes_to_phenotype.txt	Mapping of human genes to phenotypic features (via disease-to-gene relationship). (<source > is one of ALL_SOURCES, OMIM, or ORPHANET; <freq > is either ALL_FREQ or TYPICAL).
		<source >_ <freq >_phenotype_to_genes.txt	Mapping of phenotypic features to human genes (via disease-to-gene relationship). (<source > is one of ALL_SOURCES, OMIM, or ORPHANET; <freq > is either ALL_FREQ or TYPICAL).
		MYHPO_MM_YYYY.sql	MySQL dump of the HPO database, where MM and YYYY denote the month and year of release.
	hpo.diseasesimilarity/lastStableBuild/	matrices.tar.gz	Precomputed disease–disease similarity matrix for all diseases with annotations to HPO’s phenotypic abnormality subontology. Symmetric and asymmetric semantic similarity score.
	hpo.ontology.uberpheno/lastStableBuild/	crossSpeciesPheno.obo	Cross-species phenotype ontology (human, mouse, zebrafish).
	hpo.ontology.uberpheno/lastStableBuild/	HSgenes_crossSpecies PhenoAnnotation.txt	Annotation of all human genes to terms in crossSpeciesPheno.obo (uses orthology to human genes obtained from MGI and ZFIN). See (13).

Release category	URL of latest stable release for job (relative to http://compbio.charite.de/hudson/job/)	File(s) at URL	File description
HPO releases	hpo/lastStableBuild/	hp.obo, hp.owl	HPO in OBO/OWL format as generated by Oort.
		human-phenotype-ontology_xp.obo	Logical definitions of HPO terms.
		onet_hpo.tsv, LDDB2HPO-v2.csv, medraMapping.tsv	Mappings to other phenotype vocabularies, e.g. Orphanet, LDDB, MedDRA.
Disease annotations	hpo.annotations/lastStableBuild/	negative_phenotype_annotation.tab	Disease-HPO term associations asserted not to be associated with the corresponding disease.
		phenotype_annotation_hpoteam.tab	Manual and semi-automatic annotations of syndromes from OMIM and DECIPHER.
		phenotype_annotation.tab	Manual and semi-automatic annotations of OMIM and DECIPHER augmented with annotations to Orphanet syndromes.
Other data	hpo.annotations.monthly/lastStableBuild/	<source >_ <freq >_genes_to_phenotype.txt	Mapping of human genes to phenotypic features (via disease-to-gene relationship). (<source > is one of ALL_SOURCES, OMIM, or ORPHANET; <freq > is either ALL_FREQ or TYPICAL).
		<source >_ <freq >_phenotype_to_genes.txt	Mapping of phenotypic features to human genes (via disease-to-gene relationship). (<source > is one of ALL_SOURCES, OMIM, or ORPHANET; <freq > is either ALL_FREQ or TYPICAL).
		MYHPO_MM_YYYY.sql	MySQL dump of the HPO database, where MM and YYYY denote the month and year of release.
	hpo.diseasesimilarity/lastStableBuild/	matrices.tar.gz	Precomputed disease–disease similarity matrix for all diseases with annotations to HPO’s phenotypic abnormality subontology. Symmetric and asymmetric semantic similarity score.
	hpo.ontology.uberpheno/lastStableBuild/	crossSpeciesPheno.obo	Cross-species phenotype ontology (human, mouse, zebrafish).
	hpo.ontology.uberpheno/lastStableBuild/	HSgenes_crossSpecies PhenoAnnotation.txt	Annotation of all human genes to terms in crossSpeciesPheno.obo (uses orthology to human genes obtained from MGI and ZFIN). See (13).

The HPO release (job hpo) is triggered whenever changes in any of the ontology or logical definition files are uploaded. For every build, the OBO Ontology Release Tool (Oort, https://code.google.com/p/owltools/wiki/OortIntro) is used to generate OBO- and OWL-format versions of the HPO. In addition, the GULO software (28) is used to generate a report on the overlap between the hierarchy inferred from the logical definitions and the manually asserted HPO hierarchy. This is used to incrementally improve both the logical definitions and the HPO structure.

Annotation data are also integrated in our Hudson build system (Table 4). Every HPO release induces a rebuild of the annotation data (job hpo.annotations). This job pulls the latest manual annotation data (http://svn.code.sf.net/p/obo/svn/phenotype-commons/annotations/OMIM/by-disease/annotated/) and the latest Orphanet data (http://www.orphadata.org) and constructs one integrated disease annotation file. Again only successful builds are made available, such that e.g. manually curated annotations are automatically checked for consistency before being offered to the public. The simplest check verifies the syntactical correctness of the input files. Another example is that the generation of annotation files fails if there are annotations to obsolete terms, which are terms that have been marked as to be replaced by other HPO terms and thus should not be used for annotation anymore. Another check confirms that annotation onset-modifiers are correctly chosen from the Onset and clinical course subontology.

Once a month, several secondary files are created automatically by the Hudson build system. The job hpo.annotations.monthly creates an MySQL version of the HPO and the annotation data. It also constructs direct gene-to-phenotype mappings, which use known gene-to-disease relations (from morbidmap and Orphanet) and disease-to-phenotype relations from the job hpo.annotations. So for example the gene ATXN10 (Entrez ID 25814) will be associated with Gait ataxia (HP:0002066), because mutations in that gene cause Spinocerebellar ataxia (OMIM:603516), which is annotated to this HPO class. The files are constructed for different phenotype annotation sources (OMIM, Orphanet) and different frequency thresholds.

Other jobs generate the data used by the Phenomizer (hpo.annotations.monthly.phenomizer), a precomputed disease–disease similarity matrix (hpo.diseasesimilarity), as well as the cross-species phenotype resource Uberpheno (hpo.ontology.uberpheno).

Besides these files, the information of the HPO project can also be accessed in other ways. The HPO Web site offers an individual page for each HPO term (e.g. http://www.human-phenotype-ontology.org/hpoweb/showterm?id=HP:0000127), each of which displays the term label, synonyms, definition and links to genes and diseases. The PhenExplorer is a Web-based application that offers much of the same functionality in a graphical user interface. The HPO is being increasingly used as a basis for integrating phenotypic abnormalities into computational algorithms for diagnostics and research. For instance, Phenomizer (29) and BOQA (20) can be used to assist clinical differential diagnostic for human genetics, and MouseFinder (30), Monarch (http://monarchinitiative.org) PhenoDigm (14) as well as PhenomeNET (12) enable searches for novel disease genes based on the analysis of model-organism phenotypes. The HPO has been used to integrate phenotypic information into computational analysis of the distribution of proteins in the postsynaptic density of the human neocortex (31), to derive a disease–disease similarity measure for the prediction of novel drug indications (32) and to analyze overrepresentation of phenotypes associated with individual protein domains (33). A summary of tools and applications using data from the HPO project is given in Table 5.

Table 5.

Tools and applications using HPO

Tool	Reference/URL
Differential diagnosis and exome analysis
Phenomizer	(29)
BOQA	(20)
Exomiser	http://www.sanger.ac.uk/ resources/databases/exomiser/
Clinical data management and analysis
Cartagenia	http://www.cartagenia.com/
ECARUCA	(26)
DECIPHER	(18)
PhenoTips	(34)
Cross-species phenotype analysis
PhenoDigm	(14)
MouseFinder	(30)
Monarch	http://monarchinitiative.org
PhenomeNet	(12)
Uberpheno	(13)

Tool	Reference/URL
Differential diagnosis and exome analysis
Phenomizer	(29)
BOQA	(20)
Exomiser	http://www.sanger.ac.uk/ resources/databases/exomiser/
Clinical data management and analysis
Cartagenia	http://www.cartagenia.com/
ECARUCA	(26)
DECIPHER	(18)
PhenoTips	(34)
Cross-species phenotype analysis
PhenoDigm	(14)
MouseFinder	(30)
Monarch	http://monarchinitiative.org
PhenomeNet	(12)
Uberpheno	(13)

Table 5.

Tools and applications using HPO

Tool	Reference/URL
Differential diagnosis and exome analysis
Phenomizer	(29)
BOQA	(20)
Exomiser	http://www.sanger.ac.uk/ resources/databases/exomiser/
Clinical data management and analysis
Cartagenia	http://www.cartagenia.com/
ECARUCA	(26)
DECIPHER	(18)
PhenoTips	(34)
Cross-species phenotype analysis
PhenoDigm	(14)
MouseFinder	(30)
Monarch	http://monarchinitiative.org
PhenomeNet	(12)
Uberpheno	(13)

Tool	Reference/URL
Differential diagnosis and exome analysis
Phenomizer	(29)
BOQA	(20)
Exomiser	http://www.sanger.ac.uk/ resources/databases/exomiser/
Clinical data management and analysis
Cartagenia	http://www.cartagenia.com/
ECARUCA	(26)
DECIPHER	(18)
PhenoTips	(34)
Cross-species phenotype analysis
PhenoDigm	(14)
MouseFinder	(30)
Monarch	http://monarchinitiative.org
PhenomeNet	(12)
Uberpheno	(13)

The HPO project offers a number of files that are intended to help users use these kinds of data for their own research. A Hudson job (hpo.diseasesimilarity) creates a precomputed disease similarity matrix, which contains all diseases that have annotations to the HPO subontology ‘phenotypic abnormality’. The similarity value between two diseases is calculated using the HPO annotations for the diseases to calculate a semantic similarity measure (6). A symmetric and an asymmetric version of the disease similarity matrix are calculated (29,35).

The HPO tracker at http://purl.obolibrary.org/obo/hp/tracker can be used to request new classes or to suggest structural changes of the HPO subsumption hierarchy.

Classes of the HPO and associated diseases and genes can be accessed using persistent URLs of the form http://purl.obolibrary.org/obo/HP_ID, where <ID> represents the numeric identifier of the HPO class. Further information on HPO-related publications and general announcements can be found on the HPO Web site at http://www.human-phenotype-ontology.org.

FUTURE DEVELOPMENTS

Development of the HPO has continued apace since its initial publication in 2008 (6). The HPO has focused on providing a well-defined, comprehensive and interoperable resource for computational analysis of human disease phenotypes and has been used as a basis for a wide panoply of tools to perform analysis in clinical and in research settings. While the initial focus of the HPO was placed on rare, mainly Mendelian diseases, HPO annotations are now available also for CNV diseases, and a pilot project to explore the development of annotations for common diseases is currently underway.

Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described (36). Deep phenotype analysis is an essential component of the emerging field of precision medicine, which aims to provide the best available care for each patient based on stratification into disease subclasses with a common biological basis of disease. The HPO aims to provide a powerful and manually curated resource to support efforts to discover disease subclasses, and to translate this knowledge into clinical care, by providing the means to capture, store and exchange phenotypic data. The clinical data that have been captured in this fashion are computable and can be easily integrated into computational algorithms for translational biomedical research.

FUNDING

The Deutsche Forschungsgemeinschaft [DFG RO 2005/4-2]; Bundesministerium für Bildung und Forschung [BMBF project number 0313911]; the European Community’s Seventh Framework Programme [Grant Agreement 602300; SYBIL]. Additional support was received from the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under [Contract No. DE-AC02-05CH11231]; the MGD grant from the National Institutes of Health [HG000330]; the ZFIN grant from the National Institutes of Health [U41-HG002659]; National Institutes of Health [R01-HG004838 and R24-OD011883]; National Institute for Health Research University College London Hospitals Biomedical Research Centre. Funding for open access charge: Institutional support.

Conflict of interest statement. None declared.

REFERENCES

Amberger

Bocchini

Hamosh

A new face and new challenges for online mendelian inheritance in man (OMIM®)

Hum. Mutat.

2011

, vol.

(pg.

564

567

)

Doelken

Köhler

Mungall

Gkoutos

Ruef

Smith

Smedley

Bauer

Klopocki

Schofield

, et al.

Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish

Dis. Model Mech.

2013

, vol.

(pg.

358

372

)

Biesecker

Phenotype matters

Nat. Genet.

2004

, vol.

(pg.

323

324

)

Robinson

Deep phenotyping for precision medicine

Hum. Mutat.

2012

, vol.

(pg.

777

780

)

Gene Ontology Consortium

The gene ontology (GO) database and informatics resource

Nucleic Acids Res.

2004

, vol.

(pg.

D258

D261

)

Crossref

PubMed

Robinson

Köhler

Bauer

Seelow

Horn

Mundlos

The human phenotype ontology: a tool for annotating and analyzing human hereditary disease

Am. J. Hum. Genet.

2008

, vol.

(pg.

610

615

)

Smith

Ashburner

Rosse

Bard

Bug

Ceusters

Goldberg

Eilbeck

Ireland

Mungall

, et al.

The OBO foundry: coordinated evolution of ontologies to support biomedical data integration

Nat. Biotechnol.

2007

, vol.

(pg.

1251

1255

)

Schriml

Arze

Nadendla

Chang

YWW

Mazaitis

Felix

Feng

Kibbe

Disease ontology: a backbone for disease semantic integration

Nucleic Acids Res.

2012

, vol.

(pg.

D940

D946

)

Gkoutos

Mungall

Dolken

Ashburner

Lewis

Hancock

Schofield

Kohler

Robinson

Entity/quality-based logical definitions for the human skeletal phenome using PATO

Conf. Proc. IEEE Eng. Med. Biol. Soc.

2009

, vol.

(pg.

7069

7072

)

Martone

Maynard

Mungall

Lewis

Imam

A knowledge based approach to matching human neurodegenerative disease and animal models

Front. Neuroinforma.

2013

, vol.

pg.

Rosse

Mejino

JLV

A reference ontology for biomedical informatics: the foundational model of anatomy

J. Biomed. Inform.

2003

, vol.

(pg.

478

500

)

Hoehndorf

Schofield

Gkoutos

PhenomeNET: a whole-phenome approach to disease gene discovery

Nucleic Acids Res.

2011

, vol.

pg.

e119

Köhler

Dölken

Ruef

Washington

SBN

Westerfield

Gkoutos

Schofield

Smedley

Robinson

Mungall

Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research

F1000 Res.

2013

, vol.

pg.

Smedley

Oellrich

Köhler

Ruef

Project

SMG

Westerfield

Robinson

Lewis

Mungall

Phenodigm: analyzing curated annotations to associate animal models with human diseases

Database

2013

, vol.

2013

pg.

bat025

Washington

Haendel

Mungall

Ashburner

Westerfield

Lewis

Linking human diseases to animal models using ontology-based phenotype annotation

PLoS Biol.

2009

, vol.

pg.

e1000247

Amberger

Bocchini

Scott

Hamosh

McKusick’s online mendelian inheritance in man (OMIM)

Nucleic Acids Res.

2009

, vol.

(pg.

D793

D796

)

Rath

Olry

Dhombres

Brandt

Urbero

Ayme

Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users

Hum. Mutat.

2012

, vol.

(pg.

803

808

)

Firth

Richards

Bevan

Clayton

Corpas

Rajan

Vooren

Moreau

Pettett

Carter

DECIPHER: database of chromosomal imbalance and phenotype in humans using ENSEMBL resources

Am. J. Hum. Genet.

2009

, vol.

(pg.

524

533

)

Caciotti

Rocco

Filocamo

Grossi

Traverso

d’Azzo

Cavicchi

Messeri

Guerrini

Zammarchi

, et al.

Type II sialidosis: review of the clinical spectrum and identification of a new splicing defect with chitotriosidase assessment in two patients

J. Neurol.

2009

, vol.

256

(pg.

1911

1915

)

Bauer

Köhler

Schulz

Robinson

Bayesian ontology querying for accurate and noise-tolerant semantic searches

Bioinformatics

2012

, vol.

(pg.

2502

2508

)

Firth

Wright

Study

The deciphering developmental disorders (DDD) study

Dev. Med. Child Neurol.

2011

, vol.

(pg.

702

703

)

Firth

Hurst

. ,

Oxford Desk Reference –Clinical Genetics

2005

Oxford, UK

Oxford University Press

Google Preview

http://bio-ontologies.knowledgeblog.org/405

Robinson

Krawitz

Mundlos

Strategies for exome and genome sequence data analysis in disease-gene discovery projects

Clin. Genet.

2011

, vol.

(pg.

127

132

)

Riggs

Jackson

Miller

Vooren

Phenotypic information in genomic variant databases enhances clinical care and research: the international standards for cytogenomic arrays consortium experience

Hum. Mutat.

2012

, vol.

(pg.

787

796

)

Riggs

Wain

Riethmaier

Savage

Smith-Packard

Kaminsky

Rehm

Martin

Ledbetter

Faucett

Towards a universal clinical genomics database: the 2012 international standards for cytogenomic arrays consortium meeting

Hum. Mutat.

2013

, vol.

(pg.

915

919

)

van Silfhout

ATV

van Ravenswaaij

CMA

Hehir-Kwa

Verwiel

ETP

Dirks

van Vooren

Schinzel

de Vries

BBA

de Leeuw

An update on ECARUCA, the European cytogeneticists association register of unbalanced chromosome aberrations

Eur. J. Med. Genet.

2013

, vol.

(pg.

471

474

)

Mungall

Dietze

Carbon

Ireland

Bauer

Lewis

. ,

Continuous Integration of Open Biological Ontology Libraries

2012

Google Preview