Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 1;28(21):2724-31.
doi: 10.1093/bioinformatics/bts525. Epub 2012 Sep 3.

JEnsembl: a version-aware Java API to Ensembl data systems

Affiliations

JEnsembl: a version-aware Java API to Ensembl data systems

Trevor Paterson et al. Bioinformatics. .

Abstract

Motivation: The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data.

Results: The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed.

Availability: Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
JEnsembl architecture. Schematic diagram of the modular JEnsembl architecture, where schema-versioned MyBatis configurations in the ensembl-config module are mapped to DatasourceAware objects using the MyBatis data mapping framework. Connection to external Ensembl datasources is via the MySQL JDBC connector
Fig. 2.
Fig. 2.
Data mapping between database releases and schema versions. (A) The configuration file hierarchy in the ensembl-config module. The ensembldb, ensembldb-archives and ensemblgenomes properties files hold JDBC connection parameters, while schema_version_mappings specifies which MyBatis configurations are to be used for each Ensembl release version. The base Configuration.xml and Database.xml files configure connection at the datasource level, while release-specific MyBatis mappings are held in database type-specific directories: schema/XX/compara, core, funcgen and variation; rules specified in a Configuration.xml file in each directory allows a release configuration to use mapping files from different directories. (B) Abridged listing of schema_version_mappings properties, showing how the appropriate mappings of database type and version to MyBatis configuration directories are specified. Core and Compara mappings were developed for release 57 and are backwards compatible to release 51. Variation mappings were introduced from version 62 and Core mapping rules updated at release 65
Fig. 3.
Fig. 3.
Example usage of JEnsembl Java API (v1.12). The Species ‘ecoli’ retrieved in the final code block is actually a CollectionSpecies because it is retrieved from the ‘escherichia_shigella_collection_core’ databases. CollectionSpecies are slightly less reliable access points than normal Species as there is no guarantee of stable species, strain names and aliases between releases
Fig. 4.
Fig. 4.
Code illustrating JEnsembl API retrieving chromosomal coordinates for a human gene (Ensembl ID ENSG00000153551) for 18 different Ensembl Releases currently available at the ENSEMBLDB datasource (i.e. MySQL databases at ensembldb.ensembl.org:5306). The results reflect different coordinates of this gene in assembly builds 36 and 37. The increase in apparent gene size between release 55 and 56 (highlighted) is due to the addition of further transcripts to the gene model
Fig. 5.
Fig. 5.
JEnsembl plug-in for Savant genome browser. (A) The user selects the desired species and release version from those available at the selected datasource (Ensembl, EnsemblGenomes or EnsemblGenomes-Bacterial). (B) A single chromosome/assembly is selected from those available for the chosen species/release. The chromosome is imported either as a simple coordinate skeleton or with the associated colour-coded genomic sequence. Currently, the only feature annotation that can be imported from the datasource is the gene track, which Savant shows aligned with the DNA Sequence
Fig. 6.
Fig. 6.
The ArkMAP application uses JEnsembl for retrieving maps and homologies from Ensembl datasources. ArkMAP can be used to draw genetic maps loaded from ArkDB, Ensembl or local datasources. Here the first 8 Mb of a bovine ePCR map has been loaded from ArkDB, where Ark Markers have been mapped on the Btau4 assembly. The JEnsembl API was then used to retrieve and align the cognate gene-annotated chromosome 1 assembly from Ensembl release 54. JEnsembl was then used to retrieve a more recent (release 66) gene annotated assembly which is aligned to the old assembly. Finally, JEnsembl was used to search for human gene homologies with the bovine genes in this region, and the region of conserved synteny on human chromosome 21 aligned with the bovine chromosome (with colour-coded homology relationships)

Similar articles

  • ArkMAP: integrating genomic maps across species and data sources.
    Paterson T, Law A. Paterson T, et al. BMC Bioinformatics. 2013 Aug 13;14:246. doi: 10.1186/1471-2105-14-246. BMC Bioinformatics. 2013. PMID: 23941167 Free PMC article.
  • Ensembl Genomes: extending Ensembl across the taxonomic space.
    Kersey PJ, Lawson D, Birney E, Derwent PS, Haimel M, Herrero J, Keenan S, Kerhornou A, Koscielny G, Kähäri A, Kinsella RJ, Kulesha E, Maheswari U, Megy K, Nuhn M, Proctor G, Staines D, Valentin F, Vilella AJ, Yates A. Kersey PJ, et al. Nucleic Acids Res. 2010 Jan;38(Database issue):D563-9. doi: 10.1093/nar/gkp871. Epub 2009 Nov 1. Nucleic Acids Res. 2010. PMID: 19884133 Free PMC article.
  • The Ensembl REST API: Ensembl Data for Any Language.
    Yates A, Beal K, Keenan S, McLaren W, Pignatelli M, Ritchie GR, Ruffier M, Taylor K, Vullo A, Flicek P. Yates A, et al. Bioinformatics. 2015 Jan 1;31(1):143-5. doi: 10.1093/bioinformatics/btu613. Epub 2014 Sep 17. Bioinformatics. 2015. PMID: 25236461 Free PMC article.
  • Interoperability with Moby 1.0--it's better than sharing your toothbrush!
    BioMoby Consortium; Wilkinson MD, Senger M, Kawas E, Bruskiewich R, Gouzy J, Noirot C, Bardou P, Ng A, Haase D, Saiz Ede A, Wang D, Gibbons F, Gordon PM, Sensen CW, Carrasco JM, Fernández JM, Shen L, Links M, Ng M, Opushneva N, Neerincx PB, Leunissen JA, Ernst R, Twigger S, Usadel B, Good B, Wong Y, Stein L, Crosby W, Karlsson J, Royo R, Párraga I, Ramírez S, Gelpi JL, Trelles O, Pisano DG, Jimenez N, Kerhornou A, Rosset R, Zamacola L, Tarraga J, Huerta-Cepas J, Carazo JM, Dopazo J, Guigo R, Navarro A, Orozco M, Valencia A, Claros MG, Pérez AJ, Aldana J, Rojano M, Fernandez-Santa Cruz R, Navas I, Schiltz G, Farmer A, Gessler D, Schoof H, Groscurth A. BioMoby Consortium, et al. Brief Bioinform. 2008 May;9(3):220-31. doi: 10.1093/bib/bbn003. Epub 2008 Jan 31. Brief Bioinform. 2008. PMID: 18238804 Review.
  • The Ensembl core software libraries.
    Stabenau A, McVicker G, Melsopp C, Proctor G, Clamp M, Birney E. Stabenau A, et al. Genome Res. 2004 May;14(5):929-33. doi: 10.1101/gr.1857204. Genome Res. 2004. PMID: 15123588 Free PMC article. Review.

Cited by

References

    1. Fiume M, et al. Savant: genome browser for high-throughput sequencing data. Bioinformatics. 2010;26:1938–1944. - PMC - PubMed
    1. Flicek P, et al. Ensembl’s 10th year, 2010. Nucleic Acids Res. 2008;38(Suppl. 1):D557–D562. - PMC - PubMed
    1. Holland RCG, et al. BioJava: an Open-Source Framework for Bioinformatics. Bioinformatics. 2008;24:2096–2097. - PMC - PubMed
    1. Knight R, et al. PyCogent: a toolkit for making sense from sequence. Genome Biol. 2007;8:R171. - PMC - PubMed
    1. Stabenau A, et al. The Ensembl core software libraries. Genome Res. 2004;14:929–933. - PMC - PubMed

Publication types