Skip to content

cogent3/cogent3

PyPI version Downloads

Build Status coverall

PyPI - Python Version Ruff

CodeQL Codacy Badge

cogent3 is a mature python library for analysis of genomic sequence data. We endeavour to provide a first-class experience within Jupyter notebooks, but the algorithms also support parallel execution on compute systems with 1000's of processors.

πŸ“£ Feature Announcements πŸ“£

New core data types improve efficiency and flexibility

The cogent3 development team πŸ‘Ύ have been hard at work modernising the core internals πŸ’ͺπŸ› .

The grand rewrite of alignment classes is ready for use! The new approach gives us the foundation for major performance improvements in the future. As with the moltype, alphabet, genetic code and SequenceCollection, you can select the new class via make_aligned_seqs() or load_aligned_seqs() by specifying new_type=True. These are not yet the default and are not fully integrated into the existing code. They can also differ in their API relative to the classes they replace.

We encourage experimentation in cases where integration with old objects is NOT required and look forward to any feedback!

Faster pairwise genetic distance calculations πŸš€

We have completely rewritten a subset of the genetic distance calculators. These are now only available using the new type Alignment.distance_matrix() method. Single CPU performance is faster and we now also support parallel execution.

Faster sequence coevolution measures πŸš€

We have completely rewritten all the Mutual Information based coevolution statistic calculators. Single CPU performance is orders of magnitude faster than the old implementation and we now also support parallel execution. The existing Alignment.coevolution() method uses these so you don't need to do anything different to use the new algorithms.

Faster sequence format parsers πŸ’¨

We have faster implementations of the parsers for Fasta and GenBank sequence formats. These are used by our standard loading mechanisms. If you just want to get the contents of files in those formats as standard Python types, use cogent3.parser.fasta.iter_fasta_records() or cogent3.parser.genbank.iter_genbank_records().

Supporting third-party apps as plugins πŸ”Œ

Cogent3 now provides support for plugins! Third-party developers can deploy their code as cogent3 apps with just a few lines. See the demo project.

Post any questions you have in cogent3 discussions.

The developers of Cogent3 and IQ-TREE2 announce piqtree2 πŸŽ‰

Speaking of plugins, our first major third-party plugin is piqtree2. Try it out and give us feedback.

Who is it for?

Anyone who wants to analyse sequence divergence using robust statistical models

cogent3 is unique in providing numerous non-stationary Markov models for modelling sequence evolution, including codon models. cogent3 also includes an extensive collection of time-reversible models (again including novel codon models). We have done more than just invent these new methods, we have established the most robust algorithms for their implementation and their suitability for real data. Additionally, there are novel signal processing methods focussed on statistical estimation of integer period signals.

🎬 Demo non-reversible substitution model
cogent3-demo-composable.mp4

Anyone who wants to undertake exploratory genomic data analysis

Beyond our novel methods, cogent3 provides an extensive suite of capabilities for manipulating and analysing sequence data. You can manipulate sequences by their annotations, e.g.

🎬 Demo sequences with annotations
cogent3-demo-new-ann.mp4

Plus, you can read standard tabular and biological data formats, perform multiple sequence alignment using any cogent3 substitution models, phylogenetic reconstruction and tree manipulation, manipulation of tabular data, visualisation of phylogenies and much more.

Beginner friendly approach to genomic data analysis

Our cogent3.app module provides a very different approach to using the library capabilities. Expertise in structural programming concepts is not essential!

🎬 Demo friendly coding
cogent3-demo-composable.mp4

Installation

For most users we recommend

$ pip install "cogent3[extra]"

which installs support for data visualisation and jupyter notebooks.

If you're running on a high-performance computing system we recommend

$ pip install cogent3

which skips the data visualisation and notebook support.

To install the development version directly from GitHub

$ pip install git+https://github.com/cogent3/cogent3.git@develop#egg=cogent3

Project Information

cogent3 is released under the BSD-3 license, documentation is at cogent3.org, while cogent3 code is on GitHub. If you would like to contribute (and we hope you do!), we have created a companion c3dev GitHub repo which provides details on how to contribute and some useful tools for doing so.

Project History

cogent3 is a descendant of PyCogent. While there is much in common with PyCogent, the amount of change has been substantial, motivating the name change to cogent3. This name has been chosen because cogent was always the import name (dating back to PyEvolve in 2004) and it's Python 3 only.

Given this history, we are grateful to the multitude of individuals who have made contributions over the years. Many of these contributors were also co-authors on the original PyEvolve and PyCogent publications. Individual contributions can be seen by using "view git blame" on individual lines of code on GitHub, through git log in the terminal, and more recently the changelog.

Funding

Cogent3 has received funding support from the Australian National University and an Essential Open Source Software for Science Grant from the Chan Zuckerberg Initiative.

Β Β Β Β  Β Β Β Β