Eukfinder is a modular pipeline for classifying WGS metagenomic data and recovering potential eukaryotic sequences. It supports both Illumina short reads (Eukfinder_short) and assemblies or long-read data (Eukfinder_long).
Key Features:
- Automated classification of potential eukaryotic sequences.
- Flexible design for short-read, long-read, or assembly data.
- Optional binning workflow for refining nuclear and mitochondrial genomes.
- Customizable databases for different environments (e.g., gut, ocean, soil).
Eukfinder has two different modes of operation based on the input files:
-
(a) Illumina short reads workflow (Eukfinder_short): Short reads are first classified into five taxonomic categories (Archaeal, Bacterial, Viral, Eukaryotic, and Unknown) using Centrifuge (DB1) and PLAST (DB2). Reads classified as 'Eukaryotic' or 'Unknown' are assembled into contigs using metaSpades. These contigs are then reclassified with Centrifuge and PLAST. Contigs assigned as 'Eukaryotic' or 'Unknown' are combined and treated as potential eukaryotic sequences, which can be further analyzed for downstream binning and genome recovery.
-
(b) Metagenome assembled contigs or long-read sequencing workflow (Eukfinder_long): For MAG assembled contigs or long-read sequencing data generated by Nanopore or PacBio platforms, the workflow performs a single round of classification to select 'Eukaryotic' and 'Unknown' contigs. These selected contigs are combined and treated as potential eukaryotic sequences, ready for further binning and downstream analysis.
Schematic representation of Eukfinder pipeline:
Contributions are what makes the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
Zhao, D., Salas-Leiva, D.E., Williams, S.K., Dunn, K.A. and Roger, A.J., 2023. Eukfinder: a pipeline to retrieve microbial eukaryote genomes from metagenomic sequencing data. bioRxiv, pp.2023-12.
Dandan Zhao (d.zhao@dal.ca) Dayana Salas (ds2000@cam.ac.uk)