Skip to content

Can this be used as a replacement for MetaEuk's eukaryotic gene prediction capabilities? #64

Open
@jolespin

Description

I've been developing a metagenomic/metatranscriptomics software suite called VEBA (https://github.com/jolespin/veba) that natively handles eukaryotic binning of metagenome-assembled genomes and exon-aware gene prediction.

Currently I'm using MetaEuk but encountering significant resource requirements for larger eukaryotic genomes (especially alga from targeted cultured assemblies).

I've seen the sensitivity and distant homology issues mentioned so thought this would be appropriate to ask in an issue.

I have a general microeukaryotic protein database that I've compiled from various source repositories and clustered by 100%, 90%, and 50% identity similar to UniRef (explained in Table 2 https://academic.oup.com/nar/article/52/14/e63/7697622).

In this database, there will be many proteins that are not related to the target genome.

My questions:

  • Given a genome where I do not know the lineage a priori can I use miniprot with this "general" microeukaryotic protein database?
  • Can I use miniprot for exon-aware gene predictions as I do with MetaEuk?
  • Can this be used with fragmented genomes?

If so, are there any parameters I should adjust to help with any of those scenarios?

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions