Question

Pangenome of plant

0

Entering edit mode

7 months ago

analyst ▴ 50

Dear all,

I have to build/construct a pan genome of a plant organism for which I have 10 assemblies of different varieties.

Now I need to construct pan genome out of assemblies. Anyone please suggest which tool is used to construct pangenome for plant?

pangenome plant • 1.0k views

ADD COMMENT • link 6 months ago by analyst ▴ 50

score 5 · Accepted Answer · 2024-03-26

The size, quality and complexity of the genome assemblies will have an effect on the options available, as well as your goals with the analysis. I would say there are two basic approaches:

gene-based analysis: when you only care about the gene content and not intergenic regions. This requires that the input assemblies have been annotated ie have matching GFF files with gene models. This approach is faster and is potentially less penalized by assemblies being fragmented or extremely rich in repeated sequences. Examples of tools that can do this kind of analyses are OrthoFinder (protein-based), GENESPACE (protein- and GFF-based), GET_HOMOLOGUES-EST (nucleotide-based, with pangenome-specific analysis) or GET_PANGENES (nucleotide- and colinearity-based, with pangenome-specific analysis).
whole-genome analysis: when you want to carry out presence-absence analysis of all genomic regions. In recent paper https://doi.org/10.3389/fpls.2024.1371222 three construction strategies (iterative individual, iterative pooling, and map-to-pan) are compared. For this kind of analysis I have tested minigraph on rice data, but I know from our own work that minimap-based approaches suffer with large, repeat-rich genomes such as barley and wheat. I haven't tested https://github.com/ComparativeGenomicsToolkit/cactus yet. I am currently testing https://github.com/maize-genetics/phg_v2 with barley data.

score 3 · Accepted Answer · 2024-03-27

3

Entering edit mode

7 months ago

colindaven 6.8k

We maintain a community awesome-list here on pangenomics - https://github.com/colindaven/awesome-pangenomes

The construction tools, toolkits and file formats are particularly important. Be aware pangenomic analysis is nowhere near as advanced as reference based analysis yet.

ADD COMMENT • link 7 months ago by colindaven 6.8k

0

Entering edit mode

Thank you so much colindaven!

Can you please suggest that which tool from awesome-pangenomes should be preferred for diploid plant and which works best for polyploid plant like wheat ?

ADD REPLY • link 7 months ago by analyst ▴ 50

1

Entering edit mode

Honestly, none of them are ready for whole genome wheat.

I would look at target regions exclusively and define your goals before you start.

Possibly the most robust method to start with is - if you can create a graph

PGGB -> ODGI PAV -> get presence absence variations

The fastest and most efficient alignment method by far is minigraph but it does not give you a GFA output so cannot be used by most downstream tools.

ADD REPLY • link 7 months ago by colindaven 6.8k

score 3 · Accepted Answer · 2024-03-27

There is no one tool that fits best. There is the homology-based approach, using protein coding genes, but this relies on high-quality annotations. If you are using lift-over annotations you'll miss novel isoforms. I think @b.contreras.moreira gave the best summary of the existing tools. To achieve a decent pangenome these days it requires integrating multiple approaches.

For an assembly-based approach, I've used Progressive Minigraph-cactus, but it is limited by a reference-based alignment call and the quality of genome assemblies being input. Ideally, all genome assemblies going into a pangenome pipeline should be chromosome-scaffolded and haplotype-resolved. Overall, it's a good tool that is still under development by a highly skilled and motivated team -- plus, Singularity makes it easy to install.

Analysis for a pangenome assembly is challenging and not at all straightforward. If you're looking for something quick and easy, I'd suggest the homology-based approach, using something like presence-absence variation.