A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study

doi:10.1186/s12864-024-10931-w

Review

. 2024 Oct 31;25(1):1022.

doi: 10.1186/s12864-024-10931-w.

A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study

Harpreet Kaur¹, Laura M Shannon², Deborah A Samac³

Affiliations

¹ Department of Horticultural Science, University of Minnesota, St. Paul, MN, 55108, USA. kaurh@umn.edu.
² Department of Horticultural Science, University of Minnesota, St. Paul, MN, 55108, USA.
³ USDA-ARS, Plant Science Research Unit, St. Paul, MN, 55108, USA.

PMID: 39482604
PMCID: PMC11526573
DOI: 10.1186/s12864-024-10931-w

Review

A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study

Harpreet Kaur et al. BMC Genomics. 2024.

. 2024 Oct 31;25(1):1022.

doi: 10.1186/s12864-024-10931-w.

Authors

Harpreet Kaur¹, Laura M Shannon², Deborah A Samac³

Affiliations

¹ Department of Horticultural Science, University of Minnesota, St. Paul, MN, 55108, USA. kaurh@umn.edu.
² Department of Horticultural Science, University of Minnesota, St. Paul, MN, 55108, USA.
³ USDA-ARS, Plant Science Research Unit, St. Paul, MN, 55108, USA.

PMID: 39482604
PMCID: PMC11526573
DOI: 10.1186/s12864-024-10931-w

Abstract

Background: The concept of pangenomics and the importance of structural variants is gaining recognition within the plant genomics community. Due to advancements in sequencing and computational technology, it has become feasible to sequence the entire genome of numerous individuals of a single species at a reasonable cost. Pangenomes have been constructed for many major diploid crops, including rice, maize, soybean, sorghum, pearl millet, peas, sunflower, grapes, and mustards. However, pangenomes for polyploid species are relatively scarce and are available in only few crops including wheat, cotton, rapeseed, and potatoes.

Main body: In this review, we explore the various methods used in crop pangenome development, discussing the challenges and implications of these techniques based on insights from published pangenome studies. We offer a systematic guide and discuss the tools available for constructing a pangenome and conducting downstream analyses. Alfalfa, a highly heterozygous, cross pollinated and autotetraploid forage crop species, is used as an example to discuss the concerns and challenges offered by polyploid crop species. We conducted a comparative analysis using linear and graph-based methods by constructing an alfalfa graph pangenome using three publicly available genome assemblies. To illustrate the intricacies captured by pangenome graphs for a complex crop genome, we used five different gene sequences and aligned them against the three graph-based pangenomes. The comparison of the three graph pangenome methods reveals notable variations in the genomic variation captured by each pipeline.

Conclusion: Pangenome resources are proving invaluable by offering insights into core and dispensable genes, novel gene discovery, and genome-wide patterns of variation. Developing user-friendly online portals for linear pangenome visualization has made these resources accessible to the broader scientific and breeding community. However, challenges remain with graph-based pangenomes including compatibility with other tools, extraction of sequence for regions of interest, and visualization of genetic variation captured in pangenome graphs. These issues necessitate further refinement of tools and pipelines to effectively address the complexities of polyploid, highly heterozygous, and cross-pollinated species.

Keywords: Alfalfa; Autotetraploid; Crop pangenome; Graph-based pangenome; Polyploids.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
The pangenome development methods. Different colors represent different genes in Fig. a) and assemblies, in Fig. b), and c). Grey areas represent colinear segments between assemblies. a Gene-based pangenome approach, b Map-to-pan approach, c Iterative mapping approach

**Fig. 2**
Graph-based pangenome development approach. In the reference Graphical Fragment Assembly (rGFA) format, used by Minigraph and Minigraph-Cactus pipelines, origin of the segment can be traced back to its linear genome used to build the graph and each segment/sequence is associated with only one origin. In GFA (v. 1.1 and 1.0) format, used by PanGenome Graph Building (PGGB) and k-mer based approaches, tracing the origin back to its linear genome is difficult. Segments in a pangenome graph are DNA sequences. Links/Edges connects the segments/sequences to each other and represent overlapping sequences between two segments. Links/Edges could be bidirected and describes the possible ways of walking through the nodes. Nodes are group of segments connected with edges through which multiple paths are possible

**Fig. 3**
Flowcharts explaining gene-based and linear pangenome development pipelines or software. Different software used in the linear pangenome are mentioned in Table 3

**Fig. 4**
Flowchart explaining the five different pangenome graph building pipelines, their comparisons, and the downstream analyses. In output and downstream analyses section, blue boxes are the software tools and other boxes are outputs of these tools. VG augment function, highlighted yellow box, only works with VG output which should be generated using a reference genome and VCF file in VGToolkit. Double sided arrow means that the function works both ways i.e., VG format can be converted into rGFA and vice-versa. Other linear and graph alignment and variant calling tools are mentioned in Table 3

**Fig. 5**
Dotplots comparing Medicago sativa assemblies. The ZhongmuNo.1 consensus genome assembly (y-axis) is comapred to the a XinJiang DaYe, b ZhongmuNo.4, and c CADL genome assemblies. Nucleotide level alignments were generated using MUMmer (v. 4.0.0beta2). Dotplots showing unique alignments were generated using web version of Assemblytics

**Fig. 6**
GO enrichment of the highly-impacted-by-variant genes. a SNPs identified in ZhongmuNo.1 reference genome using SyRI v.1.6.3 separately for each of the homologs for two reference genomes, ‘XinJiang DaYe’ and ‘ZhongmuNo.4’ and later combined using BCFtools v.1.16, b Structural Variants (< 50 bp) identified using SyRI v.1.6.3 and web version of Assemblytics separately for each homolog and later combined using SURVIVOR v.0.1.0

**Fig. 7**
Alignment of five different genes against graph-based pangenomes in alfalfa using BLAST. The blue highlighted regions show the aligned regions on different nodes in the graphs. The five different genes include *Medicago sativa* palmate-like pentafoliata 1 (*MsPALM1*; GenBank accession: HM038483.1), *M. truncatula* PHD finger protein male sterility 1 (*MtMS1*; GenBank accession: XM_003613725.3), *Glycine max* caffeic acid 3-O-methyltransferase (*GmCOMT*; KEGG gene database reference number: gmx:100780100), *M. sativa* chromoplast heme oxygenase 1 (*MsHO1*; GenBank accession: HM212768.1), and *M. sativa* leghemoglobin 3 (*MsLb3*; GenBank accession: M91077.1). These figures were generated with BANDAGE (v. 0.8.1) software using an in-built BLAST feature with default parameters

See this image and copyright information in PMC

References

1. Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546:524–7. - PMC - PubMed
1. Sun S, Zhou Y, Chen J, Shi J, Zhao H, Zhao H, et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat Genet. 2018;50:1289–95. - PubMed
1. Yang N, Liu J, Gao Q, Gui S, Chen L, Yang L, et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat Genet. 2019;51:1052–9. - PubMed
1. Li C, Xiang X, Huang Y, Zhou Y, An D, Dong J, et al. Long-read sequencing reveals genomic structural variations that underlie creation of quality protein maize. Nat Commun. 2020;11:17. - PMC - PubMed
1. Ge F, Qu J, Liu P, Pan L, Zou C, Yuan G, et al. Genome assembly of the maize inbred line A188 provides a new reference genome for functional genomics. Crop J. 2022;10:47–55.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central

[1] Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546:524–7. - PMC - PubMed

[2] Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546:524–7. - PMC - PubMed

[3] Sun S, Zhou Y, Chen J, Shi J, Zhao H, Zhao H, et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat Genet. 2018;50:1289–95. - PubMed

[4] Sun S, Zhou Y, Chen J, Shi J, Zhao H, Zhao H, et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat Genet. 2018;50:1289–95. - PubMed

[5] Yang N, Liu J, Gao Q, Gui S, Chen L, Yang L, et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat Genet. 2019;51:1052–9. - PubMed

[6] Yang N, Liu J, Gao Q, Gui S, Chen L, Yang L, et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat Genet. 2019;51:1052–9. - PubMed

[7] Li C, Xiang X, Huang Y, Zhou Y, An D, Dong J, et al. Long-read sequencing reveals genomic structural variations that underlie creation of quality protein maize. Nat Commun. 2020;11:17. - PMC - PubMed

[8] Li C, Xiang X, Huang Y, Zhou Y, An D, Dong J, et al. Long-read sequencing reveals genomic structural variations that underlie creation of quality protein maize. Nat Commun. 2020;11:17. - PMC - PubMed

[9] Ge F, Qu J, Liu P, Pan L, Zou C, Yuan G, et al. Genome assembly of the maize inbred line A188 provides a new reference genome for functional genomics. Crop J. 2022;10:47–55.

[10] Ge F, Qu J, Liu P, Pan L, Zou C, Yuan G, et al. Genome assembly of the maize inbred line A188 provides a new reference genome for functional genomics. Crop J. 2022;10:47–55.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study

Affiliations

A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources