Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Oct 31;25(1):1022.
doi: 10.1186/s12864-024-10931-w.

A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study

Affiliations
Review

A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study

Harpreet Kaur et al. BMC Genomics. .

Abstract

Background: The concept of pangenomics and the importance of structural variants is gaining recognition within the plant genomics community. Due to advancements in sequencing and computational technology, it has become feasible to sequence the entire genome of numerous individuals of a single species at a reasonable cost. Pangenomes have been constructed for many major diploid crops, including rice, maize, soybean, sorghum, pearl millet, peas, sunflower, grapes, and mustards. However, pangenomes for polyploid species are relatively scarce and are available in only few crops including wheat, cotton, rapeseed, and potatoes.

Main body: In this review, we explore the various methods used in crop pangenome development, discussing the challenges and implications of these techniques based on insights from published pangenome studies. We offer a systematic guide and discuss the tools available for constructing a pangenome and conducting downstream analyses. Alfalfa, a highly heterozygous, cross pollinated and autotetraploid forage crop species, is used as an example to discuss the concerns and challenges offered by polyploid crop species. We conducted a comparative analysis using linear and graph-based methods by constructing an alfalfa graph pangenome using three publicly available genome assemblies. To illustrate the intricacies captured by pangenome graphs for a complex crop genome, we used five different gene sequences and aligned them against the three graph-based pangenomes. The comparison of the three graph pangenome methods reveals notable variations in the genomic variation captured by each pipeline.

Conclusion: Pangenome resources are proving invaluable by offering insights into core and dispensable genes, novel gene discovery, and genome-wide patterns of variation. Developing user-friendly online portals for linear pangenome visualization has made these resources accessible to the broader scientific and breeding community. However, challenges remain with graph-based pangenomes including compatibility with other tools, extraction of sequence for regions of interest, and visualization of genetic variation captured in pangenome graphs. These issues necessitate further refinement of tools and pipelines to effectively address the complexities of polyploid, highly heterozygous, and cross-pollinated species.

Keywords: Alfalfa; Autotetraploid; Crop pangenome; Graph-based pangenome; Polyploids.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The pangenome development methods. Different colors represent different genes in Fig. a) and assemblies, in Fig. b), and c). Grey areas represent colinear segments between assemblies. a Gene-based pangenome approach, b Map-to-pan approach, c Iterative mapping approach
Fig. 2
Fig. 2
Graph-based pangenome development approach. In the reference Graphical Fragment Assembly (rGFA) format, used by Minigraph and Minigraph-Cactus pipelines, origin of the segment can be traced back to its linear genome used to build the graph and each segment/sequence is associated with only one origin. In GFA (v. 1.1 and 1.0) format, used by PanGenome Graph Building (PGGB) and k-mer based approaches, tracing the origin back to its linear genome is difficult. Segments in a pangenome graph are DNA sequences. Links/Edges connects the segments/sequences to each other and represent overlapping sequences between two segments. Links/Edges could be bidirected and describes the possible ways of walking through the nodes. Nodes are group of segments connected with edges through which multiple paths are possible
Fig. 3
Fig. 3
Flowcharts explaining gene-based and linear pangenome development pipelines or software. Different software used in the linear pangenome are mentioned in Table 3
Fig. 4
Fig. 4
Flowchart explaining the five different pangenome graph building pipelines, their comparisons, and the downstream analyses. In output and downstream analyses section, blue boxes are the software tools and other boxes are outputs of these tools. VG augment function, highlighted yellow box, only works with VG output which should be generated using a reference genome and VCF file in VGToolkit. Double sided arrow means that the function works both ways i.e., VG format can be converted into rGFA and vice-versa. Other linear and graph alignment and variant calling tools are mentioned in Table 3
Fig. 5
Fig. 5
Dotplots comparing Medicago sativa assemblies. The ZhongmuNo.1 consensus genome assembly (y-axis) is comapred to the a XinJiang DaYe, b ZhongmuNo.4, and c CADL genome assemblies. Nucleotide level alignments were generated using MUMmer (v. 4.0.0beta2). Dotplots showing unique alignments were generated using web version of Assemblytics
Fig. 6
Fig. 6
GO enrichment of the highly-impacted-by-variant genes. a SNPs identified in ZhongmuNo.1 reference genome using SyRI v.1.6.3 separately for each of the homologs for two reference genomes, ‘XinJiang DaYe’ and ‘ZhongmuNo.4’ and later combined using BCFtools v.1.16, b Structural Variants (< 50 bp) identified using SyRI v.1.6.3 and web version of Assemblytics separately for each homolog and later combined using SURVIVOR v.0.1.0
Fig. 7
Fig. 7
Alignment of five different genes against graph-based pangenomes in alfalfa using BLAST. The blue highlighted regions show the aligned regions on different nodes in the graphs. The five different genes include Medicago sativa palmate-like pentafoliata 1 (MsPALM1; GenBank accession: HM038483.1), M. truncatula PHD finger protein male sterility 1 (MtMS1; GenBank accession: XM_003613725.3), Glycine max caffeic acid 3-O-methyltransferase (GmCOMT; KEGG gene database reference number: gmx:100780100), M. sativa chromoplast heme oxygenase 1 (MsHO1; GenBank accession: HM212768.1), and M. sativa leghemoglobin 3 (MsLb3; GenBank accession: M91077.1). These figures were generated with BANDAGE (v. 0.8.1) software using an in-built BLAST feature with default parameters

Similar articles

References

    1. Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546:524–7. - PMC - PubMed
    1. Sun S, Zhou Y, Chen J, Shi J, Zhao H, Zhao H, et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat Genet. 2018;50:1289–95. - PubMed
    1. Yang N, Liu J, Gao Q, Gui S, Chen L, Yang L, et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat Genet. 2019;51:1052–9. - PubMed
    1. Li C, Xiang X, Huang Y, Zhou Y, An D, Dong J, et al. Long-read sequencing reveals genomic structural variations that underlie creation of quality protein maize. Nat Commun. 2020;11:17. - PMC - PubMed
    1. Ge F, Qu J, Liu P, Pan L, Zou C, Yuan G, et al. Genome assembly of the maize inbred line A188 provides a new reference genome for functional genomics. Crop J. 2022;10:47–55.

LinkOut - more resources