HiFi-SR is a Python-based pipeline for the detection of plant mitochondrial structural rearrangements based on the mapping of PacBio high-fidelity (HiFi) reads or Circular Consensus Sequencing (ccs) data, to a reference genome (i.e., the hypothetical master cycle DNA). HiFi-SR also includes useful scripts for organellar genome analyses.
We will continuously make upgrades and modifications to hifisr to enhance its functionality and performance.
The pipeline has been tested on Ubuntu-24.04. It shall work in other Linux operating system, such as CentOS.
# /mnt/software/sys/python/python-3.13.1/bin is your path to the python executable
# /mnt/software/scripts/hifisr in the virtual environment directory
mkdir /mnt/software/scripts/hifisr
/mnt/software/sys/python/python-3.13.1/bin/python3.13 -m venv /mnt/software/scripts/hifisr
source /mnt/software/scripts/hifisr/bin/activate
pip install -i https://mirrors.aliyun.com/pypi/simple/ biopython pysam pandas numpy openpyxl matplotlib
mkdir deps && cd deps
touch soft_paths.txt
# Contents of soft_paths.txt: A TAB-delimited File containing software names, and the path to the executable.
# If your have the softwares installed, add them directly to the file.
# Otherwise, you can install new versions of them.
python /mnt/software/scripts/hifisr/bin/python
minimap2 /mnt/software/scripts/hifisr/deps/minimap2-2.28_x64-linux/minimap2
samtools /mnt/software/scripts/hifisr/deps/samtools/samtools-1.21/bin/samtools
seqkit /mnt/software/scripts/hifisr/deps/seqkit
mecat /mnt/software/scripts/hifisr/deps/MECAT2/Linux-amd64/bin/mecat.pl
blastn /mnt/software/scripts/hifisr/deps/ncbi-blast-2.16.0+/bin/blastn
bcftools /mnt/software/scripts/hifisr/deps/bcftools/bcftools-1.21/bin/bcftools
bamtools /mnt/software/scripts/hifisr/deps/bamtools-2.5.2/bin/bamtools
meryl /mnt/software/scripts/hifisr/deps/meryl-1.4.1/bin/meryl
winnowmap /mnt/software/scripts/hifisr/deps/Winnowmap-2.03/bin/winnowmap
pigz /mnt/software/scripts/hifisr/deps/pigz/pigz
cd /mnt/software/scripts/hifisr/deps
curl -L https://github.com/lh3/minimap2/releases/download/v2.28/minimap2-2.28_x64-linux.tar.bz2 | tar -jxvf -
wget -c https://github.com/samtools/samtools/releases/download/1.21/samtools-1.21.tar.bz2
tar -xjf samtools-1.21.tar.bz2
cd samtools-1.21
autoheader
autoconf -Wno-syntax
./configure --prefix=/mnt/software/scripts/hifisr/deps/samtools/samtools-1.21
make -j 20
make install
cd ..
rm -rf samtools-1.21 samtools-1.21.tar.bz2
# choose the correct executable for your platform
wget -c http://app.shenwei.me/data/seqkit/seqkit_linux_amd64.tar.gz
tar -zxf seqkit_linux_amd64.tar.gz
rm seqkit_linux_amd64.tar.gz
git clone https://github.com/xiaochuanle/MECAT2.git
cd MECAT2
make -j 20
wget -c https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.16.0+-x64-linux.tar.gz
tar -zxf ncbi-blast-2.16.0+-x64-linux.tar.gz && rm ncbi-blast-2.16.0+-x64-linux.tar.gz
wget -c https://github.com/samtools/bcftools/releases/download/1.21/bcftools-1.21.tar.bz2
tar -xjf bcftools-1.21.tar.bz2
cd bcftools-1.21
./configure --prefix=/mnt/software/scripts/hifisr/deps/bcftools/bcftools-1.21
make -j 20
make install
cd ..
rm -rf bcftools-1.21 bcftools-1.21.tar.bz2
wget -c https://github.com/pezmaster31/bamtools/archive/refs/tags/v2.5.2.zip
unzip -zxf v2.5.2.zip && rm v2.5.2.zip
cd bamtools-2.5.2
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/mnt/software/scripts/hifisr/deps/bamtools-2.5.2 ..
make -j 20
make install
wget -c https://github.com/marbl/meryl/releases/download/v1.4.1/meryl-1.4.1.Linux-amd64.tar.xz
tar -xJf meryl-1.4.1.Linux-amd64.tar.xz && rm meryl-1.4.1.Linux-amd64.tar.xz
wget -c https://github.com/marbl/Winnowmap/archive/refs/tags/v2.03.tar.gz
tar -zxf v2.03.tar.gz &&cd Winnowmap-2.03/
make -j 20
wget -c https://zlib.net/pigz/pigz.tar.gz
tar -zxf pigz.tar.gz && rm pigz.tar.gz
cd pigz
make -j 20
pip install hifisr==0.3.0
# create your working folder
mkdir -p /mnt/software/scripts/results && cd /mnt/software/scripts/results
# Download the references Col_mito.fa, Col_plastid.fa.
# rotate the reference
python rot_ref.py /mnt/software/scripts/hifisr/deps/soft_paths.txt mito Col_mito.fa
python rot_ref.py /mnt/software/scripts/hifisr/deps/soft_paths.txt plastid Col_plastid.fa
Analyze of an example wild-type Arabidopsis thaliana dataset Col-CEN (ERR6210723, 14.6 Gb, Naish et al., 2021, Science)
# Download the data as Col-CEN.fastq
python get_mtpt_reads.py /mnt/software/scripts/hifisr/deps/soft_paths.txt /mnt/software/scripts/results/mito_rotated_293434.fasta /mnt/software/scripts/results/plastid_rotated_61049.fasta /mnt/software/scripts/results/Col-CEN.fastq ATHiFi001 32
All | mitochondria | plastid |
---|---|---|
# generate a sample of 4000 reads for mitochondrial and plastid genome, respectively
seqkit sample -p 0.2 /mnt/software/scripts/results/ATHiFi001/reads/ATHiFi001_mito_f1k.fastq > /mnt/software/scripts/results/ATHiFi001/reads/sample_
ATHiFi001_mito_f1k.fastq
seqkit sample -p 0.01 /mnt/software/scripts/results/ATHiFi001/reads/ATHiFi001_plastid_f1k.fastq > /mnt/software/scripts/results/ATHiFi001/reads/sample_
ATHiFi001_plastid_f1k.fastq
# Estimation of variant frequencies
python get_variant_frequency.py /mnt/software/scripts/hifisr/deps/soft_paths.txt ATHiFi001 mito run_1 /mnt/software/scripts/results/mito_rotated_293434.fasta /mnt/software/scripts/results/ATHiFi001/reads/sample_ATHiFi001_mito_f1k.fastq 32
python get_variant_frequency.py /mnt/software/scripts/hifisr/deps/soft_paths.txt ATHiFi001 plastid run_1 /mnt/software/scripts/results/plastid_rotated_61049.fasta /mnt/software/scripts/results/ATHiFi001/reads/sample_ATHiFi001_plastid_f1k.fastq 32
mitochondria | plastid |
---|---|
Recursive identification of large (> 1 kb) and intermediate-sized (50 bp - 1 kb) repeat groups in the reference.
- Yi Zou, Weidong Zhu, Daniel B. Sloan, Zhiqiang Wu. (2022). Long-read sequencing characterizes mitochondrial and plastid genome variants in Arabidopsis msh1 mutants. The Plant journal 112 (3), 738–755. https://doi.org/10.1111/tpj.15976
- Yi Zou, Weidong Zhu, Yingke Hou, Daniel B. Sloan, Zhiqiang Wu. (2025). The evolutionary dynamics of organellar pan-genomes in Arabidopsis thaliana. bioRxiv 2025.01.20.633836; doi: https://doi.org/10.1101/2025.01.20.633836