Skip to content

Commit

Permalink
Changes for v1.1.1 (#170)
Browse files Browse the repository at this point in the history
* Update default_params.config

Added the site_representation_cutoff

* Create variant_table_to_fasta.py

Added the script which replace the sed commands used to create the fasta file for phylogeny

* Update default_params.config

Moved site representation to EXPERIENCED USERS section

* Added the resistance database for TBProfiler version 5

* Update magma-env-1.yml

* Update setup_conda_envs.sh

* Update default_params.config

* Update build.sh

* Update Dockerfile

* Update summarize_resistance.py

Added the command line argument for the structural variant results directory. They are not yet added to the summary file, as this requires testing with real data.

* replace sed -> python scripts [ci skip]

* tweak for python2 [ci skip]

* document the different GVCF files

* fix typo [ci skip]

* Added the structural variant workflow and the resistance profiling of these variants.

* cleanup

* Update setup_conda_envs.sh

* Update default_params.config

* Update rename_vcf_chrom.py

* Update rename_vcf_chrom.py

* interim commit [ci skip]

* tweak variants to fasta [ci skip]

* accommodate the new design for structural variants [ci skip]

* fix imports [ci skip]

* fix input to workflow [ci skip]

* dev [ci skip]

* dev [ci skip]

* build and push new containers for v1.1.1 [ci skip]

* Fixed the summarize resistance script and added the strcutural variants to it

* add back the bc dependency [ci skip]

* Update magma-env-1.yml

MAke sure to use xlsxwriter 3.1.1

* minimal change, add bc to container-2 only [ci skip]

* Update CHANGELOG.md

* Changed the permissions on some files

* Fixed a typo in the script causing structural variants to not shoiw up

* add the default lineage reference files GVCF [ci skip]

* tweak comments [ci skip]

* tweak comments in the config file [ci skip]

* fixed the filtering bug

* finilize filtering bug fix

* added sample filtering for the structural variant workflow

* Update structural_variants_analysis_wf.nf

* Moved the vcf filenames from bcftools merge into a file such that the command does not become impossibly long

* Added multithreading to tbprofiler

* Fixed it that the file listing samples actually contains the newlines

* removed sorting from merge channels

* Fixed bug in the bcftools merge where the input file was uanavailble on AWS

* tweak the readme [ci skip]

* add params specific to iqtree [ci skip]

* Update iqtree.nf

fixed flag for standard bootstrapping

* switch the script to python3 [ci skip]

* add view for file filtering logic [ci skip]

* refactor the location of view [ci skip]

* refactor the location of view [ci skip]

* refactor the location of view [ci skip]

* revert to binary invocation of the script in SNPEFF [ci skip]

* use generic python [ci skip]

---------

Co-authored-by: LennertVerboven <lennert.verboven@uantwerpen.be>
Co-authored-by: Tim H. Heupink <tim.heupink@uantwerpen.be>
Co-authored-by: vrennie <113892099+vrennie@users.noreply.github.com>
  • Loading branch information
4 people authored Sep 9, 2023
1 parent 86f2188 commit c84ee9e
Show file tree
Hide file tree
Showing 48 changed files with 18,834 additions and 4,594 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Created a parallel workflow for mapping without using the strict seed lenght for use in the structural variant workflow.

Updated TBProfiler to version 5.0.0 and recreated the resistance database to work with the the new version

Updated the summarize resistance script to include the structural variants in the excel output
31 changes: 23 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,27 @@ MAGMA (**M**aximum **A**ccessible **G**enome for **M**tb **A**nalysis) is a pipe
- MAGMA parameters (`default_parameters.config`)
- Hardware requirements (`conf/standard.config`)
- Execution (software) requirements (`conf/docker.config` or `conf/conda.config`)
- An (optional) GVCF reference dataset for ~600 samples is provided for augmenting smaller datasets


# (Optional) GVCF for analyzing small number of samples
# (Optional) GVCF datasets

We also provide some reference GVCF files which you could use for specific use-cases.

- For small datasets (20 samples or less), we recommend that you download the `EXIT_RIF GVCF` files from https://zenodo.org/record/8054182
containing GVCF reference dataset for ~600 samples is provided for augmenting smaller datasets

- For including Mtb lineages and outgroup (M. canettii) in the phylogenetic tree, you can download the `LineagesAndOutgroup` files from https://zenodo.org/record/8233518


```
use_ref_exit_rif_gvcf = false
ref_exit_rif_gvcf = "/path/to/FILE.g.vcf.gz"
ref_exit_rif_gvcf_tbi = "/path/to/FILE.g.vcf.gz.tbi"
```

> :note: **Custom GVCF dataset**:
For creating a custom GVCF dataset, you can refer the discussion [here](https://github.com/TORCH-Consortium/MAGMA/issues/162).

You can download the `EXIT_RIF GVCF` files from https://zenodo.org/record/8054182

## Tutorials and Presentations

Expand Down Expand Up @@ -91,7 +106,7 @@ Which could be provided to the pipeline using `-params-file` parameter as shown
```console
nextflow run 'https://github.com/TORCH-Consortium/MAGMA' \
-profile conda_local \
-r v1.0.1 \
-r v1.1.1 \
-params-file my_parameters_1.yml
```
Expand Down Expand Up @@ -139,9 +154,9 @@ We provide [two docker containers](https://github.com/orgs/TORCH-Consortium/pack
Although, you don't need to pull the containers manually, but should you need to, you could use the following commands to pull the pre-built and provided containers

```console
docker pull ghcr.io/torch-consortium/magma/magma-container-1:1.1.0
docker pull ghcr.io/torch-consortium/magma/magma-container-1:1.1.1
docker pull ghcr.io/torch-consortium/magma/magma-container-2:1.1.0
docker pull ghcr.io/torch-consortium/magma/magma-container-2:1.1.1
```


Expand All @@ -154,7 +169,7 @@ Here's the command which should be used
nextflow run 'https://github.com/torch-consortium/magma' \
-params-file my_parameters_2.yml \
-profile docker \
-r v1.0.1
-r v1.1.1
```

> :bulb: **Hint**: <br>
Expand Down Expand Up @@ -189,7 +204,7 @@ You can then include this configuration as part of the pipeline invocation comma
```console
nextflow run 'https://github.com/torch-consortium/magma' \
-profile docker \
-r v1.0.1 \
-r v1.1.1 \
-c custom.config \
-params-file my_parameters_2.yml
```
Expand Down
1 change: 1 addition & 0 deletions bin/reformat_lofreq.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ def write_vcf(filename, df, header):
args = vars(parser.parse_args())

vcf, header, not_empty = read_vcf(args['lofreq_vcf_file'])
header = '\n'.join([i for i in header.split('\n') if 'lofreq' not in i])
if not_empty:
vcf['FORMAT'] = 'GT:AD:DP:GQ:PL'

Expand Down
6 changes: 3 additions & 3 deletions bin/rename_vcf_chrom.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#! /usr/bin/env python3
#! /usr/bin/env python

'''Original author Jody Phelan at https://github.com/jodyphelan/pathogen-profiler/blob/master/scripts/rename_vcf_chrom.py'''
import sys
Expand Down Expand Up @@ -30,11 +30,11 @@ def cmd_out(cmd,verbose=1):
stderr.close()

def main(args):
generator = cmd_out(f"bcftools view {args.vcf}") if args.vcf else sys.stdin
generator = cmd_out("bcftools view " + args.vcf) if args.vcf else sys.stdin
convert = dict(zip(args.source,args.target))
for l in generator:
if l[0]=="#":
sys.stdout.write(l)
sys.stdout.write(l.strip()+"\n")
else:
row = l.strip().split()
row[0] = convert[row[0]]
Expand Down
318 changes: 165 additions & 153 deletions bin/summarize_resistance.py

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions bin/variant_table_to_fasta.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#! /usr/bin/env python3

import sys
import argparse

def main(args):
table = []
with open(args.table, 'r') as table_file:
table.append(table_file.readline().strip().split('\t')) # Get the headerline without modifying
# Process the actual variants
for idx, l in enumerate(table_file):
l = l.strip().split('\t')
l = [i.replace('*', '-').replace('.', '-') for i in l]
if l.count('-')/len(l) < (1-args.site_representation_cutoff):
table.append(l)
else:
pass
with open(args.output_fasta, 'w') as fasta_file:
for l in list(map(list, zip(*table))):
fasta_file.write('>{}\n{}\n'.format(l[0].replace('.GT', ''), ''.join(l[1:])))



parser = argparse.ArgumentParser(description='tbprofiler script',formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('table', type=str, help='The input table to convert (stdin if empty)')
parser.add_argument('output_fasta', type=str, help='The output fasta file')
parser.add_argument('site_representation_cutoff', type=float, help='Minimum fraction of samples that need to have a call at a site before it is considered')
parser.set_defaults(func=main)
args = parser.parse_args()
args.func(args)
26 changes: 12 additions & 14 deletions conda_envs/magma-env-1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,16 @@ name: magma-env-1
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- bioconda::gatk4=4.2.6.1
- conda-forge::r-ggplot2=3.3.5
- conda-forge::pandas=1.5.1
- conda-forge::xlsxwriter=3.0.3
- bioconda::datamash=1.1.0
- bioconda::delly=0.8.7
- bioconda::lofreq=2.1.5
- bioconda::tb-profiler=4.1.1
- bioconda::multiqc=1.11
- bioconda::fastqc=0.11.8
- bioconda::fastq_utils=0.25.1
- conda-forge::bc=1.07.1
- conda-forge::sed=4.8
- conda-forge::grep=3.11
- gatk4=4.2.6.1
- r-ggplot2=3.3.5
- pandas=1.5.1
- xlsxwriter=3.1.1
- datamash=1.1.0
- delly=0.8.7
- lofreq=2.1.5
- tb-profiler=5.0.0
- multiqc=1.11
- fastqc=0.11.8
- fastq_utils=0.25.1
23 changes: 11 additions & 12 deletions conda_envs/magma-env-2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,15 @@ name: magma-env-2
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::python=2.7
- bioconda::bwa=0.7.17
- bioconda::samtools=1.9
- bioconda::iqtree=2.1.2
- bioconda::snp-dists=0.8.2
- bioconda::snp-sites=2.4.0
- bioconda::bcftools=1.9
- bioconda::snpeff=4.3.1t
- bioconda::clusterpicker=1.2.3
- conda-forge::bc=1.07.1
- conda-forge::sed=4.8
- conda-forge::grep=3.11
- python=2.7
- bwa=0.7.17
- samtools=1.9
- iqtree=2.1.2
- snp-dists=0.8.2
- snp-sites=2.4.0
- bcftools=1.9
- snpeff=4.3.1t
- clusterpicker=1.2.3
- bc=1.07.1
2 changes: 1 addition & 1 deletion conda_envs/setup_conda_envs.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ cp -r ../resources/resistance_db_who ./
cd resistance_db_who

echo "INFO: Load the database within tb-profiler"
tb-profiler load_library resistance_db_who
tb-profiler load_library ./resistance_db_who

echo "INFO: Remove the local copy of the database folder"
cd ..
Expand Down
4 changes: 2 additions & 2 deletions conf/docker.config
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ process {

withName:
'GATK.*|LOFREQ.*|DELLY.*|TBPROFILER.*|MULTIQC.*|FASTQC.*|UTILS.*|FASTQ.*|SAMPLESHEET.*' {
container = "ghcr.io/torch-consortium/magma/magma-container-1:1.1.0"
container = "ghcr.io/torch-consortium/magma/magma-container-1:1.1.1"
}

withName:
'BWA.*|IQTREE.*|SNPDISTS.*|SNPSITES.*|BCFTOOLS.*|BGZIP.*|SAMTOOLS.*|SNPEFF.*|CLUSTERPICKER.*' {
container = "ghcr.io/torch-consortium/magma/magma-container-2:1.1.0"
container = "ghcr.io/torch-consortium/magma/magma-container-2:1.1.1"
}

}
Expand Down
4 changes: 2 additions & 2 deletions conf/podman.config
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ process {

withName:
'GATK.*|LOFREQ.*|DELLY.*|TBPROFILER.*|MULTIQC.*|FASTQC.*|UTILS.*|FASTQ.*|SAMPLESHEET.*' {
container = "ghcr.io/torch-consortium/magma/magma-container-1:1.1.0"
container = "ghcr.io/torch-consortium/magma/magma-container-1:1.1.1"
}

withName:
'BWA.*|IQTREE.*|SNPDISTS.*|SNPSITES.*|BCFTOOLS.*|BGZIP.*|SAMTOOLS.*|SNPEFF.*|CLUSTERPICKER.*' {
container = "ghcr.io/torch-consortium/magma/magma-container-2:1.1.0"
container = "ghcr.io/torch-consortium/magma/magma-container-2:1.1.1"
}

}
Expand Down
Loading

0 comments on commit c84ee9e

Please sign in to comment.