-
Notifications
You must be signed in to change notification settings - Fork 2
genome-vendor/strelka
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Strelka Workflow -- 1.0.11 --------------- Christopher Saunders (csaunders@illumina.com) Strelka is an analysis package designed to detect somatic SNVs and small indels from the aligned sequencing reads of matched tumor-normal samples. For additional documentation and mailing list info, please see: https://sites.google.com/site/strelkasomaticvariantcaller WORKFLOW SUMMARY: The inputs of the workflow are BAM files for each sample containing aligned sequencing reads. These must be sorted and indexed. PCR duplicate removal is recommended. Strelka does its own read realignment around indels so it is not necessary to pre-process the data with any such step. The output of the workflow are a pair of VCF files describing the somatic SNVs and indels. The workflow takes advantage of the VCF file format to report both filtered and passed variants, but it is strongly recommended that only the passed variants (or a subset thereof) be used. INSTALLATION: The strelka workflow is designed to run on *nux-like platforms. It has been specifically tested on the following distributions: * Amazon Linux AMI 2011.09 * Centos 5.3 * Centos 6.2 * Mac OS X 10.5 * Ubuntu 11.10 * Ubuntu 12.04 LTS For Centos/Amazon Linux and other rpm-based distributions, the following packages (and their dependencies) are known requirements on top of the base Amazon Linux AMI 2011.09: * make * gcc * gcc-c++ * zlib * zlib-devel For Ubuntu and other Debian package distributions, the following packages (and their dependencies) are known requirements on top of the base Ubuntu 11.10 distribution: * g++ * zlib1g-dev Other compilation and runtime prerequisites are included in the distribution tarball. These included prerequisites are: * boost 1.44 (modified to include only the subset of the library used by strelka) * samtools 0.1.18 (modified to remove the 'curses' library requirement) * vcftools r837 * codemin 1.0.2 * tabix 0.2.5 (makefile modified to reorder libz library to last place, as required for Ubuntu 12 build) The build process for strelka workflow will compile the included prerequisite packages, ancillary workflow programs and the strelka variant caller itself. To build the workflow, go to the root of the strelka_workflow directory tree and use configure and make as in the example below: Example build: """ tar -xzf strelka_workflow-1.2.3.tar.gz cd strelka_workflow-1.2.3 ./configure --prefix=/path/to/install/to make """ If the installation succeeds, all files for the above example will be installed to "/path/to/install/to". The final installation step is to run the workflow on the demonstration data included in the distribution. This verifies that the workflow completes without error and produces the expected answer. To do this, first complete the installation steps above. Next, given a strelka installation path of $STRELKA_INSTALL_DIR, execute the following script: $STRELKA_INSTALL_DIR/bin/demo/run_demo.bash The demo will create a 'strelkaDemoAnalysis' directory under the current working directory, then it will run an analysis for a very small chromosome segment. The demo should complete in less than a minute. If the demo script succeeds, you should see the following lines as the final output: """ **** No differences between expected and computed results. **** Demo/verification successfully completed """ CONFIGURATION & ANALYSIS: A strelka run is accomplished in two steps: (1) configuration and (2) analysis. The configuration is a fast step designed to setup the run and perform some preliminary validation (eg. check that the chromosome names match in the BAM header and reference genome). The configuration will create a makefile to control the analysis and a directory structure to hold all intermediate and final results. All configuration output is written to the analysis directory. This step requires a configuration initialization file containing strelka's default parameters. Template configuration files can be found in the etc/ subdirectory under the strelka_workflow installation root (see example run setup below). Following configuration, the makefile can be used to run the analysis itself using make, qmake, or a similar tool. For whole genome sequence data from human samples, strelka requires approximately 1 core-hour per 2x of combined sample reference coverage e.g. analyzing a human tumor-normal sample pair sequenced to 60x and 40x respectively should take approximately 50 core-hours. Example run: """ # example location where strelka was installed: # STRELKA_INSTALL_DIR=/opt/strelka_workflow # example location where analysis will be run: # WORK_DIR=/data/myWork # Step 1. Move to working directory: # cd $WORK_DIR # Step 2. Copy configuration ini file from default template set to a local copy, # possibly edit settings in local copy of file: # cp $STRELKA_INSTALL_DIR/etc/strelka_config_eland_default.ini config.ini # Step 3. Configure: # $STRELKA_INSTALL_DIR/bin/configureStrelkaWorkflow.pl --normal=/data/normal.bam --tumor=/data/tumor.bam --ref=/data/reference/hg19.fa --config=config.ini --output-dir=./myAnalysis # Step 4. Run Analysis # This example is run using 8 cores on the local host: # cd ./myAnalysis make -j 8
About
Unofficial repo for software vendoring or packaging purposes
Resources
Stars
Watchers
Forks
Packages 0
No packages published