GWAlpha.jl

Lab Website	Build Status	Documentation

A Julia package for building genomic prediction models and performing genome-wide association on quantitative traits by inferring additive allelic effects using pool sequencing data (Pool-seq; i.e. allele frequencies).

Installation

Install dependencies:

Julia
R
R packages:
- glmnet
- RColorBrewer

Installation in Julia:

using Pkg
Pkg.add(PackageSpec(url="https://github.com/jeffersonfparil/GWAlpha.jl.git", rev="master"))
using GWAlpha

Usage

PoolGPAS(;filename_sync::String, filename_phen::String, maf::Float64=0.001, depth::Int64=1, model::String=["GWAlpha", "MIXED", "GLMNET"][1], filename_random_covariate=nothing, random_covariate::String=["FST", "RELATEDNESS"][1], varcomp_est::String=["ML", "REML"][1], glmnet_alpha::Float64=collect(range(0.0,1.0,step=0.01,))[1], fpr::Float64=0.01, plot::Bool=false)

Inputs

filename_sync [String]: filename of the genotype data in synchronized pileup format (.sync)
filename_phen [String]: filename of the phenotype data in one of two formats:

.py extension for iterative maximum likelihood estimation i.e. MODEL="GWAlpha", e.g.:

	Pheno_name='Phenotype Name';
	sig=0.06724693662723039;		# standard deviation
	MIN=0.0;				# minimum phenotype value
	MAX=0.424591738712776;			# maximum phenotype value
	perc=[0.2,0.4,0.6,0.8];			# cummulative pool sizes percentiles excluding the last pool
	q=[0.16,0.20,0.23,0.27,0.42];		# phenotype values corresponding to each percentile

.csv extension for comma-separated headerless pool sizes and corresponding mean phenotypic values, e.g.:

	200.0,0.11988952929875112
	200.0,0.18030259365994225
	200.0,0.21548030739673382
	200.0,0.24966378482228616
	200.0,0.31328530259365983

maf [Float64]: minimum allele frequency threshold (default=0.001)
depth [Int64]: minimum sequencing depth threshold (default=1)
model [String]: model to use (default="GWAlpha")
- "GWAlpha" - iterative maximum likelihood estimation
- "MIXED" - linear mixed model
- "GLMNET" - elastic-net penalization
filename_random_covariate [String]: filename of a precomputed headerless square symmetric matrix of pool relatedness (default=nothing)
random_covariate [String]: type of relatedness matrix to compute if filename_random_covariate==nothing (default="FST")
- "FST" - pairwise estimation of fixation indices using Pool-seq data using Weir and Cockerham, 1984 method (additionally Hivert et al, 2018 method is also available: see ?GWAlpha.relatedness_module.Fst_pairwise)
- "RELATEDNESS" - simple standardized relatedness matrix XX'/p, where X is the allele frequency matrix (Pool-seq data) and p is the number of columns of X
glmnet_alpha [Float64]: elastic-net penalty (default=0.00 or ridge regression penalty)
varcomp_est [String]: variance component estimation method
- "ML" - maximum likelihood estimation
- "REML" - restricted maximum likelihood estimation
glmnet_alpha [Float64]: elastic-net penalty (from 0.00 for ridge regression up to 1.00 for LASSO)
fpr [Float64]: false positive rate for computing the Bonferroni threshold in significance testing (default=0.01)
plot [Bool]: generate Manhattan and quantile-quantile (QQ) plots, and save in a portable network graphics (.png) format (default=false)

Outputs

Additive allelic effects array (header: CHROM, POS, ALLELE, FREQ, ALPHA, PVALUES, LOD) written into a comma-separated (.csv) file
- "GWAlpha": string(join(split(filename_sync, ".")[1:(end-1)], '.'), "-GWAlpha-OUTPUT.csv")
- "MIXED": string(join(split(filename_sync_filtered, ".")[1:(end-1)], '.'), "-", model, varcomp_est, "_", random_covariate, "-OUTPUT.csv")
- "GLMNET": string(join(split(filename_sync_filtered, ".")[1:(end-1)], '.'), "-", model, "_ALPHA", glmnet_alpha, "-OUTPUT.csv")
Random covariate effects vector (headerless: RANDOM_EFFECTS) written into a comma-separated (.csv) file
- "GWAlpha": nothing
- "MIXED": string(join(split(filename_sync_filtered, ".")[1:(end-1)], '.'), "-", model, "_", random_covariate, "-RANEF-OUTPUT.csv")
- "GLMNET": nothing
Manhattan and QQ plots in .png format
- string(join(split(filename_output_csv, ".")[1:(end-1)], '.'), ".png")
Parsing, filtering, and relatedness matrix outputs:
- Parsed sync data into a .csv file of allele frequency data (headerless: CHROM, POS, ALLELE, FREQ_POOL1, ..., FREQ_POOLn): string(join(split(filename_sync, ".")[1:(end-1)], '.'), "_ALLELEFREQ.csv")
- Filtered sync data into a sync file: filename_sync_filtered = string(join(split(filename_sync, ".")[1:(end-1)], "."), "_MAF", maf, "_DEPTH", depth, ".sync")
- Pairwise pool relatedness (square, symmetric and headerless): string(join(split(filename_sync_filtered, ".")[1:(end-1)], '.'), "_COVARIATE_", random_covariate, ".csv")

Examples

### Input files:
filename_sync = "test/test.sync"
filename_phen_py = "test/test.py"
filename_phen_csv = "test/test.csv"
### Single-threaded execution:
using GWAlpha
@time OUT_GWAS = GWAlpha.PoolGPAS(filename_sync=filename_sync, filename_phen=filename_phen_py, maf=0.001, depth=10, model="GWAlpha", fpr=0.01, plot=true)
@time OUT_MIXEDML_FST = GWAlpha.PoolGPAS(filename_sync=filename_sync, filename_phen=filename_phen_csv, maf=0.001, depth=10, model="MIXED", random_covariate="FST", fpr=0.01, plot=true)
@time OUT_MIXEDREML_RELATEDNESS = GWAlpha.PoolGPAS(filename_sync=filename_sync, filename_phen=filename_phen_csv, maf=0.001, depth=10, model="MIXED", random_covariate="RELATEDNESS", varcomp_est="REML", fpr=0.01, plot=true)
@time OUT_GLMNET_RIDGE = GWAlpha.PoolGPAS(filename_sync=filename_sync, filename_phen=filename_phen_csv, maf=0.001, depth=10, model="GLMNET", glmnet_alpha=0.00, fpr=0.01, plot=true)
@time OUT_GLMNET_LASSO = GWAlpha.PoolGPAS(filename_sync=filename_sync, filename_phen=filename_phen_csv, maf=0.001, depth=10, model="GLMNET", glmnet_alpha=1.00, fpr=0.01, plot=true)
### Multi-threaded execution (parallel execution only applicable to model=="GWAlpha"):
using Distributed
Distributed.addprocs(length(Sys.cpu_info())-1)
@everywhere using GWAlpha
@time OUT_GWAS = GWAlpha.PoolGPAS(filename_sync=filename_sync, filename_phen=filename_phen_py, maf=0.001, depth=10, model="GWAlpha", fpr=0.01, plot=true)

Details

The GWAlpha model is defined as α = W(μₐₗₗₑₗₑ-μₐₗₜₑᵣₙₐₜᵢᵥₑ)/σᵧ, where:

μ is the mean of the beta distribution, Beta(θ) where θ={θ₁,θ₂}
θ is estimated via maximum likelihood L(θ|Q) ∝ πᵢ₌₁₋ₖf(qᵢ|θ)
Q = {q₁,...,qₖ} is the cumulative sum of allele frequencies across increasing-phenotypic-value-sorted pools where k is the number of pools
E(allele|θ) = Beta_cdf(yᵢ',θ) - Beta_cdf(yᵢ₋₁',θ), where yᵢ' ∈ Y'
Y' is the inverse quantile-normalized into phenotype data such that Y' ∈ [0,1]
W = 2√{E(allele)*(1-E(allele))} is the penalization for low allele frequency

The linear mixed model is defined as y = Xb + Zu + e, where:

X [n,p] is the centered matrix of allele frequencies
Z [n,n] is the square symmetric matrix of relatedness
y [n,1] is the centered vector of phenotypic values
no intercept is explicitly fitted but implicitly set at the mean phenotypic value as a consequence of centering y
u ~ N(0, σ²uI)
e ~ N(0, σ²eI)
y ~ N(0, V)
V = (Z (σ²uI) Z') + (σ²eI)
variance components (σ²e, σ²u) are estimated via maximum likelihood (ML) or restricted maximum likelihood (REML)
fixed effects (b) are estimated by solving: (X' * V⁻¹) * (X * X' * V⁻¹)⁻¹ * y
random effects (u) are estimated by solving: (σ²uI) * Z' * V⁻¹ * (y - (X*b))

Empirical p-values were calculated by modelling the additive allelic effects (α) using a normal distribution with mean and variance estimated using maximum likelihood.

Citation

Fournier-Level A, Robin C, Balding DJ (2016). GWAlpha: Genome-Wide estimation of additive effects (Alpha) based on trait quantile distribution from pool-sequencing experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 332 Commits
legacy		legacy
misc		misc
src		src
test		test
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GWAlpha.jl

Installation

Usage

Inputs

Outputs

Examples

Details

Citation

About

Releases

Packages

Languages

License

jeffersonfparil/GWAlpha.jl

Folders and files

Latest commit

History

Repository files navigation

GWAlpha.jl

Installation

Usage

Inputs

Outputs

Examples

Details

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages