- R (packages: edgeR)
- Python (packages: pandas)
Following series of steps show how to run the DDR method on RNASeq data.
Step0 should be used to format the count table such that the input table has the gene expression table with first n columns as group1 samples (Triple negative:TN) and remaining columns as samples from group2 (other:OT). Rows represent the ENSG ids.
Rscript step0_preprocess.R
In this step, the count table is normalized and the covariance, standard deviation, mean and MFC are being calculated.
Rscript step1_calculateStats.R RNASeq
In this step, reference set of genes are being determined. The output file 'ref_cpm.csv' stores expression level of these reference genes.
Rscript step2_findRef.R RNASeq
Since the example dataset has 115 samples in group1.
Rscript step3_overlapFisher.R 115 RNASeq
- final_out.csv: Normalized count data
- ref_cpm.csv: Expression of reference genes
- overlap_test_fdr_1_[RNASeq|microarray].csv or : Differentially expressed genes with fdr < 0.1
- overlap_test_fdr_05_[RNASeq|microarray].csv: Differentially expressed genes with fdr < 0.05
Rscript step0_preprocess_microarray.R
Rscript step1_calculateStats.R microarray
Rscript step2_findRef.R microarray
Rscript step3_overlapFisher.R 115 microarray
Rscript DDR_Ref.R
If you have another python version that has pandas:
Rscript DDR_Ref.R PYTHON_PATH