- R (packages: edgeR)
- Python (packages: pandas)
Following series of steps show how to run the DDR method on RNASeq data.
Step0 should be used to format the count table such that the input table has the gene expression table with first n columns as group1 samples (Triple negative:TN) and remaining columns as samples from group2 (other:OT). Rows represent the ENSG ids.
Rscript src/step0_preprocess.R
In this step, the count table is normalized and the covariance, standard deviation, mean and MFC are being calculated.
Rscript src/step1_calculateStats.R RNASeq
In this step, reference set of genes are being determined. The output file 'ref_cpm.csv' stores expression level of these reference genes.
Rscript src/step2_findRef.R RNASeq
Since the example dataset has 115 samples in group1.
Rscript src/step3_overlapFisher.R 115 RNASeq
- final_out.csv: Normalized count data
- ref_cpm.csv: Expression of reference genes
- overlap_test_fdr_1_[RNASeq|microarray].csv or : Differentially expressed genes with fdr < 0.1
- overlap_test_fdr_05_[RNASeq|microarray].csv: Differentially expressed genes with fdr < 0.05
Rscript src/step0_preprocess_microarray.R
Rscript src/step1_calculateStats.R microarray
Rscript src/step2_findRef.R microarray
Rscript src/step3_overlapFisher.R 115 microarray
Rscript DDR_Ref.R
If you have another python version that has pandas:
Rscript DDR_Ref.R PYTHON_PATH