Skip to content

Latest commit

 

History

History

webserver

The scripts in "RNASeq_pipeline.sh" are used to preprocess the RNA-Seq data(from "fastq" to "reads") in four major steps: quality control; mapping; bam file sorting; quantification. After that, the scripts in "HTSeq_stat.R" were used for merging the expression of immune receptor gene families and data normalization.

Notes:
1, For paired end sequencing data, the name of two fastq files should be ended with "_1.fastq" and "_2.fastq"; for single end sequencing data, the name for the single fastq file should ended with ".fastq". However, you can also modify our script according to your data.

2, You can download the lattest gtf and fastq file to build the reference data because there is duplicate gene symbols in previous versions.
   
3, You should better evaluate the data quality by checking the result in the directory of "old_fastqc" and "new_fastqc". The low quality data (For examble: high duplicating rates, low sequencing length et.al) will be removed from the raw data as usual RNA-Seq analysis.
4, You should make sure the common genes bewteen the signature matrix and tissue expression profile is enough. You'd better process the raw sequencing data according to our pipeline here.

5, The transcriptomal matrices should be separated by ",", "/t" or " " and the duplicated gene names should be removed before submit your job to our server.

6, For RNA-Seq data, reads counts calculated from HTSeq are used in our model. FPKM and TPM are not suggested in our web server.

7,If you cannnot access our webserver from the link:http://wap-lab.org:3200/immune/, you can open it from: http://218.4.234.74:3200/immune/

8, If the result page URL emailed to you can not open, please do not close the work page and download the result from it untill the job is finished.

9, Software used in our paper were as follow,
  FastQC v0.11.4
  Trimmomatic-0.35
  STAR-2.5.2a 
  samtools-1.3.1
  RSeQC-2.6.3
  HTseq 0.6.1p1
  
10, Signature matrix developed for RNA-Seq data was included in this directory, named "SignatureMatrix.rnaseq.csv".


11, NA should not appeared in the first column of Gene names and Duplicated gene names should be removed before submitting your data.

12, Number of signature genes in your sample should not be too small, because we should make sure that signature genes specific for each cell type is enough.

13, Sometimes the server will start to run and then stop at a few seconds when there are too many works in our computer cluster. You can try it again in some other times.

14, If the number of samples included in the submitted expression matrix was too large, the task will stop in a few seconds. Please split your data into some smaller files.


To view the online publication, please click here:
http://journal.frontiersin.org/article/10.3389/fimmu.2018.01286/full?&utm_source=Email_to_authors_&utm_medium=Email&utm_content=T1_11.5e1_author&utm_campaign=Email_publication&field=&journalName=Frontiers_in_Immunology&id=381701

For using our RNA-Seq data analysis tools, please cite:
Chen Z, Quan L, Huang A, Zhao Q, Yuan Y, Yuan X, Shen Q, Shang J, Ben Y, Qin FX-F and Wu A (2018) seq-ImmuCC: Cell-Centric View of Tissue Transcriptome Measuring Cellular Compositions of Immune Microenvironment From Mouse RNA-Seq Data. Front. Immunol. 9:1286. doi: 10.3389/fimmu.2018.01286.