Hi all,
I am pretty new to data analysis and could really use some help with the downstream portion. For example, I would like to use DeepTools to get the locations of SNPs relative to transcription start sites (TSS) and transcription end sites (TES), which requires a .bigwig file. My SNP file has its own format that seems specific to VarScan2. I can use the awk command to rearrange my output to be in .bed format, but I don't know what I need as the "value" column in the .bedGraph format. What I read online for that is pretty vague and confusing. Once I get the .bedGraph file, I would convert it to .bigwig with the UCSC command, but I don't know how to get the last column for the .bedGraph file. Any suggestions are most appreciated, or even a different workflow that would give me the end results I need. I am only interested in extracting A, G, T to C transitions sites.
My current workflow is the following:
- Adaptor trimming with trim_galore
- Deduplication with BBMap Clumpify.sh
- Alignment with Bowtie2
- Samtools sort, index, and mpileup
- Variant calling with VarScan2
- Isolate A, G, T to C transition mutations only from SNP data (only interested in these transitions in my workflow)
^ Here is some of my output data and what each column description is in order. Does any of it correlate at all with the "value" column of .bedGraph files? Or how do I even obtain that for the data I need?
Thanks in advance!
Jacob