I need to transform tab-delimited file like that:
IPR018351 GRMZM2G458776
IPR005731 GRMZM2G047513
IPR005732 GRMZM2G087165 GRMZM2G146818 GRMZM2G427404
IPR018355 GRMZM2G082642 GRMZM2G310283 GRMZM2G406977 GRMZM5G886785
to list of vectors in R
or MgsaSets
object from mgsa
R
package
Here's what I have tried.
putative solution 1.
- Read my file to R
x=read.table("../tymczasowe/x",sep="\t",row.names=1,fill=T)
- Transform it to list of vectors
x_list=split(x,row(x))
I must say that my longest line is 1616 field long, so I moved it to first line of my orginal file to make read.table
read it correctly. split
commands caused termination of R
. I've tried this procedure on much smaller example and it worked ok.
I've tried also to transform my data.frame
to MgsaSets
object: annoIP=new("MgsaSets",sets=as.data.frame(t(x)))
command looked successful, but produced one entry more than I expected (one gene more), but I don't know how this additional entry looks (I'm not very advanced in S4 objects).
I tried to perform analysis: xwyn=mgsa(xprb,annoIP)
(xprb
is just a list of genes to analysis) and I got this error Error in mgsa.trampoline(o, sets[!isempty], n, alpha = alpha, beta = beta, :
Set index to high (must not exceed 'n')
putative solution 2.
I tried to read the file to a MgsaSets
object for mgsa package, I tried to create from it appropriate code and paste it to command-line. Problem here is that code like works for small files x=new("MgsaSets",sets=list(IPR001844=c("AC215201.3_FG005","GRMZM2G009871"
,"GRMZM2G015989")
,IPR005732=c("GRMZM2G087165","GRMZM2G146818","GRMZM2G427404")
...
,IPR018816=c("GRMZM2G072156","GRMZM2G566688")))
but doesn't work for my big file - it is probably too big/long. I got error messages Error: unexpected ',' in ","
after every transition to next line of my pasted code e.g.,IPR023193=c("GRMZM5G877500")
Now I really don't have any idea how I can create desired file.
Thanks, the code did the work :)