Export ensembl efetch in comma-delimited format
1
0
Entering edit mode
3.4 years ago

Hello

I am using ensembl to extract taxonomic data of some sequences. I tried to save the output into a bash variable with three columns and then extract the three columns afterwards:

hit=$(efetch -db taxonomy -id $tax -format native -mode xml | xtract -pattern TaxaSet -element ScientificName | head -1 | cut -f 1,3,7)
fam=$(echo $hit | awk '{print $(NF)})
kin=$(echo $hit | awk '{print $(NF-1)})

but fam and kin are empty. This thing works by running xtract each time:

hit=$(efetch -db taxonomy -id $tax -format native -mode xml | xtract -pattern TaxaSet -element ScientificName | head -1 | cut -f 1)
fam=$(efetch -db taxonomy -id $tax -format native -mode xml | xtract -pattern TaxaSet -element ScientificName | head -1 | cut -f 3)
kin=$(efetch -db taxonomy -id $tax -format native -mode xml | xtract -pattern TaxaSet -element ScientificName | head -1 | cut -f 7)

Is there a way to export efetch/xtract in comma-delimited so cutting the results is easier?

Thank you

format efetch ensembl • 794 views
ADD COMMENT
0
Entering edit mode
3.4 years ago
GenoMax 145k

I am using ensembl to extract taxonomic data of some sequences

Small nitpick but you are using EntrezDirect, not Ensembl.

Following works for me. Using your example code with 9606 taxID. I think you are missing a single quote in your awk commands.

$ hit=$(efetch -db taxonomy -id 9606 -format native -mode xml | xtract -pattern TaxaSet -element ScientificName | head -1 | cut -f 1,3,7)
$ echo $hit
Homo sapiens Eukaryota Bilateria

$ fam=$(echo $hit | awk '{print $NF}')
$ echo $fam
Bilateria

$ kin=$(echo $hit | awk '{print $(NF-1)}')
$ echo $kin
Eukaryota
ADD COMMENT

Login before adding your answer.

Traffic: 2108 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6