Entering edit mode
6.3 years ago
saadleeshehreen
▴
140
Hi,
I tried to run the following software. https://sanger-pathogens.github.io/gubbins/ Firstly, I aligned the files with progressiveMauve. It produced .xmfa files and I understood I had to convert the file in proper fasta. I then followed the instructions from https://sourceforge.net/p/mauve/mailman/message/35156599/ I chose the second
Use this script: https://github.com/kjolley/seq_scripts/blob/master/xmfa2fasta.pl
perl xmfa2fasta.pl --file inputfile.xmfa > outputfile.fasta
I got a fasta file. But while trying running gubbins with following commands then the following error message came
run_gubbins.py -o t.fasta
The following arguments are required: alignment_filename
run_gubbins.py t.fasta
Error with the input FASTA file: It is in the wrong format so check its an alignment
How can I solve the problem?
Hello,
how does your
fasta
looks like? As the error message says there must be something wrong. But without showing us an example it will be quiet hard to figure out what's wrong with it.fin swimmer
But sometimes have NNNNNN ---------------- , etc
How big is the file? We may need to see more of it since the error might be in just one or two of the sequences.
Since gubbins expects alignments, it is probably testing to see if all your sequences are the same length, which may not be the case.
Run this command on your file to find out if they're all equal length:
The full error message is following:
As I suspected. There is a problem with converting from XMFA to Fasta.
You can physically represent the data in the 2 different formats, but Fasta is 'dumb' in comparious, so it will just have all of the sequences that XMFA put out stuck together.
I would re-align with a tool that doesn't require these conversions. I've tried to do similar things in the past and gotten stuck along the road somewhere, though I can't recall exactly where now.
If your sequences are closely related try: https://omictools.com/multiple-genome-aligner-tool
If they aren't, its going to be difficult. Multiple sequence alignment of large sequences is something of an unsolved problem in bioinformatics.
If you can tell use what exactly you want to do/show, maybe there are more efficient ways.
Also, please post errors in full in future. That error tells you exactly what the problem is, so all the effort in this thread so far could have been avoided.
Trying to run gubbins after conversion. It behaves ok initially and generated some files. But, stopped and gave an error message
" Failed while running gubbins. Please ensure you have enough free memory"
It was running on the server and was just tried with 4 genomes. As while trying 2, it gave the error message that for analyzing, I have to give 3 or more genomes.
How did you ensure enough free memory in the server? Any opinion?
How much memory do you have available? I feel like I’ve seen that error before but I can’t remember what the solutions where, I can speak to the authors and ask them though.
I would probably use a tool other than
progressiveMauve
since XMFA format is a little peculiar anyway if I recall. I think it just gives you aligned blocks which you then would have to concatenate together - I might be wrong on this though as it's a while since I looked at it.Please let me know the name of that tool, if u recall ..:)
What are you aligning? Is it whole genomes? And how many?
For my work, I need to align 100 whole genomes of different bacteria. But this time I just tested with two of them. I downloaded sequences from NCBI and aligned with progressiveMauve. Then, planning to run gubbins.