BLASTP - Searching for homologous sequences of small peptides in several species
0
0
Entering edit mode
23 months ago
瑞宏 • 0

Hello!

I now have some protein sequences of a number of species (chicken, human, mouse, ...). I want to use BLASTP to find the homologous sequence of a small peptide "KPWLRVALCPG" in these species. For the convenience of searching directly for multiple species, I have combined the protein sequences of these species into a single file called "merge.fa" and created a blast database.

I try to filter out peptides with a bit-score > 20 and I use a formula (E = query_sequence_length * total_database_length / 2^bit-score) to calculate the corresponding e_value for a bit-score of 20.

I first searched for merge.fa, using the following command.

blastp -db merge.fa -query test.fa -out test.merge.0 -task blastp-short -outfmt 0 -evalue 5283 -word_size 2 -matrix PAM30

Another search was carried out on chicken.fa, using the following command.

blastp -db chicken.fa -query test.fa -out test.chicken.0 -task blastp-short -outfmt 0 -evalue 44 -word_size 2 -matrix PAM30

However, I found two problems.

  • Firstly, the bit-score threshold was not the expected "20", but "18".
  • Secondly, I compared the search results for "chicken.fa" with those for "merge.fa" and found that many of the results for "chicken.fa" had disappeared!

I am very confused, can anyone give me some advice? Thanks.

small peptide blastp • 1.0k views
ADD COMMENT
1
Entering edit mode

What happens if you use the command:

OLD_FSC=1 blastp -db merge.fa -query test.fa [...]

related: Blast E Value Calculation

ADD REPLY
0
Entering edit mode

I'm sorry, it doesn't work. The BLASTP threw an exception:

ERROR: Too many positional arguments (1), the offending value: OLD_FSC
ADD REPLY
1
Entering edit mode

the OLD_FSC=1 should come before blastp as you're setting an environment variable. See

https://www.ncbi.nlm.nih.gov/books/NBK131777/#_Blast_ReleaseNotes_BLAST_2_2_26_January_

ADD REPLY
0
Entering edit mode

Thanks a million. It works.

I set the environment variable by typing the following command:

export OLD_FSC=1

This is the result of the search for "merge.fa". The lowest bit-score was 19.7 instead of 20, but its e_value of "5281" was close enough to my expectation of "5283", so it is reasonable to assume that the problem has been solved.

The result of the search for "merge.fa"

It is also working in the search results for "chicken.fa":

enter image description here

ADD REPLY
0
Entering edit mode

I would like to add a few words to the second problem.

The image below (Fig.1) shows the rows containing "chicken" in the search results for "merge.fa". There are three columns, indicating the sequence name, bit-score and e_value. Fig.1, It shows the rows containing "chicken" in the search results for "merge.fa". There are three columns, indicating the sequence name, bit-score and e_value.

The image below (Fig.2) shows the search results for "chicken.fa". Red boxes indicate the protein sequences that appeared in the search results for "merge.fa", and red arrows indicate the protein sequences that satisfy the bit-score requirement but don't appear in the search results for "merge.fa" Fig.2, It shows the search result for "chicken.fa". There are three columns, indicating the sequence name, bit-score and e_value. Red boxes indicate the protein sequences that appeared in the search results for "merge.fa", and red arrows indicate the protein sequences that satisfy the bit-score requirement but don't appear in the search results for "merge.fa"

Why do the protein sequences that satisfy the bit-score requirements not appear in the search results for "merge.fa"?

ADD REPLY

Login before adding your answer.

Traffic: 992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6