Tag: Basic Local Alignment Search Tool (BLAST)

Updated Bacterial and Archaeal Reference Genome Collection now Available!

Download the updated bacterial and archaeal reference genome collection! We built this collection of 20,403 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). Changes have been made to the selection criteria including upgrades for type and complete assemblies resulting in a much larger set of changes as compared to previous updates.

What’s New?

2,298 species have an updated reference
1,123 species are represented in this collection for the first time
1,125 species have a better reference assembly than in the April 2024 set
50 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment

Continue reading “Updated Bacterial and Archaeal Reference Genome Collection now Available!” →

Get Faster, More Focused Search Results with NCBI’s New BLAST Core Nucleotide Database (core_nt)

Effective August 2024, core_nt will become the default

Interested in faster nucleotide BLAST searches with more focused search results? As previously announced, NCBI has been re-evaluating the BLAST nucleotide database (nt) to make it more compact and more efficient. Thanks to your feedback, NCBI’s BLAST is excited to introduce the core nucleotide database (core_nt), an alternative to the default nt database that contains better-defined content and is less than half the size.

Benefits of BLAST core_nt over nt

Enables faster searches
Returns similar top results for most searches
Reduces redundancy for some highly represented organisms
Allows easier download and requires less storage space for database download for standalone BLAST

Continue reading “Get Faster, More Focused Search Results with NCBI’s New BLAST Core Nucleotide Database (core_nt)” →

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

Download the updated bacterial and archaeal reference genome collection! We built this collection of 19,328 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference).

What’s New?

413 species are represented in this collection for the first time
198 species are represented by a better assembly
27 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment

Continue reading “Now Available! Updated Bacterial and Archaeal Reference Genomes Collection” →

Cleaner BLAST Databases for More Accurate Results

Removing contaminated sequences using NCBI quality assurance tools

Do you use BLAST to identify a sequence or the evolutionary scope of a gene? That can be challenging if contaminated and misclassified sequences are in the BLAST databases and show up in your search results. To address this problem, we now use the NCBI quality assurance tools listed below to systematically remove these misleading sequences from the default nucleotide (nt) and protein (nr) BLAST databases. Continue reading “Cleaner BLAST Databases for More Accurate Results” →

BLAST FASTA Files Will No Longer Be Available on the FTP Site Effective April 2024

Easily generate BLAST FASTA files yourself!

In April 2024, the FASTA (sequence text) files of the sequences in the Basic Alignment Search Tool (BLAST) databases will no longer be available on the FTP site. However, you can easily generate FASTA files yourself from the formatted BLAST databases by using the BLAST utility blastdbcmd that comes with the standalone BLAST programs. This provides you the flexibility to generate organism-specific FASTA files using NCBI’s taxonomy IDs for specific organisms or groups.

See the examples below and the BLAST Command Line Applications User Manual for more details on the standalone BLAST programs and working with the BLAST databases. Continue reading “BLAST FASTA Files Will No Longer Be Available on the FTP Site Effective April 2024” →

Updated Bacterial and Archaeal Reference Genome Collection is Available!

Download the updated bacterial and archaeal reference genome collection! This collection (18,941 genomes as of Jan 18, 2024) was built by selecting the “best” genome assembly for each species among the 330,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). You can speed up your sequence searches by running them against these high-quality genomes instead of the entire nucleotide or protein database.

The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation. Continue reading “Updated Bacterial and Archaeal Reference Genome Collection is Available!” →

Using NCBI Data and Tools for Your Research Project

Are you a biology student working on a research project? NCBI offers free access to a wide variety of resources and tools to help you find and download data for your project. 

How and why do you use our resources? Check out the example below:

Your professor has assigned you a research project looking at the sequence and structure of the TP53 gene in the domestic cat (Felis catus). In addition, you were asked to find information on this gene and its genomic region in other members of the cat family (Felidae). Continue reading “Using NCBI Data and Tools for Your Research Project” →

Faster and Focused Searches with BLAST+ 2.15.0

New version now available

Do you use NCBI’s standalone BLAST tool (BLAST+)? The latest version of BLAST+ is now available and includes two exciting new features! You can now run searches faster and focus your searches by organism more easily. Continue reading “Faster and Focused Searches with BLAST+ 2.15.0” →

Comparing Yeast Species Used in Beer Brewing and Bread Making

Using the NIH Comparative Genomics Resource (CGR) to gain knowledge about less-researched organisms

The scientific community relies heavily on model organism research to gain knowledge and make discoveries. However, focusing solely on these species misses valuable variation. Comparative genomics allows us to use knowledge from a model species, such as Saccharomyces cerevisiae, to understand traits in other, related organisms, such as Saccharomyces pastorianus or Saccharomyces eubayanus. Applying this information may provide valuable insight for other less-researched organisms. The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help you do just that. Continue reading “Comparing Yeast Species Used in Beer Brewing and Bread Making” →

BLAST ClusteredNR Database is Now Available for Download!

Now available! You can download the ClusteredNR protein database, previously only available on the BLAST web application. As recently introduced, our ClusteredNR database allows you to get quicker BLAST results and access to information about the distribution of your hits across a wider range of organisms and evolutionary distances. The package includes the ClusteredNR BLAST database, an SQLite3 database, and several scripts for accessing cluster information and members.

Features & Benefits

Reduced redundancy
Faster searches
More diverse proteins and organisms in your BLAST results

Continue reading “BLAST ClusteredNR Database is Now Available for Download!” →