Tag: GenBank

Quick & Easy Access to Mpox Data Through NCBI Virus

Quick & Easy Access to Mpox Data Through NCBI Virus

The World Health Organization (WHO) declared the recent upsurge of the mpox virus to be a public health emergency of international concern. Having timely viral genome data freely and widely available enables researchers to explore how this virus differs from viruses isolated and sequenced in the past. Therefore, NCBI’s GenBank is expediting the release of mpox data by annotating gene and coding region features as part of the submission process.  Continue reading “Quick & Easy Access to Mpox Data Through NCBI Virus”

NCBI’s PopSet Database to Retire Effective January 2025

Beginning in January 2025, NCBI’s PopSet database will no longer be available.

While PopSet web pages (example) will no longer be accessible, individual sequences of PopSet will still be searchable and accessible in Nucleotide as independent records (example).  A link under ‘Related information’ on a GenBank record page will also let users access other sequences of the same set. Continue reading “NCBI’s PopSet Database to Retire Effective January 2025”

Coming Soon! Rapid Access to Influenza Data

Coming Soon! Rapid Access to Influenza Data

Improved Influenza GenBank submission process

Do you submit flu sequences to GenBank? Thanks to community feedback, NCBI is excited to announce that we are improving the influenza GenBank submission process. We continue to play a key role in providing the biomedical community free and easy access to genome sequences from viruses. To further advance public health research, in the coming weeks we will begin to expedite the release of influenza data. This means you will see the rapid assignment of accession numbers and data becoming publicly accessible within hours. In addition, we will automatically process all Influenza genomes to produce standardized, consistent annotation which saves you time and benefits the researchers who find your data valuable. Continue reading “Coming Soon! Rapid Access to Influenza Data”

Now Available: Assembled Genomes for Influenza Viruses and Improved Functionality of NCBI Virus

Now Available: Assembled Genomes for Influenza Viruses and Improved Functionality of NCBI Virus

NCBI Virus now offers genomes for viruses such as Influenza A by using an automated process to group segments from the same samples. We group these segments into genomes based on metadata for the sample including species, isolate name, host organism, collection date, and location. Newly released GenBank records are added daily. 

Access these genome assemblies through NCBI Virus using the new NCBI Virus Assembly” tab above the Results Table as shown below. Continue reading “Now Available: Assembled Genomes for Influenza Viruses and Improved Functionality of NCBI Virus”

RefSeq Release 225 Now Available!

RefSeq Release 225 Now Available!

Check out RefSeq release 225, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of July 8, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 448,507,905 records
  • 334,845,613 proteins
  • 63,542,774 RNAs
  • Sequences from 152,668 organisms

The release is provided in several directories as a complete dataset and also as divided by logical groupings. Continue reading “RefSeq Release 225 Now Available!”

GenBank Release 261.0 is Available!

GenBank Release 261.0 is Available!

GenBank release 261.0 (6/18/2024) is now available on the NCBI FTP site. This release has 32.04 trillion bases and 4.51 billion records. 


The current release has:

  • 251,094,334 traditional records containing 3,387,240,663,231 base pairs of sequence data
  • 3,380,877,515 WGS records containing 27,900,199,328.,333 base pairs of sequence data
  • 746,753,803 bulk-oriented TSA records containing 695,405,769,319 base pairs of sequence data
  • 135,446,337 bulk-oriented TLS records containing 54,512,778,803 base pairs of sequence data 

Continue reading “GenBank Release 261.0 is Available!”

New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI

New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI

Sequence data from the ongoing avian influenza A (H5N1) virus outbreak in cattle are now available through NLM’s NCBI resources NCBI Virus and NCBI Datasets.

These data were submitted by the U.S. Department of Agriculture (USDA), U.S. Centers for Disease Control and Prevention (CDC), the World Health Organization (WHO), Iowa State University, and St. Jude Children’s Research HospitalContinue reading “New Data Available! Access Avian Influenza A (H5N1) Virus Sequences at NCBI”

International Nucleotide Database Collaboration (INSDC) Introduces Enhanced Website

International Nucleotide Database Collaboration (INSDC) Introduces Enhanced Website

Aims to broaden INSDC membership and attract diverse new members

The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) and other Founding Members of the International Nucleotide Database Collaboration (INSDC) have enhanced its website, www.insdc.org, to provide comprehensive information on how interested parties from around the world can evaluate their readiness to participate in the INSDC. This effort supports INSDC’s aim to broaden membership and attract qualified nucleotide sequence databases. Web content now includes a formalized Founders Arrangement and a Membership Arrangement, along with other updated information about the INSDC mission, vision, governance, and technical documentation. In doing so, INSDC encourages interested parties to visit the INSDC website to learn more. Continue reading “International Nucleotide Database Collaboration (INSDC) Introduces Enhanced Website”

Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview

Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview

Recently, NCBI Virus SARS-CoV-2 Variants Overview moved from a manual to an automated process for selecting mutations required to define a lineage (e.g., Omicron, BA.2, JN.1, etc.). With this update, the SARS-CoV-2 Variant Overview provides coverage for all SARS-CoV-2 lineages and is no longer limited to only lineages with CDC status. The SARS-CoV-2 Variants Overview website reports results from analyzing both GenBank and unassembled Sequence Read Archive (SRA) sequence data. It allows you to view geographic and frequency trends of records assigned to Pango lineages and search for sequence records using lineage-defining or other mutations (example shown in Figure 1)  Continue reading “Automated Lineage Definitions Now Available in NCBI Virus SARS-CoV-2 Variants Overview”

GenBank Release 260.0 is Available!

GenBank Release 260.0 is Available!

GenBank release 260.0 (4/19/2024) is now available on the NCBI FTP site. This release has 31.18 trillion bases and 4.46 billion records.

The current release has:

  • 250,803,006 traditional records containing 3,213,818,003,787 base pairs of sequence data
  • 3,333,621,823 WGS records containing 27,225,116,587,937 base pairs of sequence data
  • 741,066,498 bulk-oriented TSA records containing 689,648,317,082 base pairs of sequence data
  • 135,115,766 bulk-oriented TLS records containing 53,492,243,256 base pairs of sequence data  Continue reading “GenBank Release 260.0 is Available!”