Changes to SRA Data Access on Amazon Web Services (AWS)

Changes to SRA Data Access on Amazon Web Services (AWS)

Cost-effective alternatives for accessing SRA data  

Important note! The storage tier for Sequence Read Archive (SRA) data available through Amazon Web Services (AWS) commercial buckets is transitioning to Infrequent Access. This change is projected to be complete by the end of September 2024. To mitigate the cost impact of this change, we recommend adjusting your data access workflow to utilize the SRA Toolkit for accessing SRA data. Read more. 

Please note this change does not impact SRA data access from Google Cloud Platform (GCP) or NCBI servers.    Continue reading “Changes to SRA Data Access on Amazon Web Services (AWS)”

Coming Soon! Improving Representation of Functional Data in ClinVar

Coming Soon! Improving Representation of Functional Data in ClinVar

NCBI is improving the way that functional data are submitted to ClinVar and how they are represented in the XML format and on the website. Almost half of the variants in ClinVar are variants of uncertain significance (VUS). It’s unclear what clinical action to take for these variants, creating a challenge for clinicians. One potential way to resolve VUS is to develop functional assays to determine the effect the variant has on the gene product, at either the transcript or the protein level. While ClinVar can currently accept functional data, we are striving to make submission easier and more efficient and to make the data easier to find and use.   Continue reading “Coming Soon! Improving Representation of Functional Data in ClinVar”

Submitting High-Throughput Sequence Data to Gene Expression Omnibus (GEO)

Submitting High-Throughput Sequence Data to Gene Expression Omnibus (GEO)

Submit your transcriptomic and epigenomic data to Gene Expression Omnibus (GEO)! GEO is a public functional genomics data repository that relies on your data submissions. We are pleased to announce a new submission interface to improve your experience.  

What’s new? 
  • A web interface for uploading your GEO metadata  
  • Metadata immediately validated for format and completeness 
  • Errors reported instantly with how-to-fix instructions 
  • Faster submission processing 

Continue reading “Submitting High-Throughput Sequence Data to Gene Expression Omnibus (GEO)”

New Milestone! NCBI Pathogen Detection Reaches 2 Million Isolates

New Milestone! NCBI Pathogen Detection Reaches 2 Million Isolates

NCBI’s Pathogen Detection resource collects, analyzes, and reports on bacterial and fungal isolate genome sequences for outbreak identification and tracking. Pathogen Detection is also central to the surveillance of anti-microbial resistance, virulence, and stress resistance for 97 pathogenic taxa covering 753 species, and now includes analysis results for over 2 million isolates!

How does Pathogen Detection work?

Pathogen Detection provides two major automated real-time analyses:

  1. It quickly clusters related pathogen genome sequences to identify potential transmission chains helping public health scientists investigate disease outbreaks.
  2. As part of the National Database of Antibiotic Resistant Organisms (NDARO), NCBI screens genomic sequences using AMRFinderPlus to identify the antimicrobial resistance, stress response, and virulence genes found in bacterial genomic sequences. This enables scientists to track the spread of resistance genes and to understand the relationships among antimicrobial resistance, stress response, and virulence. 

Continue reading “New Milestone! NCBI Pathogen Detection Reaches 2 Million Isolates”

NCBI Taxonomy Updates to Yeasts

NCBI Taxonomy Updates to Yeasts

As previously announced, NCBI is continually making improvements to our Taxonomy resource in response to new data and changes in biological nomenclature. We recently made classification changes to budding yeasts and allies (Saccharomycotina), which consists of more than 1,200 species and exhibits levels of genomic diversity similar to those of plants and animals. This update affects more than six million records. Check out our new Taxonomy browser in NCBI Datasets.  Continue reading “NCBI Taxonomy Updates to Yeasts”

Now Available: GenBank Release 262.0!

Now Available: GenBank Release 262.0!

GenBank release 262.0 (8/22/2024) is now available on the NCBI FTP site. This release has 34.10 trillion bases and 4.76 billion records.

The current release has: 

  • 251,998,350 traditional records containing 3,675,462,701,077 base pairs of sequence data
  • 3,569,715,357 WGS records containing 29,643,594,176,326 base pairs of sequence data
  • 755,907,377 bulk-oriented TSA records containing 706,085,554,263 base pairs of sequence data
  • 187,321,998 bulk-oriented TLS records containing 77,026,446,552 base pairs of sequence data 

Continue reading “Now Available: GenBank Release 262.0!”

NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!

NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!

Download release 16.0 of the NCBI protein profile Hidden Markov models (HMMs) used by the Prokaryotic Genome Annotation Pipeline (PGAP)! Search this collection against your favorite prokaryotic proteins to identify their function using the HMMER sequence analysis package.

What’s New?

Release 16.0 contains:

  • 17,078 HMMs maintained by NCBI
  • 406 new HMMs since release 15.0
  • The GO terms between NCBI HMMs and the corresponding Interpro entries were compared and evaluated over a substantial number of HMMs and updated (added: 307; deleted: 39; updated: 1,482). 

Continue reading “NCBI Hidden Markov Models (HMM) Release 16.0 Now Available!”

Quick & Easy Access to Mpox Data Through NCBI Virus

Quick & Easy Access to Mpox Data Through NCBI Virus

The World Health Organization (WHO) declared the recent upsurge of the mpox virus to be a public health emergency of international concern. Having timely viral genome data freely and widely available enables researchers to explore how this virus differs from viruses isolated and sequenced in the past. Therefore, NCBI’s GenBank is expediting the release of mpox data by annotating gene and coding region features as part of the submission process.  Continue reading “Quick & Easy Access to Mpox Data Through NCBI Virus”

NCBI’s First-Ever BioEd Summit Was a Success!

NCBI’s First-Ever BioEd Summit Was a Success!

NCBI hosted its first-ever BioEd Summit: Crafting Student-Centric Curricula with NCBI resources. This week-long, in-person event for science educators across the U.S. was held on the National Institutes of Health (NIH) campus in Bethesda, MD, from August 5-9, 2024. 

Event Details 

During the week, educators participated in morning sessions including interactive workshops on NCBI educational curricular design, the use of various NCBI resources in teaching, and detailed hands-on discussions and practice with NCBI tools. A panel discussion on employing novel, data-driven, active learning exercises in science classes with leaders from several institutions including:   Continue reading “NCBI’s First-Ever BioEd Summit Was a Success!”

NCBI’s PopSet Database to Retire Effective January 2025

Beginning in January 2025, NCBI’s PopSet database will no longer be available.

While PopSet web pages (example) will no longer be accessible, individual sequences of PopSet will still be searchable and accessible in Nucleotide as independent records (example).  A link under ‘Related information’ on a GenBank record page will also let users access other sequences of the same set. Continue reading “NCBI’s PopSet Database to Retire Effective January 2025”