Build physical AI faster with NVIDIA Cosmos: The advanced world foundation model platform.   Get Started

Pretrained AI Models

Accelerate AI development with world-class customizable pretrained models from NVIDIA.

Pretrained AI Models

Streamline AI Application Development

NVIDIA pretrained AI models are a collection of 600+ highly accurate models built by NVIDIA researchers and engineers using representative public and proprietary datasets for domain-specific tasks. The models enable developers to build AI applications efficiently and expeditiously.
These models are optimized for GPUs, cloud, embedded, and edge, delivering high performance for preferred production environments.

A Model for Every Task

Get started today with highly accurate models that span diverse use cases and domains, including computer vision, speech, language understanding, molecule generation, and more, and that can be customized for specific tasks.

Generative Models

NVIDIA is at the forefront of generative AI research, launching groundbreaking models like StyleGAN, GauGAN, eDiff-I, and many more. These generative models are pretrained for efficient enterprise application development.

StyleGAN3

StyleGAN3 is a cutting-edge generative model for high-quality image synthesis and enables generation of photorealistic training data. It offers unparalleled control over image style and content, making it ideal for creative and enterprise applications.

Explore StyleGAN3

EG3D

EG3D, short for Efficient Geometry-Aware 3D, is a generative adversarial network-based pretrained model that produces high-quality 3D geometry in complex environments with improved computational efficiency. EG3D is a powerful tool for developers seeking to generate multi-view-consistent images in real time and 3D geometry for creative AI applications.

Explore EG3D

Megatron 530B LLM

The Megatron-Turing NLG-530B model is a generative language model developed by NVIDIA that utilizes DeepSpeed and Megatron to train the largest and most powerful model of its kind. It has over 530 billion parameters, making it capable of generating high-quality text for a variety of tasks such as translation, question-answering, and summarization.

Explore Megatron 530B LLM

Language Models

Language models are revolutionary and facilitate application development for natural language downstream tasks like text generation, summarization, chatbots, question answering, translation, and more. These models use innovative architectures and frameworks to achieve high accuracy across a wide range of complexities in each task type.

Megatron-LM Based Models

Built upon Megatron architecture developed by the Applied Deep Learning Research team at NVIDIA, this is a series of language models trained in the style of GPT, BERT, and T5. These models deliver improved performance for downstream tasks like question answering and summarization and also excel at complex tasks like generating fluent, consistent, and coherent stories, with controlled pretraining.

Explore Megatron LM

BERT-Based Models

BERT-based models are pretrained on massive amounts of text data using the bidirectional transformer architecture, which allows them to capture context from both left and right directions. This pretraining enables these models to perform well on various natural language processing tasks such as sentiment analysis and sentence prediction—without requiring task-specific architecture modifications or extensive fine-tuning.

Explore All BERT Models

ELECTRA-Based Model

This is a transformer-based language model that leverages a novel ELECTRA pretraining method to achieve state-of-the-art performance on a range of natural language processing tasks. Smaller model size and faster training time than comparable models make it ideal for a range of applications, including chatbots, virtual assistants, and more.

Explore Megatron 530B LLM

Computer Vision Models

With computer vision devices can understand the world through images and videos. Computer vision models can be used for image classification, object detection, and tracking, object recognition, semantic segmentation, and instance segmentation.

PeopleNet

PeopleNet is a computer vision model developed using NVIDIA TAO for real-time pedestrian detection and tracking in urban environments, with high accuracy and low latency. It divides the image into a grid and predicts the location of objects in the picture based on this grid. Its high performance and optimization makes it ideal for use in smart cities, autonomous vehicles, and intelligent video analytics systems.

Explore PeopleNet

Bi3D Proximity Segmentation

Bi3D is a binary depth classification network used to classify the depth of objects at a given distance. The idea behind Bi3D is that it is faster and easier to classify an object as being closer or farther than a certain distance, rather than to regress its actual distance accurately. This is an ideal model for building collision avoidance applications, similar to those used in current industrial autonomous mobile robot (AMR) systems.

Explore Bi3D Proximity
Segmentation Model

Fine-Tuned SegFormer

Based on NVIDIA research's SegFormer paper, these fine-tuned models use a transformer encoder with a segmentation decoder to accurately identify and segment objects of interest in images or videos for semantic segmentation tasks. They're suitable for applications that require precise visual understanding, such as autonomous driving, medical imaging analysis, and surveillance.

Explore Fine-Tuned SegFormer Models

Speech Models

With computer vision Speech deals with recognizing and transcribing audio into text or synthesizing speech from text. It includes speech synthesis and automatic speech recognition (ASR).

Conformer

Conformer is short for convolution-augmented transformer and is used for ASR tasks. These models are based on a combination of transformer and CNN architecture and achieve high accuracies on speech benchmarks. The pretrained models span 10+ languages like German, Italian, Japanese, Kinyarwanda, and more, making it ideal for customized speech applications for live captioning, digital human services, voice assistance, and more.

Explore All Conformer Models

Time Delay Neural Network Model

Highly accurate pretrained model for speaker identification and verification, ECAPA TDNN is a time delay neural network-based model. It provides robust speaker embeddings under both close-talking and distant-talking conditions to identify the speaker based on how the speech is spoken. This model is used for speaker diarization in speech AI applications for scenarios like understanding medical conversations, video captioning, and many more.

Explore ECAPA TDNN Model

FastPitch and HiFiGAN

The combination of FastPitch and HiFiGAN delivers end-to-end speech synthesis, where the FastPitch model produces a mel spectrogram from raw text, and HiFiGAN can generate audio from a mel spectrogram. Collectively, these pretrained models are ideal for a wide range of text-to-speech (TTS) applications such as audiobooks, voice cloning, and music generation.

Explore FastPitch and HiFiGan Models

Biology Models

Pharma companies, biotech startups, and pioneering biology researchers are developing AI applications for medical imaging, drug discovery, predicting biomolecular data, generating new molecules, and expanding the horizons of healthcare innovation using SOTA pretrained models.

MegaMoIBART

MegaMolBART is a model that understands chemistry and can be used for a variety of cheminformatics applications in drug discovery. This model is ideal for reaction prediction, molecular optimization, and de novo molecular generation.

Explore MegaMoIBART Model

BioBERT

BioBERT is a pretrained language model designed for biomedical text mining and natural language processing tasks. Based on the popular BERT architecture, it is fine-tuned on high-quality biomedical datasets, allowing for accurate identification of chemical and protein entities in text. The model can be used for various applications in the biomedical field and clinical research around chemical-protein interactions.

Explore BioBERT models

Spleen Segmentation From image

The spleen segmentation model is pretrained for volumetric (3D) segmentation of the spleen from CT images. This model is trained using an award-winning technique for medical segmentation.

Explore Spleen Segmentation Model

Recommender Models

Recommender systems predict the "rating" or "preference" a user would give to an item. These models are used in ecommerce and retail for personalized merchandising, media and entertainment for personalized content, and in personalizing banking and services.

DLRM-Based Model

The Deep Learning Recommendation Model (DLRM) is designed to make use of both categorical and numerical inputs. The model is designed from two primary perspectives—recommendation systems and predictive analytics—to deliver accurate results for advertisements, ad click-through rates, ad ranking, and personalization.

Explore MegaMoIBART Model

SIM Model

The Search-based Interest Model (SIM) is a system that predicts user behavior based on sequences of previous interactions. The original model has a cascaded two-stage search mechanism that enhances SIM's ability to model lifelong sequential behavior data in both scalability and accuracy. The model is further pretrained to improve predictions, enabling development of effective real-time recommenders and advertising systems.

Explore SIM Model

AI Pretrained Models from the NGC Catalog

With production-ready, AI pretrained models from the NGC™ catalog, With production-ready, AI pretrained models from the.

Explore NGC Models

Transparent Model Resumes

Just like a resume provides a snapshot of a candidate's skills and experience, model credentials do the same for a model. Many pretrained models include critical parameters such as batch size, training epochs, and accuracy, providing you with the necessary transparency and confidence to pick the right model for your use case.

Transparent Model Resumes


Experience AI Models on-the-fly with NVIDIA NGC Playground

Experience AI Models On-the-fly

The AI Playground offers an easy-to-use interface that allows you to quickly try generative AI models directly from your browser. You can test the models with your own dataset or adjust the hyperparameters to customize the model's behavior. Additionally, you can download the models and run on Windows machines with RTX GPUs.


Learn More


Cloud and On-Prem SDK Integration

The pretrained models can be integrated into industry-specific software development kits such as NVIDIA Clara™ for healthcare, NVIDIA Isaac™ for robotics, and NVIDIA Riva for conversational AI—making them easier to use in your applications and services.

Cloud and On-Prem SDK Integration
Customize and Adapt Models Faster With NVIDIA SDKs

Customize and Adapt Models Faster With NVIDIA SDKs

NVIDIA Train, Adapt, and Optimize (TAO) is an AI-model-adaptation platform that simplifies and accelerates the creation of production-ready models for AI applications. By fine-tuning pretrained models with custom data, developers can produce highly accurate computer vision and language understanding models in hours rather than months. This eliminates the need for large training runs and deep AI expertise.

NVIDIA NeMo™ is an open-source framework for developers to build and train state-of-the-art conversational AI models.

Learn More

Supercharge Your Production AI

NVIDIA AI Enterprise, an end-to-end, secure, cloud-native suite of AI software, includes access to unencrypted NVIDIA pretrained models and the model weights for a wide range of use cases. Developers can view the weights and biases of the model, which can help in model explainability and understanding model bias. Additionally, unencrypted models are easier to debug and integrate into custom AI apps.

Enterprise support is included with NVIDIA AI Enterprise to ensure business continuity and AI projects stay on track.

Learn More
Supercharge Your Production AI

Accelerate your AI development with pretrained models from the NGC catalog.

Get Started