Skip to content

Implement vision transformer from scratch and predict on the Cifar-10 dataset (Image Classification)

Notifications You must be signed in to change notification settings

Hanhpt23/Vision-Transformer-Pytorch-from-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Implement vision transformer from scratch + Cifar-10 dataset (image classification)

Highlights:

In this repository, we will:

  1. Explore the paper AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE.
  2. Implement the vision transformer model from scratch.
  3. Train the model and predict on the Cifar-10 dataset.

Understand the ViT paper

Vision Transformers are a type of neural network architecture that was introduced as an alternative to traditional Convolutional Neural Networks (CNNs) for computer vision tasks. They are designed to apply the self-attention mechanism to image data, which has been successful in natural language processing tasks.

About

Implement vision transformer from scratch and predict on the Cifar-10 dataset (Image Classification)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published