Implement vision transformer from scratch + Cifar-10 dataset (image classification)

Highlights:

In this repository, we will:

Explore the paper AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE.
Implement the vision transformer model from scratch.
Train the model and predict on the Cifar-10 dataset.

Understand the ViT paper

Vision Transformers are a type of neural network architecture that was introduced as an alternative to traditional Convolutional Neural Networks (CNNs) for computer vision tasks. They are designed to apply the self-attention mechanism to image data, which has been successful in natural language processing tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
README.md		README.md
VisionTransformer.ipynb		VisionTransformer.ipynb
vit.py		vit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implement vision transformer from scratch + Cifar-10 dataset (image classification)

Highlights:

Understand the ViT paper

About

Releases

Packages

Languages

Hanhpt23/Vision-Transformer-Pytorch-from-Scratch

Folders and files

Latest commit

History

Repository files navigation

Implement vision transformer from scratch + Cifar-10 dataset (image classification)

Highlights:

Understand the ViT paper

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages