Vision Transformer Implementation

Overview

This repository contains a simple implementation of the Vision Transformer (ViT) model as described in the original paper titled "An Image is Worth 16x16 Words" by Alexey Dosovitskiy et al. The Vision Transformer is a novel approach to image classification that treats images as sequences of patches and utilizes transformer architectures for learning representations.

Citation

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
README.md		README.md
VisionTransformer.py		VisionTransformer.py
modules.py		modules.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer Implementation

Overview

Citation

About

Releases

Packages

Languages

ItaiPemp/Simple_ViT

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer Implementation

Overview

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages