Skip to content

Revisions and implementations of modern Convolutional Neural Networks architectures in TensorFlow and Keras

License

Notifications You must be signed in to change notification settings

Nyandwi/ModernConvNets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Modern Convolutional Neural Network Architectures

Render nbviewer Open In Colab

Revision of the designs and implementation of modern Convolutional Neural Network architectures

cnns_image

Introduction to Convolutional Neural Networks

Convolutional Neural Networks (ConvNets or CNNs) are a class of neural networks algorithms that are mostly used in image recognition tasks.

A standard CNN architecture is typically made of 3 main layers that are convolution, max-pooling, and fully connected layers. Convolution layers are the main component of CNNs. They are used for extracting features in images using filters. Pooling layers are used for downsampling the activation or feature maps produced by convolutional layers. Downsampling can also be achieved by using strided-convolutions, but max-pooling layers don't have any learnable parameters and they introduce translational invariance which improves model generalization on the cost of spatial inductive bias. Fully connected layers are used for classification purpose(matching learned features with their respective labels).

Example of CNN architectures that follow the above structure are AlexNet and VGG. Most modern CNN architectures go beyond plain combination of convolution, max-pooling and fully connected layers. For example, architectures like ResNet and other alike networks involve residual connections.

As you go through the materials in this repository, you will learn more about those architectures and how they are implemented. For more about CNNs, check resources section!

ConvNet Architectures

On Choosing a ConvNets Architecture

Computer vision community is blessed with having many vision architectures that work great across many platforms or hardwares. But, having many options means it is not easy to choose an architecture that suits a given problem. How can you choose a CNNs architecture for your problem?

The first rule of thumb is that you should not try to design your own architecture from scratch. If you are working on generic problem, it never hurts to start with ResNet-50. If you are building a mobile-based visual application where there is limited computation resources, try MobileNets(or other mobile friendly architectures like ShuffleNetv2 or ESPNetv2).

For a better trade-off between accuracy and computation efficiency, try EfficientNetV2 or latest ConvNeXt!

That said, choosing architecture(or learning algorithm) is a no free-lunch scenario. There is no universal architecture. There is no single architecture that is guaranteed to work for all datasets and problems. It's all experimentation. It's all trying!

If you are a visionary or like to stay on the bleeding edge of the field, take a look at vision transformers! We don't know yet, but they might be the successor of CNNs!

Important Notes

The implementations of CNN architectures contained in this repository are not optimized for training but rather to understand how those networks were designed, principal components that makes them and how they evolved overtime. LeNet-5(LeCunn, 1998) had 5 convolutional layers. AlexNet(Krizhevsky, 2012) had 8 layers. Few years later, Residual Networks(He, 2015) made the trends after showing that it's possible to train networks of over 100 layers. Today, residual networks are still one of the most widely used architecture across wide range of visual tasks and they impacted the design of language architectures. Computer vision research community is very vibrant. Understanding how architectures are designed is not a neccesity, but it's one of the good ways to stay on top of this fast-ever changing field!

If you want to use ConvNets for solving a visual recognition tasks such as image classification or object detection, you can get up running quickly by getting the models (and their pretrained weights) from tools like Keras, TensorFlow Hub, PyTorch Vision, Timm PyTorch Image Models, GluonCV, and OpenMML Lab.

References Implementations

Further Learning

If you would like to learn more about CNNs, below are some few amazing resources:

Citation

If you find this repository helpful, you are welcome to cite it:

author: Jean de Dieu Nyandwi
title: Modern Convolutional Neural Networks Architectures
year: 2022
publisher: GitHub
url: https://github.com/Nyandwi/convnets-architectures

I have so much joy learning, revising, and implementing CNN architectures. While going through the materials in this repository, I hope you will enjoy them as much as I did!

For any suggestion, comment, or simply anything, you can reach out through email, Twitter or LinkedIn.

Releases

No releases published

Packages

No packages published