Convolutional Neural Networks are heavily used for image classification tasks. Recently, various techniques were tried for trying to predict age and gender in humans. While it's fairly easy to predict whether a person is a Male or a Female since difference between both is pretty obvious, it becomes difficult to predict an age of a person seeing just their face. Looks can be deceiving as is rightly said and sometimes some people seem to be old but in reality they are not, and likewise, some people seem to be young but they are actually quite old. Various works have been done on this in literature right from using localizing facial features based on their size and ratios to applying constraints for age estimation like aging, etc. However, such constrained techniques do not generalize well on in-the-wild images of human faces. For this project, we have explored the Adience Dataset which represents highly unconstrained and complex in-the-wild human faces.
The problem of Age and Gender classification was initially solved by Tal Hassner, et al. in their 2015 CVPR paper on Age and Gender Classification using Convolutional Neural Networks. This project is an implementation of their method along with a few customizations as part of my research work.
You can either create a virtualenv or a docker container for running the project. We went about creating a virtualenv
Necessary libraries include the following and it is recommended to use pip for installation, although conda can also be used.
torch>=1.5.0
torchvision>=0.6.0
opencv-python>=4.2.0.34
opencv-contrib-python>=4.2.0.34
numpy>=1.18.3
Pillow>=7.1.1
h5py>=2.10.0
matplotlib>=3.2.1
tqdm>=4.45.0
Use the requirements.txt file for installing dependencies in your virtual environment. Once inside your virtualenv run the following command to install packages for this project.
pip install -r requirements.txt
Dataset consists of ~19,000 aligned images. We used the aligned face images along with 5 helper data fold files - fold_0_data.txt, fold_1_data.txt, fold_2_data.txt, fold_3_data.txt and fold_4_data.txt. Since data was not aggregated into a single package, a small Process script was written to read data using OpenCV, covert images from BGR2RGB (more about this in this post by Satya Mallick) and split our datasets into train and test. We used 90% data for training and 10% for testing. We randomly shuffled our data and neatly packaged the training, testing data along with the gender and age labels into a single .h5 file. To download the h5 file, use this link.
While 2 categories of genders were used throughout the dataset, Male and Female, the ages were chunked into smaller age groups. The Adience dataset had 8 age groups - (0-2), (4-6), (8-12), (15-20), (25-32), (38-43), (48-53), (60-100). As we can observe, not all age categories are covered and a lot of data had categories excluding the above mentioned 8. We introduced four more age categories - (21-24), (33-37), (44-47) and (54-59) , since ~1200 images were mislabelled. Our Process script takes care of that and reassigns correct labels while removing unwanted samples. We now have 12 classes as compared to the original 4 classes. While the paper published by Tal Hassner performed well with 8 classes, due to imbalanced distribution of classes and lack of samples for the newly added classes, our model has some bias in it.
(Download the aligned images and place in the adience directory downloaded from above drive link). Command to process script from raw images and export an h5 file :
python process.py --path=data/adience --save=data/adience/adience.h5
Model: Nvidia V100-SXM2
GPU count: 0 – 4
GPU Memory: 32.4805 GB / GPU
Clock: 1.290Ghz with max boost of 1.530GHz
Use
watch -n 0.1 nvidia-smi
to keep a check on GPU usage during training. For getting GPU specific information use,
nvidia-smi -q -d CLOCK
We have maintained a config.yaml file which holds the training configuration. If you are using NVidia GPU, mention the device list under the [GPU][DEVICES] tag. (Note, always provide a list and not a single integer. For single GPU use [0]; for any more GPU for DataParallelism, populate the DEVICE list with more GPU ids).
We have curated a common script to train both Age and Gender models. To train the model for Gender classification use -
python train.py --age-gender=gender
and to train on age use
python train.py --age-gender=age
To get pretrained models for transfer learning, download the age.pt and gender.pt files which were trained for 60 epochs. Note: Since, these models were trained by adding 4 new classes, they might have significant bias in them.
Model outputs are save in the output folder. For every run of training, output of the run will be saved in a folder named - <BATCH_SIZE>output<NUM_GPUS>, where <BATCH_SIZE> is batch size during current run and <NUM_GPUS> is number of GPUs used for training, example 64_output_3
Every training outputs a statistics file giving runtime parameter dictionary, logs along with accuracy and loss curves and best model.
For our case we will have 2 statistics file, 1 each for gender and age classification and 2 sets of accuracy and loss curves along with 2 models giving best parameters for corresponding runs.
NOTE: If you don't have GPU, set [GPU][STATUS] flag as False. However, our implementation keeps a default check of GPU and automatically switches to CPU in absence of GPU.
Model | Batch Size | # GPUs | # Epochs | Train Acc | Val Acc | Test Acc | Train Loss | Valid Loss | Test Loss |
---|---|---|---|---|---|---|---|---|---|
Age | 64 | 3 | 60 | 0.976946 | 0.702474 | 0.630011 | 0.067266 | 1.616894 | 1.208523 |
Gender | 64 | 3 | 60 | 0.999731 | 0.934218 | 0.886597 | 0.001065 | 0.367484 | 0.886597 |
Loss and Accuracy curves for both models are present in the 64_output_3 directory in the output folder.
Note: More experimentation is required as we yet not conclude the above results are state-of-the art.
To generate sample images, use
python sample.py --input=../images/woman.jpg --output=woman.png
We get really good predictions both for Age and Gender as can be seen in below images
Here is another case of a picture of me and my brother back during our younger days to the times when we were older
However, its not perfect always. Here is a case where it gave pretty weird results
In case there are no predictions, we get something like this
If you are interested in running this in real time, use the above commands without any arguments
python sample.py
Using the above command, you will be able to have a real time inference on the input stream from the camera. The above real time inference was done using pretrained CaffeModel of age and gender. I am still working on getting the PyTorch models production ready!
This work was heavily inspired and derived from Adrian Rosebrock's and from Satya Mallicks article on Age and Gender Classification on human images. Also thanks to Professor Dr. Subrata Das for giving us this awesome project, to experiment and research on as part of our coursework.