This work aims to realize unsupervised domain adaptation on gaze estimation from MPIIFaceGaze to ColumbiaGaze.
If images are from different distribution, feature extractor will map them to different clusters in feature space.
A conditional GAN is used to pull the clusters together.
The feature extractor in the original model works as a generator G(x), x represents input image.
An external MLP works as a discriminator D(x), to classify whether the extracted feature is from source domain or target domain, represented by one-hot encoding Y(x).
In each epoch, the discriminator is optimized first, the optimal target is to minimize ||D(G(x))-Y(x)|| for x in both domain.
Then the generator is optimized to confuse the discriminator, the optimal target is to minimize ||D(G(x))-Y'|| for x in target domain, Y' is the one-hot code for source domain.
In this way, images from target domain would be mapped to a cluster closer to the source domain's cluster in feature space.
The feature extractor parameters are frozen, the classifier is trained on source domain.
Since the feature extractor has been generalized, training on source domain can enhance performance on target domain.