Skip to content

Latest commit

 

History

History
 
 

FaceNet-distributed-training

Distributed training with Kubeflow pipeline

This example demonstrates how to use kubeflow end-to-end to use Tensorflow MuiltiWorkerMirroredStrategy on the local Kubernetes cluster for distributed training, and implement image similarity calculation through FaceNet algorithm and OpenCV face recognition technology.

Goals

This tutorial will implement the following items:

  • Use Tensorflow MuiltiWorkerMirroredStrategy to demonstrate distributed training .
  • Use FaceNet algorithm and OpenCV to implement image similarity calculation .

At the end of this tutorial, you will learn how to :

  • Create a new cluster and deploy the Kubeflow on Kubernetes .
  • Install and setup NFS(Network File System) to mount your volume .
  • Deploy storageclass and pvc to implement dynamic binding .
  • Use Kubeflow Pipeline and Tensorflow CPU to train a distributed model on the cluster .
  • Use flask server and web UI to implement model application .

The FaceNet and the Model training

In this tutorial, a algorithm called FaceNet is used to implement a face recognition on K8s. FaceNet presents a algorithm to train the features of Euclidean as a similarity between two face images, and output the distance as the similarity between two face images. In addition, we use triplet loss function to optimized the model.
Before training our data, three components are selected as triplets, as shown in Figure1, which include an anchor, a positive and a negative from the dataset. Since the model is trained in the Euclidean space, we assume that the distance between two points directly corresponds to the similarity between the two face images. As shown in Figure2. after training the model,the distance between the anchor and the positive will be reduced, and that between the anchor and the negative will be increased .

Figure1. Example of triplet set.


Figure2. Distance results before and after training.

Test Results

At the end of the training, we use the flask server and web interface to implement the model deployment. As shown in Figure3, user can select the face image for similarity calculation from the test images. After clicking the "predict" button, the input image will be in our image database. The similarity of the images is calculated, and the top three ranks are sorted. Each set of ranks contains the distance between the images and the name of the recognized person. You can use the test image in test-image folder as the input image.

Figure3. Example test result.

Steps:

  1. Create a new cluster and deploy the Kubeflow on local Kubernetes

  2. Install and setup NFS

  3. Setup storageclass and PVC

  4. Distributed training on Kubeflow pipeline