JavaScript/WebGL lightweight and robust face tracking library designed for augmented reality face filters
This JavaScript library detects and tracks the face in real time from the webcam video feed captured with WebRTC. Then it is possible to overlay 3D content for augmented reality applications. We provide various demonstrations using main WebGL 3D engines. We have included in this repository the release versions of the 3D engines to work with a determined version (they are in /libs/<name of the engine>/
).
This library is lightweight and it does not include any 3D engine or third party library. We want to keep it framework agnostic so the outputs of the library are raw: if the a face is detected or not, the position and the scale of the detected face and the rotation Euler angles. But thanks to the featured helpers, examples and boilerplates, you can quickly deal with a higher level context (for motion head tracking, for face filter or face replacement...). We continuously add new demontrations, so stay tuned ! Also, feel free to open an issue if you have any question or suggestion.
- Features
- Architecture
- Demonstrations
- Specifications
- Hosting
- About the tech
- Articles and tutorials
- License
- See also
- References
Here are the main features of the library:
- face detection,
- face tracking,
- face rotation detection,
- mouth opening detection,
- multiple faces detection and tracking,
- very robust for all lighting conditions,
- video acquisition with HD video ability,
- mobile friendly,
- interfaced with 3D engines like THREE.JS, BABYLON.JS, A-FRAME,
- interfaced with more accessible APIs like CANVAS, CSS3D.
/demos/
: source code of the demonstrations, sorted by 2D/3D engine used,/dist/
: heart of the library:jeelizFaceFilter.js
: main minified script,jeelizFaceFilterES6.js
: main minified script for ES6 use (withimport
orrequire
),NNC.json
: file storing the neural network parameters, loaded by the main script,NNC<xxx>.json
: alternative neural network models,
/helpers/
: scripts which can help you to use this library in some specific use cases,/libs/
: 3rd party libraries and 3D engines used in the demos.
These demonstration are included in this repository. So they are released under the FaceFilter licence. You will probably find among them the perfect starting point to build your own face based augmented reality application:
-
BABYLON.JS based demos:
- Boilerplate (displays a cube on the user's head): live demo, source code
-
THREE.JS based demos - specific README about THREE.js based demo problems:
-
Boilerplates:
- Boilerplate (displays a cube on the user's head): live demo, source code
- Same boilerplate but using
dist/NNClight.json
as neural net: live demo, source code - Same boilerplate but using
dist/NNCwideAngles.json
as neural net: live demo, source code - Boilerplate with ES6 instead of ES5: live demo, source code
- Multiple face tracking: live demo, source code
- GLTF fullscreen demo with HD video: live demo, source code
- Boilerplate with 2 canvas: 1 for FaceFilter and 1 for THREE.JS (not recommended): live demo, source code
-
AR 3D demos:
- Daft Punk (put the iconic helmet): live demo
- Star Wars: Darth Vader: live demo
- Harry Potter (say "Lumos!"): live demo
- Halloween Spiders (you've got a spider in your mouth): live demo, source code
- Werewolf (turn yourself into a werewolf): live demo, source code
- Angel/Demon (discover who of the angel or demon will win in this animated scene): live demo, source code
- Anonymous mask and video effect: live demo, source code
- Rupy Motorcycle Helmet VTO: live demo, source code
- Dog: live demo, source code
- Butterflies animation: live demo, source code
- Clouds above the head: live demo, source code
- Casa-de-Papel mask: live demo, source code
- Miel Pops glasses and bees: live demo, source code
- Football makeup: live demo, source code
- Tiger face filter with mouth opening detection (strong WTF effect): live demo, source code
- Fireworks - particules: live demo, source code
-
face painting or deformation:
- Face deformation: live demo, source code
- Face cel shading: live demo, source code
-
demos linked with tutorials:
- Luffy's Hat: live demo, source code part 1, tutorial part 1, source code part 2, tutorial part 2
- Statue Of Liberty: live demo, source code, interactive tutorial
- Matrix: live demo, source code, tutorial in French, tutorial in English
-
misc:
- Head controlled navigation: live demo, source code
- Glasses virtual try-on: live demo, source code
-
-
A-FRAME based demos:
- Boilerplate (displays a cube on the user's head): live demo, source code
-
CSS3D based demos:
- Boilerplate (displays a
<DIV>
element on the user's head): live demo, source code - Comedy glasses demo: live demo, source code
- Boilerplate (displays a
-
Canvas2D based demos:
- Draw on the face with the mouse: live demo, source code
- 2D face detection and tracking - 30 lines of code only !: live demo, source code, JSfiddle
- 2D face detection and tracking from a video file instead of webcam video: live demo, source code
-
CESIUM.JS based demos:
- 3D view of the Earth with head controlled navigation: live demo, source code, article about the demo
-
Face replacement demos:
- Insert your face into portrait art painting or film posters: live demo, source code
- Insert your face into an animated gif: live demo, specific README, source code
- The traditional faceSwap, fullscreen and with color correction: live demo, source code
-
Head motion control:
- PACMAN game with head controlled navigation: live demo, source code
- Head controlled mouse cursor: live demo, source code
If you have not bought a webcam yet, a screenshot video of some of these examples is available on Youtube. You can also subscribe to the Jeeliz Youtube channel or to the @StartupJeeliz Twitter account to be kept informed of our cutting edge developments.
These amazing applications rely on this library for face detection and tracking:
- Demos made by Movable Ink:
- Demos made by Sansho:
If you have developped an application or a fun demo using this library, we would love to see it and insert a link here ! Just contact us on Twitter @StartupJeeliz or LinkedIn.
Here we describe how to use this library. Although we planned to add new features, we will keep it backward compatible.
On your HTML page, you first need to include the main script between the tags <head>
and </head>
:
<script type="text/javascript" src="dist/jeelizFaceFilter.js"></script>
Then you should include a <canvas>
HTML element in the DOM, between the tags <body>
and </body>
. The width
and height
properties of the <canvas>
element should be set. They define the resolution of the canvas and the final rendering will be computed using this resolution. Be careful to not enlarge too much the canvas size using its CSS properties without increasing its resolution, otherwise it may look blurry or pixelated. We advise to fix the resolution to the actual canvas size. Do not forget to call JEEFACEFILTERAPI.resize()
if you resize the canvas after the initialization step. We strongly encourage you to use our helper /helpers/JeelizResizer.js
to set the width and height of the canvas (see Optimization/Canvas and video resolutions section).
<canvas width="600" height="600" id='jeeFaceFilterCanvas'></canvas>
This canvas will be used by WebGL both for the computation and the 3D rendering. When your page is loaded you should launch this function:
JEEFACEFILTERAPI.init({
canvasId: 'jeeFaceFilterCanvas',
NNCpath: '../../../dist/', //path to JSON neural network model (NNC.json by default)
callbackReady: function(errCode, spec){
if (errCode){
console.log('AN ERROR HAPPENS. ERROR CODE =', errCode);
return;
}
[init scene with spec...]
console.log('INFO: JEEFACEFILTERAPI IS READY');
}, //end callbackReady()
//called at each render iteration (drawing loop)
callbackTrack: function(detectState){
//render your scene here
[... do something with detectState]
} //end callbackTrack()
});//end init call
<boolean> followZRot
: Allow full rotation around depth axis. Default value:false
. See Issue 42 for more details,<integer> maxFacesDetected
: Only for multiple face detection - maximum number of faces which can be detected and tracked. Should be between1
(no multiple detection) and8
,<integer> animateDelay
: It is used only in normal rendering mode (not in slow rendering mode). With this statement you can set accurately the number of milliseconds during which the browser wait at the end of the rendering loop before starting another detection. If you use the canvas of this API as a secondary element (for example in PACMAN or EARTH NAVIGATION demos) you should set a smallanimateDelay
value (for example 2 milliseconds) in order to avoid rendering lags.<function> onWebcamAsk
: Function launched just before asking for the user to allow its webcam sharing,<function> onWebcamGet
: Function launched just after the user has accepted to share its video. It is called with the video element as argument,<dict> videoSettings
: override WebRTC specified video settings, which are by default:
{
'videoElement' //not set by default. <video> element used
//If you specify this parameter,
//all other settings will be useless
//it means that you fully handle the video aspect
'deviceId' //not set by default
'facingMode': 'user', //to use the rear camera, set to 'environment'
'idealWidth': 800, //ideal video width in pixels
'idealHeight': 600, //ideal video height in pixels
'minWidth': 480, //min video width in pixels
'maxWidth': 1280, //max video width in pixels
'minHeight': 480, //min video height in pixels
'maxHeight': 1280, //max video height in pixels,
'rotate': 0 //rotation in degrees possible values: 0,90,-90,180
},
<dict> scanSettings
: override face scan settings - seeset_scanSettings(...)
method for more information.<dict> stabilizationSettings
: override tracking stabilization settings - seeset_stabilizationSettings(...)
method for more information.
If the user has a mobile device in portrait display mode, the width and height of these parameters are automatically inverted for the first camera request. If it does not succeed, we invert the width and height.
The initialization function ( callbackReady
in the code snippet ) will be called with an error code ( errCode
). It can have these values:
false
: no error occurs,"GL_INCOMPATIBLE"
: WebGL is not available, or this WebGL configuration is not enough (there is no WebGL2, or there is WebGL1 without OES_TEXTURE_FLOAT or OES_TEXTURE_HALF_FLOAT extension),"ALREADY_INITIALIZED"
: the API has been already initialized,"NO_CANVASID"
: no canvas or canvas ID was specified,"INVALID_CANVASID"
: cannot found the<canvas>
element in the DOM,"INVALID_CANVASDIMENSIONS"
: the dimensionswidth
andheight
of the canvas are not specified,"WEBCAM_UNAVAILABLE"
: cannot get access to the webcam (the user has no webcam, or it has not accepted to share the device, or the webcam is already busy),"GLCONTEXT_LOST"
: The WebGL context was lost. If the context is lost after the initialization, thecallbackReady
function will be launched a second time with this value as error code,"MAXFACES_TOOHIGH"
: The maximum number of detected and tracked faces, specified by the optional init argumentmaxFacesDetected
, is too high.
We detail here the arguments of the callback functions like callbackReady
or callbackTrack
. The reference of these objects do not change for memory optimization purpose. So you should copy their property values if you want to keep them unchanged outside the callback functions scopes.
The initialization callback function ( callbackReady
in the code snippet ) is called with a second argument, spec
, if there is no error. spec
is a dictionnary having these properties:
<WebGLRenderingContext> GL
: the WebGL context. The rendering 3D engine should use this WebGL context,<canvas> canvasElement
: the<canvas>
element,<WebGLTexture> videoTexture
: a WebGL texture displaying the webcam video. It matches the dimensions of the canvas. It can be used as a background,<int> maxFacesDetected
: the maximum number of detected faces.
At each render iteration a callback function is executed ( callbackTrack
in the code snippet ). It has one argument ( detectState
) which is a dictionnary with these properties:
<float> detected
: the face detection probability, between0
and1
,<float> x
,<float> y
: The 2D coordinates of the center of the detection frame in the viewport (each between -1 and 1,x
from left to right andy
from bottom to top),<float> s
: the scale along the horizontal axis of the detection frame, between 0 and 1 (1 for the full width). The detection frame is always square,<float> rx
,<float> ry
,<float> rz
: the Euler angles of the head rotation in radians.<Float32Array> expressions
: array listing the facial expression coefficients:expressions[0]
: mouth opening coefficient (0
→ mouth closed,1
→ mouth fully opened)
In multiface detection mode, detectState
is an array. Its size is equal to the maximum number of detected faces and each element of this array has the format described just before.
After the initialization (ie after that callbackReady
is launched ) , these methods are available:
-
JEEFACEFILTERAPI.resize()
: should be called after resizing the<canvas>
element to adapt the cut of the video, -
JEEFACEFILTERAPI.toggle_pause(<boolean> isPause)
: pause/resume, -
JEEFACEFILTERAPI.toggle_slow(<boolean> isSlow)
: toggle the slow rendering mode: because this API consumes a lot of GPU resources, it may slow down other elements of the application. If the user opens a CSS menu for example, the CSS transitions and the DOM update can be slow. With this function you can slow down the rendering in order to relieve the GPU. Unfortunately the tracking and the 3D rendering will also be slower but this is not a problem is the user is focusing on other elements of the application. We encourage to enable the slow mode as soon as a the user's attention is focused on a different part of the canvas, -
JEEFACEFILTERAPI.set_animateDelay(<integer> delay)
: Change theanimateDelay
(seeinit()
arguments), -
JEEFACEFILTERAPI.set_inputTexture(<WebGLTexture> tex, <integer> width, <integer> height)
: Change the video input by a WebGL Texture instance. The dimensions of the texture, in pixels, should be provided, -
JEEFACEFILTERAPI.reset_inputTexture()
: Come back to the user's video as input texture, -
JEEFACEFILTERAPI.get_videoDevices(<function> callback)
: Should be called before theinit
method. 2 arguments are provided to the callback function:<array> mediaDevices
: an array with all the devices founds. Each device is a javascript object having adeviceId
string attribute. This value can be provided to theinit
method to use a specific webcam. If an error happens, this value is set tofalse
,<string> errorLabel
: if an error happens, the label of the error. It can be:NOTSUPPORTED
,NODEVICESFOUND
orPROMISEREJECTED
.
-
JEEFACEFILTERAPI.set_scanSettings(<object> scanSettings)
: Override scan settings.scanSettings
is a dictionnary with the following properties:<float> minScale
: min width of the face search window, relatively to the width of the video. Default value:0.15
,<float> maxScale
: max width of the face search window, relatively to the width of the video. Default value:0.6
,<float> borderWidth
: size of the left and right margins, relatively to the width of the window. Default value:0.2
,<float> borderHeight
: size of the bottom and right margins, relatively to the height of the window. Default value:0.2
,<int> nStepsX
: number of detection steps for each scan line. Default:6
,<int> nStepsY
: number of scan lines. Default:5
,<int> nStepsScale
: number of detection steps for the scale. Default:3
,<int> nDetectsPerLoop
: specify the number of detection per drawing loop.-1
for adaptative value. Default:-1
-
JEEFACEFILTERAPI.set_stabilizationSettings(<object> stabilizationSettings)
: Override detection stabilization settings. The output of the neural network is always noisy, so we need to stabilize it using a floatting average to avoid shaking artifacts. The internal algorithm computes first a stabilization factork
between0
and1
. Ifk==0.0
, the detection is bad and we favor responsivity against stabilization. It happens when the user is moving quickly, rotating the head or when the detection is bad. On the contrary, ifk
is close to1
, the detection is nice and the user does not move a lot so we can stabilize a lot.stabilizationSettings
is a dictionnary with the following properties:[<float> minValue, <float> maxValue] translationFactorRange
: multiplyk
by a factorkTranslation
depending on the translation speed of the head (relative to the viewport).kTranslation=0
iftranslationSpeed<minValue
andkTranslation=1
iftranslationSpeed>maxValue
. The regression is linear. Default value:[0.0015, 0.005]
,[<float> minValue, <float> maxValue] rotationFactorRange
: analogous totranslationFactorRange
but for rotation speed. Default value:[0.003, 0.02]
,[<float> minValue, <float> maxValue] qualityFactorRange
: analogous totranslationFactorRange
but for the head detection coefficient. Default value:[0.9, 0.98]
,[<float> minValue, <float> maxValue] alphaRange
: it specify how to applyk
. Between 2 successive detections, we blend the previousdetectState
values with the current detection values using a mixing factoralpha
.alpha=<minValue>
ifk<0.0
andalpha=<maxValue>
ifk>1.0
. Between the 2 values, the variation is quadratic.
-
JEEFACEFILTERAPI.update_videoElement(<video> vid, <function|False> callback)
: change the video element used for the face detection (which can be provided viaVIDEOSETTINGS.videoElement
) by another video element. A callback function can be called when it is done.
We strongly recommend the use of the JeelizResizer
helper in order to size the canvas to the display size in order to not compute more pixels than required. This helper also computes the best camera resolution, which is the closer to the canvas actual size. If the camera resolution is too high compared to the canvas resolution, your application will be unnecessarily slowed because it is quite costly to refresh the WebGL texture for each video frame. And if the video resolution is too low compared to the canvas resolution, the image will be blurry. You can take a look at the THREE.js boilerplate to see how it is used. To use the helper, you first need to include it in the HTML code:
<script type="text/javascript" src="https://appstatic.jeeliz.com/faceFilter/JeelizResizer.js"></script>
Then in your main script, before initializing Jeeliz FaceFilter, you should call it to size the canvas to the best resolution and to find the optimal video resolution:
JeelizResizer.size_canvas({
canvasId: 'jeeFaceFilterCanvas',
callback: function(isError, bestVideoSettings){
JEEFACEFILTERAPI.init({
videoSettings: bestVideoSettings,
//...
//...
});
}
});
Take a look at the source code of this helper (in helpers/JeelizResize.js) to get more information.
A few tips:
- In term of optimisation, the WebGL based demos are more optimized than Canvas2D demos, which are still more optimized than CSS3D demos.
- Try to use lighter resources as possibles. Each texture image should have the lowest resolution as possible, use mipmapping for texture minification filtering.
- The more effects you use, the slower it will be. Add the 3D effects gradually to check that they do not penalize too much the frame rate.
- Use low polygon meshes.
It is possible to detect and track several faces at the same time. To enable this feature, you only have to specify the optional init parameter maxFacesDetected
. Its maximum value is 8
. Indeed, if you are tracking for example 8 faces at the same time, the detection will be slower because there is 8 times less computing power per tracked face. If you have set this value to 8
but if there is only 1
face detected, it should not slow down too much compared to the single face tracking.
If multiple face tracking is enabled, the callbackTrack
function is called with an array of detection states (instead of being executed with a simple detection state). The detection state format is still the same.
You can use our Three.js
multiple faces detection helper, helpers/JeelizThreejsHelper.js
to get started and test this example. The main script has only 60 lines of code !
It is possible to use another 3D engine than BABYLON.JS or THREE.JS. If you have accomplished this work, we would be interested to add your demonstration in this repository (or link to your code). Just open a pull request.
The 3D engine should share the WebGL context with FaceFilter API. The WebGL context is created by Jeeliz Face Filter. The background video texture is given directly as a WebGLTexture
object, so it is usable only on the FaceFilter WebGL context. It would be more costly in term of computating time to have a second WebGL context for the 3D rendering, because at each new video frame we should transfert the video data from the <video>
element to the 2 webgl contexts: the Jeeliz Face Filter WebGL context for processing, and the 3D engine WebGL Context for rendering. Fortunately, with BABYLON.JS or THREE.JS, it is easy to specify an already initialized WebGL context.
Since July 2018 it is possible to change the neural network. When calling JEEFACEFILTERAPI.init({...})
with NNCpath: <path of NNC.json>
you set NNCpath value to a specific neural network file:
JEEFACEFILTERAPI.init({
NNCpath: '../../dist/NNClight.json'
//...
})
It is also possible to give directly the NNC json file content by using NNC
property instead of NNCpath
.
We provide several neural network models:
dist/NNC.json
: this is the default neural network. Good tradeoff between size and performances,dist/NNCwideAngles.json
: this neural network is better to detect wide head angles (but less accurate for small angles),dist/NNClight.json
: this is a light version of the neural network. The file is twice lighter and it runs faster but it is less accurate for large head rotation angles,dist/NNCveryLight.json
: even lighter than the previous version: 250Kbytes, and very fast. But not very accurate and robust to all lighting conditions,dist/NNCviewTop.json
: this neural net is perfect if the camera has a bird's eye view (if you use this library for a kiosk setup for example),dist/NNCdeprecated.json
: this is a deprecated version of the neural network (since 2018-07-25).
/dist/jeelizFaceFilterES6.js
is exactly the same than /dist/jeelizFaceFilter.js
except that it works with ES6, so you can import it directly using:
import 'dist/jeelizFaceFilterES6.js'
or using require
(see issue #72):
const faceFilter =require('./lib/jeelizFaceFilterES6.js')
faceFilter.init({
//you can also provide the canvas directly
//using the canvas property instead of canvasId:
canvasId: 'jeeFaceFilterCanvas',
NNCpath: '../../../dist/', //path to JSON neural network model (NNC.json by default)
callbackReady: function(errCode, spec){
if (errCode){
console.log('AN ERROR HAPPENS. ERROR CODE =', errCode);
return;
}
[init scene with spec...]
console.log('INFO: JEEFACEFILTERAPI IS READY');
}, //end callbackReady()
//called at each render iteration (drawing loop)
callbackTrack: function(detectState){
//render your scene here
[... do something with detectState]
} //end callbackTrack()
});//end init call
This API requires the user's webcam video feed through MediaStream API
. So your application should be hosted by a HTTPS server (even with a self-signed certificate). It won't work at all with unsecure HTTP, even locally with some web browsers.
For development purpose we provide a simple and minimalist HTTPS server in order to check out the demos or develop your very own filters. To launch it, execute in the bash console:
python2 httpsServer.py
It requires Python 2.X. Then open in your web browser https://localhost:4443.
You can use our hosted and up to date version of the library, available here:
https://appstatic.jeeliz.com/faceFilter/jeelizFaceFilter.js
It uses the neuron network NNC.json
hosted in the same path. The helpers used in these demos (all scripts in /helpers/) are also hosted on https://appstatic.jeeliz.com/faceFilter/
.
It is served through a content delivery network (CDN) using gzip compression.
If you host the scripts by yourself, be careful to enable gzip HTTP/HTTPS compression for JSON and JS files. Indeed, the neuron network JSON file, dist/NNC.json
is quite heavy, but very well compressed with GZIP. You can check the gzip compression of your server here.
The neuron network file, dist/NNC.json
is loaded using an ajax XMLHttpRequest
after calling JEEFACEFILTER.init()
. This loading is proceeded after the user has accepted to share its camera. So we won't load this quite heavy file if the user refuses to share it or if there is no webcam available. The loading can be faster if you systematically preload dist/NNC.json
using a service worker or a simple raw XMLHttpRequest
just after the HTML page loading. Then the file will be already in the browser cache when Jeeliz Facefilter API will request it.
This API uses Jeeliz WebGL Deep Learning technology to detect and track the user's face using a neural network. The accuracy is adaptative: the best is the hardware, the more detections are processed per second. All is done client-side.
- If
WebGL2
is available, it usesWebGL2
and no specific extension is required, - If
WebGL2
is not available butWebGL1
, we require eitherOES_TEXTURE_FLOAT
extension orOES_TEXTURE_HALF_FLOAT
extension, - If
WebGL2
is not available, and ifWebGL1
is not available or neitherOES_TEXTURE_FLOAT
orOES_HALF_TEXTURE_FLOAT
are implemented, the user is not compatible.
In all cases, WebRTC should be implemented in the web browser, otherwise FaceFilter API will not be able to get the webcam video feed. Here are the compatibility tables from caniuse.com here: WebGL1, WebGL2, WebRTC.
If a compatibility error is triggered, please post an issue on this repository. If this is a problem with the webcam access, please first retry after closing all applications which could use your device (Skype, Messenger, other browser tabs and windows, ...). Please include:
- a screenshot of webglreport.com - WebGL1 (about your
WebGL1
implementation), - a screenshot of webglreport.com - WebGL2 (about your
WebGL2
implementation), - the log from the web console,
- the steps to reproduce the bug, and screenshots.
We are currently writing a series of tutorial for the API, starting by building some very basic filters and moving to harder ones.
-
Creating a Snapchat-like face filter using Jeeliz FaceFilter API and THREE.JS:
- Part 1: Creating your first filter
- Part 2: User interactions and particles
-
Build a multifacial face filter: Interactive step by step tutorial hosted on WebGL Academy where you learn to build a Statue of Liberty using THREE.js and this library
-
Tutorial: Matrix theme face filter: French version, English translation
Apache 2.0. This application is free for both commercial and non-commercial use.
We appreciate attribution by including the Jeeliz logo and a link to the Jeeliz website in your application or desktop website. Of course we do not expect a large link to Jeeliz over your face filter, but if you can put the link in the credits/about/help/footer section it would be great.
Our newest deep learning based library is called Weboji. It detects 11 facial expressions in real time from the webcam video feed. Then they are reproduced on an avatar, either in 3D with a THREE.JS renderer or in 2D with a SVG renderer (so you can use it even if you are not a 3D developer). You can access to the github repository here.
If you just want to detect if the user is looking at the screen or not, Jeeliz Glance Tracker is what you are looking for. It can be useful to play and pause a video whether the user is watching or not. This library needs fewer resources and the neural network file is much lighter.
If you want to use this library for glasses virtual try-on (sunglasses, spectacles, ski masks), you can take a look at Jeeliz VTO widget. It includes a high quality and lightweight 3D engine which implements the following features: deferred shading, PBR, raytraced shadows, normal mapping, ... It also reconstructs the lighting environment around the user (ambient and directional lighting). But the glasses comes from a database hosted in our servers. If you want to add some models, please contact us.