diff --git a/index.html b/index.html
index 7faf0c5d0..28468e22f 100644
--- a/index.html
+++ b/index.html
@@ -238,26 +238,84 @@
- Code [coming soon]
-
+ Code [coming soon]
+
|
-
-
-
+ We present an end-to-end framework for recovering a full 3D mesh
+ of a human body from a single RGB image. We use the generative
+ human body model SMPL,
+ which parameterizes the mesh by 3D joint angles and a
+ low-dimensional linear shape space.
+
+
+ Estimating a 3D mesh opens the door to a wide range of applications such as foreground and
+ part segmentation and dense correspondences that are beyond
+ what is practical with a simple skeleton. The output mesh can be
+ immediately used by animators, modified, measured, manipulated
+ and retargeted. Our output is also holistic – we always infer
+ the full 3D body even in case of occlusions and
+ truncations.
+
+ There are several challenges in training such an model in an end-to-end
+ manner:
+
+ - First is the lack of large-scale ground truth 3D
+ annotation for in-the-wild images. Existing datasets with
+ accurate 3D annotations are captured in constrained
+ environments
+ (HumanEva
+ , Human3.6M
+ , MPI-INF-3DHP
+ ). Models trained on these datasets do not generalize
+ well to the richness of images in the real world.
+
+
- Second is the inherent ambiguities in single-view 2D-to-3D
+ mapping. Many of these configurations may not be
+ anthropometrically reasonable, such as impossible joint angles
+ or extremely skinny bodies. In addition, estimating the camera explicitly introduces an additional scale ambiguity between the size of the person and the camera distance.
+
+ In this work we propose a novel approach to mesh reconstruction that
+ addresses both of these challenges. The key insight is even though
+ we don't have a large-scale paired 2D-to-3D labels of images in-the-wild, we have
+ a lot of unpaired datasets: large-scale 2D keypoint
+ annotations of in-the-wild images
+ (LSP
+ , MPII
+ , COCO
+ , etc) and a
+ separate large-scale dataset of 3D meshes of people with various
+ poses and shapes from MoCap. Our key contribution is to take
+ advantage of these unpaired 2D keypoint annotations and 3D
+ scans in a conditional generative adversarial manner.
+
+ The idea is that, given an image, the network has to infer the 3D
+ mesh parameters and the camera such that the 3D keypoints match the
+ annotated 2D keypoints after projection. To deal with ambiguities,
+ these parameters are sent to a discriminator network, whose task is
+ to determine if the 3D parameters correspond to bodies of real
+ humans or not. Hence the network is encouraged to output parameters
+ on the human manifold and the discriminator acts as a weak
+ supervision. The network implicitly learns the angle limits for each
+ joint and is discouraged from making people with unusual body
+ shapes.
+
+
+ We take advantage of the structure of the body model and propose a
+ factorized adversarial prior. We show that we can train a model even
+ without using any paired 2D-to-3D training data (pink meshes are all
+ results of this unpaired model). Even without using any paired
+ 2D-to-3D supervision, HMR produces reasonable 3D
+ reconstructions. This is most exciting because it opens up
+ possibilities for learning 3D from large amounts of 2D data.
+
+
+ Please see the paper for more details.
+
-
-
-
-
-
-
-
-
-