forked from akanazawa/hmr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
399 lines (372 loc) · 16.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript">google.load("jquery", "1.3.2");</script>
<style type="text/css">
body {
font-family: "HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;
font-weight:400;
font-size:16px;
margin-left: auto;
margin-right: auto;
<!-- width: 1100px; -->
}
table{width:80%}
.content{
margin:auto;
width: 80%
}
h1 {
font-weight:300;
}
.disclaimerbox {
background-color: #eee;
border: 1px solid #eeeeee;
border-radius: 10px ;
-moz-border-radius: 10px ;
-webkit-border-radius: 10px ;
padding: 20px;
}
video.header-vid {
height: 140px;
border: 1px solid black;
border-radius: 10px ;
-moz-border-radius: 10px ;
-webkit-border-radius: 10px ;
}
img.header-img {
height: 140px;
border: 1px solid black;
border-radius: 10px ;
-moz-border-radius: 10px ;
-webkit-border-radius: 10px ;
}
img.rounded {
border: 1px solid #eeeeee;
border-radius: 10px ;
-moz-border-radius: 10px ;
-webkit-border-radius: 10px ;
}
a:link,a:visited
{
color: #1367a7;
text-decoration: none;
}
a:hover {
color: #208799;
}
td.dl-link {
height: 160px;
text-align: center;
font-size: 22px;
}
.vert-cent {
position: relative;
top: 50%;
transform: translateY(-50%);
}
hr
{
border: 0;
height: 1px;
background-image: linear-gradient(to right, rgba(10, 10, 10, 0),
rgba(10, 10, 10, 0.75), rgba(10, 10, 10, 0));
margin: 1em 0 1em 0;
}
</style>
<html>
<head>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-10550309-6"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-10550309-6');
</script>
<title>Human Mesh Recovery</title>
<meta property="og:title" content="HMR" />
</head>
<body>
<br>
<center>
<span style="font-size:30px">End-to-end Recovery of Human Shape and Pose
</span>
<br><br> <span style="font-size:20px">CVPR 2018</span>
</center>
<br>
<table align=center width=900px>
<tr>
<td align=center width=100px>
<center>
<span style="font-size:20px"><a href="http://www.cs.berkeley.edu/~kanazawa/">Angjoo Kanazawa</a></span>
</center>
</td>
<td align=center width=100px>
<center>
<span style="font-size:20px"><a href="https://ps.is.tuebingen.mpg.de/person/black">Michael
J Black</a></span>
</center>
</td>
<td align=center width=100px>
<center>
<span style="font-size:20px"><a href="https://www.cs.umd.edu/~djacobs/">David
W. Jacobs</a></span>
</center>
</td>
<td align=center width=100px>
<center>
<span style="font-size:20px"><a href="http://www.eecs.berkeley.edu/~malik/">Jitendra Malik</a></span>
</center>
</td>
</tr>
</table>
<!-- <br> -->
<table align=center width=700px>
<tr>
<td align=center width=100px>
<center>
<span style="font-size:19px">University of California, Berkeley</span><br>
<span style="font-size:19px">MPI for Intelligent Systems, Tübingen, Germany<br> University of Maryland, College Park
</span></center>
</td>
</tr>
</table>
<br>
<table align=center style="width:80%">
<tr>
<!-- <td width=600px> -->
<td>
<center>
<a href="./resources/images/teaser.png"><img src = "./resources/images/teaser.png" height="400px"></img></href></a><br>
</center>
</td>
</tr>
<!-- <td width=600px> -->
<td>
<!-- <center> -->
<span style="font-size:14px"><i> <span style="font-weight:bold">Human
Mesh Recovery (HMR): End-to-end adversarial learning of human pose and shape.</span> We present a real time framework for recovering the 3D joint angles and shape of the body from a single RGB image. Bottom row shows results from a model trained without using any coupled 2D-to-3D supervision. We infer the full 3D body even in case of occlusions and truncations. Note that we capture head and limb orientations.</i>
<!-- </center> -->
</td>
</tr>
</table>
<!-- <hr class = "divider"> -->
<div class='content'>
<br>
We present Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full
3D mesh of a human body from a single RGB image.
In contrast to most current methods that compute 2D or 3D joint
locations, we produce a richer and more useful mesh representation that is
parameterized by shape and 3D joint angles. The main objective is to minimize
the reprojection loss of keypoints, which allow our model to be
trained using images <i>in-the-wild</i> that only have
ground truth 2D annotations.
However, reprojection loss alone is highly under constrained.
In this work we address this problem by introducing an adversary trained to
tell whether a human body parameter is real or not using a large database of
3D human meshes. We show that HMR can be trained with and <b>without</b> using
any paired 2D-to-3D supervision. We do not rely on intermediate 2D
keypoint detection and infer 3D pose and shape parameters directly
from image pixels. Our model runs in real-time given a bounding box
containing the person. We demonstrate our approach on various images <i>in-the-wild</i> and out-perform previous optimization-based
methods that output 3D meshes and show competitive results on tasks
such as 3D joint location estimation and part segmentation.
</div>
<br><br>
<hr>
<table align=center width=650>
<center><h1>Paper</h1></center>
<tr>
<td style="padding:1em"><a href="https://arxiv.org/abs/1712.06584"><img style="height:180px" src="./resources/images/paper_thumb.png"/></a></td>
<td style="padding:1em"><span style="font-size:14pt">Angjoo Kanazawa, Michael
J. Black, David W. Jacobs, Jitendra Malik.<br><br>
End-to-end Recovery of Human Shape and Pose<br><br>
CVPR 2018.<br> </span>
</td>
</tr>
</table>
<br>
<table align=center width=500px>
<tr>
<span style="font-size:14pt">
<center>
<a href="https://arxiv.org/pdf/1712.06584.pdf">[pdf]</a>
<a href="./resources/bibtex.txt">[bibtex]</a>
<a href="https://github.com/akanazawa/hmr">[code, model and data!]</a>
</center>
</span>
<!-- <td> -->
<!-- <span style="font-size:14pt"><center> -->
<!-- <a href="https://arxiv.org/pdf/1712.06584.pdf">[pdf]</a> -->
<!-- </center> -->
<!-- </span> -->
<!-- </td> -->
<!-- <td><span style="font-size:14pt"><center> -->
<!-- <a href="./resources/bibtex.txt">[Bibtex]</a> -->
<!-- </center></td> -->
<!-- <td><span style="font-size:14pt"><center> -->
<!-- <a href="https://github.com/akanazawa/hmr">[Code, model and data!]</a> -->
<!-- </center></td> -->
</tr>
</table>
<br>
<hr>
<center><h1>Video</h1></center>
<table align=center width=1000px>
<tr>
<center>
<!-- <iframe width="560" height="315" src="https://www.youtube.com/embed/bmMV9aJKa-c" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe> -->
<iframe width="840" height="472" src="https://www.youtube.com/embed/bmMV9aJKa-c" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>
</center>
</tr>
</table>
<br>
<hr>
<center><h1>Overview</h1></center>
<table align=center width=1000px>
<tr>
<center>
<img class="round" style="height:300" src="./resources/images/overview.png"/>
</center>
<div class='content'>
<!-- <td width=600px> -->
<!-- <center> -->
<span style="font-size:14px"><i> <span style="font-weight:bold">Overview
of the proposed framework.</span> An image is passed
through a convolutional encoder and then to an
iterative 3D regression module that infers the
latent 3D representation of the human that
minimizes the joint reprojection error.
The 3D parameters are also sent to the discriminator D, whose goal is to tell if the 3D human is from a real data or not.</i></span>
<!-- </center> -->
<!-- </td> -->
</div>
</tr>
<tr>
<td><center> <br>
<span style="font-size:20px"><a href='https://github.com/akanazawa/hmr'>Code</a> </span>
<br>
</center>
</td>
</tr>
</table>
<br>
<div class='content'>
We present an end-to-end framework for recovering a full 3D mesh
of a human body from a single RGB image. We use the generative
human body model <a href="http://smpl.is.tue.mpg.de/">SMPL</a>,
which parameterizes the mesh by 3D joint angles and a
low-dimensional linear shape space. <!-- We estimate the 3D joint angles, -->
<!-- the shape, as well as the weak-perspective camera of the input -->
<!-- image. -->
Estimating a 3D mesh opens the door to a wide range of applications such as foreground and
part segmentation and dense correspondences that are beyond
what is practical with a simple skeleton. The output mesh can be
immediately used by animators, modified, measured, manipulated
and retargeted. Our output is also holistic – we always infer
the full 3D body even in case of occlusions and
truncations. <br>
<br>
There are several challenges in training such an model in an end-to-end
manner:
<ol>
<li> First is the lack of large-scale ground truth 3D
annotation for <i>in-the-wild</i> images. Existing datasets with
accurate 3D annotations are captured in constrained
environments
(<a href="http://humaneva.is.tue.mpg.de/">HumanEva</a>
, <a href="http://vision.imar.ro/human3.6m/description.php">Human3.6M</a>
, <a href="http://gvv.mpi-inf.mpg.de/3dhp-dataset/">MPI-INF-3DHP</a>
). Models trained on these datasets do not generalize
well to the richness of images in the real world.
<li> Second is the inherent ambiguities in single-view 2D-to-3D
mapping. Many of these configurations may not be
anthropometrically reasonable, such as impossible joint angles
or extremely skinny bodies. In addition, estimating the camera explicitly introduces an additional scale ambiguity between the size of the person and the camera distance.
</ol>
In this work we propose a novel approach to mesh reconstruction that
addresses both of these challenges. The key insight is even though
we don't have a large-scale paired 2D-to-3D labels of images in-the-wild, we have
a lot of <i>unpaired</i> datasets: large-scale 2D keypoint
annotations of in-the-wild images
(<a href="http://sam.johnson.io/research/lsp.html">LSP</a>
, <a href="http://human-pose.mpi-inf.mpg.de/">MPII</a>
, <a href="http://cocodataset.org/#keypoints-challenge2017">COCO</a>
, etc) and a
separate large-scale dataset of 3D meshes of people with various
poses and shapes from MoCap. Our key contribution is to take
advantage of these <i>unpaired</i> 2D keypoint annotations and 3D
scans in a conditional generative adversarial manner. <br>
The idea is that, given an image, the network has to infer the 3D
mesh parameters and the camera such that the 3D keypoints match the
annotated 2D keypoints after projection. To deal with ambiguities,
these parameters are sent to a discriminator network, whose task is
to determine if the 3D parameters correspond to bodies of real
humans or not. Hence the network is encouraged to output parameters
on the human manifold and the discriminator acts as a weak
supervision. The network implicitly learns the angle limits for each
joint and is discouraged from making people with unusual body
shapes.
<br>
<br>
We take advantage of the structure of the body model and propose a
factorized adversarial prior. We show that we can train a model even
<i>without</i> using any paired 2D-to-3D training data (pink meshes are all
results of this unpaired model). Even without using any paired
2D-to-3D supervision, HMR produces reasonable 3D
reconstructions. This is most exciting because it opens up
possibilities for learning 3D from large amounts of 2D data.
<br>
<br>
Please see
the <a href="https://arxiv.org/pdf/1712.06584.pdf">paper</a> for
more details.
</div>
<hr>
<div class='content'>
<h3>Concurrent Work</h3>
Concurrently and independently from us, a number of groups have
proposed closely related deep learning based approaches for recovering SMPL. Many have a
similar emphasis on resolving the lack of ground truth 3D issue
in interesting and
different ways! Here is a partial list:
<ul>
<li>Hsiao-Yu Fish Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina
Fragkiadaki,
NIPS'17. <a href="https://sites.google.com/view/selfsupervisedlearningofmotion/">Self-supervised
Learning of Motion Capture</a></li>
<li>Jun Kai Vince Tan, Ignas Budvytis, Roberto Cipolla, BMVC'17. <a href="http://mi.eng.cam.ac.uk/~cipolla/publications/inproceedings/2017-BMVC-3D-body-indirect.pdf ">Indirect deep structured learning for 3D human body shape and pose prediction</a></li>
<li>Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, Kostas Daniilidis, CVPR'18. <a href="https://www.seas.upenn.edu/~pavlakos/projects/humanshape/">Learning to Estimate 3D Human Pose and Shape from a Single Color Image</a></li>
<li>Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev,
and Cordelia Schmid, ECCV'18. <a href="https://www.di.ens.fr/willow/research/bodynet/">BodyNet: Volumetric Inference of 3D Human Body Shapes</a></li>
<li> Mohamed Omran, Christop Lassner, Gerard Pons-Moll, Peter
Gehler, Bernt Schiele, 3DV'18. <a href="http://virtualhumans.mpi-inf.mpg.de/papers/omran2018NBF/omran2018NBF.pdf">Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation</a></li>
</ul>
SMPL models recovered by any of these approaches could be
improved by using it as an initialization for our optimization
based approach SMPLify proposed in ECCV
2016:
<ul>
Federica Bogo*, Angjoo Kanazawa*, Christoph Lassner, Peter
Gehler, Javier Romero, Michael Black, ECCV 2016. <a href="http://smplify.is.tue.mpg.de">Keep it SMPL: Automatic
Estimation of 3D Human Pose and Shape from a Single Image</a>.
</ul>
</div>
<hr>
<!-- <br> -->
<!-- <center><h1>Acknowledgements</h1></center> -->
<div class='content'>
<h3>Acknowledgements</h3>
We thank Naureen Mahmood for providing MoShed
datasets and mesh retargeting for character animation, Dushyant
Mehta for his assistance on MPI-INF-3DHP, and Shubham Tulsiani,
Abhishek Kar, Saurabh Gupta, David Fouhey and Ziwei Liu for helpful
discussions. This research was supported in part by <a href="http://bair.berkeley.edu/">BAIR</a> and NSF Award IIS-1526234.
This webpage template is taken
from <a href="https://shubhtuls.github.io/drc/">humans
working on 3D</a> who borrowed it
from some <a href="https://richzhang.github.io/colorization/">colorful folks</a>.
</left>
</div>
<br><br>
</body>
</html>