hmr/index.html at gh-pages · yasudakn/hmr

History

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

body {

font-family: "HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;

font-weight:400;

font-size:16px;

margin-left: auto;

margin-right: auto;

}

table{width:80%}

.content{

margin:auto;

width: 80%

}

h1 {

font-weight:300;

}

.disclaimerbox {

background-color: #eee;

border: 1px solid #eeeeee;

border-radius: 10px ;

-moz-border-radius: 10px ;

-webkit-border-radius: 10px ;

padding: 20px;

}

video.header-vid {

height: 140px;

border: 1px solid black;

border-radius: 10px ;

-moz-border-radius: 10px ;

-webkit-border-radius: 10px ;

}

img.header-img {

height: 140px;

border: 1px solid black;

border-radius: 10px ;

-moz-border-radius: 10px ;

-webkit-border-radius: 10px ;

}

img.rounded {

border: 1px solid #eeeeee;

border-radius: 10px ;

-moz-border-radius: 10px ;

-webkit-border-radius: 10px ;

}

a:link,a:visited

{

color: #1367a7;

text-decoration: none;

}

a:hover {

color: #208799;

}

td.dl-link {

height: 160px;

text-align: center;

font-size: 22px;

}

.vert-cent {

position: relative;

top: 50%;

transform: translateY(-50%);

}

hr

{

border: 0;

height: 1px;

background-image: linear-gradient(to right, rgba(10, 10, 10, 0),

rgba(10, 10, 10, 0.75), rgba(10, 10, 10, 0));

margin: 1em 0 1em 0;

}

</style>

<html>

<head>

window.dataLayer = window.dataLayer || [];

function gtag(){dataLayer.push(arguments);}

gtag('js', new Date());

gtag('config', 'UA-10550309-6');

</script>

<title>Human Mesh Recovery</title>

</head>

<body>

<br>

<span style="font-size:30px">End-to-end Recovery of Human Shape and Pose

</span>

</center>

<br>

<tr>

<span style="font-size:20px"><a href="http://www.cs.berkeley.edu/~kanazawa/">Angjoo Kanazawa</a></span>

</center>

</td>

<span style="font-size:20px"><a href="https://ps.is.tuebingen.mpg.de/person/black">Michael

J Black</a></span>

</center>

</td>

<span style="font-size:20px"><a href="https://www.cs.umd.edu/~djacobs/">David

W. Jacobs</a></span>

</center>

</td>

<span style="font-size:20px"><a href="http://www.eecs.berkeley.edu/~malik/">Jitendra Malik</a></span>

</center>

</td>

</tr>

</table>

<tr>

<span style="font-size:19px">University of California, Berkeley</span><br>

<span style="font-size:19px">MPI for Intelligent Systems, T&uumlbingen, Germany<br> University of Maryland, College Park

</span></center>

</td>

</tr>

</table>

<br>

<tr>

<td>

</center>

</td>

</tr>

<td>

<span style="font-size:14px"><i> <span style="font-weight:bold">Human

Mesh Recovery (HMR): End-to-end adversarial learning of human pose and shape.</span> We present a real time framework for recovering the 3D joint angles and shape of the body from a single RGB image. Bottom row shows results from a model trained without using any coupled 2D-to-3D supervision. We infer the full 3D body even in case of occlusions and truncations. Note that we capture head and limb orientations.</i>

</td>

</tr>

</table>

<br>

We present Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full

3D mesh of a human body from a single RGB image.

In contrast to most current methods that compute 2D or 3D joint

locations, we produce a richer and more useful mesh representation that is

parameterized by shape and 3D joint angles. The main objective is to minimize

the reprojection loss of keypoints, which allow our model to be

trained using images <i>in-the-wild</i> that only have

ground truth 2D annotations.

However, reprojection loss alone is highly under constrained.

In this work we address this problem by introducing an adversary trained to

tell whether a human body parameter is real or not using a large database of

3D human meshes. We show that HMR can be trained with and <b>without</b> using

any paired 2D-to-3D supervision. We do not rely on intermediate 2D

keypoint detection and infer 3D pose and shape parameters directly

from image pixels. Our model runs in real-time given a bounding box

containing the person. We demonstrate our approach on various images <i>in-the-wild</i> and out-perform previous optimization-based

methods that output 3D meshes and show competitive results on tasks

such as 3D joint location estimation and part segmentation.

</div>

<hr>

<center><h1>Paper</h1></center>

<tr>

<td style="padding:1em"><span style="font-size:14pt">Angjoo Kanazawa, Michael

J. Black, David W. Jacobs, Jitendra Malik.<br><br>

End-to-end Recovery of Human Shape and Pose<br><br>

CVPR 2018.<br> </span>

</td>

</tr>

</table>

<br>

<tr>

<a href="./resources/bibtex.txt">[bibtex]</a>

<a href="https://github.com/akanazawa/hmr">[code, model and data!]</a>

</center>

</span>

</tr>

</table>

<br>

<hr>

<center><h1>Video</h1></center>

<tr>

</center>

</tr>

</table>

<br>

<hr>

<center><h1>Overview</h1></center>

<tr>

</center>

<span style="font-size:14px"><i> <span style="font-weight:bold">Overview

of the proposed framework.</span> An image is passed

through a convolutional encoder and then to an

iterative 3D regression module that infers the

latent 3D representation of the human that

minimizes the joint reprojection error.

The 3D parameters are also sent to the discriminator D, whose goal is to tell if the 3D human is from a real data or not.</i></span>

</div>

</tr>

<tr>

<br>

</center>

</td>

</tr>

</table>

<br>

We present an end-to-end framework for recovering a full 3D mesh

of a human body from a single RGB image. We use the generative

human body model <a href="http://smpl.is.tue.mpg.de/">SMPL</a>,

which parameterizes the mesh by 3D joint angles and a

low-dimensional linear shape space.

Estimating a 3D mesh opens the door to a wide range of applications such as foreground and

part segmentation and dense correspondences that are beyond

what is practical with a simple skeleton. The output mesh can be

immediately used by animators, modified, measured, manipulated

and retargeted. Our output is also holistic – we always infer

the full 3D body even in case of occlusions and

truncations. <br>

<br>

There are several challenges in training such an model in an end-to-end

manner:

<ol>

<li> First is the lack of large-scale ground truth 3D

annotation for <i>in-the-wild</i> images. Existing datasets with

accurate 3D annotations are captured in constrained

environments

(<a href="http://humaneva.is.tue.mpg.de/">HumanEva</a>

, <a href="http://vision.imar.ro/human3.6m/description.php">Human3.6M</a>

, <a href="http://gvv.mpi-inf.mpg.de/3dhp-dataset/">MPI-INF-3DHP</a>

). Models trained on these datasets do not generalize

well to the richness of images in the real world.

<li> Second is the inherent ambiguities in single-view 2D-to-3D

mapping. Many of these configurations may not be

anthropometrically reasonable, such as impossible joint angles

or extremely skinny bodies. In addition, estimating the camera explicitly introduces an additional scale ambiguity between the size of the person and the camera distance.

</ol>

In this work we propose a novel approach to mesh reconstruction that

addresses both of these challenges. The key insight is even though

we don't have a large-scale paired 2D-to-3D labels of images in-the-wild, we have

a lot of <i>unpaired</i> datasets: large-scale 2D keypoint

annotations of in-the-wild images

(<a href="http://sam.johnson.io/research/lsp.html">LSP</a>

, <a href="http://human-pose.mpi-inf.mpg.de/">MPII</a>

, <a href="http://cocodataset.org/#keypoints-challenge2017">COCO</a>

, etc) and a

separate large-scale dataset of 3D meshes of people with various

poses and shapes from MoCap. Our key contribution is to take

advantage of these <i>unpaired</i> 2D keypoint annotations and 3D

scans in a conditional generative adversarial manner. <br>

The idea is that, given an image, the network has to infer the 3D

mesh parameters and the camera such that the 3D keypoints match the

annotated 2D keypoints after projection. To deal with ambiguities,

these parameters are sent to a discriminator network, whose task is

to determine if the 3D parameters correspond to bodies of real

humans or not. Hence the network is encouraged to output parameters

on the human manifold and the discriminator acts as a weak

supervision. The network implicitly learns the angle limits for each

joint and is discouraged from making people with unusual body

shapes.

<br>

We take advantage of the structure of the body model and propose a

factorized adversarial prior. We show that we can train a model even

<i>without</i> using any paired 2D-to-3D training data (pink meshes are all

results of this unpaired model). Even without using any paired

2D-to-3D supervision, HMR produces reasonable 3D

reconstructions. This is most exciting because it opens up

possibilities for learning 3D from large amounts of 2D data.

<br>

Please see

the <a href="https://arxiv.org/pdf/1712.06584.pdf">paper</a> for

more details.

</div>

<hr>

<h3>Concurrent Work</h3>

Concurrently and independently from us, a number of groups have

proposed closely related deep learning based approaches for recovering SMPL. Many have a

similar emphasis on resolving the lack of ground truth 3D issue

in interesting and

different ways! Here is a partial list:

<ul>

<li>Hsiao-Yu Fish Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina

Fragkiadaki,

NIPS'17. <a href="https://sites.google.com/view/selfsupervisedlearningofmotion/">Self-supervised

Learning of Motion Capture</a></li>

<li>Jun Kai Vince Tan, Ignas Budvytis, Roberto Cipolla, BMVC'17. <a href="http://mi.eng.cam.ac.uk/~cipolla/publications/inproceedings/2017-BMVC-3D-body-indirect.pdf ">Indirect deep structured learning for 3D human body shape and pose prediction</a></li>

<li>Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, Kostas Daniilidis, CVPR'18. <a href="https://www.seas.upenn.edu/~pavlakos/projects/humanshape/">Learning to Estimate 3D Human Pose and Shape from a Single Color Image</a></li>

<li>Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev,

and Cordelia Schmid, ECCV'18. <a href="https://www.di.ens.fr/willow/research/bodynet/">BodyNet: Volumetric Inference of 3D Human Body Shapes</a></li>

<li> Mohamed Omran, Christop Lassner, Gerard Pons-Moll, Peter

Gehler, Bernt Schiele, 3DV'18. <a href="http://virtualhumans.mpi-inf.mpg.de/papers/omran2018NBF/omran2018NBF.pdf">Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation</a></li>

</ul>

SMPL models recovered by any of these approaches could be

improved by using it as an initialization for our optimization

based approach SMPLify proposed in ECCV

2016:

<ul>

Federica Bogo*, Angjoo Kanazawa*, Christoph Lassner, Peter

Gehler, Javier Romero, Michael Black, ECCV 2016. <a href="http://smplify.is.tue.mpg.de">Keep it SMPL: Automatic

Estimation of 3D Human Pose and Shape from a Single Image</a>.

</ul>

</div>

<hr>

<h3>Acknowledgements</h3>

We thank Naureen Mahmood for providing MoShed

datasets and mesh retargeting for character animation, Dushyant

Mehta for his assistance on MPI-INF-3DHP, and Shubham Tulsiani,

Abhishek Kar, Saurabh Gupta, David Fouhey and Ziwei Liu for helpful

discussions. This research was supported in part by <a href="http://bair.berkeley.edu/">BAIR</a> and NSF Award IIS-1526234.

This webpage template is taken

from <a href="https://shubhtuls.github.io/drc/">humans

working on 3D</a> who borrowed it

from some <a href="https://richzhang.github.io/colorization/">colorful folks</a>.

</left>

</div>

</body>

</html>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.html

index.html

Files

index.html

Latest commit

History

index.html

File metadata and controls