Speed up Model tests by 20% #5574

datumbox · 2022-03-09T11:39:12Z

We focus on the test_classification_model, test_detection_model, test_quantized_classification_model, test_segmentation_model, test_video_model tests and improve their execution times by 20%.

This is achieved by:

Reducing the input size for very large models
Avoid reestimating model outputs when possible

Before: 629.21 sec

Run: https://app.circleci.com/pipelines/github/pytorch/vision/15497/workflows/51d7e30b-86ff-4c71-af68-7cb3438f25c9/jobs/1252313

62.17s call     test/test_models.py::test_quantized_classification_model[mobilenet_v3_large]
46.86s call     test/test_models.py::test_quantized_classification_model[mobilenet_v2]
43.47s call     test/test_models.py::test_quantized_classification_model[resnext101_32x8d]
32.26s call     test/test_models.py::test_quantized_classification_model[shufflenet_v2_x0_5]
22.26s call     test/test_models.py::test_classification_model[cpu-regnet_y_128gf]
18.74s call     test/test_models.py::test_quantized_classification_model[googlenet]
16.09s call     test/test_models.py::test_classification_model[cpu-efficientnet_v2_l]
13.48s call     test/test_models.py::test_quantized_classification_model[shufflenet_v2_x1_0]
12.84s call     test/test_models.py::test_classification_model[cpu-efficientnet_b7]
11.11s call     test/test_models.py::test_classification_model[cpu-efficientnet_v2_m]
10.93s call     test/test_models.py::test_classification_model[cpu-vit_l_16]
10.48s call     test/test_models.py::test_classification_model[cpu-efficientnet_b6]
10.15s call     test/test_models.py::test_classification_model[cpu-vit_l_32]
9.12s call     test/test_models.py::test_classification_model[cpu-densenet201]
8.58s call     test/test_models.py::test_classification_model[cpu-efficientnet_b5]
8.48s call     test/test_models.py::test_detection_model[cpu-maskrcnn_resnet50_fpn]
8.40s call     test/test_models.py::test_classification_model[cpu-densenet161]
7.45s call     test/test_models.py::test_classification_model[cpu-densenet169]
7.27s call     test/test_models.py::test_classification_model[cpu-regnet_y_32gf]
7.18s call     test/test_models.py::test_detection_model[cpu-keypointrcnn_resnet50_fpn]
7.10s call     test/test_models.py::test_classification_model[cpu-efficientnet_v2_s]
6.94s call     test/test_models.py::test_classification_model[cpu-efficientnet_b4]
6.66s call     test/test_models.py::test_classification_model[cpu-convnext_large]
6.49s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_mobilenet_v3_large_320_fpn]
6.02s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_resnet50_fpn]
5.99s call     test/test_models.py::test_quantized_classification_model[resnet18]
5.81s call     test/test_models.py::test_classification_model[cpu-efficientnet_b3]
5.37s call     test/test_models.py::test_classification_model[cpu-regnet_y_1_6gf]
5.35s call     test/test_models.py::test_classification_model[cpu-regnet_y_16gf]
5.30s call     test/test_models.py::test_detection_model[cpu-ssdlite320_mobilenet_v3_large]
5.24s call     test/test_models.py::test_classification_model[cpu-regnet_x_32gf]
5.24s call     test/test_models.py::test_detection_model[cpu-ssd300_vgg16]
5.12s call     test/test_models.py::test_classification_model[cpu-wide_resnet101_2]
5.01s call     test/test_models.py::test_classification_model[cpu-efficientnet_b2]
4.99s call     test/test_models.py::test_classification_model[cpu-densenet121]
4.87s call     test/test_models.py::test_classification_model[cpu-efficientnet_b1]
4.61s call     test/test_models.py::test_classification_model[cpu-regnet_y_3_2gf]
4.45s call     test/test_models.py::test_classification_model[cpu-resnet152]
4.44s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_mobilenet_v3_large_fpn]
4.38s call     test/test_models.py::test_classification_model[cpu-regnet_y_8gf]
4.25s call     test/test_models.py::test_classification_model[cpu-resnext101_32x8d]
4.20s call     test/test_models.py::test_classification_model[cpu-regnet_x_16gf]
4.11s call     test/test_models.py::test_classification_model[cpu-convnext_base]
4.09s call     test/test_models.py::test_classification_model[cpu-regnet_y_400mf]
3.82s call     test/test_models.py::test_classification_model[cpu-vit_b_16]
3.69s call     test/test_models.py::test_quantized_classification_model[inception_v3]
3.63s call     test/test_models.py::test_classification_model[cpu-efficientnet_b0]
3.63s call     test/test_models.py::test_detection_model[cpu-fcos_resnet50_fpn]
3.55s call     test/test_models.py::test_segmentation_model[cpu-deeplabv3_resnet101]
3.52s call     test/test_models.py::test_classification_model[cpu-vit_b_32]
3.45s call     test/test_models.py::test_classification_model[cpu-vgg19_bn]
3.40s call     test/test_models.py::test_classification_model[cpu-regnet_x_8gf]
3.40s call     test/test_models.py::test_detection_model[cpu-retinanet_resnet50_fpn]
3.20s call     test/test_models.py::test_segmentation_model[cpu-fcn_resnet101]
3.17s call     test/test_models.py::test_classification_model[cpu-vgg16_bn]
3.15s call     test/test_models.py::test_classification_model[cpu-convnext_small]
3.13s call     test/test_models.py::test_classification_model[cpu-vgg19]
3.12s call     test/test_models.py::test_classification_model[cpu-inception_v3]
3.05s call     test/test_models.py::test_classification_model[cpu-resnet101]
3.02s call     test/test_models.py::test_classification_model[cpu-vgg13_bn]
3.00s call     test/test_models.py::test_classification_model[cpu-regnet_y_800mf]
2.98s call     test/test_models.py::test_classification_model[cpu-vgg16]
2.95s call     test/test_models.py::test_classification_model[cpu-vgg11_bn]
2.92s call     test/test_models.py::test_classification_model[cpu-regnet_x_3_2gf]
2.89s call     test/test_models.py::test_classification_model[cpu-vgg13]
2.87s call     test/test_models.py::test_classification_model[cpu-vgg11]
2.83s call     test/test_models.py::test_classification_model[cpu-wide_resnet50_2]
2.71s call     test/test_models.py::test_quantized_classification_model[resnet50]
2.63s call     test/test_models.py::test_classification_model[cpu-googlenet]
2.60s call     test/test_models.py::test_classification_model[cpu-mobilenet_v3_large]
2.56s call     test/test_models.py::test_classification_model[cpu-shufflenet_v2_x0_5]
2.56s call     test/test_models.py::test_classification_model[cpu-regnet_x_400mf]
2.49s call     test/test_models.py::test_segmentation_model[cpu-deeplabv3_mobilenet_v3_large]
2.33s call     test/test_models.py::test_classification_model[cpu-shufflenet_v2_x2_0]
2.30s call     test/test_models.py::test_classification_model[cpu-mobilenet_v2]
2.29s call     test/test_models.py::test_classification_model[cpu-mnasnet1_3]
2.29s call     test/test_models.py::test_segmentation_model[cpu-lraspp_mobilenet_v3_large]
2.25s call     test/test_models.py::test_classification_model[cpu-mobilenet_v3_small]
2.20s call     test/test_models.py::test_classification_model[cpu-regnet_x_1_6gf]
2.19s call     test/test_models.py::test_classification_model[cpu-shufflenet_v2_x1_0]
2.16s call     test/test_models.py::test_classification_model[cpu-resnext50_32x4d]
2.15s call     test/test_models.py::test_classification_model[cpu-shufflenet_v2_x1_5]
2.15s call     test/test_models.py::test_classification_model[cpu-mnasnet0_75]
2.12s call     test/test_models.py::test_segmentation_model[cpu-deeplabv3_resnet50]
2.12s call     test/test_models.py::test_classification_model[cpu-mnasnet1_0]
2.04s call     test/test_models.py::test_classification_model[cpu-mnasnet0_5]
2.04s call     test/test_models.py::test_video_model[cpu-r2plus1d_18]
2.03s call     test/test_models.py::test_video_model[cpu-r3d_18]
2.01s call     test/test_models.py::test_classification_model[cpu-regnet_x_800mf]
1.95s call     test/test_models.py::test_segmentation_model[cpu-fcn_resnet50]
1.92s call     test/test_models.py::test_classification_model[cpu-convnext_tiny]
1.75s call     test/test_models.py::test_classification_model[cpu-resnet50]
1.49s call     test/test_models.py::test_video_model[cpu-mc3_18]
1.29s call     test/test_models.py::test_classification_model[cpu-resnet34]
0.99s call     test/test_models.py::test_classification_model[cpu-alexnet]
0.95s call     test/test_models.py::test_classification_model[cpu-resnet18]
0.54s call     test/test_models.py::test_classification_model[cpu-squeezenet1_0]
0.39s call     test/test_models.py::test_classification_model[cpu-squeezenet1_1]

After: 507.28 sec

Run: https://app.circleci.com/pipelines/github/pytorch/vision/15520/workflows/2cc25bdc-fe10-4705-b942-c206a9d12fad/jobs/1254266

44.83s call     test/test_models.py::test_quantized_classification_model[mobilenet_v3_large]
33.70s call     test/test_models.py::test_quantized_classification_model[mobilenet_v2]
33.10s call     test/test_models.py::test_quantized_classification_model[resnext101_32x8d]
27.79s call     test/test_models.py::test_quantized_classification_model[shufflenet_v2_x1_0]
26.34s call     test/test_models.py::test_quantized_classification_model[shufflenet_v2_x0_5]
17.15s call     test/test_models.py::test_classification_model[cpu-regnet_y_128gf]
13.76s call     test/test_models.py::test_quantized_classification_model[googlenet]
10.88s call     test/test_models.py::test_classification_model[cpu-efficientnet_v2_l]
10.11s call     test/test_models.py::test_classification_model[cpu-vit_l_16]
8.01s call     test/test_models.py::test_classification_model[cpu-vit_l_32]
7.95s call     test/test_models.py::test_classification_model[cpu-efficientnet_b7]
7.34s call     test/test_models.py::test_classification_model[cpu-densenet201]
7.09s call     test/test_models.py::test_classification_model[cpu-efficientnet_v2_m]
6.89s call     test/test_models.py::test_classification_model[cpu-efficientnet_b6]
6.56s call     test/test_models.py::test_classification_model[cpu-densenet169]
6.56s call     test/test_models.py::test_classification_model[cpu-densenet161]
6.48s call     test/test_models.py::test_classification_model[cpu-convnext_large]
5.76s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_mobilenet_v3_large_fpn]
5.47s call     test/test_models.py::test_classification_model[cpu-efficientnet_b5]
5.27s call     test/test_models.py::test_classification_model[cpu-regnet_y_32gf]
4.80s call     test/test_models.py::test_classification_model[cpu-efficientnet_v2_s]
4.78s call     test/test_models.py::test_quantized_classification_model[resnet18]
4.71s call     test/test_models.py::test_detection_model[cpu-maskrcnn_resnet50_fpn]
4.67s call     test/test_models.py::test_classification_model[cpu-densenet121]
4.58s call     test/test_models.py::test_classification_model[cpu-resnet152]
4.49s call     test/test_models.py::test_classification_model[cpu-regnet_x_32gf]
4.44s call     test/test_models.py::test_classification_model[cpu-efficientnet_b4]
4.38s call     test/test_models.py::test_classification_model[cpu-wide_resnet101_2]
4.17s call     test/test_models.py::test_detection_model[cpu-keypointrcnn_resnet50_fpn]
4.08s call     test/test_models.py::test_quantized_classification_model[inception_v3]
4.07s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_mobilenet_v3_large_320_fpn]
3.90s call     test/test_models.py::test_detection_model[cpu-fasterrcnn_resnet50_fpn]
3.90s call     test/test_models.py::test_detection_model[cpu-ssd300_vgg16]
3.87s call     test/test_models.py::test_classification_model[cpu-regnet_y_16gf]
3.86s call     test/test_models.py::test_classification_model[cpu-convnext_base]
3.80s call     test/test_models.py::test_classification_model[cpu-efficientnet_b3]
3.68s call     test/test_models.py::test_classification_model[cpu-regnet_y_1_6gf]
3.52s call     test/test_models.py::test_classification_model[cpu-resnext101_32x8d]
3.52s call     test/test_models.py::test_classification_model[cpu-vgg19_bn]
3.41s call     test/test_models.py::test_classification_model[cpu-regnet_y_8gf]
3.40s call     test/test_models.py::test_detection_model[cpu-ssdlite320_mobilenet_v3_large]
3.34s call     test/test_models.py::test_classification_model[cpu-efficientnet_b2]
3.32s call     test/test_models.py::test_classification_model[cpu-convnext_small]
3.21s call     test/test_models.py::test_classification_model[cpu-vit_b_16]
3.17s call     test/test_models.py::test_classification_model[cpu-vgg19]
3.14s call     test/test_models.py::test_classification_model[cpu-vgg16_bn]
3.14s call     test/test_models.py::test_classification_model[cpu-efficientnet_b1]
3.11s call     test/test_models.py::test_segmentation_model[cpu-deeplabv3_resnet101]
3.10s call     test/test_models.py::test_classification_model[cpu-regnet_y_3_2gf]
3.06s call     test/test_models.py::test_classification_model[cpu-regnet_x_8gf]
3.06s call     test/test_models.py::test_classification_model[cpu-vgg16]
3.02s call     test/test_models.py::test_classification_model[cpu-resnet101]
3.01s call     test/test_models.py::test_classification_model[cpu-regnet_x_16gf]
2.99s call     test/test_models.py::test_classification_model[cpu-vgg13_bn]
2.99s call     test/test_models.py::test_segmentation_model[cpu-fcn_resnet101]
2.85s call     test/test_models.py::test_classification_model[cpu-inception_v3]
2.82s call     test/test_models.py::test_classification_model[cpu-regnet_y_400mf]
2.79s call     test/test_models.py::test_classification_model[cpu-vgg11]
2.77s call     test/test_models.py::test_classification_model[cpu-vgg13]
2.76s call     test/test_models.py::test_classification_model[cpu-vit_b_32]
2.75s call     test/test_models.py::test_classification_model[cpu-vgg11_bn]
2.74s call     test/test_models.py::test_detection_model[cpu-retinanet_resnet50_fpn]
2.68s call     test/test_models.py::test_classification_model[cpu-regnet_x_3_2gf]
2.53s call     test/test_models.py::test_classification_model[cpu-wide_resnet50_2]
2.46s call     test/test_models.py::test_classification_model[cpu-efficientnet_b0]
2.44s call     test/test_models.py::test_classification_model[cpu-regnet_x_400mf]
2.43s call     test/test_models.py::test_detection_model[cpu-fcos_resnet50_fpn]
2.39s call     test/test_models.py::test_quantized_classification_model[resnet50]
2.26s call     test/test_models.py::test_classification_model[cpu-regnet_y_800mf]
2.24s call     test/test_models.py::test_classification_model[cpu-shufflenet_v2_x0_5]
2.23s call     test/test_models.py::test_classification_model[cpu-resnext50_32x4d]
2.20s call     test/test_models.py::test_classification_model[cpu-googlenet]
2.17s call     test/test_models.py::test_segmentation_model[cpu-deeplabv3_mobilenet_v3_large]
2.17s call     test/test_models.py::test_video_model[cpu-r2plus1d_18]
2.04s call     test/test_models.py::test_classification_model[cpu-regnet_x_1_6gf]
2.03s call     test/test_models.py::test_classification_model[cpu-convnext_tiny]
2.00s call     test/test_models.py::test_classification_model[cpu-mnasnet1_3]
1.97s call     test/test_models.py::test_classification_model[cpu-mobilenet_v2]
1.95s call     test/test_models.py::test_classification_model[cpu-mobilenet_v3_large]
1.94s call     test/test_models.py::test_classification_model[cpu-shufflenet_v2_x1_0]
1.93s call     test/test_models.py::test_segmentation_model[cpu-deeplabv3_resnet50]
1.92s call     test/test_models.py::test_segmentation_model[cpu-fcn_resnet50]
1.90s call     test/test_models.py::test_classification_model[cpu-regnet_x_800mf]
1.84s call     test/test_models.py::test_classification_model[cpu-resnet50]
1.79s call     test/test_models.py::test_classification_model[cpu-shufflenet_v2_x2_0]
1.79s call     test/test_models.py::test_video_model[cpu-r3d_18]
1.72s call     test/test_models.py::test_classification_model[cpu-mnasnet0_75]
1.70s call     test/test_models.py::test_segmentation_model[cpu-lraspp_mobilenet_v3_large]
1.68s call     test/test_models.py::test_classification_model[cpu-mnasnet0_5]
1.66s call     test/test_models.py::test_classification_model[cpu-mnasnet1_0]
1.63s call     test/test_models.py::test_classification_model[cpu-shufflenet_v2_x1_5]
1.49s call     test/test_models.py::test_video_model[cpu-mc3_18]
1.47s call     test/test_models.py::test_classification_model[cpu-mobilenet_v3_small]
1.32s call     test/test_models.py::test_classification_model[cpu-resnet34]
1.13s call     test/test_models.py::test_classification_model[cpu-alexnet]
0.94s call     test/test_models.py::test_classification_model[cpu-resnet18]
0.65s call     test/test_models.py::test_classification_model[cpu-squeezenet1_0]
0.38s call     test/test_models.py::test_classification_model[cpu-squeezenet1_1]

facebook-github-bot · 2022-03-09T11:39:19Z

💊 CI failures summary and remediations

As of commit 01be6a5 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NicolasHug

Thanks @datumbox , I took a brief look only. Minor question but otherwise LGTM

NicolasHug · 2022-03-09T18:26:16Z

test/test_models.py

-        eager_out = nn_module(*args)
+    if eager_out is None:
+        with torch.no_grad(), freeze_rng_state():
+            if unwrapper:


Is this line needed? It looks like it eager_out wasn't unwrapped before

It's not mandatory to have it for most of the existing models but this is only due to implementation details. Since this is a general purpose tool for checking JIT-scriptability, I opted for consistently unwrapping the output in all place.

FYI the reason many models don't have to get unwrapped is due to idioms like this:

vision/torchvision/models/googlenet.py

Lines 163 to 179 in 4ae20e5

@torch.jit.unused

def eager_outputs(self, x: Tensor, aux2: Tensor, aux1: Optional[Tensor]) -> GoogLeNetOutputs:

if self.training and self.aux_logits:

return _GoogLeNetOutputs(x, aux2, aux1)

else:

return x # type: ignore[return-value]

def forward(self, x: Tensor) -> GoogLeNetOutputs:

x = self._transform_input(x)

x, aux1, aux2 = self._forward(x)

aux_defined = self.training and self.aux_logits

if torch.jit.is_scripting():

if not aux_defined:

warnings.warn("Scripted GoogleNet always returns GoogleNetOutputs Tuple")

return GoogLeNetOutputs(x, aux2, aux1)

else:

return self.eager_outputs(x, aux2, aux1)

New non-detection models don't use this idiom any more (returning different output depending on jit/training flag), so I think it's safer to handle it explicitly.

jdsgomes

LGTM! Thanks for this change - great improvement!

github-actions · 2022-03-09T19:34:32Z

Hey @datumbox!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Summary: * Measuring execution times of models. * Speed up models by avoiding re-estimation of eager output * Fixing linter * Reduce input size for big models * Speed up jit check method. * Add simple jitscript fallback check for flaky models. * Restore pytest filtering * Fixing linter Reviewed By: vmoens Differential Revision: D34878998 fbshipit-source-id: 37bfa05aac0d28d59d3320119147446006bff75c

pytorch-bot bot added the ciflow/default label Mar 9, 2022

facebook-github-bot added the cla signed label Mar 9, 2022

datumbox marked this pull request as draft March 9, 2022 11:39

Measuring execution times of models.

17593a5

datumbox force-pushed the hackathon/speedup_tests branch from 9214973 to 17593a5 Compare March 9, 2022 14:12

datumbox added 6 commits March 9, 2022 14:52

Speed up models by avoiding re-estimation of eager output

1875444

Fixing linter

af29b07

Reduce input size for big models

516eda3

Speed up jit check method.

2a3a70d

Add simple jitscript fallback check for flaky models.

4edb317

Restore pytest filtering

1221e12

datumbox force-pushed the hackathon/speedup_tests branch from db77373 to 1221e12 Compare March 9, 2022 18:11

datumbox changed the title ~~[WIP] Speed up CI tests~~ Speed up Model tests by 20% Mar 9, 2022

Fixing linter

d510f5c

datumbox added module: ci module: tests labels Mar 9, 2022

datumbox marked this pull request as ready for review March 9, 2022 18:17

NicolasHug approved these changes Mar 9, 2022

View reviewed changes

jdsgomes approved these changes Mar 9, 2022

View reviewed changes

Merge branch 'main' into hackathon/speedup_tests

01be6a5

datumbox merged commit d0dede0 into pytorch:main Mar 9, 2022

datumbox deleted the hackathon/speedup_tests branch March 9, 2022 19:34

datumbox added the enhancement label Mar 9, 2022

This was referenced Mar 9, 2022

ci: Limit scope of unittest to one python version #5479

Open

Improve test of backbone utils #5552

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up Model tests by 20% #5574

Speed up Model tests by 20% #5574

datumbox commented Mar 9, 2022 •

edited

Loading

facebook-github-bot commented Mar 9, 2022 •

edited

Loading

NicolasHug left a comment

NicolasHug Mar 9, 2022

datumbox Mar 9, 2022

jdsgomes left a comment

github-actions bot commented Mar 9, 2022

	@torch.jit.unused
	def eager_outputs(self, x: Tensor, aux2: Tensor, aux1: Optional[Tensor]) -> GoogLeNetOutputs:
	if self.training and self.aux_logits:
	return _GoogLeNetOutputs(x, aux2, aux1)
	else:
	return x # type: ignore[return-value]

	def forward(self, x: Tensor) -> GoogLeNetOutputs:
	x = self._transform_input(x)
	x, aux1, aux2 = self._forward(x)
	aux_defined = self.training and self.aux_logits
	if torch.jit.is_scripting():
	if not aux_defined:
	warnings.warn("Scripted GoogleNet always returns GoogleNetOutputs Tuple")
	return GoogLeNetOutputs(x, aux2, aux1)
	else:
	return self.eager_outputs(x, aux2, aux1)

Speed up Model tests by 20% #5574

Speed up Model tests by 20% #5574

Conversation

datumbox commented Mar 9, 2022 • edited Loading

facebook-github-bot commented Mar 9, 2022 • edited Loading

💊 CI failures summary and remediations

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug Mar 9, 2022

Choose a reason for hiding this comment

datumbox Mar 9, 2022

Choose a reason for hiding this comment

jdsgomes left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 9, 2022

datumbox commented Mar 9, 2022 •

edited

Loading

facebook-github-bot commented Mar 9, 2022 •

edited

Loading