Demoing hot to use html5ever through kuchiki to speed up html parsing and css-selecting.
parse_file
and parse_text
return a parsed Document
, which then lets you select elements by css selectors using the select
method. All elements are returned as strings
Create a python 3.6+ venv and activate it. Install html-py-ever in there (python setup.py install
). To get a readable benchmark, run test/run_all.py
. To get a real benchmark, run pytest test_parsing.py
or pytest test_selector.py
. Both have a --benchmark-histogram
option.
Running on Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz with python 3.6 and rustc 1.30.0-nightly (aaa170beb 2018-08-31)
run_all.py
monty-python.html 1400
Parse lxml 0.013675s 0.114107s 8.344x
Parse py 0.013675s 0.191262s 13.986x
Select lxml 0.004283s 0.001122s 3.818x
Select py 0.004047s 0.001122s 3.608x
empty.html 0
Parse lxml 0.000050s 0.000250s 5.027x
Parse py 0.000050s 0.000091s 1.834x
Select lxml 0.000047s 0.000011s 4.452x
Select py 0.000034s 0.000011s 3.263x
small.html 0
Parse lxml 0.000050s 0.000408s 8.221x
Parse py 0.000050s 0.000341s 6.860x
Select lxml 0.000048s 0.000006s 7.700x
Select py 0.000116s 0.000006s 18.739x
rust.html 733
Parse lxml 0.034088s 0.269182s 7.897x
Parse py 0.034088s 0.423923s 12.436x
Select lxml 0.006814s 0.004962s 1.373x
Select py 0.006792s 0.004962s 1.369x
python.html 1518
Parse lxml 0.134979s 1.440968s 10.675x
Parse py 0.134979s 2.271023s 16.825x
Select lxml 0.036732s 0.006711s 5.474x
Select py 0.036882s 0.006711s 5.496x
test_parsing.py
------------------------------------------------------------------------------------------------------------------- benchmark: 10 tests -------------------------------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bench_parsing_rust[empty.html] 6.1110 (1.0) 513.7940 (1.0) 7.4792 (1.0) 9.5990 (1.0) 6.3950 (1.0) 0.2948 (1.0) 649;4746 133,704.3206 (1.0) 27203 1
test_bench_parsing_rust[small.html] 19.3520 (3.17) 788.8010 (1.54) 22.1472 (2.96) 16.8692 (1.76) 19.8700 (3.11) 0.5373 (1.82) 393;1818 45,152.4211 (0.34) 16177 1
test_bench_parsing_python[empty.html] 57.6250 (9.43) 38,060.2320 (74.08) 72.3809 (9.68) 457.3842 (47.65) 59.6890 (9.33) 3.0377 (10.31) 11;948 13,815.7902 (0.10) 6939 1
test_bench_parsing_python[small.html] 290.9070 (47.60) 2,750.8890 (5.35) 345.1972 (46.15) 178.1737 (18.56) 301.0480 (47.08) 26.8838 (91.21) 103;362 2,896.8951 (0.02) 2477 1
test_bench_parsing_rust[monty-python.html] 12,943.2440 (>1000.0) 21,217.3930 (41.30) 13,930.9700 (>1000.0) 1,687.9115 (175.84) 13,393.0260 (>1000.0) 493.4407 (>1000.0) 6;7 71.7825 (0.00) 65 1
test_bench_parsing_rust[rust.html] 27,254.8300 (>1000.0) 44,283.6160 (86.19) 29,939.0300 (>1000.0) 3,770.0365 (392.75) 28,366.1800 (>1000.0) 2,199.8490 (>1000.0) 4;4 33.4012 (0.00) 30 1
test_bench_parsing_rust[python.html] 117,097.9310 (>1000.0) 139,946.1370 (272.38) 124,982.5736 (>1000.0) 7,679.8512 (800.07) 124,375.9720 (>1000.0) 10,055.3265 (>1000.0) 2;0 8.0011 (0.00) 8 1
test_bench_parsing_python[monty-python.html] 181,122.6270 (>1000.0) 221,371.7280 (430.86) 191,845.8776 (>1000.0) 16,849.9999 (>1000.0) 186,777.4470 (>1000.0) 15,766.5518 (>1000.0) 1;1 5.2125 (0.00) 5 1
test_bench_parsing_python[rust.html] 384,658.8340 (>1000.0) 423,217.7400 (823.71) 406,878.9022 (>1000.0) 17,625.0831 (>1000.0) 413,173.2850 (>1000.0) 31,943.3840 (>1000.0) 1;0 2.4577 (0.00) 5 1
test_bench_parsing_python[python.html] 2,195,261.3770 (>1000.0) 2,249,598.2990 (>1000.0) 2,221,196.6530 (>1000.0) 23,091.9237 (>1000.0) 2,212,574.4390 (>1000.0) 38,692.2310 (>1000.0) 2;0 0.4502 (0.00) 5 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_selector.py
------------------------------------------------------------------------------------------------------------ benchmark: 10 tests -------------------------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bench_selector_rust[empty.html] 1.3180 (1.0) 63.8790 (1.0) 1.5361 (1.0) 0.9402 (1.0) 1.4420 (1.0) 0.0630 (1.0) 1884;6084 651,005.8803 (1.0) 84775 1
test_bench_selector_rust[small.html] 1.5300 (1.16) 112.5220 (1.76) 1.7647 (1.15) 1.0319 (1.10) 1.6590 (1.15) 0.0630 (1.00) 2215;7135 566,666.9515 (0.87) 96507 1
test_bench_selector_python[empty.html] 20.1260 (15.27) 532.0720 (8.33) 22.9150 (14.92) 12.8876 (13.71) 20.8190 (14.44) 0.5280 (8.38) 818;1965 43,639.4426 (0.07) 18434 1
test_bench_selector_python[small.html] 26.5540 (20.15) 890.5700 (13.94) 29.7362 (19.36) 14.8236 (15.77) 27.4300 (19.02) 0.7265 (11.53) 762;2109 33,629.0076 (0.05) 17413 1
test_bench_selector_rust[monty-python.html] 691.8140 (524.90) 2,925.7400 (45.80) 851.7575 (554.50) 222.7539 (236.93) 802.9160 (556.81) 79.2970 (>1000.0) 43;69 1,174.0430 (0.00) 843 1
test_bench_selector_rust[rust.html] 1,220.5940 (926.10) 6,789.2340 (106.28) 1,509.8102 (982.90) 540.7908 (575.20) 1,352.9600 (938.25) 361.6030 (>1000.0) 8;6 662.3349 (0.00) 240 1
test_bench_selector_python[monty-python.html] 3,851.9600 (>1000.0) 8,077.7510 (126.45) 4,260.0542 (>1000.0) 675.4977 (718.48) 4,063.3380 (>1000.0) 216.4488 (>1000.0) 20;26 234.7388 (0.00) 245 1
test_bench_selector_python[rust.html] 6,437.3910 (>1000.0) 11,348.6070 (177.66) 7,033.6536 (>1000.0) 1,050.6394 (>1000.0) 6,739.6810 (>1000.0) 363.3680 (>1000.0) 12;13 142.1736 (0.00) 151 1
test_bench_selector_rust[python.html] 6,504.3130 (>1000.0) 12,934.9650 (202.49) 7,557.5249 (>1000.0) 1,398.7101 (>1000.0) 6,976.7700 (>1000.0) 965.8090 (>1000.0) 17;16 132.3185 (0.00) 143 1
test_bench_selector_python[python.html] 36,145.0260 (>1000.0) 46,582.5100 (729.23) 38,058.3009 (>1000.0) 2,960.4055 (>1000.0) 36,630.3450 (>1000.0) 1,389.9710 (>1000.0) 4;5 26.2755 (0.00) 23 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------