Compiler Performance: Benchmark Definitions

The compiler performance tracking issue (#48547) defines the four main usage scenarios that we strive to support well. In order to measure how well we are actually doing, we define a benchmark for each scenario. Eventually, [perf.rust-lang.org](perf.rust-lang.org) will provide a graph for each of scenario that shows how compile times develop over time.


Methodology
-----------
Compiler performance in each scenario is measured by the sum of all build times for a given set of projects. The build settings depend on the usage scenario. The set of projects should contain small, medium, and large ones.


Benchmarks
----------

### FROM-SCRATCH - Compiling a project from scratch

Compile the listed projects with each of the following combinations:

 - non-optimized & non-incremental
 - non-optimized & incremental (w/ empty cache)
 - optimized (-Ccodegen-units=8, no LTO) & non-incremental
 - optimized (no LTO) & incremental (w/ empty cache)

Projects:
 - style-servo
 - script-servo
 - encoding-rs
 - clap-rs
 - regex
 - helloworld
 - crates.io
 - hyper
 - html5ever
 - tokio-webpush-simple
 - inflate
 - syn
 - futures
 - piston-image
 - ripgrep
 - webrender
 - cargo
 - winapi
 - stm32f103xx


### SMALL-CHANGE - Re-Compiling a project after a small change

For this scenario, we re-compile the project incrementally with a full cache
after a `println!()` statement has been added somewhere.

 - non-optimized & incremental (w/ full cache)
     - style-servo
     - script-servo
     - encoding-rs (`cargo test --lib --no-run`)
     - clap-rs (`cargo test --no-run`)
     - regex (`cargo test --lib --no-run`)
     - crates.io
     - syn (`cargo test --no-run`)
     - futures (`cargo test --test=all --no-run`)
     - tokio-webpush-simple
     - ripgrep
     - webrender
 - optimized (no LTO) & incremental (w/ full cache)
     - style-servo
     - script-servo
     - tokio-webpush-simple
     - crates.io
     - ripgrep
     - webrender
     - cargo

### RLS - Continuously re-compiling a project for the Rust Language Server

For this scenario, we run `cargo check` incrementally with a full cache
after a `println!()` statement has been added somewhere.

 - `cargo check`, non-optimized  & incremental (w/ full cache)

Projects:
 - style-servo
 - script-servo
 - encoding-rs
 - clap-rs
 - regex
 - helloworld
 - crates.io
 - hyper
 - html5ever
 - tokio-webpush-simple
 - inflate
 - syn
 - futures
 - piston-image
 - ripgrep
 - webrender
 - cargo

 **NOTE**: This is a rather crude method for measuring RLS performance since
 there are many more variables that need to be taken into account here. For
 example, the RLS will invoke the compiler differently, allowing for things to
 be kept in memory that would go onto the disk otherwise. It also produces
 "save-analysis" data, which `cargo check` does not, and the creation of which
 can take up a significant amount of time and thus should be measured!
 Consequently, the RLS benchmarks need more discussion.

### DIST - Compiling a project for maximum runtime performance

For this scenario, we compile the projects from scratch, with maximum
optimizations:

 - optimized (--opt-level=3, full LTO), non-incremental
 - optimized (--opt-level=3, whole crate graph ThinLTO), non-incremental

Projects:
 - style-servo
 - script-servo
 - crates.io
 - tokio-webpush-simple
 - inflate
 - ripgrep
 - webrender
 - cargo
 - stm32f103xx



Open Questions
--------------

 - [ ] The sum of all build times might be too crude of a metric. Sometimes a crate does not compile at all with a specific compiler version. Only successful builds should go into the aggregate score. Is there a metric that intrinsically corrects for missing individual scores?
 - [ ] How to better measure performance in the RLS case?


Please provide your feedback on how well you think the above benchmarks actually measure what people care about when using the Rust compiler. I expect these definitions to undergo a few cycles of iteration before we are satisfied with them.

cc @rust-lang/wg-compiler-performance


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler Performance: Benchmark Definitions #48750

Methodology

Benchmarks

FROM-SCRATCH - Compiling a project from scratch

SMALL-CHANGE - Re-Compiling a project after a small change

RLS - Continuously re-compiling a project for the Rust Language Server

DIST - Compiling a project for maximum runtime performance

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development