Skip to content

jl_cumulative_compile_time_ns_before tracks compilation time per thread rather than per task #41271

Open
@rgankema

Description

For my work at RelationalAI I'm trying to gather metrics on how much time is spent in run-time compilation while evaluating queries. I was hoping to use jl_cumulative_compile_time_ns_[before/after] for this, a little bit like this:

function evaluate_query(query_fn)
    time_before = time_ns()
    compile_before = jl_cumulative_compile_time_ns_before()
    
    query_fn()
    
    compile_after = jl_cumulative_compile_time_ns_after()
    time_after = time_ns()
    inc_metric!(metrics, "query_eval_total_time", time_after - time_before)
    inc_metric!(metrics, "query_jl_compile_time", compile_after - compile_before)
end

This works fine if there are no background tasks and queries run sequentially. It also works fine in a multi-threaded scenario, as long as only a single task runs at-a-time at any given thread, because jl_cumulative_compile_time_ns_[before/after] seem to use separate counters per thread. However, it breaks when multiple tasks are multiplexed on the same thread. For instance, if one call of evaluate_query has a very long running query_fn, and other tasks that do work that incur compilation time are running at the same time on that thread, that first evaluate_query call will also record all the compilation time triggered by those others tasks. This means that we're overestimating how much time we spent in compilation for that given query, potentially by a very large margin.

To see this behavior in action, please consider the following MRE:

using Base.Threads: @async

function do_something_and_measure_compilation_time(a, b)
    compile_before = Base.cumulative_compile_time_ns_before()

    # Function that we want to measure.
    x = do_something(a, b)

    compile_after = Base.cumulative_compile_time_ns_after()
    compile_duration_s = (compile_after - compile_before) / 1_000_000_000
    @show compile_duration_s
    return x
end

# Simulates a long running function that costs a lot more time to run than to compile.
function do_something(a, b)
    x = foo(a, a + 3, b, b + 5)
    sleep(5)
    return x
end

# Something to compile.
function foo(x1, x2, y1, y2)
    xs = [x1:x2...]
    ys = [y1:y2...]
    return sum(xs) * max(xs...) * sum(ys) * max(ys...)
end

# Do a couple of runs. We should see compilation time at first, but that should quickly go
# away.
do_something_and_measure_compilation_time(1, 2)
do_something_and_measure_compilation_time(1, 2)
do_something_and_measure_compilation_time(1, 2)
do_something_and_measure_compilation_time(1, 2)

# Call foo with types we haven't seen before, in the background.
@async begin
    sleep(1)
    for T1 in [Int32, Int16, Int8]
        for T2 in [Int32, Int16, Int8]
            @time foo(zero(T1), one(T1), zero(T2), one(T2))
        end
    end
end

# Final run, this time we'll measure compilation time from the above, even though it's
# not compilation that comes from this task.
do_something_and_measure_compilation_time(1, 2)

This outputs:

compile_duration_s = 0.024834394
compile_duration_s = 0.002964853
compile_duration_s = 0.0
compile_duration_s = 0.0
  0.037290 seconds (87.25 k allocations: 5.084 MiB, 99.79% compilation time)
  0.077509 seconds (169.89 k allocations: 10.369 MiB, 99.61% compilation time)
  0.068803 seconds (170.97 k allocations: 10.438 MiB, 99.69% compilation time)
  0.008911 seconds (4.97 k allocations: 271.748 KiB, 99.42% compilation time)
  0.009805 seconds (4.92 k allocations: 266.718 KiB, 99.52% compilation time)
  0.014783 seconds (4.97 k allocations: 269.733 KiB, 98.73% compilation time)
  0.009418 seconds (4.97 k allocations: 270.107 KiB, 99.38% compilation time)
  0.009743 seconds (4.97 k allocations: 269.671 KiB, 99.47% compilation time)
  0.013891 seconds (4.92 k allocations: 266.905 KiB, 99.33% compilation time)
compile_duration_s = 0.298222365

As you can see, the last call to do_something_and_measure_compilation_time recorded the compilation time of the background task. Unfortunately that behavior makes it unusable for the use-case described above. We'd therefore like to request that jl_cumulative_compile_time_ns_[before/after] is changed such that it keeps a counter per task, rather than per thread. Thanks!

Metadata

Assignees

No one assigned

    Labels

    multithreadingBase.Threads and related functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions