Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fenv cache to task struct #51288

Merged
merged 3 commits into from
Oct 16, 2024
Merged

add fenv cache to task struct #51288

merged 3 commits into from
Oct 16, 2024

Conversation

simonbyrne
Copy link
Contributor

Fixes #51277. I know @vchuravy is skeptical, but I wanted to try it out.

To give some examples:

julia> t = Base.Rounding.setrounding_raw(Float64, Base.Rounding.to_fenv(RoundDown)) do
           Task(() -> println(rounding(Float64)))
       end
Task (runnable) @0x000000010dff04c0

julia> rounding(Float64)
RoundingMode{:Nearest}()

julia> wait(schedule(t))
RoundingMode{:Down}() # currently gives RoundingMode{:Nearest}()

julia> rounding(Float64)
RoundingMode{:Nearest}()

julia> Base.Rounding.setrounding_raw(Float64, Base.Rounding.to_fenv(RoundDown)) do
           Threads.@threads :static for i = 1:Threads.nthreads()
               println(Threads.threadid() =>  rounding(Float64))
           end
       end
1 => RoundingMode{:Down}()
2 => RoundingMode{:Down}() # currently gives RoundingMode{:Nearest}()
4 => RoundingMode{:Down}() # currently gives RoundingMode{:Nearest}()
3 => RoundingMode{:Down}() # currently gives RoundingMode{:Nearest}()

@brenhinkeller brenhinkeller added maths Mathematical functions multithreading Base.Threads and related functionality labels Sep 14, 2023
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work

@@ -510,6 +510,8 @@ JL_NO_ASAN static void ctx_switch(jl_task_t *lastt)
jl_set_pgcstack(&t->gcstack);
jl_signal_fence();
lastt->ptls = NULL;
fegetenv(&lastt->fenv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come before the task switch, as some task switch implementations will clobber it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where specifically should it go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather, it looks like the fesetenv is the one that needed to move

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should it go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to push to this branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done. I don't know about the propagating behavior though. I think we might want to start every task with a call to set fesetenv(FE_DFL_ENV)?

@vtjnash vtjnash requested a review from vchuravy February 1, 2024 16:05
Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not convinced that this is a good thing to do. We have not yet specified what fenv means on the language level and if we want to support it in these cases. Echoing #51277 (comment)

Doesn't longjmp/setjmp save fenv? E.g. why do we have to do manually?

@vtjnash
Copy link
Member

vtjnash commented Feb 1, 2024

I don't think so. That is a very significant problem for anyone using this API, but I don't think it means it is wrong to do this PR (e.g. https://github.com/lattera/glibc/blob/master/sysdeps/x86_64/jmpbuf-offsets.h and https://github.com/lattera/glibc/blob/master/sysdeps/x86_64/__longjmp.S have no space to be capable of storing it)

@vchuravy
Copy link
Member

vchuravy commented Feb 2, 2024

Two notes:

The C standard defines fenv as thread-local, I do agree that it would make sense to define it as a scoped value (as implemented here) and a task would copy the fence from its parent.

I do still maintain that doing so is preemptive without adding language level support for strictfp and constrained floating point ops as defined by LLVM for this purpose.

Also see https://internals.rust-lang.org/t/equivalent-of-pragma-stdc-fenv-access-on/14934

I am okay with merging this partial support under the understanding that there is a declaration of intent to actually develop the full language semantics and that setting or accessing fenv remains UB.

Because right now messing with fenv and then entering the compiler that does some constant evaluation sounds like a recipe for causing hard to track down bugs.

@simonbyrne simonbyrne marked this pull request as ready for review October 16, 2024 05:43
@vtjnash vtjnash dismissed vchuravy’s stale review October 16, 2024 20:51

might not be good to do, but it at least isn't worse than what we already do

@vtjnash vtjnash merged commit 12aa9de into master Oct 16, 2024
6 of 8 checks passed
@vtjnash vtjnash deleted the sb/task-fenv branch October 16, 2024 20:54
@oscardssmith
Copy link
Member

might not be good to do, but it at least isn't worse than what we already do

I disagree with this. This has a real (if small) costs and unclear benefit.

@vtjnash
Copy link
Member

vtjnash commented Oct 16, 2024

Fair: I didn't expect AMD to seem so bad at this.

  CPU: 128 × AMD EPYC 7513 32-Core Processor
julia> @btime Base.Rounding.setrounding_raw(Float64, Base.Rounding.rounding_raw(Float64))
  19.679 ns (0 allocations: 0 bytes)

julia> t = @eval @task while true; yieldto($(current_task())); end
Task (runnable) @0x00007fae5e5b1090

julia> @btime yieldto(t)
  134.496 ns (0 allocations: 0 bytes)
  CPU: 10 × Apple M2 Pro
julia> @btime Base.Rounding.setrounding_raw(Float64, Base.Rounding.rounding_raw(Float64))
  5.916 ns (0 allocations: 0 bytes)
0

julia> @btime yieldto(t)
  67.197 ns (0 allocations: 0 bytes)
  CPU: 8 × Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz
julia> @btime Base.Rounding.setrounding_raw(Float64, Base.Rounding.rounding_raw(Float64))
  6.380 ns (0 allocations: 0 bytes)
0

julia> @btime yieldto(t)
  175.516 ns (0 allocations: 0 bytes)

KristofferC pushed a commit that referenced this pull request Oct 21, 2024
Fixes #51277, though we give no guarantee that it keeps working this
way, or that calling `setrounding_raw` won't lead to other undefined
behavior.

To give some examples:

```julia
julia> t = Base.Rounding.setrounding_raw(Float64, Base.Rounding.to_fenv(RoundDown)) do
           Task(() -> println(rounding(Float64)))
       end
Task (runnable) @0x000000010dff04c0

julia> rounding(Float64)
RoundingMode{:Nearest}()

julia> wait(schedule(t))
RoundingMode{:Down}() # currently gives RoundingMode{:Nearest}()

julia> rounding(Float64)
RoundingMode{:Nearest}()

julia> Base.Rounding.setrounding_raw(Float64, Base.Rounding.to_fenv(RoundDown)) do
           Threads.@threads :static for i = 1:Threads.nthreads()
               println(Threads.threadid() =>  rounding(Float64))
           end
       end
1 => RoundingMode{:Down}()
2 => RoundingMode{:Down}() # currently gives RoundingMode{:Nearest}()
4 => RoundingMode{:Down}() # currently gives RoundingMode{:Nearest}()
3 => RoundingMode{:Down}() # currently gives RoundingMode{:Nearest}()
```
@JeffBezanson
Copy link
Member

For me this more than doubled task switch time, from 120ns to 290ns. This at least needs to be discussed.

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Dec 5, 2024
JeffBezanson added a commit that referenced this pull request Jan 7, 2025
vchuravy pushed a commit that referenced this pull request Jan 8, 2025
This reverts commit 12aa9de.

un-fixes #51277 

Alternatively, we could add a flag for whether the state is non-default
so we know whether `fesetenv` is necessary, if this behavior is
considered desireable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions multithreading Base.Threads and related functionality triage This should be discussed on a triage call
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add floating point environment to task state
6 participants