Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace several intrinsics with Julia equivalents #22202

Merged
merged 3 commits into from
Jul 8, 2017
Merged

Conversation

timholy
Copy link
Member

@timholy timholy commented Jun 3, 2017

While thinking about #18521/#20005 a bit more, I realized that perhaps an easier approach would be to ensure that every creation of certain exception types occurred from Julia code. To cover both InexactError and DomainError (probably our two most irritating errors), that required Julia replacements for 4 intrinsics.

I've checked that the LLVM code generation is identical, but it's probably a good idea to @nanosoldier runbenchmarks((ALL, vs = ":master").

@timholy
Copy link
Member Author

timholy commented Jun 3, 2017

@nanosoldier runbenchmarks(ALL, vs = ":master")

base/math.jl Outdated
@@ -435,8 +435,14 @@ Compute sine and cosine of `x`, where `x` is in radians.
return res
end

sqrt(x::Float64) = sqrt_llvm(x)
sqrt(x::Float32) = sqrt_llvm(x)
@inline function sqrt(x::Float64)
Copy link
Contributor

@musm musm Jun 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use dispatch for the f32 and f64 version?

src/julia.h Outdated
@@ -521,7 +521,6 @@ extern JL_DLLEXPORT jl_value_t *jl_stackovf_exception;
extern JL_DLLEXPORT jl_value_t *jl_memory_exception;
extern JL_DLLEXPORT jl_value_t *jl_readonlymemory_exception;
extern JL_DLLEXPORT jl_value_t *jl_diverror_exception;
extern JL_DLLEXPORT jl_value_t *jl_domain_exception;
extern JL_DLLEXPORT jl_value_t *jl_overflow_exception;
extern JL_DLLEXPORT jl_value_t *jl_inexact_exception;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand, but if nothing else throws inexact exceptions from C can this be removed (like for domain_exception)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like you're right. There are no uses of cvt_iintrinsic_checked after this, so even though it looks like there is another use of jl_inexact_exception in runtime_intrinsics, it seems to be dead code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have that removed in the next PR, but I might as well move that to this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of that next PR, I'm noticing that the elegant mechanism here doesn't seem to work as I'm suspecting it was intended:

be1(a, i) = (if i > 1 throw(BoundsError(a, i)) end; @inbounds ret = a[1]; ret)
be2(a, i) = (if i > 1 Base.throw_boundserror(a, i) end; @inbounds ret = a[1]; ret)
a = rand(5)

julia> @code_llvm be1(a, 2)

define double @julia_be1_62876(i8** dereferenceable(40), i64) #0 !dbg !5 {
top:
  %ptls_i8 = call i8* asm "movq %fs:0, $0;\0Aaddq $$-10928, $0", "=r,~{dirflag},~{fpsr},~{flags}"() #3
  %ptls = bitcast i8* %ptls_i8 to i8****
  %2 = alloca [3 x i8**], align 8
  %.sub = getelementptr inbounds [3 x i8**], [3 x i8**]* %2, i64 0, i64 0
  %3 = getelementptr [3 x i8**], [3 x i8**]* %2, i64 0, i64 2
  store i8** null, i8*** %3, align 8
  %4 = bitcast [3 x i8**]* %2 to i64*
  store i64 2, i64* %4, align 8
  %5 = getelementptr [3 x i8**], [3 x i8**]* %2, i64 0, i64 1
  %6 = bitcast i8* %ptls_i8 to i64*
  %7 = load i64, i64* %6, align 8
  %8 = bitcast i8*** %5 to i64*
  store i64 %7, i64* %8, align 8
  store i8*** %.sub, i8**** %ptls, align 8
  %9 = icmp slt i64 %1, 2
  br i1 %9, label %L5, label %if

if:                                               ; preds = %top
  %10 = call i8** @jlsys_Type_59491(i8** inttoptr (i64 140329698287600 to i8**), i8** nonnull %0, i64 %1)
  store i8** %10, i8*** %3, align 8
  call void @jl_throw(i8** %10)
  unreachable

L5:                                               ; preds = %top
  %11 = bitcast i8** %0 to double**
  %12 = load double*, double** %11, align 8
  %13 = load double, double* %12, align 8
  %14 = load i64, i64* %8, align 8
  store i64 %14, i64* %6, align 8
  ret double %13
}

julia> @code_llvm be2(a, 2)

define double @julia_be2_62879(i8** dereferenceable(40), i64) #0 !dbg !5 {
top:
  %2 = icmp slt i64 %1, 2
  br i1 %2, label %L5, label %if

if:                                               ; preds = %top
  call void @julia_throw_boundserror_62880(i8** nonnull %0, i64 %1)
  call void @llvm.trap()
  unreachable

L5:                                               ; preds = %top
  %3 = bitcast i8** %0 to double**
  %4 = load double*, double** %3, align 8
  %5 = load double, double* %4, align 8
  ret double %5
}

Obviously I can introduce throw_inexacterror if necessary.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

base/math.jl Outdated
sqrt_llvm(x)
end
@inline function sqrt(x::Float32)
x < 0.0f0 && throw(DomainError())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to just use x < zero(x) and combine the two methods?

@ararslan ararslan added maths Mathematical functions performance Must go faster labels Jun 3, 2017
@timholy timholy force-pushed the teh/purge_intrinsics branch from c3aa2b8 to 9d0d30a Compare June 3, 2017 22:13
@timholy
Copy link
Member Author

timholy commented Jun 3, 2017

So, some of those regressions do not seem to be reproducible, but others are real; one of the most affected is

x = 1+2im
y = UInt(1)+UInt(2)im

myplus(x, y) = x+y

If you inspect @code_llvm x+y, you get identical code with this branch as with master; but @code_llvm myplus(x,y) fails to inline + on this branch (but not master). This seem to arise because our inliner-worthiness algorithm counts expressions, and an intrinsic that generates the same LLVM as julia code nevertheless "looks" simpler by hiding the instructions from the inliner. So while I can force-inline the convert operation itself, things that use the convert will be more heavily penalized.

I'm not exactly sure how to handle this. It seems that the right solution is to improve the inliner (e.g., make it depend on LLVM IR rather than julia expressions), but that also seems like a big job.

@timholy
Copy link
Member Author

timholy commented Jul 8, 2017

Rebased on top of #22210, at least that specific problem is resolved.

Time for another @nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @jrevels

@timholy timholy force-pushed the teh/purge_intrinsics branch from 36eb30e to d5ec1c6 Compare July 8, 2017 09:02
@timholy
Copy link
Member Author

timholy commented Jul 8, 2017

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Member Author

timholy commented Jul 8, 2017

Looks good. The only two of these that are reproducible are ["io","read","readstring"] and ["scalar","intfuncs",("nextpow2","Int64","-")], and locally these are ~10% effects. They are due to very small differences in cost accounting ending up on one side or the other of the inlining threshold (e.g., cost 92 vs 105). If we really cared we could bump the threshold a bit.

@timholy timholy merged commit 0c9a858 into master Jul 8, 2017
@timholy timholy deleted the teh/purge_intrinsics branch July 8, 2017 14:08
ararslan added a commit that referenced this pull request Jul 8, 2017
The only uses of the function were removed in #22202, so building with
Clang previously emitted an unused function warning.
ararslan added a commit that referenced this pull request Jul 9, 2017
The only uses of the function were removed in #22202, so building with
Clang previously emitted an unused function warning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants