Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster rand!(::MersenneTwister, ::Array{Bool}) #33721

Merged
merged 4 commits into from
May 4, 2020
Merged

Conversation

rfourquet
Copy link
Member

This uses the same optimizations as for other bits types,
and gives equivalent performance as for UInt8 (at least
7x to 9x speedup in few tested cases).

This seems to work, but I'm not sure whether it's valid to write arbitrary bit patterns at the memory pointed to by pointer(b::Array{Bool}).

@rfourquet rfourquet added performance Must go faster randomness Random number generation and the Random stdlib labels Oct 30, 2019
@chriselrod
Copy link
Contributor

chriselrod commented Oct 30, 2019

Seems like you would have to specifically reinterpret a pointer to the Bool array to notice the difference:

julia> bar(b) = @inbounds reinterpret(UInt8, b[1])
bar (generic function with 1 method)

julia> unsafe_load(reinterpret(Ptr{UInt8}, pointer(b)))
0xff

julia> bar(b)
0x01

At any getindex (like in bar), seems that there is always a & 1 to eliminate the top bits.

julia> @code_llvm bar(b)

;  @ REPL[110]:1 within `bar'
; Function Attrs: uwtable
define i8 @julia_bar_19418(%jl_value_t addrspace(10)* nonnull align 16 dereferenceable(40)) #0 {
top:
; ┌ @ array.jl:758 within `getindex'
   %1 = addrspacecast %jl_value_t addrspace(10)* %0 to %jl_value_t addrspace(11)*
   %2 = bitcast %jl_value_t addrspace(11)* %1 to i8 addrspace(13)* addrspace(11)*
   %3 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %2, align 8
   %4 = load i8, i8 addrspace(13)* %3, align 1
   %5 = and i8 %4, 1
; └
  ret i8 %5
}

@rfourquet
Copy link
Member Author

I'm not sure I feel comfortable with the fact that this implementation would be observable. It's easy to expect that at most 1 bit will be set per byte. On slack @jakobnissen was giving the example of loading 8 Bool at a time and calling count_ones on the resulting UInt64 as an optimized way to sum 8 Bools, which would break here.

@rfourquet
Copy link
Member Author

Ok, so I updated with a specific algorithm for Array{Bool} (instead of using the one for Array{UInt8}), which avoids writing non-zero bits in other positions than the LSB of each byte. rand!(::Array{Bool}) then gets slightly faster than rand!(::Array{UInt8}).

@jakobnissen
Copy link
Contributor

@rfourquet Looks good to me, except I think you forgot to protect the array from reallocation/garbage collection with GC.@preserve.

@rfourquet rfourquet force-pushed the rf/MT-rand-bang-Bool branch from ceb2cd2 to 13a7c0a Compare October 31, 2019 11:44
@vtjnash
Copy link
Member

vtjnash commented Oct 31, 2019

Writing garbage is technically supposed to be OK because the array may have not been zero-init, we need to handle that case gracefully. But usually best to stick with valid bit representations. (especially when it’s faster!)

@rfourquet
Copy link
Member Author

Ah right, I didn't about uninitialized arrays. What makes the algo faster now is that its specialized, masking to not write garbage still costs maybe 5% performance, which seems like a fine compromise to me.

@chethega
Copy link
Contributor

chethega commented Nov 1, 2019

For example, copy_to_bitarray_chunks!(Bc::Vector{UInt64}, pos_d::Int, C::Array{Bool}, pos_s::Int, numbits::Int) in base/bitarray.jl assumes that at most the lowest bit of each Bool is set, and will produce garbage otherwise.

We could of course make that more robust, but usercode might make the same assumptions.

@rfourquet
Copy link
Member Author

Bump

@rfourquet
Copy link
Member Author

I added some test code for this change in another PR, as said test code revealed a bug bug in RandomDevice.
I will merge in a couple of days if no objection.

rfourquet added 4 commits May 2, 2020 19:02
This uses the same optimizations as for other bits types,
and gives equivalent performance as for `UInt8` (at least
7x to 9x speedup in few tested cases).
@rfourquet rfourquet force-pushed the rf/MT-rand-bang-Bool branch from b3513f5 to 751d0b7 Compare May 2, 2020 17:14
@rfourquet
Copy link
Member Author

@nanosoldier runtests(ALL, vs =":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@rfourquet rfourquet merged commit a1f6448 into master May 4, 2020
@rfourquet rfourquet deleted the rf/MT-rand-bang-Bool branch May 4, 2020 10:04
@rfourquet
Copy link
Member Author

The three PkgEval failures seem unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants