Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rand(::String) #22224

Merged
merged 7 commits into from
Jun 15, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
add rand(::String)
  • Loading branch information
rfourquet committed Jun 13, 2017
commit 459a12698639c4280696459b5cbf0821038fdc60
43 changes: 41 additions & 2 deletions base/random.jl
Original file line number Diff line number Diff line change
Expand Up @@ -256,8 +256,9 @@ globalRNG() = GLOBAL_RNG

Pick a random element or array of random elements from the set of values specified by `S`; `S` can be

* an indexable collection (for example `1:n` or `['x','y','z']`), or
* an `Associative` or `AbstractSet` object, or
* an indexable collection (for example `1:n` or `['x','y','z']`),
* an `Associative` or `AbstractSet` object,
* a string (considered as a collection of characters),
Copy link
Member

@stevengj stevengj Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need "or" at the end of this line (or at the beginning of the next line)

* a type: the set of values to pick from is then equivalent to `typemin(S):typemax(S)` for
integers (this is not applicable to [`BigInt`](@ref)), and to ``[0, 1)`` for floating
point numbers;
Expand Down Expand Up @@ -709,6 +710,44 @@ end
rand(rng::AbstractRNG, r::AbstractArray{T}, dims::Dims) where {T} = rand!(rng, Array{T}(dims), r)
rand(rng::AbstractRNG, r::AbstractArray, dims::Integer...) = rand(rng, r, convert(Dims, dims))

# rand from a string

rand(rng::AbstractRNG, s::AbstractString) = rand_iter(rng, s, Base.iteratorsize(s))
rand(s::AbstractString) = rand(GLOBAL_RNG, s)

## optimization for String

isvalid_unsafe(s::String, i) = !Base.is_valid_continuation(unsafe_load(pointer(s), i))

function rand(rng::AbstractRNG, s::String)
rg = Base.Random.RangeGenerator(1:s.len)
while true
pos = rand(Base.Random.GLOBAL_RNG, rg)
isvalid_unsafe(s, pos) && return s[pos]
end
end

## rand from an iterator

rand_iter(rng, s, ::Base.IteratorSize) = throw(ArgumentError("iterator must have a known length"))

function rand_iter(rng::AbstractRNG, s, ::Union{Base.HasShape,Base.HasLength})::eltype(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is the return type annotation necessary here? c is the value that will be returned and isn't that already guaranteed to have type eltype(s) (or at least <:eltype(s) if eltype(s) is abstract) given that it's an element of s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inference probably can't determine that i == pos is guaranteed to be true for at least one value of the enumerate loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Tony is spot on, and also initially I thought to propose this PR working generally on iterators, and the return type annotation could serve as an additional way to check whether the argument was complying to the iterator interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, nice. Thanks for the explanation, both of you.

pos = rand(rng, 1:length(s))
for (i, c) in enumerate(s)
i == pos && return c
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little skeptical of the rand_iter function and AbstractString support. People using rand with other string types will probably not expect this function to be O(n); it seems preferable to throw a MethodError rather than to use this implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, I had the same hesitation and had planned to update the doc today to warn about the complexity. Then I think that with a proper warning those convenience methods (similarly for AbstractSet and Associative at #22228) are not harmful and can be handy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also do rejection sampling: randomly choose and index from start(s):endof(s) and retry if it's not a valid start of a character – i.e. the same thing you do for String.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh great! for some reason, I skimmed too fast on the generic code of isvalid and thought it was O(n).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was confused not about isvalid, but about the fact that I thought I needed to compute the length somehow, which is O(n) in the generic method. Thanks again @StefanKarpinski, I updated the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StefanKarpinski would you mind having a look at this updated generic implementation, to confirm it complies with AbstractString API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to do i = rand(rng, g); isvalid(s, i) && return i since various string types are expected to implement that without the try/catch. The try/catch only exists as a last resort fall-back, and I believe it is more targeted and only catches UnicodeError exceptions (if it's not already, it should be).

Copy link
Member Author

@rfourquet rfourquet Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks (I believe the idea being that try/catch an exception is expected to occur, and slow down the computation when the isvalid call would generally be faster). I will update. Note that the generic isvalid catches all exceptions (not only UnicodeError). I'm not familiar with strings, but I can open a corresponding simple PR if it helps.


## rand from a string for arrays
# we use collect(str), which is most of the time more efficient than specialized methods
# (except maybe for very small arrays)
rand!(rng::AbstractRNG, A::AbstractArray, str::AbstractString) = rand!(rng, A, collect(str))
rand!(A::AbstractArray, str::AbstractString) = rand!(GLOBAL_RNG, A, str)
rand(rng::AbstractRNG, str::AbstractString, dims::Dims) = rand!(rng, Array{eltype(str)}(dims), str)
rand(rng::AbstractRNG, str::AbstractString, d1::Integer, dims::Integer...) = rand(rng, str, convert(Dims, tuple(d1, dims...)))
rand(str::AbstractString, dims::Dims) = rand(GLOBAL_RNG, str, dims)
rand(str::AbstractString, d1::Integer, dims::Integer...) = rand(GLOBAL_RNG, str, d1, dims...)

## random BitArrays (AbstractRNG)

function rand!(rng::AbstractRNG, B::BitArray)
Expand Down
16 changes: 10 additions & 6 deletions test/random.jl
Original file line number Diff line number Diff line change
Expand Up @@ -319,8 +319,9 @@ for rng in ([], [MersenneTwister(0)], [RandomDevice()])
1:100 => Int,
rand(Int, 100) => Int,
Int => Int,
Float64 => Float64]

Float64 => Float64,
"qwèrtï" => Char,
GenericString("qwèrtï") => Char]
b2 = big(2)
u3 = UInt(3)
for f in [rand, randn, randexp]
Expand All @@ -346,10 +347,13 @@ for rng in ([], [MersenneTwister(0)], [RandomDevice()])
a0 = rand(rng..., C) ::T
a1 = rand(rng..., C, 5) ::Vector{T}
a2 = rand(rng..., C, 2, 3) ::Array{T, 2}
a3 = rand(rng..., C, b2, u3) ::Array{T, 2}
a4 = rand!(rng..., Array{T}(5), C) ::Vector{T}
a5 = rand!(rng..., Array{T}(2, 3), C) ::Array{T, 2}
for a in [a0, a1..., a2..., a3..., a4..., a5...]
a3 = rand(rng..., C, (2, 3)) ::Array{T, 2}
a4 = rand(rng..., C, b2, u3) ::Array{T, 2}
a5 = rand!(rng..., Array{T}(5), C) ::Vector{T}
a6 = rand!(rng..., Array{T}(2, 3), C) ::Array{T, 2}
@test size(a1) == (5,)
@test size(a2) == size(a3) == (2, 3)
for a in [a0, a1..., a2..., a3..., a4..., a5..., a6...]
if C isa Type
@test a isa C
else
Expand Down