Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change find() to return the same index type as pairs() #24774

Merged
merged 4 commits into from
Jan 10, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -362,13 +362,19 @@ This section lists changes that do not have deprecation warnings.
trait; see its documentation for details. Types which support subtraction (operator
`-`) must now implement `widen` for hashing to work inside heterogeneous arrays.

* `AbstractSet` objects are now considered equal by `==` and `isequal` if all of their
* `findn(x::AbstractVector)` now returns a 1-tuple with the vector of indices, to be
consistent with higher order arrays ([#25365]).

* `find` now returns the same type of indices as `keys`/`pairs` for `AbstractArray`,
`AbstractDict`, `AbstractString`, `Tuple` and `NamedTuple` objects ([#24774]).
In particular, this means that it returns `CartesianIndex` objects for matrices
and higher-dimensional arrays instead of linear indices as was previously the case.
Use `Int[LinearIndices(size(a))[i] for i in find(f, a)]` to compute linear indices.

* `AbstractSet` objects are now considered equal by `==` and `isequal` if all of their
elements are equal ([#25368]). This has required changing the hashing algorithm
for `BitSet`.

* `findn(x::AbstractVector)` now return a 1-tuple with the vector of indices, to be
consistent with higher order arrays ([#25365]).

* the default behavior of `titlecase` is changed in two ways ([#23393]):
+ characters not starting a word are converted to lowercase;
a new keyword argument `strict` is added which
Expand All @@ -377,7 +383,6 @@ This section lists changes that do not have deprecation warnings.
to get the old behavior (only "space" characters are considered as
word separators), use the keyword `wordsep=isspace`.


Library improvements
--------------------

Expand Down Expand Up @@ -1155,6 +1160,7 @@ Command-line option changes
[#24713]: https://github.com/JuliaLang/julia/issues/24713
[#24714]: https://github.com/JuliaLang/julia/issues/24714
[#24715]: https://github.com/JuliaLang/julia/issues/24715
[#24774]: https://github.com/JuliaLang/julia/issues/24774
[#24781]: https://github.com/JuliaLang/julia/issues/24781
[#24785]: https://github.com/JuliaLang/julia/issues/24785
[#24786]: https://github.com/JuliaLang/julia/issues/24786
Expand Down
86 changes: 43 additions & 43 deletions base/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1720,48 +1720,60 @@ findlast(testf::Function, A) = findprev(testf, A, endof(A))
"""
find(f::Function, A)

Return a vector `I` of the linear indices of `A` where `f(A[I])` returns `true`.
Return a vector `I` of the indices or keys of `A` where `f(A[I])` returns `true`.
If there are no such elements of `A`, return an empty array.

Indices or keys are of the same type as those returned by [`keys(A)`](@ref)
and [`pairs(A)`](@ref) for `AbstractArray`, `AbstractDict`, `AbstractString`
`Tuple` and `NamedTuple` objects, and are linear indices starting at `1`
for other iterables.

# Examples
```jldoctest
julia> x = [1, 3, 4]
3-element Array{Int64,1}:
1
3
4

julia> find(isodd, x)
2-element Array{Int64,1}:
1
2

julia> A = [1 2 0; 3 4 0]
2×3 Array{Int64,2}:
1 2 0
3 4 0

julia> find(isodd, A)
2-element Array{Int64,1}:
1
2
2-element Array{CartesianIndex{2},1}:
CartesianIndex(1, 1)
CartesianIndex(2, 1)

julia> find(!iszero, A)
4-element Array{Int64,1}:
1
2
3
4
4-element Array{CartesianIndex{2},1}:
CartesianIndex(1, 1)
CartesianIndex(2, 1)
CartesianIndex(1, 2)
CartesianIndex(2, 2)

julia> d = Dict(:A => 10, :B => -1, :C => 0)
Dict{Symbol,Int64} with 3 entries:
:A => 10
:B => -1
:C => 0

julia> find(x -> x >= 0, d)
2-element Array{Symbol,1}:
:A
:C

julia> find(isodd, [2, 4])
0-element Array{Int64,1}
```
"""
function find(testf::Function, A)
# use a dynamic-length array to store the indices, then copy to a non-padded
# array for the return
tmpI = Vector{Int}()
inds = _index_remapper(A)
for (i,a) = enumerate(A)
if testf(a)
push!(tmpI, inds[i])
end
end
I = Vector{Int}(uninitialized, length(tmpI))
copyto!(I, tmpI)
return I
end
_index_remapper(A::AbstractArray) = linearindices(A)
_index_remapper(iter) = OneTo(typemax(Int)) # safe for objects that don't implement length
find(testf::Function, A) = collect(first(p) for p in _pairs(A) if testf(last(p)))

_pairs(A::Union{AbstractArray, AbstractDict, AbstractString, Tuple, NamedTuple}) = pairs(A)
_pairs(iter) = zip(OneTo(typemax(Int)), iter) # safe for objects that don't implement length

"""
find(A)
Expand All @@ -1786,22 +1798,10 @@ julia> find(falses(3))
```
"""
function find(A)
nnzA = count(t -> t != 0, A)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to issues with the inference of the returned eltype (see commit message), I've taken a radical approach simply using collect instead of the custom loops. This means we no longer compute the length of the result before filling it. Benchmarks will be needed to check what's the best approach, but at first sight it doesn't sound obvious to me that doing two passes over the data is a good tradeoff, does it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically it was worth it, but might be worth benchmarking again. See also https://discourse.julialang.org/t/push-and-interfacing-to-the-runtime-library/7461, which suggests that two passes might still be faster (growing an array with push! is 3x slower than growing it with setindex!).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also https://discourse.julialang.org/t/half-vectorization/7399/3, which benchmarks some conditional comprehensions with somewhat alarming results. I think you really need to benchmark this change before we can decide.

I = Vector{Int}(uninitialized, nnzA)
cnt = 1
inds = _index_remapper(A)
warned = false
for (i,a) in enumerate(A)
if !warned && !(a isa Bool)
depwarn("In the future `find(A)` will only work on boolean collections. Use `find(x->x!=0, A)` instead.", :find)
warned = true
end
if a != 0
I[cnt] = inds[i]
cnt += 1
end
if !(eltype(A) === Bool) && !all(x -> x isa Bool, A)
depwarn("In the future `find(A)` will only work on boolean collections. Use `find(x->x!=0, A)` instead.", :find)
end
return I
collect(first(p) for p in _pairs(A) if last(p) != 0)
end

find(x::Bool) = x ? [1] : Vector{Int}()
Expand Down
2 changes: 1 addition & 1 deletion base/sparse/sparsematrix.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1276,7 +1276,7 @@ function find(p::Function, S::SparseMatrixCSC)
end
sz = size(S)
I, J = _findn(p, S)
return Base._sub2ind(sz, I, J)
return CartesianIndex.(I, J)
end
find(p::Base.OccursIn, x::SparseMatrixCSC) =
invoke(find, Tuple{Base.OccursIn, AbstractArray}, p, x)
Expand Down
6 changes: 6 additions & 0 deletions test/arrayops.jl
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,12 @@ end
@test findnext(equalto(0x00), [0x00, 0x01, 0x00], 2) == 3
@test findprev(equalto(0x00), [0x00, 0x01, 0x00], 2) == 1
end
@testset "find with Matrix" begin
A = [1 2 0; 3 4 0]
@test find(isodd, A) == [CartesianIndex(1, 1), CartesianIndex(2, 1)]
@test find(!iszero, A) == [CartesianIndex(1, 1), CartesianIndex(2, 1),
CartesianIndex(1, 2), CartesianIndex(2, 2)]
end
@testset "find with general iterables" begin
s = "julia"
@test find(c -> c == 'l', s) == [3]
Expand Down
7 changes: 7 additions & 0 deletions test/dict.jl
Original file line number Diff line number Diff line change
Expand Up @@ -757,3 +757,10 @@ end
end
@test map(string, keys(d)) == Set(["1","3"])
end

@testset "find" begin
@test @inferred find(equalto(1), Dict(:a=>1, :b=>2)) == [:a]
@test @inferred sort(find(equalto(1), Dict(:a=>1, :b=>1))) == [:a, :b]
@test @inferred isempty(find(equalto(1), Dict()))
@test @inferred isempty(find(equalto(1), Dict(:a=>2, :b=>3)))
end
5 changes: 5 additions & 0 deletions test/namedtuple.jl
Original file line number Diff line number Diff line change
Expand Up @@ -208,3 +208,8 @@ abstr_nt_22194_3()
@test Base.structdiff((a=1, b=2, z=20), NamedTuple{(:b,)}) == (a=1, z=20)
@test typeof(Base.structdiff(NamedTuple{(:a, :b), Tuple{Int32, Union{Int32, Nothing}}}((1, Int32(2))),
(a=0,))) === NamedTuple{(:b,), Tuple{Union{Int32, Nothing}}}

@test @inferred find(equalto(1), (a=1, b=2)) == [:a]
@test @inferred find(equalto(1), (a=1, b=1)) == [:a, :b]
@test @inferred isempty(find(equalto(1), NamedTuple()))
@test @inferred isempty(find(equalto(1), (a=2, b=3)))
4 changes: 4 additions & 0 deletions test/strings/search.jl
Original file line number Diff line number Diff line change
Expand Up @@ -324,3 +324,7 @@ end
@test findnext(equalto('('), "(⨳(", 2) == 5
@test findlast(equalto('('), "(⨳(") == 5
@test findprev(equalto('('), "(⨳(", 2) == 1

@test @inferred find(equalto('a'), "éa") == [3]
@test @inferred find(equalto('€'), "€€") == [1, 4]
@test @inferred isempty(find(equalto('é'), ""))
7 changes: 7 additions & 0 deletions test/tuple.jl
Original file line number Diff line number Diff line change
Expand Up @@ -364,3 +364,10 @@ end
@testset "issue 24707" begin
@test eltype(Tuple{Vararg{T}} where T<:Integer) >: Integer
end

@testset "find" begin
@test @inferred find(equalto(1), (1, 2)) == [1]
@test @inferred find(equalto(1), (1, 1)) == [1, 2]
@test @inferred isempty(find(equalto(1), ()))
@test @inferred isempty(find(equalto(1), (2, 3)))
end