Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OneHotMatrix causes a 'scalar getindex disallowed' error on GPU #1006

Closed
tanhevg opened this issue Jan 27, 2020 · 9 comments
Closed

OneHotMatrix causes a 'scalar getindex disallowed' error on GPU #1006

tanhevg opened this issue Jan 27, 2020 · 9 comments

Comments

@tanhevg
Copy link

tanhevg commented Jan 27, 2020

I think this issue is causing many reported bugs that complain about "slowness" of OneHotMatrix

I suspect the underlying issue is that OneHotVector is not properly adapted for GPU storage, and OnHotMatrix stores an vector of OneHotVectors. A workaround would be to change OnHotMatrix to store a Vector{Int} and a size, as proposed by #578 . It would then be easy to adapt the vector for GPU storage.

MWE:

using Flux, CuArrays
CuArrays.allowscalar(false)
using Flux: onehotbatch

ohb = onehotbatch(rand(1:10, 100), 1:10) |> gpu;
dl = Dense(10, 5) |> gpu;

dl(ohb);

ERROR: scalar getindex is disallowed
Stacktrace:
 [1] assertscalar(::String) at /data/.julia/packages/GPUArrays/1wgPO/src/indexing.jl:14
 [2] getindex at /data/.julia/packages/GPUArrays/1wgPO/src/indexing.jl:54 [inlined]
 [3] iterate at ./abstractarray.jl:914 [inlined]
 [4] iterate at ./abstractarray.jl:912 [inlined]
 [5] checkindex at ./abstractarray.jl:572 [inlined]
 [6] checkbounds_indices at ./abstractarray.jl:529 [inlined] (repeats 2 times)
 [7] checkbounds at ./abstractarray.jl:482 [inlined]
 [8] checkbounds at ./abstractarray.jl:503 [inlined]
 [9] _getindex at ./multidimensional.jl:669 [inlined]
 [10] getindex at ./abstractarray.jl:981 [inlined]
 [11] *(::CuArray{Float32,2,Nothing}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1,Nothing}}) at /data/.julia/packages/Flux/2i5P1/src/onehot.jl:30
 [12] (::Dense{typeof(identity),CuArray{Float32,2,Nothing},CuArray{Float32,1,Nothing}})(::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1,Nothing}}) at /data/.julia/packages/Flux/2i5P1/src/layers/basic.jl:102
 [13] top-level scope at none:0
@DhairyaLGandhi
Copy link
Member

Could you try with #764 ?

@tanhevg
Copy link
Author

tanhevg commented Jan 27, 2020

@dhairyagandhi96, #764 makes no difference, it makes some changes to onecold which is not called in this example.

@mrchaos
Copy link

mrchaos commented Feb 3, 2020

julia> ohb = float.(onehotbatch(rand(1:10, 100), 1:10)) |> gpu
10×100 CuArray{Float32,2,Nothing}:
....
or
julia> ohb = cu.(onehotbatch(rand(1:10, 100), 1:10)) |> gpu
10×100 CuArray{Float32,2,Nothing}:
....

julia> dl(ohb)
5×100 CuArray{Float32,2,Nothing}:
....

@DhairyaLGandhi
Copy link
Member

Right, I misread that, apologies for that. Minimizing shouldn't be hard, it's probably in the matmul, but would need to cross check.

@tanhevg
Copy link
Author

tanhevg commented Feb 3, 2020

As far as I understand, the point of having OneHotMatrix as a separate type is to replace the expensive matmul with inexpensive indexing operation. This optimisation is lost in the workaround proposed by @mrchaos .

It also does not currently work properly with CuArrays, as demonstrated in the OP. I suppose this code from onehot.jl is supposed to make it work, but evidently it does not.

adapt_structure(T, xs::OneHotMatrix) = OneHotMatrix(xs.height, adapt(T, xs.data))

import .CuArrays: CuArray, cudaconvert
import Base.Broadcast: BroadcastStyle, ArrayStyle
BroadcastStyle(::Type{<:OneHotMatrix{<:CuArray}}) = ArrayStyle{CuArray}()
cudaconvert(x::OneHotMatrix{<:CuArray}) = OneHotMatrix(x.height, cudaconvert(x.data))

I am not 100% sure of the correct way Adapt.jl functions should be used to achieve the desired behaviour (docs are a bit scarce). One workaround I can think of is that instead of holding a Vector of OneHotVectors inside OneHotMatrix, it should hold an array of indices (Vector{Int}).

@CarloLucibello
Copy link
Member

In any case, we don't have this problem when computing crossentropies (which is the main reason why OneHotMatrix is there).

using Flux, CuArrays
CuArrays.allowscalar(false)
using Flux: onehotbatch

ohb = onehotbatch(rand(1:10, 100), 1:10) |> gpu;
ŷ = CuArrays.rand(size(ohb)...)
Flux.crossentropy(ŷ, ohb)

I'm not sure wether OneHotMatrix was ever meant to be a fully-fledged AbstractMatrix to use as an input to a model

@tanhevg
Copy link
Author

tanhevg commented Mar 2, 2020

I'm not sure wether OneHotMatrix was ever meant to be a fully-fledged AbstractMatrix to use as an input to a model

One-hot encoding is a standard technique used in language modelling and other applications of deep learning. When training these models on GPU with Flux, one currently has to revert to dense arrays with zeros and ones for that purpose. This is not ideal; either OneHotMatrix needs to be fixed to support one-hot encoding (seems like a natural fit), or a separate type should be provided.

@tanhevg
Copy link
Author

tanhevg commented Mar 12, 2020

#958

@tanhevg
Copy link
Author

tanhevg commented Sep 24, 2020

Fixed by JuliaGPU/CUDA.jl#90. Make sure Julia 1.5 is used, as well as CUDA instead of CuArrays.

@tanhevg tanhevg closed this as completed Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants