[BUG] AttributeError: 'Parameter' object has no attribute 'scale' #3242
Description
Describe the bug
I tried to apply deepspeed InferenceEngine
on GPT-J-6B
but ran into error AttributeError: 'Parameter' object has no attribute 'scale'
. I can successfully speed up GPT2 and GPT-NEO, and I didn't get similar issues while searching on the internet, so not sure what happend.
I have no problem with doing inference if directly use the model loaded from huggingface, and only run into this error after apply the InferenceEngine
on the model
To Reproduce
Steps to reproduce the behavior:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "EleutherAI/gpt-j-6b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
import deepspeed
ds_model = deepspeed.init_inference(
model,
mp_size=1,
dtype=torch.float16,
replace_method="auto",
replace_with_kernel_inject=True
)
text = "This is a sample prompt"
tokens = tokenizer.encode(text, return_tensors='pt').to(ds_model.module.device)
_ = model(tokens)
Below is the error traceback
------------------------------------------------------
Free memory : 3.268677 (GigaBytes)
Total memory: 15.772339 (GigaBytes)
Requested memory: 0.546875 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x7f2732000000
------------------------------------------------------
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 852, in forward
transformer_outputs = self.transformer(
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 687, in forward
outputs = block(
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 161, in forward
output = self.mlp(attention_output, input, inp_norm, self.attention.attn_ob)
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH1/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 65, in forward
output = self.fused_gemm_gelu(input=residual_norm,
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/op_binding/gelu_gemm.py", line 26, in forward
output = self.fused_gemm_gelu(input, weight, weight.scale, bias, weight_out, weight_out.scale,
AttributeError: 'Parameter' object has no attribute 'scale'
Expected behavior
I would expect the inference call to work, and return logits
and past_key_values
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['MYPATH1/pytorch/torch']
torch version .................... 2.1.0a0+gitb8580b0
deepspeed install path ........... ['MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.1+cc67f22f, cc67f22f, master
torch cuda version ............... 12.0
torch hip version ................ None
nvcc version ..................... 12.0
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.0
Screenshots
No screenshots available
System info (please complete the following information):
- OS: Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1051-aws x86_64v)
- GPU count and types 8 V100 but only using 1 in this experiment
- DeepSpeed
'0.9.1+cc67f22f'
, installed withpip install git+...
- Hugging Face Transformers **version
'4.29.0.dev0'
, installed withpip install git+...
** - Python version 3.9.16
- CUDA 12.0
Docker context
Not using a Docker
Additional context
Add any other context about the problem here.