[BUG] AttributeError: 'Parameter' object has no attribute 'scale'

**Describe the bug**
I tried to apply deepspeed `InferenceEngine` on `GPT-J-6B` but ran into error `AttributeError: 'Parameter' object has no attribute 'scale'`. I can successfully speed up GPT2 and GPT-NEO, and I didn't get similar issues while searching on the internet, so not sure what happend.

I have no problem with doing inference if directly use the model loaded from huggingface, and only run into this error after apply the `InferenceEngine` on the model

**To Reproduce**
Steps to reproduce the behavior:
```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "EleutherAI/gpt-j-6b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)

import deepspeed

ds_model = deepspeed.init_inference(
    model,
    mp_size=1,
    dtype=torch.float16,
    replace_method="auto",
    replace_with_kernel_inject=True
)

text = "This is a sample prompt"
tokens = tokenizer.encode(text, return_tensors='pt').to(ds_model.module.device)
_ = model(tokens)
```

Below is the error traceback
```
------------------------------------------------------
Free memory : 3.268677 (GigaBytes)  
Total memory: 15.772339 (GigaBytes)  
Requested memory: 0.546875 (GigaBytes) 
Setting maximum total tokens (input + output) to 1024 
WorkSpace: 0x7f2732000000 
------------------------------------------------------
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 852, in forward
    transformer_outputs = self.transformer(
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 687, in forward
    outputs = block(
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 161, in forward
    output = self.mlp(attention_output, input, inp_norm, self.attention.attn_ob)
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH1/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 65, in forward
    output = self.fused_gemm_gelu(input=residual_norm,
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/op_binding/gelu_gemm.py", line 26, in forward
    output = self.fused_gemm_gelu(input, weight, weight.scale, bias, weight_out, weight_out.scale,
AttributeError: 'Parameter' object has no attribute 'scale'
```

**Expected behavior**
I would expect the inference call to work, and return `logits` and `past_key_values`

**ds_report output**
```
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['MYPATH1/pytorch/torch']
torch version .................... 2.1.0a0+gitb8580b0
deepspeed install path ........... ['MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.1+cc67f22f, cc67f22f, master
torch cuda version ............... 12.0
torch hip version ................ None
nvcc version ..................... 12.0
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.0
```

**Screenshots**
No screenshots available

**System info (please complete the following information):**
 - OS: **Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1051-aws x86_64v)**
 - GPU count and types **8 V100 but only using 1 in this experiment**
 - DeepSpeed **`'0.9.1+cc67f22f'`, installed with `pip install git+...`**
 - Hugging Face Transformers **version `'4.29.0.dev0'`, installed with `pip install git+...` **
 - Python version **3.9.16**
 - CUDA 12.0

**Docker context**
Not using a Docker

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] AttributeError: 'Parameter' object has no attribute 'scale' #3242

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development