My graphics card has insufficient memory. Can I use memory and graphics memory to run it? #32

Jackxwb · 2024-05-05T10:06:56Z

My graphics card has insufficient memory. Can I use memory and graphics memory to run it?
My computer:

Window 10
GTX 1063
DDR4 24G

Run error message:

python chatbot.py --path V:\codellama-7b-instruct-pad
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\AI\Llama2-Code-Interpreter\chatbot.py:104: GradioUnusedKwargWarning: You have unused kwarg parameters in Chatbot, please remove them: {'avatar_images': './assets/logo2.png'}
  chatbot = gr.Chatbot(height=820, avatar_images="./assets/logo2.png")
Traceback (most recent call last):
  File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 238, in <module>
    gradio_launch(model_path=args.path, load_in_4bit=True)
  File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 108, in gradio_launch
    interpreter = StreamingLlamaCodeInterpreter(
  File "D:\AI\Llama2-Code-Interpreter\code_interpreter\LlamaCodeInterpreter.py", line 79, in __init__
    self.model = LlamaForCausalLM.from_pretrained(
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\modeling_utils.py", line 3119, in from_pretrained
    raise ValueError(
ValueError:
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
                        `device_map` to `from_pretrained`. Check
                        https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
                        for more details.

model is Seungyoun/codellama-7b-instruct-pad

I have tried changing LlamaCodeInterpreter.py:79 to the following code, but encountered an error when running it: TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_8bit_fp32_cpu_offload'

self.model = LlamaForCausalLM.from_pretrained(
            model_path,
            device_map="auto",
            load_in_4bit=load_in_4bit,
            load_in_8bit=load_in_8bit,
            torch_dtype=torch.float16,
            load_in_8bit_fp32_cpu_offload=True,
        )

Complete operation log:

python chatbot.py --path V:\codellama-7b-instruct-pad
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\AI\Llama2-Code-Interpreter\chatbot.py:104: GradioUnusedKwargWarning: You have unused kwarg parameters in Chatbot, please remove them: {'avatar_images': './assets/logo2.png'}
  chatbot = gr.Chatbot(height=820, avatar_images="./assets/logo2.png")
Traceback (most recent call last):
  File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 238, in <module>
    gradio_launch(model_path=args.path, load_in_4bit=True)
  File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 108, in gradio_launch
    interpreter = StreamingLlamaCodeInterpreter(
  File "D:\AI\Llama2-Code-Interpreter\code_interpreter\LlamaCodeInterpreter.py", line 79, in __init__
    self.model = LlamaForCausalLM.from_pretrained(
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\modeling_utils.py", line 2959, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_8bit_fp32_cpu_offload'

The text was updated successfully, but these errors were encountered:

yarou1025 · 2024-11-20T13:55:09Z

try pip install gradio==4.44.0 for warning in gradio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

My graphics card has insufficient memory. Can I use memory and graphics memory to run it? #32

My graphics card has insufficient memory. Can I use memory and graphics memory to run it? #32

Jackxwb commented May 5, 2024

yarou1025 commented Nov 20, 2024

My graphics card has insufficient memory. Can I use memory and graphics memory to run it? #32

My graphics card has insufficient memory. Can I use memory and graphics memory to run it? #32

Comments

Jackxwb commented May 5, 2024

yarou1025 commented Nov 20, 2024