Skip to content

onlinedpo error when use deepspeed zero3 #7

Open
@August-murr

Description

System Info

`
transformers 4.47.0
triton 3.0.0
trl 0.12.1
trove-classifiers 2024.10.21.16
truststore 0.8.0
typer 0.14.0
types-dataclasses 0.6.6
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2024.2
tzlocal 5.2
ujson 5.10.0
urllib3 2.2.2
utils 1.0.2
uvicorn 0.32.1
uvloop 0.21.0
virtualenv 20.28.0
vllm 0.6.3
vllm-flash-attn 2.6.1

trl env
`Copy-paste the following information when reporting an issue:

Platform: Linux-5.4.143-2-velinux1-amd64-x86_64-with-glibc2.35
Python version: 3.11.9
PyTorch version: 2.4.0
CUDA device(s): NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-80GB, NVIDIA A100-SXM4-80GB
Transformers version: 4.47.0
Accelerate version: 1.1.1
Accelerate config: not found
Datasets version: 3.1.0
HF Hub version: 0.26.3
TRL version: 0.12.1
bitsandbytes version: 0.45.0
DeepSpeed version: 0.16.1
Diffusers version: not installed
Liger-Kernel version: not installed
LLM-Blender version: 0.0.2
OpenAI version: 1.57.0
PEFT version: 0.13.2`

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

class UnifiedDPODataset(Dataset):
"""
统一的DPO数据集
"""
def init(self, file, tokenizer, max_seq_length, max_prompt_length, template,
maximum_es_score,minimum_es_score,bool_training:bool):
self.tokenizer = tokenizer
self.template_name = template.template_name
#==None
self.system_format = template.system_format
self.user_format = template.user_format
self.assistant_format = template.assistant_format
self.system = template.system

    self.max_seq_length = max_seq_length
    self.max_prompt_length = max_prompt_length
    logger.info('Loading data: {}'.format(file))
    with open(file, 'r', encoding='utf-8') as f:
        raw_data_list = f.readlines()
        #根据key=es_score过滤数据
        for check_data in raw_data_list:
            try:
                json.loads(check_data)
            except json.decoder.JSONDecodeError as e:
                print(f'JSONDecodeError={e.args},check_data={check_data}')
        # data_list = [json.loads(data) for data in raw_data_list 
        #     if float(json.loads(data)['es_score']) >= minimum_es_score 
        #     and float(json.loads(data)['es_score']) <= maximum_es_score]
        if bool_training:
            data_list=[]
            for data_str_iter in raw_data_list:
                data_json_iter=json.loads(data_str_iter)
                if isinstance(data_json_iter['es_score'],dict):
                    es_score=max([float(elem) for elem in list(data_json_iter['es_score'].values())])
                else:
                    es_score=float(data_json_iter['es_score'])
                if es_score >= minimum_es_score and es_score <= maximum_es_score:
                    data_list.append(data_json_iter)
        else:
            data_list=[json.loads(data_str_iter) for data_str_iter in raw_data_list]
    logger.info(f"Use template {self.template_name} for training,bool_training={bool_training},There are {len(data_list)} data in dataset,原始数据量={len(raw_data_list)}")
    self.data_list = data_list

def __len__(self):
    return len(self.data_list)

def build_prompt_input_ids(self, system, history):
    """
    chatglm2: [gMASK]sop [Round 1]\n\n问:{input1}\n\n答:{target1}</s>[Round 2]\n\n问:{input2}\n\n答:{target2}</s>...
    chatglm3: [gMASK]sop <|system|>xxx<|user|>xxx<|assistant|>xxx<eos>
    others: {system_format}{user_format}{assistant_format}{user_format}{assistant_format}...
    """
    # chatglm模型具有特殊的起始token
    if self.template_name in ['chatglm2', 'chatglm3']:
        prompt_input_ids = self.tokenizer.get_prefix_tokens()
    else:
        prompt_input_ids = []
    prompt=''
    # collect system information
    if self.system_format is not None:
        system = system if system is not None else self.system
        # system信息不为空
        if system is not None:
            if self.template_name == 'chatglm3':
                prompt_input_ids += [self.tokenizer.get_command(f"<|system|>")] + self.tokenizer.encode(system, add_special_tokens=False)
            else:
                system_text = self.system_format.format(content=system)
                prompt_input_ids += self.tokenizer.encode(system_text, add_special_tokens=False)
            prompt+=system_text
    # collect history
    ##将 user/assist 的 multi-turn  prompt/input_ids 拼接
    for i, conv in enumerate(history):
        role = conv['role'].strip()
        content = conv['content'].strip()

        assert role != 'system', 'there should not be more than one system information'
        text_iter=''
        if role == 'user':
            if self.template_name == 'chatglm2':
                human = self.user_format.format(content=content, idx=i//2 + 1)
                input_ids = self.tokenizer.encode(human, add_special_tokens=False)
            elif self.template_name == 'chatglm3':
                input_ids = [self.tokenizer.get_command(f"<|user|>")] + \
                            self.tokenizer.encode(content, add_special_tokens=False) + \
                            [self.tokenizer.get_command(f"<|assistant|>")]
            else:
                human = self.user_format.format(content=content, stop_token=self.tokenizer.eos_token)
                input_ids = self.tokenizer.encode(human, add_special_tokens=False)
            text_iter=human
        elif role == 'assistant':
            if self.template_name in ['chatglm2', 'chatglm3']:
                input_ids = self.tokenizer.encode(content, add_special_tokens=False) + [self.tokenizer.eos_token_id]
            else:
                assistant = self.assistant_format.format(content=content, stop_token=self.tokenizer.eos_token)
                input_ids = self.tokenizer.encode(assistant, add_special_tokens=False)
            text_iter=assistant
        else:
            raise Exception('role error')
        prompt_input_ids += input_ids
        prompt += text_iter

    return prompt_input_ids,prompt

def __getitem__(self, index):
    data = self.data_list[index]
    # data = json.loads(data)
    chosen = data['chosen']
    rejected = data['rejected']
    assert len(chosen) == len(rejected)

    # 判断第0个是否为system
    if chosen[0]['role'] == 'system':
        system = chosen[0]['content'].strip()
        history = chosen[1:-1]  # 对话上文
        chosen, rejected = chosen[-1], rejected[-1]
    else:
        # user/assist ,单轮 history为空
        system = None
        history = chosen[:-1]  # 对话上文
        ##chosen/rejected 最后一轮,assist的回复
        chosen, rejected = chosen[-1], rejected[-1]

    # build prompt 
    #构建 system, history 部分
    prompt_input_ids,prompt = self.build_prompt_input_ids(system, history)

    # build response
    if self.template_name in ['chatglm2', 'chatglm3']:
        chosen_input_ids = self.tokenizer.encode(chosen['content'], add_special_tokens=False) + [self.tokenizer.eos_token_id]
        rejected_input_ids = self.tokenizer.encode(rejected['content'], add_special_tokens=False) + [self.tokenizer.eos_token_id]
    else:
        #chosen content 对应的prompt
        chosen = self.assistant_format.format(content=chosen['content'], stop_token=self.tokenizer.eos_token)
        #rejected content 对应的prompt
        rejected = self.assistant_format.format(content=rejected['content'], stop_token=self.tokenizer.eos_token)

        chosen_input_ids = self.tokenizer.encode(chosen, add_special_tokens=False)
        rejected_input_ids = self.tokenizer.encode(rejected, add_special_tokens=False)

    # truncate by max_seq_length
    ##todo 需要在生成语料时候对最长的声场加上截断,过滤筛选,防止过长
    longer_response_length = max(len(chosen_input_ids), len(rejected_input_ids))
    # if combined sequence is too long, truncate the prompt
    if len(prompt_input_ids) + longer_response_length > self.max_seq_length:
        #取 static的 max_prompt_length  和  max_seq_length - longer_response_length 的最大值
        max_prompt_length = max(self.max_prompt_length, self.max_seq_length - longer_response_length)
        #截断
        prompt_input_ids = prompt_input_ids[-max_prompt_length:]
    # if that's still too long, truncate the response
    ##?? 什么情况still too long?
    if len(prompt_input_ids) + longer_response_length > self.max_seq_length:
        chosen_input_ids = chosen_input_ids[: self.max_seq_length - len(prompt_input_ids)]
        rejected_input_ids = rejected_input_ids[: self.max_seq_length - len(prompt_input_ids)]
    chosen_content_of_assist_len=len(chosen_input_ids)
    reject_content_of_assist_len=len(rejected_input_ids)
    chosen_labels = [-100] * len(prompt_input_ids) + chosen_input_ids
    chosen_input_ids = prompt_input_ids + chosen_input_ids
    rejected_labels = [-100] * len(prompt_input_ids) + rejected_input_ids
    rejected_input_ids = prompt_input_ids + rejected_input_ids
    assert len(chosen_labels) == len(chosen_input_ids)
    assert len(rejected_labels) == len(rejected_input_ids)
    if np.random.random()<0.01:
        info_msg=f'longer_response_length={longer_response_length},prompt_input_ids len={len(prompt_input_ids)}'+ \
        f'chosen_答案长度={chosen_content_of_assist_len},reject答案长度={reject_content_of_assist_len}'+\
        f'拼接prompt后chosen长度={len(chosen_input_ids)},拼接prompt后reject长度={len(rejected_input_ids)}'+\
        f'prompt_input_ids={prompt_input_ids},\n chosen_input_ids={chosen_input_ids},\n rejected_input_ids={rejected_input_ids},\n'+\
        f'chosen_labels={chosen_labels},\n rejected_labels={rejected_labels}'
        print(info_msg) 
    inputs = dict(
        prompt_input_ids=prompt_input_ids,
        prompt_attention_mask=[1]*len(prompt_input_ids),
        chosen_input_ids=chosen_input_ids,
        chosen_attention_mask=[1]*len(chosen_input_ids),
        chosen_labels=chosen_labels,
        rejected_input_ids=rejected_input_ids,
        rejected_attention_mask=[1]*len(rejected_input_ids),
        rejected_labels=rejected_labels,
        prompt=prompt
    )
    return inputs

# 为了适配DPOTrainer的接口
def map(self, func, **kwargs):
    return self
# 为了适配DPOTrainer的接口
def map(self, func, **kwargs):
    return self
def select(self,index_list):
    select_data_lsit=[]
    for index in index_list:
        data_iter=self.data_list[index]
        select_data_lsit.append(data_iter)
    return select_data_lsit

class UnifiedOnlineDPODataset(UnifiedDPODataset):
def init(self, file, tokenizer, max_seq_length,template,
maximum_es_score,minimum_es_score,bool_training:bool):
max_prompt_length=max_seq_length
super(UnifiedOnlineDPODataset, self).init(file=file, tokenizer=tokenizer, max_seq_length=max_seq_length,
max_prompt_length=max_prompt_length, template=template,maximum_es_score=maximum_es_score,minimum_es_score=minimum_es_score,
bool_training=bool_training)
def getitem(self, index):
data = self.data_list[index]
# build prompt
#构建 system, history 部分
# 判断第0个是否为system
# chosen = data['chosen']
# if chosen[0]['role'] == 'system':
# system = chosen[0]['content'].strip()
# history = chosen[1:-1] # 对话上文
# chosen = chosen[-1]
# else:
# # user/assist ,单轮 history为空
# system = None
# history = chosen[:-1] # 对话上文
# ##chosen/rejected 最后一轮,assist的回复
# chosen = chosen[-1]
# prompt_input_ids,prompt = self.build_prompt_input_ids(system, history)

    prompt=data['prompt']
    groundtruth=data['groundtruth']
    # self.build_prompt_input_ids(system, history)       
    # prompt_input_ids = self.tokenizer.encode(prompt, add_special_tokens=False) + [self.tokenizer.eos_token_id]
    prompt_input_ids = self.tokenizer.encode(prompt, add_special_tokens=False)
    ##todo assert fim_end
    assert groundtruth.endswith(TC.DS_EOS_TOKEN)
    assert not prompt.endswith(TC.DS_EOS_TOKEN)
    # system = None

    # truncate by max_seq_length
    ##todo 需要在生成语料时候对最长的声场加上截断,过滤筛选,防止过长
    # if combined sequence is too long, truncate the prompt
    if len(prompt_input_ids) > self.max_prompt_length:
        #截断
        prompt_input_ids = prompt_input_ids[-self.max_prompt_length:]        
        decoded_prompt=self.tokenizer.decode(prompt_input_ids,skip_special_tokens=False)
        double_decoded_prompt_ids=self.tokenizer.encode(decoded_prompt,add_special_tokens=False)
        #for check 
        try:
            zipo_decode_tuple_list=list(zip(prompt[-self.max_prompt_length:][::-1],decoded_prompt[-self.max_prompt_length:][::-1]))[::-1]
            zipo_decode_id_tuple_list=list(zip(prompt_input_ids[::-1],double_decoded_prompt_ids[::-1]))[::-1]
            assert decoded_prompt[-self.max_prompt_length:]==prompt[-self.max_prompt_length:]
        except AssertionError as e:
            # print(f'decoded_prompt[-self.max_prompt_length:]=\n{decoded_prompt[-self.max_prompt_length:]},\nprompt[-self.max_prompt_length:]={prompt[-self.max_prompt_length:]},')        
            print(f'decoded_prompt[-self.max_prompt_length:]={decoded_prompt[-self.max_prompt_length:]},\n'+\
                f'prompt[-self.max_prompt_length:]={prompt[-self.max_prompt_length:]},\n'+\
                f'zipo_decode_tuple_list={zipo_decode_tuple_list},\nzipo_decode_id_tuple_list={zipo_decode_id_tuple_list}')
        prompt=decoded_prompt
        # prompt=self.tokenizer.convert_ids_to_tokens(prompt_input_ids)
    inputs = dict(
        # prompt_input_ids=prompt_input_ids,
        prompt_input_ids=prompt_input_ids,
        prompt_attention_mask=[1]*len(prompt_input_ids),
        prompt=prompt,
        groundtruth=groundtruth
    )
    return inputs

Expected behavior

| [rank4]: Traceback (most recent call last): |  

  | app | task-multitask-rl-dev-26d7c8b2
  | container | main
  | filename | /var/log/pods/sc-ep_task-multitask-rl-dev-26d7c8b2-master-0_0dfb4ee9-631c-4e77-8070-46d72626f885/main/0.log

  |   | 2024-12-30 10:53:44.559 | [rank4]: Traceback (most recent call last): |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/lpai-running/code/firefly-zyy-dev/339ecc/shells/../train_onlinedpo.py", line 251, in |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: main() |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/lpai-running/code/firefly-zyy-dev/339ecc/shells/../train_onlinedpo.py", line 195, in main |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: train_result=trainer.train() |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/transformers/trainer.py", line 2164, in train |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: return inner_training_loop( |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/transformers/trainer.py", line 2522, in _inner_training_loop |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch) |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/lpai-running/code/firefly-zyy-dev/339ecc/models/online_dpo_trainer.py", line 480, in training_step |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: output = unwrapped_model.generate( |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: return func(*args, **kwargs) |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 2252, in generate |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: result = self._sample( |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 3254, in _sample |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: outputs = model_forward(**model_inputs, return_dict=True) |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: return self._call_impl(*args, **kwargs) |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: result = forward_call(*args, **kwargs) |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1163, in forward |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: outputs = self.model( |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: return self._call_impl(*args, **kwargs) |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: return forward_call(*args, **kwargs) |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 883, in forward |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: causal_mask = self._update_causal_mask( |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 993, in _update_causal_mask |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position( |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: File "/opt/conda/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1060, in _prepare_4d_causal_attention_mask_with_cache_position |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: causal_mask *= torch.arange(target_length, device=device) > cache_position.reshape(-1, 1) |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: RuntimeError: The size of tensor a (4137) must match the size of tensor b (4138) at non-singleton dimension 0 |  
  | Fieldsapptask-multitask-rl-dev-26d7c8b2containermainfilename/var/log/pods/sc-ep_task-multitask-rl-dev-26d7c8b2-master-0_0dfb4ee9-631c-4e77-8070-46d72626f885/main/0.logjobsc-ep/task-multitask-rl-dev-26d7c8b2namespacesc-epnode_name10.48.7.142podtask-multitask-rl-dev-26d7c8b2-master-0streamstderr | Fields |   | app | task-multitask-rl-dev-26d7c8b2 |   | container | main |   | filename | /var/log/pods/sc-ep_task-multitask-rl-dev-26d7c8b2-master-0_0dfb4ee9-631c-4e77-8070-46d72626f885/main/0.log |   | job | sc-ep/task-multitask-rl-dev-26d7c8b2 |   | namespace | sc-ep |   | node_name | 10.48.7.142 |   | pod | task-multitask-rl-dev-26d7c8b2-master-0 |   | stream | stderr
Fields
  | app | task-multitask-rl-dev-26d7c8b2
  | container | main
  | filename | /var/log/pods/sc-ep_task-multitask-rl-dev-26d7c8b2-master-0_0dfb4ee9-631c-4e77-8070-46d72626f885/main/0.log
  | job | sc-ep/task-multitask-rl-dev-26d7c8b2
  | namespace | sc-ep
  | node_name | 10.48.7.142
  | pod | task-multitask-rl-dev-26d7c8b2-master-0
  | stream | stderr
  |   | 2024-12-30 10:53:44.559 | [rank4]: Exception raised from infer_size_impl at /opt/conda/conda-bld/pytorch_1720538435607/work/aten/src/ATen/ExpandUtils.cpp:31 (most recent call first): |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: C++ CapturedTraceback: |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: #4 std::_Function_handler<std::shared_ptr<c10::LazyValuestd::string const> (), c10::SetStackTraceFetcher(std::function<std::string ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 |  
 
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#40 do_call_core from /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#42 method_vectorcall from /usr/local/src/conda/python-3.11.9/Objects/classobject.c:59 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#43 _PyVectorcall_Call from /usr/local/src/conda/python-3.11.9/Objects/call.c:257 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#44 do_call_core from /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#45 _PyEval_EvalFrame from /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#46 method_vectorcall from /usr/local/src/conda/python-3.11.9/Objects/classobject.c:59 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#47 _PyVectorcall_Call from /usr/local/src/conda/python-3.11.9/Objects/call.c:257 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#48 do_call_core from /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#50 method_vectorcall from /usr/local/src/conda/python-3.11.9/Objects/classobject.c:59 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#51 _PyVectorcall_Call from /usr/local/src/conda/python-3.11.9/Objects/call.c:257 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#52 do_call_core from /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#54 _PyVectorcall_Call from /usr/local/src/conda/python-3.11.9/Objects/call.c:257 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#55 do_call_core from /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#57 method_vectorcall from /usr/local/src/conda/python-3.11.9/Objects/classobject.c:59 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#58 _PyVectorcall_Call from /usr/local/src/conda/python-3.11.9/Objects/call.c:257 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#59 partial_call from /usr/local/src/conda/python-3.11.9/Modules/_functoolsmodule.c:324 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#60 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.11.9/Objects/call.c:214 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.11.9/Include/internal/pycore_call.h:92 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#62 _PyEval_EvalFrameDefault from /usr/local/src/conda/python-3.11.9/Python/ceval.c:4769 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#64 PyEval_EvalCode from /usr/local/src/conda/python-3.11.9/Python/ceval.c:1148 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#65 run_eval_code_obj from /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:1741 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#66 run_mod from /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:1762 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#67 pyrun_file from /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:1657 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#68 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:440 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#69 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.11.9/Python/pythonrun.c:79 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#70 pymain_run_file_obj from /usr/local/src/conda/python-3.11.9/Modules/main.c:360 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#71 Py_BytesMain from /usr/local/src/conda/python-3.11.9/Modules/main.c:734 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#72 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#73 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 |  
  |   | 2024-12-30 10:53:44.559 | [rank4]: huggingface#74 _start from ??:0 |  
  |   | 2024-12-30 10:53:44.559 |   |  

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions