Skip to content

Ariya12138/Evaluator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Evaluator

复用了FlashRAG评测生成答案质量的Evaluator,做到了即插即用。 支持metrics: rouge-1,rouge-2, rouge-l, bleu-1, bleu-2, bleu-3, bleu-4, f1, recall, precision。(其他的还没测)

数据集格式如下:
generated_response_file.jsonl

{“sample_id": $sample_id, "predicted_response": str}

golden_response_file.jsonl

{“sample_id": $sample_id, "golden_response": str}

如果有其他数据集格式,可以在dataset.py中的Dataset类的_load_data中重新实现如何读取数据

    def _load_data(self, golden_file_path: str, user_file_path: str) -> List[Item]:
        """Load data from the provided dataset_path or directly download the file(TODO)."""
        if not os.path.exists(golden_file_path):
            # TODO: auto download: self._download(self.dataset_name, dataset_path)
            raise FileNotFoundError(f"Dataset file {golden_file_path} not found.")
        
        if not os.path.exists(user_file_path):
            raise FileNotFoundError(f"Dataset file {user_file_path} not found.")

        data = []
        with open(golden_file_path, "r", encoding="utf-8") as fg, open(user_file_path, "r", encoding="utf-8") as fu:
            for line1, line2 in zip(fg, fu):
                golden_dict = json.loads(line1)
                user_dict = json.loads(line2)
                assert golden_dict["sample_id"] == user_dict["sample_id"]
                
                if user_dict["predicted_response"] == None:
                    continue
                item_dict = {}
                item_dict["sample_id"] = golden_dict["sample_id"]
                temp_user_response = process_response(user_dict["predicted_response"])
                temp_golden_response = process_response(golden_dict["golden_response"])
                if temp_user_response == None or temp_golden_response == None:
                    continue
                item_dict["golden_response"] = [temp_golden_response]
                item_dict["predicted_response"] = temp_user_response
                item = Item(item_dict)
                data.append(item)
        

        return data

需要允许评测脚本的时候,直接在命令行中执行

python evaluating.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published