Evaluator

复用了FlashRAG评测生成答案质量的Evaluator，做到了即插即用。支持metrics: rouge-1,rouge-2, rouge-l, bleu-1, bleu-2, bleu-3, bleu-4, f1, recall, precision。（其他的还没测）

数据集格式如下：
generated_response_file.jsonl

{“sample_id": $sample_id, "predicted_response": str}

golden_response_file.jsonl

{“sample_id": $sample_id, "golden_response": str}

如果有其他数据集格式，可以在dataset.py中的Dataset类的_load_data中重新实现如何读取数据

    def _load_data(self, golden_file_path: str, user_file_path: str) -> List[Item]:
        """Load data from the provided dataset_path or directly download the file(TODO)."""
        if not os.path.exists(golden_file_path):
            # TODO: auto download: self._download(self.dataset_name, dataset_path)
            raise FileNotFoundError(f"Dataset file {golden_file_path} not found.")
        
        if not os.path.exists(user_file_path):
            raise FileNotFoundError(f"Dataset file {user_file_path} not found.")

        data = []
        with open(golden_file_path, "r", encoding="utf-8") as fg, open(user_file_path, "r", encoding="utf-8") as fu:
            for line1, line2 in zip(fg, fu):
                golden_dict = json.loads(line1)
                user_dict = json.loads(line2)
                assert golden_dict["sample_id"] == user_dict["sample_id"]
                
                if user_dict["predicted_response"] == None:
                    continue
                item_dict = {}
                item_dict["sample_id"] = golden_dict["sample_id"]
                temp_user_response = process_response(user_dict["predicted_response"])
                temp_golden_response = process_response(golden_dict["golden_response"])
                if temp_user_response == None or temp_golden_response == None:
                    continue
                item_dict["golden_response"] = [temp_golden_response]
                item_dict["predicted_response"] = temp_user_response
                item = Item(item_dict)
                data.append(item)
        

        return data

需要允许评测脚本的时候，直接在命令行中执行

python evaluating.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
generation		generation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluator

About

Releases

Packages

Languages

Ariya12138/Evaluator

Folders and files

Latest commit

History

Repository files navigation

Evaluator

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages