- Table of Contents
- Overview
- Installation
- Easy Start
- Train your Own Model with OmniEvent
- Supported Datasets & Models & Contests
- Consistent Evaluation
- Experiments
- Citation
[2024.12] In the spring of next year, we will build a new agent system based on LLMs and small models optimized for IE tasks. The system will be more powerful and general than the models in OmniEvent. The OmniEvent repository will only serve as introductory code for EE, and major updates will no longer be made in the future.
[2024.10] We recently released a series of LLMs (ADELIE) trained for information extraction, which includes event extraction tasks. Although its performance underperforms specialized small models, such as BERT, its general capabilities and ability to learn from schemas in context are impressive. Welcome to try! Link: https://huggingface.co/THU-KEG/ADELIE-SFT-1.5B.
OmniEvent is a powerful open-source toolkit for event extraction, including event detection and event argument extraction. We comprehensively cover various paradigms and provide fair and unified evaluations on widely-used English and Chinese datasets. Modular implementations make OmniEvent highly extensible.
-
Comprehensive Capability
- Support to do Event Extraction at once, and also to independently do its two subtasks: Event Detection, Event Argument Extraction.
- Cover various paradigms: Token Classification, Sequence Labeling, MRC(QA) and Seq2Seq.
- Implement Transformer-based (BERT, T5, etc.) and classical (DMCNN, CRF, etc.) models.
- Both Chinese and English are supported for all event extraction sub-tasks, paradigms and models.
-
Unified Benchmark & Evaluation
- Various datasets are processed into a unified format.
- Predictions of different paradigms are all converted into a unified candidate set for fair evaluations.
- Four evaluation modes (gold, loose, default, strict) well cover different previous evaluation settings.
-
Modular Implementation
- All models are decomposed into four modules:
- Input Engineering: Prepare inputs and support various input engineering methods like prompting.
- Backbone: Encode text into hidden states.
- Aggregation: Fuse hidden states (e.g., select [CLS], pooling, GCN) to the final event representation.
- Output Head: Map the event representation to the final outputs, such as Linear, CRF, MRC head, etc.
- You can combine and reimplement different modules to design and implement your own new model.
- All models are decomposed into four modules:
-
Big Model Training & Inference
- Efficient training and inference of big event extraction models are supported with BMTrain.
-
Easy to Use & Highly Extensible
- Open datasets can be downloaded and processed with a single command.
- Fully compatible with 🤗 Transformers and its Trainer.
- Users can easily reproduce existing models and build customized models with OmniEvent.
This repository is tested on Python 3.9+, Pytorch 1.12.1+. OmniEvent can be installed with pip as follows:
pip install OmniEvent
If you want to install the repository from local source, you can install as follows:
pip install .
And if you want to edit the repositoy, you can
pip install -e .
OmniEvent provides several off-the-shelf models for the users. Examples are shown below.
Make sure you have installed OmniEvent as instructed above. Note that it may take a few minutes to download checkpoint at the first time.
>>> from OmniEvent.infer import infer
>>> # Even Extraction (EE) Task
>>> text = "2022年北京市举办了冬奥会"
>>> results = infer(text=text, task="EE")
>>> print(results[0]["events"])
[
{
"type": "组织行为开幕", "trigger": "举办", "offset": [8, 10],
"arguments": [
{ "mention": "2022年", "offset": [9, 16], "role": "时间"},
{ "mention": "北京市", "offset": [81, 89], "role": "地点"},
{ "mention": "冬奥会", "offset": [0, 4], "role": "活动名称"},
]
}
]
>>> text = "U.S. and British troops were moving on the strategic southern port city of Basra \
Saturday after a massive aerial assault pounded Baghdad at dawn"
>>> # Event Detection (ED) Task
>>> results = infer(text=text, task="ED")
>>> print(results[0]["events"])
[
{ "type": "attack", "trigger": "assault", "offset": [113, 120]},
{ "type": "injure", "trigger": "pounded", "offset": [121, 128]}
]
>>> # Event Argument Extraction (EAE) Task
>>> results = infer(text=text, triggers=[("assault", 113, 120), ("pounded", 121, 128)], task="EAE")
>>> print(results[0]["events"])
[
{
"type": "attack", "trigger": "assault", "offset": [113, 120],
"arguments": [
{ "mention": "U.S.", "offset": [0, 4], "role": "attacker"},
{ "mention": "British", "offset": [9, 16], "role": "attacker"},
{ "mention": "Saturday", "offset": [81, 89], "role": "time"}
]
},
{
"type": "injure", "trigger": "pounded", "offset": [121, 128],
"arguments": [
{ "mention": "U.S.", "offset": [0, 4], "role": "attacker"},
{ "mention": "Saturday", "offset": [81, 89], "role": "time"},
{ "mention": "British", "offset": [9, 16], "role": "attacker"}
]
}
]
OmniEvent can help users easily train and evaluate their customized models on specific datasets.
We show a step-by-step example of using OmniEvent to train and evaluate an Event Detection model on ACE-EN dataset in the Seq2Seq paradigm. More examples are shown in examples.
We provide standard data processing scripts for several commonly-used datasets. Checkout the details in scripts/data_processing.
dataset=ace2005-en # the dataset name
cd scripts/data_processing/$dataset
bash run.sh
We keep track of the configurations of dataset, model and training parameters via a single *.yaml
file. See ./configs for details.
>>> from OmniEvent.arguments import DataArguments, ModelArguments, TrainingArguments, ArgumentParser
>>> from OmniEvent.input_engineering.seq2seq_processor import type_start, type_end
>>> parser = ArgumentParser((ModelArguments, DataArguments, TrainingArguments))
>>> model_args, data_args, training_args = parser.parse_yaml_file(yaml_file="config/all-datasets/ed/s2s/ace-en.yaml")
>>> training_args.output_dir = 'output/ACE2005-EN/ED/seq2seq/t5-base/'
>>> data_args.markers = ["<event>", "</event>", type_start, type_end]
OmniEvent supports various backbones. The users can specify the model and tokenizer in the config file and initialize them as follows.
>>> from OmniEvent.backbone.backbone import get_backbone
>>> from OmniEvent.model.model import get_model
>>> backbone, tokenizer, config = get_backbone(model_type=model_args.model_type,
model_name_or_path=model_args.model_name_or_path,
tokenizer_name=model_args.model_name_or_path,
markers=data_args.markers,
new_tokens=data_args.markers)
>>> model = get_model(model_args, backbone)
OmniEvent prepares the DataProcessor
and the corresponding evaluation metrics for different task and paradigms.
Note that the metrics here are paradigm-dependent and are not used for the final unified evaluation.
>>> from OmniEvent.input_engineering.seq2seq_processor import EDSeq2SeqProcessor
>>> from OmniEvent.evaluation.metric import compute_seq_F1
>>> train_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.train_file)
>>> eval_dataset = EDSeq2SeqProcessor(data_args, tokenizer, data_args.validation_file)
>>> metric_fn = compute_seq_F1
OmniEvent adopts Trainer from 🤗 Transformers for training and evaluation.
>>> from OmniEvent.trainer_seq2seq import Seq2SeqTrainer
>>> trainer = Seq2SeqTrainer(
args=training_args,
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=metric_fn,
data_collator=train_dataset.collate_fn,
tokenizer=tokenizer,
)
>>> trainer.train()
Since the metrics in Step 4 depend on the paradigm, it is not fair to directly compare the performance of models in different paradigms.
OmniEvent evaluates models of different paradigms in a unified manner, where the predictions of different models are converted to predictions on the same candidate sets and then evaluated.
>>> from OmniEvent.evaluation.utils import predict, get_pred_s2s
>>> from OmniEvent.evaluation.convert_format import get_trigger_detection_s2s
>>> logits, labels, metrics, test_dataset = predict(trainer=trainer, tokenizer=tokenizer, data_class=EDSeq2SeqProcessor,
data_args=data_args, data_file=data_args.test_file,
training_args=training_args)
>>> # paradigm-dependent metrics
>>> print("{} test performance before converting: {}".formate(test_dataset.dataset_name, metrics["test_micro_f1"]))
ACE2005-EN test performance before converting: 66.4215686224377
>>> preds = get_pred_s2s(logits, tokenizer)
>>> # convert to the unified prediction and evaluate
>>> pred_labels = get_trigger_detection_s2s(preds, labels, data_args.test_file, data_args, None)
ACE2005-EN test performance after converting: 67.41016109045849
For those datasets whose test set annotations are not public, such as MAVEN and LEVEN, OmniEvent provide scripts to generate submission files. See dump_result.py for details.
Continually updated. Welcome to add more!
Language | Domain | Task | Dataset |
---|---|---|---|
English | General | ED | MAVEN |
General | ED EAE | ACE-EN | |
General | ED EAE | ACE-DYGIE | |
General | ED EAE | RichERE (KBP+ERE) | |
Chinese | Legal | ED | LEVEN |
General | ED EAE | DuEE | |
General | ED EAE | ACE-ZH | |
Financial | ED EAE | FewFC |
- Paradigm
- Token Classification (TC)
- Sequence Labeling (SL)
- Sequence to Sequence (Seq2Seq)
- Machine Reading Comprehension (MRC)
- Backbone
- CNN / LSTM
- Transformers (BERT, T5, etc.)
- Aggregation
- Select [CLS]
- Dynamic/Max Pooling
- Marker
- GCN
- Head
- Linear / CRF / MRC heads
OmniEvent provides corresponding remedies for the three discrepancies in event extraction evaluation, as suggested in our ACL 2023 paper.
We provide several preprocessing scripts in scripts/data_processing
. For ACE 2005, we provide three mainstream scripts: ace2005-dygie
, ace2005-oneie
, and ace2005-en
. Users can easily use the scripts to process the original data into a unified data format.
We implement the output standardization in OmniEvent/evaluation/convert_format.py
. Specifically, users can use corresponding functions to convert the output of different paradigms into the output space of the token classification method.
As suggested in OmniEvent/evaluation/README.md
, we provide several evaluation modes for evaluating event argument extraction. We recommend the strict mode for comparable evaluation. And we provide a unified extracted trigger set for pipeline evaluation of different event argument extraction methods. The triggers are extracted by an advanced ED model: CLEVE. The extracted triggers for different datasets (ACE 2005, RichERE, and TACKBP 2014-2017) are placed in here.
We implement and evaluate state-of-the-art methods on some popular benchmarks using OmniEvent, and the results are shown in our ACL 2023 paper "The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation".
If our codes help you, please cite us:
@inproceedings{peng2023devil,
title={The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation},
author={Peng, Hao and Wang, Xiaozhi and Yao, Feng and Zeng, Kaisheng and Hou, Lei and Li, Juanzi and Liu, Zhiyuan and Shen, Weixing},
booktitle={Findings of ACL 2023},
year={2023}
}