Re-DocRED Dataset

DocRED is a widely used benchmark for document-level relation extraction. However, the DocRED dataset contains a significant percentage of false negative examples (incomplete annotation). We revised 4,053 documents in the DocRED dataset and resolved its problems. We released this dataset as: Re-DocRED dataset.

The Re-DocRED Dataset resolved the following problems of DocRED:

Resolved the incompleteness problem by supplementing large amounts relationtion triples.
Addressed the logical inconsistencies in DocRED.
Corrected the coreferential errors within DocRED.

Statistics of Re-DocRED

The Re-DocRED dataset is located as ./data directory, the statistics of the dataset are shown below:

	#Train	#Dev	#Test
# Documents	3,053	500	500
# Avg. Triples	28.1	34.6	34.9
# Avg. Entities	19.4	19.4	19.6
# Avg. Sents	7.9	8.2	7.9

Citation

If you find our work useful, please cite our work as:

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Re-DocRED Dataset

Statistics of Re-DocRED

Citation

About

Releases

Packages

Languages

License

adharm/Re-DocRED

Folders and files

Latest commit

History

Repository files navigation

Re-DocRED Dataset

Statistics of Re-DocRED

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages