corpora
Here are 158 public repositories matching this topic...
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
-
Updated
Nov 29, 2024 - Python
Data repository for pretrained NLP models and NLP corpora.
-
Updated
Mar 16, 2018 - Python
微信公众号语料库
-
Updated
Jan 7, 2019
A collaborative catalog of NLP resources for Indic languages
-
Updated
Dec 14, 2024
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
-
Updated
Jul 27, 2023 - Python
A web-based engine for creating and annotating textual corpora
-
Updated
Aug 26, 2023 - PHP
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
-
Updated
Jan 5, 2021 - Python
Unannotated Spanish 3 Billion Words Corpora
-
Updated
Oct 20, 2022 - Python
Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.
-
Updated
Jan 1, 2019 - Python
An R package for dynamic exploration of text collections
-
Updated
Sep 11, 2024 - R
An advanced, extensible web front-end for the Manatee-open corpus search engine
-
Updated
Dec 13, 2024 - TypeScript
The Official Repository for 👉 CCAE: A Corpus of Chinese-based Asian Englishes @ NLPCC 2023
-
Updated
Dec 6, 2023 - Python
Named Entity Recognition for biomedical entities
-
Updated
Jan 11, 2023 - Python
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
-
Updated
Dec 19, 2023 - PHP
Reading the data from OPIEC - an Open Information Extraction corpus
-
Updated
Jun 12, 2019 - Java
Improve this page
Add a description, image, and links to the corpora topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the corpora topic, visit your repo's landing page and select "manage topics."