Folders and files Name Name Last commit message
Last commit date
parent directory
View all files
UP
Collection of files for working with NER models
Author: Juraj Dedič
Scripts needed for conversion to SpaCy DocBin
Also JSON converter for other training pipelines
training of BERT model using HuggingFace
evaluation for BERT / RoBERTa
training RoBERTa using PyTorch
was not getting better results than BERT
so not used anymore
Evaluation of SpaCy NER model
Training of SpaCy NER and SpanCat models
The training config was changed many times but the best was described in the thesis
spacyNER_to_SpanCat.ipynb
The data formats for training NER and SpanCat models are different
this file converts the data to train SpanCat
Evaluation of SpanCat model
JuPyter notebooks were used mostly to work with the models
I personally don't have the hardware to train the models
Kaggle was used and it supports the Jupyter notebooks
While working with the models there were tutorials used to help me create them (linked in the thesis)
The first thing implemented was the data conversion script
It converts the XML
files with human transcripts to SpaCy DocBin
Next the spacy NER model was trained
There were multiple iterations of the models and the datasets changed
There were also models BERT & RoBERTa trained using Pytorch and HuggingFace,
These models were not used later, because it was found out that the SpaCy NER pipeline outperformed them
After that the SpaCy spancat model was trained
And the detected classes changed to include also the commands and values
You can’t perform that action at this time.