Skip to content

Pre-processed natural language data for per-position tagging by high-performing machine learning algorithm.

Notifications You must be signed in to change notification settings

leonardramsey/MUST-CNN-RST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

MUST-CNN-RST

The Multilayer Shift-and-Stitch Deep Convolutional Neural Network machine learning algorithm was developed by University of Virginia's Bioinformatics Laboratory. More information on this algorithm can be found at its github repo.

In order to test the ability of this algorithm to efficiently process natural language data, four python files were constructed and utilized to pre-process text from the Rhetorical Structure Theory data collection. The Python toolkit used for pre-processing was the Natural Language Toolkit (NLTK). The following python files can be found in the RST\data\RSTtrees-WSJ-main-1.0 pathway:

  • 0_Dict_Build.py
  • 1_EDU_tag.py
  • 1_EDU_word.py
  • RST_ALL_EDUs.py

In addition to these code files, all of the text files processed are included in this pathway as well. Additional text files are included in the repo as well.

About

Pre-processed natural language data for per-position tagging by high-performing machine learning algorithm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published