CollocateR is a package for the statistical programming language R. Albeit imperfectly, the package increasingly uses functions and workflows from the tidyverse and tidytext packages.
CollocateR serves a simple purpose. It processes collocates for keywords in context in text files and calculates significance for them, based on tests set out in Barnbrook et al's Collocation: Applications and Implications, Palgrave 2013, and formulae explained in the British National Corpus home.
- save_collocates: Return a list containing a tokenised version of the original document, a record of the node in original and hashed format, lists of left and right collocate locations, and document word_length.
- get_freqs: A frequency count for collocates, both in context and in the document in general
- pmi: a 'pointwise mutual information' significance test based on the probability of nodes and collocates occurring together compared to the probability of their occurring independently.
- npmi: as above, but normalised so all results occur between 1 (perfect collocation) and -1 (the terms never collocate).
- z-score: a probability test comparing probability of collocate occurring in near the node versus its occurrence across the text
- save_collocates
- pmi
- npmi
- z-score
- MI Cubed
- log_log
- log_likelihood
- Import other elements
README generated with readme2tex.