AOCHILDESNouns

Research code for replicating corpus analyses in the following publication:

Using lexical context to discover the noun category: Younger children have it easier

Research Question

Do nouns form a better category in speech to younger vs. older children?

One way to study the structure of data is to decompose it into linearly separable and orthogonal dimensions, which can be done with SVD. Below is a visualisation of the noun-co-occurrence matrix of speech to children under 900 days old, projected on the first, then first + second, then first + second + third, ... singular dimensions, incrementing with each new animation frame.

Replication

Clone the repository, then install requirements (preferably into a virtual Python3.7 environment):

pip install -r requirements.txt

Optionally edit the conditions to replicate in aochildesnouns/params.py, and then:

python3 main.py

Technical Notes

Nouns

Nouns were obtained from by:

collecting all words tagged by spacy as noun in a spacy-tokenized American-English CHILDES corpus
excluding words which are not among 4k most frequent words in corpus
excluding onomatopeia, interjections, single characters, gerunds, proper names
misspelled words

What does the largest singular value mean?

Assume we are talking about a lexical co-occurrence matrix, where each context type is associated with a unique column in the matrix, and target words are associated with unique rows.

The first singular dimension of a lexical co-occurrence matrix can be thought of as a vector whose elements are lexical frequencies, which best fits the observed frequencies of context types across all target types. Its associated singular value indicates how well this single distribution describes the full co-occurrence matrix. This value will be larger when all_ (not just pairwise) targets have similar context type distributions, and will be smaller if all targets have dissimilar context type distributions. Thus, a larger the fist singular value, relative to the other singular values, means that the co-occurrence matrix, projected on the first singular dimension, will be a better approximation of the original co-occurrence matrix.

Compatibility

Developed on Ubuntu 18.04 and Python 3.7

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
animations		animations
aochildesnouns		aochildesnouns
corpora		corpora
images		images
results		results
scripts		scripts
targets		targets
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AOCHILDESNouns

Research Question

Replication

Technical Notes

Nouns

What does the largest singular value mean?

Compatibility

About

Languages

phueb/AOCHILDESNouns

Folders and files

Latest commit

History

Repository files navigation

AOCHILDESNouns

Research Question

Replication

Technical Notes

Nouns

What does the largest singular value mean?

Compatibility

About

Topics

Resources

Stars

Watchers

Forks

Languages