This project was developed under a previous phase of the Yale Digital Humanities Lab. Now a part of Yale Library’s Computational Methods and Data department, the Lab no longer includes this project in its scope of work. As such, it will receive no further updates.
Visualize large collections of text data with WebGL
pip install wordmap
To create a visualization from a directory of text files, you can call wordmap as follows:
wordmap --texts "data/*.txt"
That process creates a visualization in ./web
that can be viewed if you start a local web server:
# python 2
python -m SimpleHTTPServer 7090
# python 3
python -m http.server 7090
After starting the web server, navigate to http://localhost:7090/web/
to view the visualization.
The following flags can be passed to the wordmap command. Type --help
to see the full list:
--texts
A glob of files to process
--encoding
The encoding of input files
--max_n
The maximum number of words/docs to include in the visualization
--layouts
The layouts to render {umap, tsne, grid, img, obj}
--obj_file
An .obj file that should be used to create the obj layout
--img_file
A .png or .jpg file that should be used to create the img layout
--n_components
The number of dimensions to use when creating the layouts
--tsne_perplexity
The perplexity value to use when creating TSNE layout
--umap_n_neighbors
The n_neighbors value to use when creating UMAP layout
--umap_min_distance
The min_distance value to use when creating the UMAP layout
--model_type
The model type to use {word2vec
}
--use_cache
Boolean that, if True, will load saved layouts from models
--model_name
The name to use when saving a model to disk
--model
A persisted model to use to create layouts
--size
The number of dimensions to include in Word2Vec vectors
--window
The number of words to include in windows when creating a Word2Vec model
--iter
The maximum number of iterations to run the created model
--min_count
The minimum occurrences of each word to be included in the Word2Vec model
--workers
The number of computer cores to use when processing input data
--verbose
If true, logs progress during layout construction
Examples:
Create a wordmap of the text files in ./data using the umap
, tsne
, and grid
layouts:
wordmap --texts "data/*.txt" \
--layouts umap tsne grid
Create a wordmap using a saved Word2Vec model with 3 dimsions and a maximum of 10000 words:
wordmap --model "1563222036.model" \
--n_components 3 \
--max_n 10000
Create a wordmap with several layouts, each with multiple parameter steps:
python wordmap/wordmap.py \
--texts "data/philosophical_transactions/*.txt" \
--layouts tsne umap grid \
--tsne_perplexity 5 25 100 \
--umap_n_neighbors 2 20 200 \
--umap_min_dist 0.01 0.1 1.0 \
--n_clusters 10 25 \
--iter 100