This repository holds the code for the paper:
- Xin Du and Kumiko Tanaka-Ishii. "Semantic Field of Words Represented as Nonlinear Functions", NeurIPS 2022.
We proposed a new word representation in a functional space rather than a vector
space, called FIeld REpresentation (FIRE). Each word
FIRE represents word polysemy by the multimodality of
The similarity between two sentences
where
Overlapped semantic fields of `river` and `financial`, and their locations. The shape resembles that of `bank` in the image above, indicating FIRE's property of compositionality.
A challenge for implementing FIRE is to parallize
the evaluation of functions
The usual way of using a neural network NN is to process a data batch at a time,
that is the parallelization of
In FIRE-based language models, we instead require the parallelization of both neural networks and data. The desired behavior should include:
- plain mode:
$\text{NN}_1(x_1)$ ,$\text{NN}_2(x_2)$ ... - cross mode:
-
$\text{NN}_1(x_1)$ ,$\text{NN}_1(x_2)$ ... -
$\text{NN}_2(x_1)$ ,$\text{NN}_2(x_2)$ ... $\cdots$
-
In other words, the separate neural networks must be batchified, just as the
indexing of column vectors in a matrix and the recombination of them into a new
matrix. We call this process a "stacking and slicing". We provide one solution
in this repository. Please see the StackSlicing
class.
We selected a subset of the "Core"
WordNet dataset and
constructed a list of 542 strongly polysemeous / strongly monosemeous words.
See /data/wordnet-542.txt