This is the repository for the code in the paper FIFAKT: Each Encounter Counts: Modeling Language Learning and Forgetting. LAK23: 13th International Learning Analytics and Knowledge Conference. Authors: Boxuan Ma, Gayan Prasad Hettiarachchi, Sora Fukui and Yuji Ando. (LINK)
If you find this repository useful, please cite our work
@inproceedings{ma2023each,
title={Each Encounter Counts: Modeling Language Learning and Forgetting},
author={Ma, Boxuan and Hettiarachchi, Gayan Prasad and Fukui, Sora and Ando, Yuji},
booktitle={LAK23: 13th International Learning Analytics and Knowledge Conference},
pages={79--88},
year={2023}
}
Language learning applications usually estimate the learner's language knowledge over time to provide personalized practice content for each learner at the optimal timing. However, accurately predicting language knowledge or linguistic skills is much more challenging than math or science knowledge, as many language tasks involve memorization and retrieval. Learners must memorize a large number of words and meanings, which are prone to be forgotten without practice. Although a few studies consider forgetting when modeling learners' language knowledge, they tend to apply traditional models, consider only partial information about forgetting, and ignore linguistic features that may significantly influence learning and forgetting. This paper focuses on modeling and predicting learners' knowledge by considering their forgetting behavior and linguistic features in language learning. Specifically, we first explore the existence of forgetting behavior and cross-effects in real-world language learning datasets through empirical studies. Based on these, we propose a model for predicting the probability of recalling a word given a learner’s practice history. The model incorporates key information related to forgetting, question formats, and semantic similarities between words using the attention mechanism. Experiments on two real-world datasets show that the proposed model improves performance compared to baselines. Moreover, the results indicate that combining multiple types of forgetting information and item format improves performance. In addition, we find that incorporating semantic features, such as word embeddings, to model similarities between words in a learner's practice history and their effects on memory also improves the model.
The Duolingo data we used can be downloaded from Duolingo. We also use their HLR code as a baseline for our experiments.
The Tagetomo data we used is private and belongs to Obunsha Inc., so we cannot make it public.
Some of the code in this repository is based on code from Benoît Choffin's DAS3H repository and Brian Zylich's Linguistic Skill Modeling repository.