This repository contains the Python code used for the experiments from our EDM 2019 paper: DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills. Authors: Benoît Choffin, Fabrice Popineau, Yolaine Bourda, and Jill-Jênn Vie.
Code for this repository is partly borrowed from jilljenn's ktm repository.
It is recommended to use a virtual environment for running our experiments. In order to use embedding dimensions d > 0, libfm needs to be installed as well:
git clone https://github.com/srendle/libfm
cd libfm && git reset --hard 91f8504a15120ef6815d6e10cc7dee42eebaab0f && make all
Three open access datasets were used for our experiments:
- ASSISTments 2012-2013 (assistments12)
- Bridge to Algebra 2006-2007 (bridge_algebra06)
- Algebra I 2005-2006 (algebra05)
The two last datasets come from the KDD Cup 2010 EDM Challenge. Datasets need to be downloaded and put inside each corresponding data folder in data. The main dataset (train
for KDD Cup) should each time be renamed "data" + corresponding extension name.
To preprocess each of the datasets:
python prepare_data.py --dataset <dataset codename> --min_interactions 10 --remove_nan_skills
To encode sparse features on which the ML models will train, encode.py
is used. The preprocessed dataset is automatically selected. For instance, DAS3H is "users, items, skills, wins, attempts, tw_kc":
python encode.py --dataset <dataset codename> --users --items --skills --wins --attempts --tw_kc
users | items | skills | wins | fails | attempts | tw_kc | tw_items | |
---|---|---|---|---|---|---|---|---|
DAS3H | x | x | x | x | x | x | ||
DASH | x | x | x | x | x | |||
IRT/MIRT | x | x | ||||||
PFA | x | x | x | |||||
AFM | x | x |
A faster script for encoding DAS3H time windows is available here.
Code for running the experiments is in das3h.py
. For instance, for performing cross-validation for DAS3H with embedding dimension d=5, on ASSISTments12:
python das3h.py data/assistments12/X-uiswat1.npz --dataset assistments12 --d 5 --users --items --skills --wins --attempts --tw_kc
Algebra 2005-2006 (PSLC DataShop) dataset:
model | dim | AUC | ACC | NLL |
---|---|---|---|---|
DAS3H | 0 | 0.826 ± 0.003 | 0.815 ± 0.007 | 0.414 ± 0.011 |
DAS3H | 5 | 0.818 ± 0.004 | 0.812 ± 0.007 | 0.421 ± 0.011 |
DAS3H | 20 | 0.817 ± 0.005 | 0.811 ± 0.004 | 0.422 ± 0.007 |
DASH | 5 | 0.775 ± 0.005 | 0.802 ± 0.010 | 0.458 ± 0.012 |
DASH | 20 | 0.774 ± 0.005 | 0.803 ± 0.014 | 0.456 ± 0.017 |
DASH | 0 | 0.773 ± 0.002 | 0.801 ± 0.004 | 0.454 ± 0.006 |
IRT | 0 | 0.771 ± 0.007 | 0.800 ± 0.009 | 0.456 ± 0.015 |
MIRTb | 20 | 0.770 ± 0.007 | 0.800 ± 0.006 | 0.460 ± 0.007 |
MIRTb | 5 | 0.770 ± 0.004 | 0.800 ± 0.008 | 0.459 ± 0.011 |
PFA | 0 | 0.744 ± 0.004 | 0.782 ± 0.003 | 0.481 ± 0.004 |
AFM | 0 | 0.707 ± 0.005 | 0.774 ± 0.004 | 0.499 ± 0.006 |
PFA | 20 | 0.670 ± 0.010 | 0.748 ± 0.005 | 1.008 ± 0.047 |
PFA | 5 | 0.664 ± 0.010 | 0.735 ± 0.013 | 1.107 ± 0.079 |
AFM | 20 | 0.644 ± 0.005 | 0.750 ± 0.005 | 0.817 ± 0.076 |
AFM | 5 | 0.640 ± 0.007 | 0.742 ± 0.009 | 0.941 ± 0.056 |
ASSISTments 2012-2013 dataset:
model | dim | AUC | ACC | NLL |
---|---|---|---|---|
DAS3H | 5 | 0.744 ± 0.002 | 0.737 ± 0.001 | 0.531 ± 0.001 |
DAS3H | 20 | 0.740 ± 0.001 | 0.736 ± 0.002 | 0.533 ± 0.003 |
DAS3H | 0 | 0.739 ± 0.001 | 0.736 ± 0.001 | 0.534 ± 0.002 |
DASH | 0 | 0.703 ± 0.002 | 0.719 ± 0.003 | 0.557 ± 0.004 |
DASH | 5 | 0.703 ± 0.001 | 0.720 ± 0.001 | 0.557 ± 0.001 |
DASH | 20 | 0.703 ± 0.002 | 0.720 ± 0.002 | 0.557 ± 0.002 |
IRT | 0 | 0.702 ± 0.001 | 0.719 ± 0.001 | 0.558 ± 0.001 |
MIRTb | 20 | 0.701 ± 0.001 | 0.720 ± 0.001 | 0.558 ± 0.001 |
MIRTb | 5 | 0.701 ± 0.002 | 0.719 ± 0.001 | 0.558 ± 0.001 |
PFA | 5 | 0.669 ± 0.002 | 0.709 ± 0.002 | 0.577 ± 0.002 |
PFA | 20 | 0.668 ± 0.002 | 0.709 ± 0.003 | 0.578 ± 0.003 |
PFA | 0 | 0.668 ± 0.002 | 0.708 ± 0.001 | 0.579 ± 0.002 |
AFM | 5 | 0.610 ± 0.001 | 0.699 ± 0.002 | 0.597 ± 0.001 |
AFM | 20 | 0.609 ± 0.001 | 0.699 ± 0.003 | 0.597 ± 0.003 |
AFM | 0 | 0.608 ± 0.002 | 0.697 ± 0.002 | 0.598 ± 0.002 |
Bridge to Algebra 2006-2007 (PSLC DataShop):
model | dim | AUC | ACC | NLL |
---|---|---|---|---|
DAS3H | 5 | 0.791 ± 0.005 | 0.848 ± 0.002 | 0.369 ± 0.005 |
DAS3H | 0 | 0.790 ± 0.004 | 0.846 ± 0.002 | 0.371 ± 0.004 |
DAS3H | 20 | 0.776 ± 0.023 | 0.838 ± 0.019 | 0.387 ± 0.027 |
DASH | 0 | 0.749 ± 0.002 | 0.840 ± 0.005 | 0.393 ± 0.007 |
DASH | 20 | 0.747 ± 0.003 | 0.840 ± 0.001 | 0.399 ± 0.002 |
IRT | 0 | 0.747 ± 0.002 | 0.839 ± 0.004 | 0.393 ± 0.007 |
DASH | 5 | 0.747 ± 0.003 | 0.840 ± 0.002 | 0.399 ± 0.002 |
MIRTb | 5 | 0.746 ± 0.002 | 0.840 ± 0.004 | 0.398 ± 0.006 |
MIRTb | 20 | 0.746 ± 0.004 | 0.839 ± 0.005 | 0.399 ± 0.007 |
PFA | 20 | 0.746 ± 0.003 | 0.839 ± 0.002 | 0.397 ± 0.004 |
PFA | 5 | 0.744 ± 0.007 | 0.838 ± 0.003 | 0.402 ± 0.007 |
PFA | 0 | 0.739 ± 0.003 | 0.835 ± 0.005 | 0.406 ± 0.008 |
AFM | 5 | 0.706 ± 0.002 | 0.836 ± 0.003 | 0.411 ± 0.004 |
AFM | 20 | 0.706 ± 0.002 | 0.836 ± 0.003 | 0.412 ± 0.004 |
AFM | 0 | 0.692 ± 0.002 | 0.833 ± 0.004 | 0.423 ± 0.006 |