ACKNOWLEDGEMENT

This work received partial support from a research agreement between Kent State University and iLambda Inc., as well as from the National Science Foundation under Grant IIS-2142675.

Reference

@inproceedings{li@2020kdd,
  author = {Li, Dong and Jin, Ruoming and Gao, Jing and Liu, Zhi},
  title = {On Sampling Top-K Recommendation Evaluation},
  year = {2020},
  isbn = {9781450379984},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3394486.3403262},
  doi = {10.1145/3394486.3403262},
  booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages = {2114–2124},
  numpages = {11},
  keywords = {recall, top-k, recommender systems, evaluation metric, hit ratio},
  location = {Virtual Event, CA, USA},
  series = {KDD '20}
}

@article{li@2023aaai,
  title={Towards Reliable Item Sampling for Recommendation Evaluation},
  volume={37},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/25561},
  DOI={10.1609/aaai.v37i4.25561},
  number={4},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  author={Li, Dong and Jin, Ruoming and Liu, Zhenming and Ren, Bin and Gao, Jing and Liu, Zhi},
  year={2023}, month={Jun.}, pages={4409-4416}
}

@article{jin@2021aaai,
  title={On Estimating Recommendation Evaluation Metrics under Sampling},
  volume={35},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/16537},
  DOI={10.1609/aaai.v35i5.16537},
  number={5},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  author={Jin, Ruoming and Li, Dong and Mudrak, Benjamin and Gao, Jing and Liu, Zhi},
  year={2021}, month={May}, pages={4147-4154}
}

1. Data Process

generate 100 repeats test dataset with fix sample set size (n = 100 for example)

cd ./data
python fix_sampling.py

Afterwards, it would generate a ./fix_sample_100 folder contains 100 different test sets for each dataset.

2. Train recommendation models and generate ranks

cd ./models

Train the models, for example run NeuMF_train.ipynb.

After the model is trained, we are able to generate the rank for the test item among all items and among sampled set. run NeuMF_repeat.ipynb

Consequently, it would generate a ./fix_sample_100 folder contains 100 different rank files for each dataset.

3. Estimate the `P(R)` by estimators

cd ./estimators

Run MLE.ipynb for instance to estimate the P(R) according to the rank files based on the sample set. The outputs would be saved in ../save_PR folder

4. Quantify the performance of the estimators

PICK_WINNER, Relative_Error files are used to generate the final (table) results in our paper.

The output file is stored in ./table/results folder.

Noting that limited by the file size, we did not put all output files here.

Adaptive Sampling Estimation

1. When to use

Test/evaluation stage, not training stage. Adaptive sampling estimation is the same as a normal evaluation method that compute metrics for different models and evaluate/compare their performance
When there are too many items (such as over millions), computing a specific rank of an item among all items are significant inefficient.

2. Why use

Global Evaluation

Assume there is a user defined rank function (after model is trained):

$$R^u_i = f(u, i, I/i)$$

where $u$ is the user id, $i$ is the test item id, $I$ is the total set of items. $R^u_i$ is the final rank.

Sampling Evaluation

$$r^u_i = f(u, i, I_s)$$

$I_s$ is a sample set of items

Sometimes $R^u_i$ is too much resources consuming, we have to rely on sampling-based evaluation. The issue is sampling-based evaluation can not correctly reflect the models' performance as we expected according to KDD 2020 best paper. Intuitively, $Recall@10$ of a model in sampling-based evaluation can be approximate to $Recall@1000$ in global estimation while the top-1000 is not really what we want (ref our KDD2020 paper).

Adaptive sampling can help rectify the issue with given only sapling rank $r^u_i$ to estimate its global $R^u_i$ and compute metric effectively.

3. How to use

3.1 use 'adaptive/adaptive_sampling.py' to obtain the sample rank

3.2 use 'adaptive/adaptive_estimator.py' to estimate the global rank (distribution)

As long as PR is obtained from 3.2, one can use 'NDCG_K' from 'estimator.utils.py' to approximate global NDCG metric, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACKNOWLEDGEMENT

Reference

1. Data Process

2. Train recommendation models and generate ranks

3. Estimate the `P(R)` by estimators

4. Quantify the performance of the estimators

Adaptive Sampling Estimation

1. When to use

2. Why use

3. How to use

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.idea		.idea
adaptive		adaptive
data		data
estimator		estimator
model		model
table		table
README.md		README.md

dli12/Item-Sampling-Recommendation-Evaluation

Folders and files

Latest commit

History

Repository files navigation

ACKNOWLEDGEMENT

Reference

1. Data Process

2. Train recommendation models and generate ranks

3. Estimate the P(R) by estimators

4. Quantify the performance of the estimators

Adaptive Sampling Estimation

1. When to use

2. Why use

3. How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

3. Estimate the `P(R)` by estimators

Packages