Balancing the trade-off between accuracy and diversity in recommender systems with personalized explanations based on Linked Open Data
This is the source code used for the experiments of two papers:
The first paper is you can find the link to our paper on ScienceDirect here. It proposes a reordering algorithm that aims to improve or maintain a collaborative filtering recommendation engine's accuracy, while also providing more diversity, coverage and fairness with the ability to generate personalized explanations to the user with the Wikidata Linked Open Data.
Zanon, André Levi, Leonardo Chaves Dutra da Rocha, and Marcelo Garcia Manzato. "Balancing the trade-off between accuracy and diversity in recommender systems with personalized explanations based on Linked Open Data." Knowledge-Based Systems 252 (2022): 109333.
The second paper proposes a approach to explaining recommendation based on graph embeddings, that are trained on the Wikidata Linked Open Data. A cosine similarity is between a user embedding and a path embedding. The user embedding is the sum pooling of the user's interacted items embeddings and the path's embedding are the sum pooling of item and edges that connect an interacted item with a recommended. The path with most similarity to the user is chosen to be displayed.
Zanon, André Levi, Leonardo Chaves Dutra da Rocha, and Marcelo Garcia Manzato. "Model-Agnostic Knowledge Graph Embedding Explanations for Recommender Systems". The 2nd World Conference on eXplainable Artificial Intelligence (2023)
In the third paper we compared results of different graph embeddings on the paradigm on the previous paper.
Zanon, André Levi, Leonardo Chaves Dutra da Rocha, and Marcelo Garcia Manzato. "O impacto de estratégias de embeddings de grafos na explicabilidade de sistemas de recomendação." Proceedings of the Brazilian Symposium on Multimedia and the Web (WebMedia). 2024.
In the fourth paper we compared results of all the previous papers and compared to the Large Language Model results. In this method we leveraged paths from the KG and extracted the explanation quality metrics of the path chosen by the LLM.
If this repository could be usefull to you, please cite us:
@article{zanon2022balancing,
title = {Balancing the trade-off between accuracy and diversity in
recommender systems with personalized explanations based on
Linked Open Data},
author = {Zanon, Andr{\'e} Levi and
da Rocha, Leonardo Chaves Dutra and
Manzato, Marcelo Garcia},
journal = {Knowledge-Based Systems},
volume = {252},
pages = {109333},
year = {2022},
publisher = {Elsevier}
}
@inproceedings{zanon2024model,
title = {Model-agnostic knowledge graph embedding explanations for recommender systems},
author ={Zanon, Andr{\'e} Levi and da Rocha, Leonardo Chaves Dutra and Manzato, Marcelo Garcia},
booktitle ={World Conference on Explainable Artificial Intelligence},
pages ={3--27},
year ={2024},
organization ={Springer}
}
@inproceedings{webmedia,
author = {André Zanon and Leonardo Rocha and Marcelo Manzato},
title = { O Impacto de Estratégias de Embeddings de Grafos na Explicabilidade de Sistemas de Recomendação},
booktitle = {Proceedings of the 30th Brazilian Symposium on Multimedia and the Web},
location = {Juiz de Fora/MG},
year = {2024},
pages = {231--239},
publisher = {SBC},
address = {Porto Alegre, RS, Brasil},
doi = {10.5753/webmedia.2024.241857},
url = {https://sol.sbc.org.br/index.php/webmedia/article/view/30317}
}
📁 datasets: file with MovieLens 100k and LastFM datasets, folds of cross validation and experiments outputs and results for all folds
📁 generated_files: files of metadata generated from the Wikidata for items on both datasets
📁 preprocessing: source code for extracting Wikidata metadata and cross validation folder creation
📁 recommenders: implementation of recommender engines and proposed reordering approach. Each file represents one recommendation engine, except the Neural Collaborative Filtering algorithm that has two classes with the NCF prefix. A base class for all recommenders was also implemented.
📄 main.py: main source code to run command line arguments experiments
📄 evaluation_utils.py: evaluation of recommender engines source code
📄 requirements.txt: list of library requirements to run the code
The embedding model used in the WebMedia paper are available in https://tinyurl.com/2p969fe3
Please add them in the models folder inside each of the datasets folder, therefore, ./datasets/ml-latest-small/models/ and ./datasets/hetrec2011-lastfm-2k/models/
To run the models with Large Language Models gpt-4o-mini and llm_gpt-3.5-turbo-1106 it is required to generate a key with OpenAI and add create an .env file on the root of the project with a key named OPEN_AI_KEY. The same is required for the Llama 80B model, but with Groq and add the key with name GROQ_API_KEY.
The files props_wikidata_movilens_small.csv and props_artists_id.csv contains the Wikidata. metadata extracted using SPARQLWrapper 1.8.5 library for the MovieLens 100k dataset and the and the LastFM artist dataset. For the MovieLens we extracted metadata from 97% of the movies available and for the LastFM we extracted 66% of the artists available.
All the generated files and results are available in this repository for the MovieLens 100k database and the LastFM database. Bellow are the libraries and command line arguments to reproduce the results of those two folders.
- numpy 1.21.1
- pandas 1.0.4
- scipy 1.5.2
- networkx 2.5.1
- pygini 1.0.1
- sklearn 0.20.3
- openpyxl 3.0.7
- gensim 4.2.0
- node2vec 0.4.3
- requests 2.25.1
- pytorch 1.13.1
- sparqlwrapper 1.8.5
- caserecommender 1.1.0
- pykeen 1.5.0
- openai 0.27.8
- groq 0.9.0
To install the libraries used in this project, use the command:
pip install requirements
Or create a conda enviroment with the following command:
conda env create --f requirements.yml
After this step it is necessary to install the CaseRecommender library with the command:
pip install -U git+git://github.com/caserec/CaseRecommender.git
We used Anaconda to run the experiments. The version of Python used was the 3.7.3.
You can run experiments with command line arguments.
The documentation of each arguments follows bellow along with examples that was the commands used in the experiments:
-
mode
: Set 'run' to run accuracy experiments, 'validate' to run statistical accuracy relevance tests, 'explanation' to run explanation experiments, 'validate_expl' to run statistical accuracy tests or 'maut' to run the multi-attribute utility theory for an explanation algorithm; -
dataset
: Either 'ml' for the small movielens dataset or 'lastfm' for the lastfm dataset; -
begin
: Fold to start the experiment; -
end
: Fold to end the experiment; -
alg
: Algoritms to run separated by space. E.g.: "MostPop BPRMF UserKNN PageRank NCF EASE. Only works on the 'run' and 'maut' modes; -
reord
: Algoritms to reorder separated by space. E.g.: "MostPop BPRMF UserKNN PageRank NCF EASE." Only works on the 'run' mode; -
nreorder
: Number of recommendations to reorder. Only works on the 'run' mode; -
pitems
: Set of items to build user semantic profile. Only works on the 'run' mode; -
policy
: Policy to extract set of items to build semantic profile. 'all' to get all items, 'last' for the last interacted, 'first' for the first interacted, 'random' for random items. Only works on the 'run' mode; -
baseline
: Name of the file without extension of the baseline to validate results. E.g.: 'bprmf'. Only works on the 'validation' mode; -
sufix
: Reorder sufix on result file after the string of the baseline. E.g.: 'bprmf'. Only works on the 'validation' mode; -
metrics
: Reorder sufix on result file after the string of the baseline. E.g.: path[policy=last_items=01_reorder=10_hybrid]. Only works on the 'validation' mode; -
method
: Statistical relevance test. Either 'ttest', 'wilcoxon' or 'both'. Only works on the 'validation' mode. -
save
: Boolean argument to save or not result in file. Only works on the 'validation' mode. -
fold
: Fold to consider when generating explanations. Only works on 'explanation' mode. -
min
: Minimum number of user interacted items to explain. Works on the 'explanation' mode. -
max
: Maximum number of user interacted items to explain. Works on the 'explanation' mode. -
max_users
: Maximum number of users to generate explanations to. Works on the 'explanation' mode. -
reordered_recs
: Explain baseline or reordered algorithm. Works on the 'explanation' mode. -
expl_alg
: Algorithm to explain recommendations. Either explod, explod_v2, pem, diverse or rotate. Works only on 'explanation' mode. -
n_explain
: Number of recommendations to explain. Min 1, max: 10. Works only on 'explanation' mode. -
expl_algs
: List of explanation algorithms to get explanations from ouputed explanations. Options of explanation algorithms are explod, explod_v2, pem, diverse or rotate. Works only on 'maut' mode.
Therefore there are three main commands: the 'run' that is responsable of running an experiment, the 'validate' to run a statistical relevance test comparison of a baseline with a proposed metric and the 'explanation' mode that generates to a fold recommendations just like the run, but it prints on the console the items names, the semantic profile and the explanation paths.
To run the MovieLens experiments use the following command line:
python main.py --mode=run --dataset=ml --begin=0 --end=9 --alg="MostPop BPRMF UserKNN PageRank NCF EASE" --reord="MostPop BPRMF UserKNN PageRank NCF EASE" --nreorder=10 --pitems=0.1 --policy=last
To run the Lastfm experiments use the following command line:
python main.py --mode=run --dataset=lastfm --begin=0 --end=9 --alg="MostPop BPRMF UserKNN PageRank NCF EASE" --reord="MostPop BPRMF UserKNN PageRank NCF EASE" --nreorder=10 --pitems=0.1 --policy=last
To run a statistical relevance test for ranking metrics, considering all the folds, use the following command in which the bprmf baseline is compared to the proposed reordering with policy of last items, percentage of historic items to build user profile of 0.1 and reordering of the top 10 of the baseline:
python main.py --mode=validate --dataset=lastfm --sufix=path[policy=last_items=01_reorder=10_hybrid] --baseline=bprmf --method="both" --save=1 --metrics="MAP AGG_DIV NDCG GINI ENTROPY COVERAGE"
To run an explanation experiments for the movielens dataset for the explod_v2 explanations algorithm run the following command. To compare results with the ExpLOD algorithm change the parameter to explod on expl_alg parameter or pem to the PEM algorithm. To run with the reordered explanation of the KBS paper change the reordered_recs param to 1:
python main.py --mode=explanation --dataset=ml --begin=0 --end=9 --reord="MostPop BPRMF UserKNN PageRank NCF EASE" --nreorder=10 --pitems=0.1 --policy=last --min=0 --max=0 --max_users=0 --expl_alg=explod_v2 --reordered_recs=0 --n_explain=5
To run an explanation samples for the pem explanations algorithm on the movielens dataset for the PageRank algorithm for a maximum of 2 users that have at least 0 interactions and a max of 20 interactions. To compare results with the ExpLOD algorithm change the parameter to explod on expl_alg parameter. Change the reordered_recs parameter to explain the reordered recommendations or the baseline algorithm:
python main.py --mode=explanation --dataset=ml --begin=0 --end=1 --reord="PageRank" --nreorder=10 --pitems=0.1 --policy=last --min=0 --max=20 --max_users=2 --expl_alg=pem --reordered_recs=0
To run a statistical relevance test, considering all the ten folds, for explanation metrics for the PageRank algorithm and movielens dataset, for the PEM, ExpLOD v2 and ExpLOD algorithms use the command:
python main.py --mode=validate_expl --baseline=wikidata_page_rank8020 --dataset=ml --reordered_recs=0
To run the Multi-Attribute Utility Theory between explanation algorithms for a recommendation algorithm and explanation metrics run the following command:
python main.py --mode=maut --dataset=ml --expl_algs="explod explod_v2 pem rotate" --alg=ease --expl_metrics="LIR SEP ETD" --n_explain=5
To run the experiments from the paper: "Balancing the trade-off between accuracy and diversity in recommender systems with personalized explanations based on Linked Open Data", run the following commands:
To run the ranking of the base algorithms and reordering with the following commands:
python main.py --mode=run --dataset=ml --begin=0 --end=9 --alg="MostPop BPRMF UserKNN PageRank NCF EASE" --reord="MostPop BPRMF UserKNN PageRank NCF EASE" --nreorder=10 --pitems=0.1 --policy=last
python main.py --mode=run --dataset=lastfm --begin=0 --end=9 --alg="MostPop BPRMF UserKNN PageRank NCF EASE" --reord="MostPop BPRMF UserKNN PageRank NCF EASE" --nreorder=10 --pitems=0.1 --policy=last
Then compare the algorithms with the following command to perform statistical validation. Change the dataset and baseline accordingly to the documentation.
python main.py --mode=validate --dataset=lastfm --sufix=path[policy=last_items=01_reorder=10_hybrid] --baseline=bprmf --method="both" --save=1 --metrics="MAP AGG_DIV NDCG GINI ENTROPY COVERAGE"
To run the experiments from the paper: "Model-Agnostic Knowledge Graph Embedding Explanations for Recommender Systems", run the following commands:
python main.py --mode=explanation --dataset=ml --begin=0 --end=0 --reord="EASE" --nreorder=5 --pitems=0.1 --policy=last --min=0 --max=0 --max_users=0 --expl_alg=rotate --reordered_recs=0 --n_explain=5
In the paper we ran for both datasets and all recommendation algorithms (MostPop BPRMF UserKNN PageRank NCF EASE
)
on the reord
param. The n_explain
was 1 and 5. The expl_alg
was explod
, explod_v2
, pem
and rotate
.
Only the fold 0 (begin
and end
should be 0 for all experiments), was used in the paper's experiments, therefore a 90/10
split for training and testing.
Then to run the method MAUT run the command:
python main.py --mode=maut --dataset=ml --expl_algs="explod explod_v2 pem rotate" --alg=ease --expl_metrics="LIR SEP ETD" --n_explain=5
In this command we run maut for the ml dataset, comparing the expl_alg
for the ease
recommender using the metrics LIR SEP ETD
as attributes. In the paper we ran for both datasets, all recommendation algorithms (MostPop BPRMF UserKNN PageRank NCF EASE
) and n_explain
1 and 5.
To run the experiments from the paper: "O Impacto de Estratégias de Embeddings de Grafos na Explicabilidade de Sistemas de Recomendação", run the following commands:
python main.py --mode=explanation --dataset=ml --begin=0 --end=0 --reord="MostPop BPRMF UserKNN PageRank NCF EASE" --nreorder=5 --pitems=0.1 --policy=last --min=0 --max=0 --max_users=0 --expl_alg=webmedia_transe --reordered_recs=0 --n_explain=5
python main.py --mode=explanation --dataset=ml --begin=0 --end=0 --reord="MostPop BPRMF UserKNN PageRank NCF EASE" --nreorder=5 --pitems=0.1 --policy=last --min=0 --max=0 --max_users=0 --expl_alg=webmedia_complex --reordered_recs=0 --n_explain=5
python main.py --mode=explanation --dataset=ml --begin=0 --end=0 --reord="MostPop BPRMF UserKNN PageRank NCF EASE" --nreorder=5 --pitems=0.1 --policy=last --min=0 --max=0 --max_users=0 --expl_alg=webmedia_rotate --reordered_recs=0 --n_explain=5
We ran these three models also for the LastFM dataset, so it is required to change the dataset from ml to lastfm. In this paper we only ran for n_explain=5.
To compare the results using maut, use the command:
python main.py --mode=maut --dataset=ml --expl_algs="explod explod_v2 pem rotate webmedia_transe webmedia_complex webmedia_rotate"
--alg=ease --expl_metrics="LIR SEP ETD" --n_explain=5
Changing the alg
param with all recommenders and the parameter dataset
with ml and lastfm.
To run the experiments with LLMs: run the following commands:
python main.py --mode=explanation --dataset=ml --begin=0 --end=0 --reord="MostPop BPRMF UserKNN PageRank NCF EASE"
--nreorder=10 --pitems=0.1 --policy=last --min=0 --max=0 --max_users=0 --expl_alg=llm_gpt-3.5-turbo-1106
--reordered_recs=0 --n_explain=5
python main.py --mode=explanation --dataset=ml --begin=0 --end=0 --reord="MostPop BPRMF UserKNN PageRank NCF EASE"
--nreorder=10 --pitems=0.1 --policy=last --min=0 --max=0 --max_users=0 --expl_alg=llm_gpt-4o-mini
--reordered_recs=0 --n_explain=5
python main.py --mode=explanation --dataset=ml --begin=0 --end=0 --reord="MostPop BPRMF UserKNN PageRank NCF EASE"
--nreorder=10 --pitems=0.1 --policy=last --min=0 --max=0 --max_users=0 --expl_alg=llm_llama3-70b-8192
--reordered_recs=0 --n_explain=5
Each of these commands execute the experiments for the MovieLens dataset for three LLMs: Llama 70B, GPT 3.5 and GPT 4o mini. We ran these three models also for the LastFM dataset, so it is required to change the dataset from ml to lastfm. In this paper we only ran for n_explain=5.
To compare the results using maut, use the command:
python main.py --mode=maut --dataset=ml --expl_algs="explod explod_v2 pem webmedia_complex webmedia_rotate webmedia_transe rotate llm_gpt-3.5-turbo-1106 llm_gpt-4o-mini llm_llama3-70b-8192" --alg=mostpop --expl_metrics="SEP ETD LIR" --n_explain=5
Changing the alg
param with all recommenders and the parameter dataset
with ml and lastfm.
All results for all the papers are in this repository. To find them use the datasets
folder and then choose the MovieLens
or LastFM datasets folder. In the folds folder there are the 10 folds used.
-
For each folds folder there is the output folder
-
In the root, there are ranking items for every user for a recommender or a recommender and a reordering algorithm. The file name represents if there were reoderings with the addition of the word
lod_reorder_path
in the beginning of file name, along with the params for the reodering algorithm. -
In the explanations folder there are the paths extracted for every user of the dataset for an explanation algorithm. The file name represents the params used, therefore, the explanation algorithm used, if the recommendation was reordered, the quantity of recommendations to explain and and the recommender that generated the recommendations are on the file name.
-
-
For each folds folder there is the results folder
-
In the root, there are ranking metrics for a recommender or a recommender and a reordering algorithm. The file name represents if there were reoderings with the addition of the word
lod_reorder_path
in the beginning of file name, along with the params for the reodering algorithm. -
In the explanations folder there are the paths metrics for the explanation algorithm. The file name represents the params used, therefore, the explanation algorithm used, if the recommendation was reordered, the quantity of recommendations to explain and and the recommender that generated the recommendations are on the file name.
-