Item Indexing Methods for Recommendation Foundation Models: A Reproducibility Study

random indexing

 CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
   --master_port 123227 \
   main.py \
      --distributed --multiGPU \
      --task beauty \
         --seed 2022 \
         --warmup_prop 0.05 \
         --lr 1e-3 \
         --clip 1.0 \
         --model_type 't5-small' \
         --epochs 20 \
         --gpu '0,1' \
         --logging_step 1000 \
         --logging_dir 'log/pretrain_t5_small_beauty_random.log' \
         --model_dir 'model/pretrain_t5_small_beauty_random.pt' \
         --train_sequential_item_batch 64 \
         --whole_word_embedding shijie \
         --item_representation random_number \
         --data_order random \
         --random_initialization_embedding \
         --min_random_number 1000 \
         --max_random_number 13000

independent indexing

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
  --master_port 123227 \
  main.py \
     --distributed --multiGPU \
     --task beauty \
        --seed 2022 \
        --warmup_prop 0.05 \
        --lr 1e-3 \
        --clip 1.0 \
        --model_type 't5-small' \
        --epochs 20 \
        --gpu '0,1' \
        --logging_step 1000 \
        --logging_dir 'log/pretrain_t5_small_beauty_independent.log' \
        --model_dir 'model/pretrain_t5_small_beauty_independent.pt' \
        --train_sequential_item_batch 64 \
        --whole_word_embedding shijie \
        --item_representation no_tokenization \
        --data_order random

title indexing

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
  --master_port 123227 \
  main.py \
     --distributed --multiGPU \
     --task beauty \
        --seed 2022 \
        --warmup_prop 0.05 \
        --lr 1e-3 \
        --clip 1.0 \
        --model_type 't5-small' \
        --epochs 20 \
        --gpu '0,1' \
        --logging_step 1000 \
        --logging_dir 'log/pretrain_t5_small_beauty_title.log' \
        --model_dir 'model/pretrain_t5_small_beauty_title.pt' \
        --train_sequential_item_batch 64 \
        --whole_word_embedding shijie \
        --item_representation title \
        --data_order random

sequential indexing (time sensitive)

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
  --master_port 123227 \
  main.py \
     --distributed --multiGPU \
     --task beauty \
        --seed 2022 \
        --warmup_prop 0.05 \
        --lr 1e-3 \
        --clip 1.0 \
        --model_type 't5-small' \
        --epochs 20 \
        --gpu '0,1' \
        --logging_step 1000 \
        --logging_dir 'log/pretrain_t5_small_beauty_sequential_time_sensitive.log' \
        --model_dir 'model/pretrain_t5_small_beauty_title.pt' \
        --train_sequential_item_batch 64 \
        --whole_word_embedding shijie \
        --item_representation None \
        --data_order remapped_sequential \
        --remapped_data_order original

collaborative indexing

need to run ```CID_generation.py''' to generate files, which requires the input file of remapped_sequential_data.txt

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
  --master_port 123227 \
  main.py \
     --distributed --multiGPU \
     --task beauty \
        --seed 2022 \
        --warmup_prop 0.05 \
        --lr 1e-3 \
        --clip 1.0 \
        --model_type 't5-small' \
        --epochs 20 \
        --gpu '0,1' \
        --logging_step 1000 \
        --logging_dir 'log/pretrain_t5_small_beauty_CF.log' \
        --model_dir 'model/pretrain_t5_small_beauty_CF.pt' \
        --train_sequential_item_batch 64 \
        --whole_word_embedding shijie \
        --item_representation CF \
        --data_order remapped_sequential \
        --remapped_data_order original \
        --cluster_size 500 \
        --cluster_number 20

semantic indexing

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
  --master_port 123227 \
  main.py \
     --distributed --multiGPU \
     --task beauty \
        --seed 2022 \
        --warmup_prop 0.05 \
        --lr 1e-3 \
        --clip 1.0 \
        --model_type 't5-small' \
        --epochs 20 \
        --gpu '0,1' \
        --logging_step 1000 \
        --logging_dir 'log/pretrain_t5_small_beauty_semantics.log' \
        --model_dir 'model/pretrain_t5_small_beauty_semantics.pt' \
        --train_sequential_item_batch 64 \
        --whole_word_embedding shijie \
        --item_representation content_based \
        --data_order random

hybrid indexing (CID+IID)

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
  --master_port 123227 \
  main.py \
     --distributed --multiGPU \
     --task beauty \
        --seed 2022 \
        --warmup_prop 0.05 \
        --lr 1e-3 \
        --clip 1.0 \
        --model_type 't5-small' \
        --epochs 20 \
        --gpu '0,1' \
        --logging_step 1000 \
        --logging_dir 'log/pretrain_t5_small_beauty_CID+IID.log' \
        --model_dir 'model/pretrain_t5_small_beauty_CID_IID.pt' \
        --train_sequential_item_batch 64 \
        --whole_word_embedding shijie \
        --item_representation CF \
        --data_order remapped_sequential \
        --cluster_size 500 \
        --cluster_number 20 \
        --last_token_no_repetition

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
backup		backup
backup_2		backup_2
data		data
.gitignore		.gitignore
CF_index.py		CF_index.py
CID_generation.py		CID_generation.py
LICENSE		LICENSE
README.md		README.md
data.py		data.py
generation_trie.py		generation_trie.py
item_rep_method.py		item_rep_method.py
main.py		main.py
modeling_p5.py		modeling_p5.py
pretrain_data.py		pretrain_data.py
pretrain_main.py		pretrain_main.py
prompt.py		prompt.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Item Indexing Methods for Recommendation Foundation Models: A Reproducibility Study

random indexing

independent indexing

title indexing

sequential indexing (time sensitive)

collaborative indexing

semantic indexing

hybrid indexing (CID+IID)

About

Releases

Packages

Contributors 2

Languages

License

Wenyueh/LLM-RecSys-ID

Folders and files

Latest commit

History

Repository files navigation

Item Indexing Methods for Recommendation Foundation Models: A Reproducibility Study

random indexing

independent indexing

title indexing

sequential indexing (time sensitive)

collaborative indexing

semantic indexing

hybrid indexing (CID+IID)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages