-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
great job #3
Comments
Hi, But out rp hasn't provided the offline evaluation code so far. We will provide offline evaluation code as soon as possible. |
hi, you can reproduce the rp by the 'offline' branch now. Just Read the 'Evaluation' Part in the updated README.md file. |
大佬,这个是在线的还是离线的啊? |
您好,可以参考README文件。有详细说明环境和运行方式。 |
每个user_id有一个session_id ?? |
哈喽,大佬,sr-gnn的召回结果怎么评价啊?
这个是正常的吗?似乎也没看懂ndcg啥的啊 |
这个是正常的,复赛总共有3个phase,7,8,9。你目前在跑phase 7。3个phase跑完后,会跑官网给的评估代码进行评估。ndcg和hitrate。 |
单独评估sr-gnn的话,直接把cf_methods = {'item-cf', 'bi-graph', 'swing', 'user-cf'}改为cf_methods = {},这样只会读取sr-gnn的结果 |
大佬,train_click.csv这种数据怎么解读啊?能不能解释下啊?如下示例
|
执行recall_main.py怎么出现下面的错误呢?
我只用的sr-gnn的v1版本
其中的rec_path是online的,而eval的时候是offline的 |
跑srgnn_main.py的时候conf里头mode设为offline
…---Original---
From: "VideoRecSys"<notifications@github.com>
Date: Fri, Jul 10, 2020 21:16 PM
To: "xuetf/KDD_CUP_2020_Debiasing_Rush"<KDD_CUP_2020_Debiasing_Rush@noreply.github.com>;
Cc: "xuetf"<476122294@qq.com>;"Comment"<comment@noreply.github.com>;
Subject: Re: [xuetf/KDD_CUP_2020_Debiasing_Rush] great job (#3)
执行recall_main.py怎么出现下面的错误呢?
train/validate split done... create offline eval answer done... begin read item df... 108916 train_path=user_data/offline_underexpose_train, test_path=user_data/offline_underexpose_test (2643000, 4) (1223242, 4) using multi_processing phase: 7 train_path=user_data/offline_underexpose_train, test_path=user_data/offline_underexpose_test, target_phase=7 drop duplicates... recall-source-num=0 0 read sr-gnn results.... sr-gnn begin... sr-gnn rec path=user_data/sr-gnn/offline/7/data/standard_rec.txt Traceback (most recent call last): File "my_sr_gnn_eval2.py", line 62, in <module> recall_methods={'sr-gnn'}) File "/data1/xulm1/debiasing_rush/code/recall/do_recall_multi_processing.py", line 115, in do_multi_recall_results_multi_processing standard_sr_gnn_recall_item_dict = read_sr_gnn_results(phase, prefix='standard', adjust_type=adjust_type) File "/data1/xulm1/debiasing_rush/code/recall/sr_gnn/read_sr_gnn_results.py", line 54, in read_sr_gnn_results with open(sr_gnn_rec_path) as f: FileNotFoundError: [Errno 2] No such file or directory: 'user_data/sr-gnn/offline/7/data/standard_rec.txt'
我只用的sr-gnn的v1版本
我看了下运行的时候是(展示部分代码)
def sr_nn_version_1(phase, item_cnt): model_path = './models/v1/{}/{}'.format(mode, phase) file_path = '{}/{}/data'.format(sr_gnn_root_dir, phase) sr_gnn_lib_path = 'code/recall/sr_gnn/lib' if os.path.exists(model_path): print('model_path={} exists, delete'.format(model_path)) shutil.rmtree(model_path) if not os.path.exists(model_path): os.makedirs(model_path) os.system("python3 {sr_gnn_lib_path}/my_main_.py --task train --node_count {item_cnt} " "--checkpoint_path {model_path}/session_id --train_input {file_path}/train_item_seq_enhanced.txt " "--test_input {file_path}/test_item_seq.txt --gru_step 2 --epochs 10 " "--lr 0.001 --lr_dc 2 --dc_rate 0.1 --early_stop_epoch 3 " "--hidden_size 256 --batch_size 256 --max_len 20 --has_uid True " "--feature_init {file_path}/item_embed_mat.npy --sigma 8 ".format(sr_gnn_lib_path=sr_gnn_lib_path, item_cnt=item_cnt, model_path=model_path, file_path=file_path)) # generate rec checkpoint_path = find_checkpoint_path(phase, version='v1') prefix = 'standard_' rec_path = '{}/{}rec.txt'.format(file_path, prefix) print("WOC"*20) print(rec_path) os.system("python3 {sr_gnn_lib_path}/my_main_.py --task recommend --node_count {item_cnt} " "--checkpoint_path {checkpoint_path} --item_lookup {file_path}/item_lookup.txt " "--recommend_output {rec_path} --session_input {file_path}/test_user_sess.txt " "--gru_step 2 --hidden_size 256 --batch_size 256 --rec_extra_count 50 --has_uid True " "--feature_init {file_path}/item_embed_mat.npy " "--max_len 10 --sigma 8".format(sr_gnn_lib_path=sr_gnn_lib_path, item_cnt=item_cnt, checkpoint_path=checkpoint_path, file_path=file_path, rec_path=rec_path)) for phase in range(start_phase, now_phase+1): print('phase={}'.format(phase)) sr_nn_version_1(phase, phase_item_cnt_dict[phase])
其中的rec_path是online的,而eval的时候是offline的
user_data/sr-gnn/online/7/data/standard_rec.txt
所以哪里是不是需要改一下?这里??
is_use_whole_click = True if mode == 'online' else False # True if online
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
改成offline,相关代码及结果如下:
eval阶段还是有问题啊
这是为啥呢? |
v1版本对应的结果,eval的时候已经读取成功了: |
v1版本的指标似乎没有显示啊?大佬,哪里有问题吗? |
你可以认真读一下代码。多路召回合并结果后才会进行评估。不是每路单独评估的。你想每路单独评估也可以改下代码。 |
请教下在训练时设置mode为offline数据似乎少了很多啊,只有800多batch,而设置为online就有几千多Total batch,这样做的原因是什么呢?多谢大佬 |
线下的时候用的单个phase的数据跑,线上的时候用的所有数据跑 (二者gap比较固定)。请认真阅读README.md中Evaluation部分的说明。
…---Original---
From: "VideoRecSys"<notifications@github.com>
Date: Sat, Jul 11, 2020 15:45 PM
To: "xuetf/KDD_CUP_2020_Debiasing_Rush"<KDD_CUP_2020_Debiasing_Rush@noreply.github.com>;
Cc: "xuetf"<476122294@qq.com>;"Comment"<comment@noreply.github.com>;
Subject: Re: [xuetf/KDD_CUP_2020_Debiasing_Rush] great job (#3)
请教下在训练时设置mode为offline数据似乎少了很多啊,只有800多batch,而设置为online就有几千多Total batch,这样做的原因是什么呢?多谢大佬
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
哈喽大佬,有没有关于数据集的详细解释啊?能给个链接吗?多谢 |
README上给了官网链接了
…---Original---
From: "VideoRecSys"<notifications@github.com>
Date: Sun, Jul 12, 2020 14:05 PM
To: "xuetf/KDD_CUP_2020_Debiasing_Rush"<KDD_CUP_2020_Debiasing_Rush@noreply.github.com>;
Cc: "xuetf"<476122294@qq.com>;"Comment"<comment@noreply.github.com>;
Subject: Re: [xuetf/KDD_CUP_2020_Debiasing_Rush] great job (#3)
哈喽大佬,有没有关于数据集的详细解释啊?能给个链接吗?多谢
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
大佬,我看您的数据中似乎没有用到underexpose_user_feat.csv?这个数据是官方没有提供吗? |
大佬,请教下在设置offline后get online_topk
这个得到的是online的train和offline的test的合并后的数据啊,能这么做吗? |
hi,dear
其中的归一化是按照行进行的,每列是特征,为啥按照行进行归一化呢?举例如下:
这说明每行计算一个2范数 |
注意看特征的含义,128维图片向量,128维文本向量,图片和文本向量分别做归一化。
…---Original---
From: "VideoRecSys"<notifications@github.com>
Date: Tue, Jul 14, 2020 20:37 PM
To: "xuetf/KDD_CUP_2020_Debiasing_Rush"<KDD_CUP_2020_Debiasing_Rush@noreply.github.com>;
Cc: "xuetf"<476122294@qq.com>;"Comment"<comment@noreply.github.com>;
Subject: Re: [xuetf/KDD_CUP_2020_Debiasing_Rush] great job (#3)
hi,dear
confused about the norm,
def process_item_feat(item_feat_df): processed_item_feat_df = item_feat_df.copy() # norm txt_item_feat_np = processed_item_feat_df[txt_dense_feat].values img_item_feat_np = processed_item_feat_df[img_dense_feat].values txt_item_feat_np = txt_item_feat_np / np.linalg.norm(txt_item_feat_np, axis=1, keepdims=True) img_item_feat_np = img_item_feat_np / np.linalg.norm(img_item_feat_np, axis=1, keepdims=True) processed_item_feat_df[txt_dense_feat] = pd.DataFrame(txt_item_feat_np, columns=txt_dense_feat) processed_item_feat_df[img_dense_feat] = pd.DataFrame(img_item_feat_np, columns=img_dense_feat) return processed_item_feat_df
其中的归一化是按照行进行的,每列是特征,为啥按照行进行归一化呢?举例如下:
>>> xx=np.random.randn(3,4) >>> xx array([[ 0.18874834, 0.37971162, 0.8287003 , -0.95896989], [-0.07977954, 0.04206023, -0.23647192, -0.36731412], [ 1.77722951, 0.68746666, -1.77812892, 0.54136854]]) >>> np.linalg.norm(xx, axis=1, keepdims=True) array([[1.33647832], [0.4460633 ], [2.66194994]])
这说明每行计算一个2范数
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
这个看出来了,我的意思是您的做法是对行进行归一化axis=1, |
请教大佬这里的啥意思啊?
其中的
帮忙看下,多谢 |
哈喽,大佬这个函数是填充那些没有txt,img特征的item的吗? |
哈喽,大佬,我可以将phase7,8,9的数据搁在一起进行预测吗? |
hi,大佬 |
麻烦您认真读一下代码,全局搜索下哪里用到Faiss代码,是否执行了该代码,并确认Faiss是否已经安装。
…---Original---
From: "VideoRecSys"<notifications@github.com>
Date: Mon, Jul 20, 2020 17:04 PM
To: "xuetf/KDD_CUP_2020_Debiasing_Rush"<KDD_CUP_2020_Debiasing_Rush@noreply.github.com>;
Cc: "xuetf"<476122294@qq.com>;"Comment"<comment@noreply.github.com>;
Subject: Re: [xuetf/KDD_CUP_2020_Debiasing_Rush] great job (#3)
hi,大佬
faiss都没有引入,为啥不报错呢?好诡异啊
请教下大佬是怎么做到的?
在notebook中的文件Rush_0615.ipynb
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
您好,可以这么做,您可以在您的项目中尝试使用更多的数据,比赛里这么做效果是下降的。
…---Original---
From: "VideoRecSys"<notifications@github.com>
Date: Sat, Jul 18, 2020 20:12 PM
To: "xuetf/KDD_CUP_2020_Debiasing_Rush"<KDD_CUP_2020_Debiasing_Rush@noreply.github.com>;
Cc: "xuetf"<476122294@qq.com>;"Comment"<comment@noreply.github.com>;
Subject: Re: [xuetf/KDD_CUP_2020_Debiasing_Rush] great job (#3)
哈喽,大佬,我可以将phase7,8,9的数据搁在一起进行预测吗?
也就是不区分phase了,由训练集直接得到给所有user推items,这样做可以吗?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
合成一个phase的结果如下:
这是正常吗? |
我看训练集和测试集的user都是分开的,不同的,我想获取所有user的推荐该怎么做呢? |
请教下这是啥原因啊,需要调啥参数吗?
|
哈喽,大佬, |
哈喽,大佬,这个训练速度似乎有点慢啊,咋整啊, |
另外一个奇怪的问题是,我在训练中如果没有验证集,训练完直接infer却不对, |
hi,dear
well done
will try to reproduce the rp
btw,any metrics for the Recall term
thx
The text was updated successfully, but these errors were encountered: