Skip to content

jianzhu/dl-rerank

Repository files navigation

dl-rerank (alpha)

Deep learning powered personalized re-ranking solution

User Interest Modeling Strategy

Given item list needed for ranking, we use DIN (deep interest network) modeling user diverse interest, DIEN is another good solution for this problem, the problem with this solution is we need to do lots of engineering optimization to get good performance when we use RNN, may be SRU is a candidate solution.

Reference

DIN: Deep Interest Network for Click-Through Rate Prediction
DIEN: Deep Interest Evolution Network for Click-Through Rate Prediction
SRU: Simple Recurrent Units for Highly Parallelizable Recurrence

Item Modeling Strategy

After modeling user interest, given item targeted user vectorized representation and item list vectorized representation, and item click or not click label info. To precisely model (personalized user representation, item representation, context, label) relation, we need to consider item list info.

With item list info, we can compute each (personalized user representation, item representation)'s precise vectorized representation. Considering the computation budget we can apply dense tranformation before apply Transformer to do self-attention. We could use transformer to do user interest modeling also (BST).

Convolutional kernel give us another path to do self attention, we can finish this with Convolution, or Light Weight Convolution, or use Transformer and Light Convolution together which named by Long-Short Range Attention.

Reference

Transformer: Attention Is All You Need
PRM-Rerank: Personalized Re-ranking for Recommendation
BST: Behavior Sequence Transformer for E-commerce Recommendation in Alibaba
ConvSeq2Seq: Convolutional Sequence to Sequence Learning
LightConv: Pay Less Attention with Light Weight and Dynamic Convolutions
LSRA: Lite Transformer with Long-Short Range Attention
GLU: GLU Variants Improve Transformer

Query & Item text Modeling

We modeling query item text field matching with Convolutional Neural Network

Reference

TextCNN: Convolutional Neural Networks for Sentence Classification
RankCNN: Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks

Multi-task learning

When developing a complicated machine learning application system, we need to consider multiple objectives, such as: click, add basket, buy etc. Multi-task learning give us a solution to simultaneously learn multiple objectives.

There exists two type of multi-task learning: hard parameter sharing, soft parameter sharing. Here we use MMoE2, which is a soft parameter sharing method, and considering we use transformer to modeling inter-item relation, transformer is computation very costly, we use transformer as shared bottom layer, this architecture has also been tested by MT-DNN.

For regression objectives, such as dwell time, considering its range isn't between 0 and 1, we have two methods to cope with it:

  1. do log10 transformation on dwell time, then norm it with min-max normalization
  2. bucketize dwell time, and transform regression problem to classification problem, and use predicted probabilities as class weight, and compute class's weighted sum value, take this value as the final result, then normalize it with the largest bucket's class. This method somehow similar to McRank

Performance (3 tasks)
hidden_size=256, kernel_size=3, batch_size=32, layer_num=3, filter_size=1024
qtxt_filters=32, qtxt_kernel_sizes='2,3', ttxt_filters=32, ttxt_kernel_sizes='2,3', ctxt_filters=16, ctxt_kernel_sizes='2,3'
hardware: (os) macos 10.13.4; (cpu) core i7 2.3 GHZ; (mem) 16GB

transformer flatten transformer lite transformer light conv
21ms/sample 19.3ms/sample 20.8ms/sample 19.2ms/sample

Reference

Survey: An Overview of Multi-Task Learning in Deep Neural Networks
MMoE: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts
MMoE2: Recommending What Video to Watch Next: A Multitask Ranking System
SNR: Sub-Network Routing for Flexible Parameter Sharing in Multi-Task Learning
MT-DNN: Multi-Task Deep Neural Networks for Natural Language Understanding
McRank: McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

Important Details

Position Bias Modeling

  1. Training Phase: randomly mask 10% item's show position as unknown
  2. Evaluation Phase: set item's show position as unknown
  3. Modelling Strategy: using shallow tower do position bias modelling

Ranking Position Modeling

  1. Item position: given by rank phase
  2. Modelling Strategy: sum item position embedding to other item features

Embedding

  1. support share embedding

Mini-batch aware Regularization

  1. support mini-batch aware regularization for sparse categorical feature

Dimension Reduction
When modeling user behavior or item info, we usually use billions of categorical features, considering training & serving cost, we can do feature selection or use hash tricks to reduce each type of categorical feature dimension, or use them together. Here we implemented feature selection based modelling strategy, if we want to use hash tricks for feature reduction, we can use categorical_column_with_hash_bucket.

Engineering Related

XLA: support xla
Mixed Precision: support mixed precision, this feature can only be used with tf >=2.2.0
Distributed Training: support parameter-server distributed training strategy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published