eShopOnWebML app comes with a previously trained model for product recommendation, but you can train your own model based in your own data.
The console application project ProductRecommendation.Train
can be used to generate the product recommendation model. You need to follow next steps in order to generate these models:
- Set VS default startup project: Set
ProductRecommendation.Train
as starting project in Visual Studio - (Optional) - Generate your own input training data: The
assets/inputs
folder contains default training fileorderItems.csv
. This file contains the data arranged in 3 columns: CustomerId, ProductId and Quantity. If you want to use your own training file, you should follow the same schema and replace current default training file. - Run the training model console app: Hit F5 in Visual Studio. At the end of the execution, the output will be similar to this screenshot:
- Copy the model file into the Infrastructure project: By default, when the execution finishes, the model is saved at
assets/output/productRecommendation.zip
. Copy model file intosrc / Infrastructure / Setup /
model using the same name.
The model training source code is located at src / ProductRecommentation.Train / Model /
ModelBuilder.cs.
Before creating the model, in this case we need to pre-process the input data. The reason behind this is because we will use a method that is able to make only binary recommendations, and our Label feature (quantity) is a continuos variable. The pre-process will transform this continuous variable into a categorical variable with 2 states: recommend / not recommend (true / false).
There are several methods for discretizing a continuous variable, in this case we will set a threshold, and then we will transform values over or equal the threshold to true (do recommend), otherwise, to false (do not recommend). Finally, the mean by product is used as a threshold.
Previous transformation is supported by the method PreProcess()
. As result, we will add one column named Recommend
holding Quantity as a discretized value (true / false).
var pipeline = new LearningPipeline();
pipeline.Add(CollectionDataSource.Create(salesData));
pipeline.Add(new CategoricalHashOneHotVectorizer(
(nameof(SalesRecommendationData.ProductId),
nameof(SalesRecommendationData.ProductId) + "_OH")) { HashBits = 18 });
pipeline.Add(new CategoricalHashOneHotVectorizer(
(nameof(SalesRecommendationData.CustomerId),
nameof(SalesRecommendationData.CustomerId) + "_OH")) { HashBits = 18 });
pipeline.Add(new ColumnConcatenator("Features",
nameof(SalesRecommendationData.ProductId) + "_OH",
nameof(SalesRecommendationData.CustomerId) + "_OH"));
pipeline.Add(new FieldAwareFactorizationMachineBinaryClassifier() { LearningRate = 0.05F, Iters = 1, LambdaLinear = 0.0002F });
The training pipeline is supported by the following components:
- CollectionDataSource.Create: The preprocessed data can be directly use as input for the pipeline.
- CategoricalHashOneHotVectorizer: CustomerId and ProductId are transformed using a One Hot Encoding variant based on hashing.
- ColumnConcatenator: Data needs to be combined into a single column (by default, named
Features
) as a prior step before the learner starts executing. - FieldAwareFactorizationMachineBinaryClassifier: The learner used by the pipeline, this algorithm evaluates the interaction between CustomerId and ProductId, and can be used with sparse data.
After building the pipeline, we train the recommendation model:
var model = learningPipeline.Train<SalesData, SalesPrediction>();
Finally, we save the recommendation model to local disk:
await model.WriteAsync(modelLocation);
Additionally, we evaluate the accuracy of the model. This accuracy is measured using the BinaryClassificationEvaluator, and the Accuracy and AUC metrics are displayed.
The model created in former step, is used to make recommendations for users. When the user logs in the website, his homepage will display first recommended products for him/her, based on previous purchases.
The source code of prediction core is in src / Infrastructure / Services /
ProductRecommendationService.cs, inside the method GetRecommendationsForUserAsync()
.
public async System.Threading.Tasks.Task<IEnumerable<string>> GetRecommendationsForUserAsync
(string user, string[] products, int recommendationsInPage)
{
var model = await PredictionModel.ReadAsync<SalesData, SalesPrediction>(modelLocation);
var crossPredictions = from product in products
select new SalesData { CustomerId = user, ProductId = product };
var predictions = model.Predict(crossPredictions).ToArray();
return predictions.Where(p => p.Recommendation.IsTrue)
.OrderByDescending(p => p.Probability)
.Select(p => p.ProductId)
.Take(recommendationsInPage);
}
The method receives as parameters the user and the products we need to check. The method then creates SalesData
objects (one object per product received as parameter, using always the same customer). The model returns the probability and the label (recommended / not recommended), so the method returns only recommended predictions, ordered by probability and only the first ones (taken recommendationsInPage
predictions).
When running the web app, in order to see the recomendations, you first need to authenticate with a demo user with these credentials:
User: demouser@microsoft.com
Password: Pass@word1
The app runs generates the recommendations for that particular user (based on his orders history compared to other orders from other users) by using the ML.NET model and shows the first 6 recommendations on top of the regular product catalog, like in the following screenshot:
After cloning or downloading the web app sample, you should be able to run it using an In Memory database, immediately. That database is used for handling the Product Catalog and other typical entities. If you wish to use the sample with a persistent SQL Server database, you will need to modify the setup as explained in the original eShopOnWeb repo, here: https://github.com/dotnet-architecture/eShopOnWeb
eShopOnWeb dataset is based on a public Online Retail Dataset from UCI: http://archive.ics.uci.edu/ml/datasets/online+retail
Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197–208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).