Skip to content

HusamQ/Amazon_Vine_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Vine Analysis

Amazon Product Review ETL.

This analysis is to study the Amazon vine program. Amazon vine program is a paid program by Amazon to customers to review products listed on Amazon.com.

ETL

Using Pyspark, the data was extracted from an AWS S3 bucket, transform the data, connect to an AWS RDS instance, and then load the transformed data into pgAdmin. Next, using SQL to determine if there is any bias toward favorable reviews from Vine members in the dataset.

Result

In this analysios we focus only on the reviews that gave 5 star reviews and marked as helpful reviews for these products. There are 14630 participants with more than 20 of total reviews.

Numberofvine

  • Vine reviews morethan20reviews

  • Non Vine Reviews Totalofunpaid

  • Percentage of vine reviews: Total of vine helpful 5 star reviews divided by the total number of helpful and 5 star reviews. Percentageofvine

  • Percentage of non vine reviews: The total of non vine helpful and 5 star reviews divided by the total number of helpful and 5 star reviews. Percentageofnonvine

Summary

Based on this dataset for Grocery products, we notice that the majority of helpful reviews came from a non vine reviewers (unpaid). As a result, I don't think there would be a bias in the reviews for these product by vine participants.

In order to get more accurate study and analysis to measure the vine program, will need to analyze the eefect of unpaid reviews with no verified purchase on any of the products.

About

ETL on Amazon Product Reviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published