This project explores the pricing dynamics of cosmetic products by analyzing their ingredients, brand influence, and customer ratings. Leveraging a dataset sourced from Sephora, the project uses ingredient scoring, brand analysis, and data visualization to uncover key insights that can guide consumers and businesses in making informed decisions within the beauty industry.
The dataset, sourced from the Comparing Cosmetics by Ingredients GitHub repository, contains over 1,400 cosmetic products with details on ingredients, prices, customer ratings, and skin type suitability.
The 20 most common ingredients found in the dataset include:
-
Water - 944 occurrences
-
Glycerin - 901 occurrences
-
Phenoxyethanol - 728 occurrences
-
Butylene Glycol - 693 occurrences
-
Sodium Hyaluronate - 402 occurrences
-
Caprylyl Glycol - 401 occurrences
-
Dimethicone - 382 occurrences
-
Xanthan Gum - 378 occurrences
-
Ethylhexylglycerin - 375 occurrences
-
Tocopheryl Acetate - 351 occurrences
-
Citric Acid - 322 occurrences
-
Tocopherol - 310 occurrences
-
Caprylic/Capric Triglyceride - 288 occurrences
-
Potassium Sorbate - 276 occurrences
-
Disodium EDTA - 266 occurrences
-
Carbomer - 264 occurrences
-
Fragrance - 256 occurrences
-
Sodium Hydroxide - 255 occurrences
-
Limonene - 248 occurrences
-
Sodium Benzoate - 247 occurrences
The analysis involved several key steps:
-
Ingredient Scoring: Each ingredient was researched and scored based on its effectiveness and safety, leading to a metric for ingredient quality. For instance:
-
High Quality: Sodium Hyaluronate, Tocopherol (+3)
-
Moderate Quality: Glycerin, Citric Acid (+2)
-
Low Quality: Phenoxyethanol, Fragrance (+1)
-
-
Visualization of Ingredient Quality vs. Price
-
A scatter plot was created to visually explore the relationship between ingredient quality and product pricing. This allowed us to see if there was any observable trend between higher-quality ingredients and product prices.
-
Observation: The scatter plot indicates that there is no strong visual trend suggesting that higher ingredient scores directly correlate with higher product prices. The data points are scattered, showing that factors other than ingredient quality might play a significant role in determining prices.
-
-
Exploring Other Factors: Various factors such as brand influence, product category, and customer ratings were analyzed to understand their impact on pricing.
-
Brand Influence: Analyzed the average prices across different brands.
-
Product Category Impact: Examined how different product types (e.g., moisturizers, treatments) affect pricing.
-
Customer Ratings: Investigated the correlation between customer ratings and product prices.
-
-
Ingredient Quality vs. Price: The scatter plot showed that there isn't a strong relationship between ingredient scores and product prices, suggesting that other factors might be more influential.
-
Brand Influence: Luxury brands like LA MER and SK-II command higher prices, often independent of ingredient quality, emphasizing the impact of brand reputation.
-
Product Category Impact: Treatment products are the most expensive on average, reflecting their specialized nature.
-
Customer Ratings: There is a very weak correlation between product prices and customer ratings, indicating that customer satisfaction is not a primary driver of pricing.
Contributions are welcome! Please fork this repository and submit a pull request with your enhancements. Ensure that your contributions align with the project's goals and include relevant tests.
This project is licensed under the MIT License - see the LICENSE file for details.
-
Data sourced from Sephora.
-
Visualization libraries: Matplotlib, Seaborn.
-
Special thanks to Satyam9090 for their help with data preprocessing.