This project was designed to demonstrate my proficiency with PostgreSQL and to gain expertise with Docker. I used PostgreSQL to uncover insights and Apache Superset to create dashboards that provide actionable insights into sales trends, customer behavior, and product performance.
The dataset consisted of transactions from an e-commerce store based in the UK, specializing in unique, all-occasion gifts. Many of the store's customers are wholesalers.
- PostgreSQL: Used to query and analyze data, uncovering key insights about sales and customer behavior.
- Apache Superset: Helped create interactive dashboards to visualize trends and make the data easier to understand.
- Python: Used for cleaning and preparing the data before diving into the analysis.
- Data Analysis: Cohort analysis, retention trends, revenue tracking, and customer segmentation.
- Data Storytelling: Turning raw data into actionable insights that drive decision-making.
- Visualization: Building intuitive dashboards to effectively communicate findings.
-
Seasonal Spikes:
- The November peak suggests a seasonal trend (e.g., holiday shopping) that drives both customer engagement and total sales volume.
-
Stable Months:
- May to July exhibit steady but lower levels of sales and customer counts, indicating periods of lower demand.
-
Heavy Customer Retention:
- Even when unique customer counts drop in certain months like February and December, total items sold remains high, suggesting repeat purchases by loyal customers.
-
Leverage Seasonal Trends:
- Plan marketing campaigns and promotions in advance of November to capitalize on seasonal spikes in customer engagement and sales volume.
-
Stimulate Demand in Lower Periods:
- Use promotions or discounts during May to July to drive sales during traditionally lower-demand months.
WITH rfm AS (
SELECT
"CustomerID",
MAX("InvoiceDate") AS most_recent_purchase,
COUNT("InvoiceNo") AS frequency,
ROUND(SUM("Quantity" * "UnitPrice")::NUMERIC, 2) AS monetary_value
FROM online_retail
GROUP BY "CustomerID"
)
SELECT
"CustomerID",
DATE_PART('day', CURRENT_DATE - most_recent_purchase) AS recency,
frequency,
monetary_value,
CASE
WHEN DATE_PART('day', CURRENT_DATE - most_recent_purchase) <= 30
AND frequency >= 5
AND monetary_value > 500 THEN 'Loyal Customers'
WHEN DATE_PART('day', CURRENT_DATE - most_recent_purchase) > 60
AND frequency <= 2 THEN 'At Risk'
ELSE 'Casual Buyers'
END AS rfm_category
FROM rfm
ORDER BY monetary_value DESC;
- A noticeable spike occurs in November, suggesting a seasonal sales boost, likely due to holiday shopping.
- Some products like JUMBO BAG RED RETROSPOT and POP ART HOLDER show a significant increase during this time.
- A specific product category StockCode 22457 dominates in terms of both revenue and sales volume.
- Top Sellers:
- JUMBO BAG RED RETROSPOT and ASSORTED COLOUR BIRD ORNAMENT are clear top sellers in terms of quantity.
- High-Revenue Products:
- Some products, such as REGENCY CAKESTAND 3 TIER, generate disproportionately high revenue compared to sales volume.
- Increase marketing and stock availability for top-performing products like JUMBO BAG RED RETROSPOT to maximize sales during peak seasons.
- Review pricing strategies for high-revenue, low-quantity products like REGENCY CAKESTAND 3 TIER to optimize profitability.
SELECT
DATE_TRUNC('month', "InvoiceDate") AS month,
"Description",
SUM("Quantity") AS total_quantity_sold
FROM online_retail
GROUP BY month, "Description"
ORDER BY month, total_quantity_sold DESC;
- The 2010-12 cohort retains customers well beyond 12 months, with retention stabilizing around 30–50% in later months.
- Dominance of Low-Value Customers:
- The majority of customers fall into the Low Value segment.
- Medium and High-Value Customers:
- A small number of customers exist in the High Value segment, but they contribute significantly to revenue.
- Focus retention and upselling efforts on Medium and High-Value customers to maximize profitability.
- Develop strategies to move Low Value customers into higher segments by:
- Offering personalized promotions.
- Implementing cross-selling and bundling strategies.
WITH first_purchases AS (
SELECT
"CustomerID",
TO_CHAR(MIN("InvoiceDate"), 'YYYY-MM') as cohort_month,
DATE_TRUNC('month', MIN("InvoiceDate")) as first_purchase_date
FROM online_retail
WHERE "CustomerID" IS NOT NULL
GROUP BY "CustomerID"
),
customer_purchases AS (
SELECT
f."CustomerID",
f.cohort_month,
TO_CHAR(o."InvoiceDate", 'YYYY-MM') as purchase_month,
EXTRACT(YEAR FROM age(DATE_TRUNC('month', o."InvoiceDate"), f.first_purchase_date)) * 12 +
EXTRACT(MONTH FROM age(DATE_TRUNC('month', o."InvoiceDate"), f.first_purchase_date)) as month_number
FROM online_retail o
JOIN first_purchases f ON o."CustomerID" = f."CustomerID"
),
cohort_size AS (
SELECT
cohort_month,
COUNT(DISTINCT "CustomerID") as num_customers
FROM first_purchases
GROUP BY cohort_month
)
SELECT
cp.cohort_month,
cp.month_number,
COUNT(DISTINCT cp."CustomerID") as num_customers,
cs.num_customers as original_cohort_size,
ROUND(100.0 * COUNT(DISTINCT cp."CustomerID") / cs.num_customers, 2) as retention_percentage
FROM customer_purchases cp
JOIN cohort_size cs ON cp.cohort_month = cs.cohort_month
WHERE cp.month_number <= 12
GROUP BY cp.cohort_month, cp.month_number, cs.num_customers
ORDER BY cp.cohort_month, cp.month_number;
- Observation:
- The average order value remains relatively stable throughout the year, hovering between 15 and 20 units.
- Slight increases occur in March and July, followed by a slight decline in November.
- Insight:
- Despite seasonal revenue spikes, the average order value does not fluctuate significantly, indicating consistent customer purchasing patterns.
- Recommendation: Target campaigns to increase order value through strategies like:
- Bundling: Offer discounts for purchasing related items together.
- Upselling: Encourage higher-value purchases during high-revenue months.
- Observation:
- Revenue steadily increases from February to October, peaking in November.
- A sharp drop in revenue occurs after November, likely due to the end of a seasonal sales period or holiday shopping.
- Insight:
- November is a significant revenue driver, likely due to seasonal demand (e.g., holidays or promotions).
- Recommendation: Plan inventory and marketing campaigns well in advance to capitalize on November’s peak revenue period.
SELECT
DATE_TRUNC('month', "InvoiceDate") AS month,
ROUND(AVG("Quantity" * "UnitPrice")::NUMERIC,2) AS average_order_value
FROM online_retail
GROUP BY month
ORDER BY month;