Project Description:
We need a web scraping and comparison tool to gather competitor pricing data for approximately 100,000 products that we sell. The goal is to collect and save competitor sell prices to a database for future analysis.
Competitor pricing will come from two main sources:
Shortlist of specific competitor websites (sites we will provide).
Top results from Google Search for the exact product name or SKU.
Product Input:
We give the Input of all product data (e.g., product name, SKU, and our sell price).
Support for bulk upload of product data in formats like CSV/Excel.
Competitor Price Collection:
Scrape pricing information for each product:
From a shortlist of predefined competitor websites.
From Google search results that sell the same product.
Identify products using keywords, SKUs, or model numbers.
Data to collect:
Competitor name/URL
Competitor sell price
Date and time of data collection
Database Storage:
Save all collected data to a database (preferably BigQuery or MySQL).
Ensure efficient storage of large datasets (100,000+ products with multiple competitor prices).
Include fields:
Product name
SKU
Our sell price
Competitor name
Competitor sell price
Source URL
Timestamp
Data Updates:
Automate scraping at regular intervals (daily or weekly).
Append new data to the database while keeping a history of older price data.
Technical Requirements:
Scalable and optimized to handle large product datasets.
Compliant with website terms of service.
Tool should allow cloud deployment for continuous operation.
Additional Notes:
We will provide:
A list of predefined competitor sites.
Product data (e.g., Norton 271R).
Database credentials for testing (BigQuery/MySQL).
Deliverables:
Functional web scraping tool.
Database integration for storing collected competitor pricing data.
Documentation for setup, database schema, and usage.
For example of what data should generate see (example):
[login to view URL]