Hello,
I am an experienced data engineer with expertise in building robust automated scraping pipelines using Python and its powerful libraries. For this project, I propose designing a comprehensive system that collects structured data from multiple sources, processes it for consistency and quality, and stores it efficiently in a PostgreSQL database.
The process begins with identifying reliable online sources for data collection. Using libraries like `BeautifulSoup`, `Selenium`, or `Scrapy`, I will implement web scraping to extract key information such as names, addresses, user reviews, ratings, and operating hours. Data will be processed using `pandas` for cleaning and standardization, ensuring it aligns with the provided Place Entity schema and PlaceTypes/PlaceSubTypes enums.
To maintain data quality, I will implement deduplication techniques and anomaly detection using Python’s `numpy` and `scikit-learn`. Finally, the cleaned and verified data will be stored in PostgreSQL using the `SQLAlchemy` library for seamless integration.
'Hire Me' to receive an automated system that ensures continuous and accurate data updates.
Best Regards,
Aneesa.