POC/Web Analytics.txt

This project aims to do analytics on top of machine generated data (web server logs) and user generated data (transactions). We used Apache Hadoop for processing this data in distributed environment. Here we used to find popular urls, users, products, and many others

This Project aims to collect, aggregate, process, and report the web server logs. Web server logs contain the information about user clicks, user demographics and products information. We used this data to derive some insights like which products are doing well, which pages are popular, which users are active, page clicks count, time spent by each user (per month, per week, per day, per year) etc. This POC objective is to use the Open Distributed Computing Platform like Apache Hadoop and Hadoop Ecosystem to ingest data, to store on commodity hardware, to process data in parallel and finally to report the processed data.