Yahoo Answers Bot to crawl and scrape data and insert into personal website
$100-250 USD
Cancelled
Posted about 15 years ago
$100-250 USD
Paid on delivery
I need a bot to auto crawl the Yahoo Answers website and scrape just the question data and the category and sub categories if any the question belongs to for example 'Electronics/Car'. So if the question is 'How do i connect the Nintendo Wii to the Internet?' only this line of data will be scrapped along with the categorie(s) it was inserted in on Yahoo answers. Once this information is scraped i require the bot to auto insert and create a web page on my website which consits of this information. This will use a template file which will be created dynamically thus creating a web page based on the question. Also the process will create and insert a link in the relevant category on my website. This has to be all automated and ensure that quality checks are in place and duplicate questions and content are not scraped. I also require a search function built into my website to search for questions that have been inserted. As mentioned all pages have to be auto genrated on the fly. So a bot tool which runs on a domain server will be the best course of action unless there is a better solution which can be suggested. I also require a sitemap which is Google webmaster compliant which i can add to Google Webmaster tools to let the Google bots know the pages which are available on my website. I also require the home page of my website to also need to be created with the question categories sitting on the left hand side which are clickable and then which create a bread crumb link trail similar to that of Yahoo Answers displaying the questions in each category as the user clicks further. The home page and all pages will have the search box displayed also so the user is able to search the whole website whichever page he/she is browsing at the current time.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows? (depending on the nature? of the deliverables):
a)? For web sites or? other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software? installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
The system will be created on a domain hosting account. I presume it will require a SQL database of some sort. I think for this it is best if the coder decides the best solution. In terms of weather to host it on Linux or Windows and or how much hosting space will be required etc. I will then purchase the domain name and hosting dependent on the requirements set out when finalised.