--Hadoop example - Pageranking wikipedia
This project is a sample how to work with Hadoop. It contains 3 jobs to parse, calculate and order the pageranking of a Wikipedia dump. This source is used for the blog at xebia: http://blog.xebia.com/2011/09/wiki-pagerank-with-hadoop/
Requires:
- Maven
- Hadoop cluster with HDFS.
- Wiki dump input file: http://dumps.wikimedia.org/nlwiki/latest/nlwiki-latest-pages-articles.xml.bz2
- Eclipse with Hadoop plugin