GitHub - dx88968/hadoop-wiki-pageranking: Calculate the PageRank of the pages in the wikipedia dump.

--Hadoop example - Pageranking wikipedia

This project is a sample how to work with Hadoop. It contains 3 jobs to parse, calculate and order the pageranking of a Wikipedia dump. This source is used for the blog at xebia: http://blog.xebia.com/2011/09/wiki-pagerank-with-hadoop/

Requires:

Maven
Hadoop cluster with HDFS.
Wiki dump input file: http://dumps.wikimedia.org/nlwiki/latest/nlwiki-latest-pages-articles.xml.bz2
Eclipse with Hadoop plugin

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/com/xebia/sandbox/hadoop		src/com/xebia/sandbox/hadoop
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

dx88968/hadoop-wiki-pageranking

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages