Skip to content

karanjeets/PCF-Nutch-on-Wrangler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawl - Evaluation

A crawl evaluation of Apache Nutch v1.12. We are running our crawls on TACC Wrangler, a supercomputer funded by NSF, in both Hadoop and Local mode thereby pushing the crawler to its limits for a best throughput.

We are evaluating Nutch all kind of crazy stuff - Broad crawling, Focused crawling, Inteligient Crawling, Domain Discovery and many more...

The project has a sample crawling workspace for Wrangler which is both automated and portable. More details can be found from the respective README files.

Quick Links

About

A repository for Nutch crawl evaluation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •