GitHub - flyabroad/Crawler: Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as the link relationship (both incoming and outgoing) between each page.

flyabroad / Crawler Public

forked from FCC/Crawler

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as the link relationship (both incoming and outgoing) between each page.

1 star 40 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
includes		includes
README.txt		README.txt
browse.php		browse.php
config.php		config.php
crawl.php		crawl.php
create-tables.sql		create-tables.sql
export.php		export.php
query.php		query.php
sitemap.php		sitemap.php
stats.php		stats.php

Repository files navigation

TO USE:

1. Edit config.PHP with appropriate database and domain information
2. (for now) in phpMyAdmin insert the seed URL into the urls table.
	* URL should be www.
	* URL should have a trailing slash
	* (for now) May also want to set clicks to '0' to avoid problems 
3. Open crawler.php
4. (optional) open stats.php to watch progress

TIPS:
	Changes to php.ini
		1. Increase memory limit (1GB)
		2. Remove execution time limit
	Changes to mysql.ini
		* Increased max query size (to avoid "mysql went away" error)

Additional documentation (source code) in (/source)