Skip to content

A browser for master web scrapers. It saves cookies, has a decent user agent, sends referer urls, and let's you (the scraper) easily submit forms on the page. It *doesn't* run the page's javascript yet, however.

Notifications You must be signed in to change notification settings

crowdmob/hpricotscape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hpricotscape

A browser for master web scrapers. It saves cookies, has a decent user agent, sends referer urls, and let’s you (the scraper) easily submit forms on the page. It doesn’t run the page’s javascript yet, however.

Installation

In your application’s Gemfile, put:

  gem 'hpricotscape', :git => 'git@github.com:crowdmob/hpricotscape.git'

Usage

This operates a “browser” that returns Hpricot object for each page it accesses. General usage is as follows:

  browser = Hpricotscape::Browser.new # You can set debug_mode to true (which opens up security vulnerabilities), and any pre-set cookies you want to have
  dom_elements = browser.load('http://www.example.com/cart') # Loads a URL, a shopping card in this example
  dom_elements = browser.submit('form#cart', 'checkout', { item_0_qty: 2 }) # Submits the page's form#cart form with a submit button matching 'checkout', with the item_0_qty value set to 2

About

A browser for master web scrapers. It saves cookies, has a decent user agent, sends referer urls, and let's you (the scraper) easily submit forms on the page. It *doesn't* run the page's javascript yet, however.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages