Skip to content

Commit

Permalink
Create ruby.md
Browse files Browse the repository at this point in the history
  • Loading branch information
lorien committed Aug 16, 2015
1 parent 4973812 commit 485b95b
Showing 1 changed file with 142 additions and 0 deletions.
142 changes: 142 additions & 0 deletions ruby.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Python Web Scraping

This list contains ruby libraries related to web scraping and data processing

* [Python Web Scraping](#python-web-scraping)
* [Network](#network)
* [Web-scraping Frameworks](#web-scraping-frameworks)
* [HTML/XML Parsing](#htmlxml-parsing)
* [Text processing](#text-processing)
* [Specific Formats Processing](#specific-formats-processing)
* [Natural Language Processing](#natural-language-processing)
* [Downloader](#downloader)
* [Browser automation and emulation](#browser-automation-and-emulation)
* [Multiprocessing](#multiprocessing)
* [Queue](#queue)
* [Cloud Computing](#cloud-computing)
* [Email](#email)
* [URL Manipulation](#url-manipulation)
* [Web Content Extracting](#web-content-extracting)
* [Asynchronous](#asynchronous)
* [WebSocket](#websocket)
* [DNS Resolving](#dns-resolving)
* [Computer Vision](#computer-vision)
* [Geolocation](#geolocation)
* [Other Python Lists](#other-python-lists)

## Network

* [httparty](https://github.com/jnunemaker/httparty) Makes http fun again!
* [faraday](https://github.com/lostisland/faraday) Simple, but flexible HTTP client library, with support for multiple backends.
* [http](https://github.com/tarcieri/http) A simple Ruby DSL for making HTTP requests
* [excon](https://github.com/excon/excon) Usable, fast, simple HTTP(S) 1.1 for Ruby
* [nestful](https://github.com/maccman/nestful) Simple Ruby HTTP/REST client with a sane API
* [EM-HTTP-Request](https://github.com/igrigorik/em-http-request) - EventMachine based asynchronous HTTP client

## Web-Scraping Frameworks

* TODO

## HTML/XML Parsing

* [nokogiri](https://github.com/sparklemotion/nokogiri) - HTML, XML, SAX, and Reader parser with XPath and CSS selector support
* [loofah](https://github.com/flavorjones/loofah) - HTML/XML manipulation and sanitization based on Nokogiri

## Text Processing

*Libraries for parsing and manipulating plain texts.*

* General
* TODO

## Specific Formats Processing

*Libraries for parsing and manipulating specific text formats.*

* Office
* [Yomu](https://github.com/Erol) - Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)
* [spreadsheet](https://github.com/zdavatz/spreadsheet) - The Spreadsheet Library is designed to read and write Spreadsheet Documents.
* [roo](https://github.com/Empact/roo) - Roo implements read access for all spreadsheet types and read/write access for Google spreadsheets.
* [google-spreadsheet-ruby](https://github.com/gimite/google-spreadsheet-ruby) - This is a library to read/write Google Spreadsheet.
* [rubyXL](https://github.com/weshatheleopard/rubyXL) - rubyXL is a gem which allows the parsing, creation, and manipulation of Microsoft Excel (.xlsx/.xlsm) Documents
* [remote_table](https://github.com/seamusabshere/remote_table) - Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs.
* [sheets](https://github.com/bspaulding/Sheets) - Work with spreadsheets easily in a native ruby format.
* [workbook](https://github.com/murb/workbook) - Workbook contains workbooks, as in a table, contains rows, contains cells, reads/writes excel, ods and csv and tab separated files...
* [oxcelix](https://github.com/gbiczo/oxcelix) - A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects
* [wrap_excel](https://github.com/tomiacannondale/wrap_excel) - WrapExcel is to wrap the win32ole, and easy to use Excel operations with ruby. Detailed description please see the README.

## Natural Language Processing

*Libraries for working with human languages.*

* [Treat](https://github.com/louismullie/treat) - Treat is a toolkit for natural language processing and computational linguistics in Ruby

## Downloader

*Libraries for downloading.*

* TODO

## Browser automation and emulation
* TODO

## Multiprocessing

* [Celluloid](https://github.com/celluloid/celluloid) - Actor-based concurrent object framework for Ruby
* [Parallel](https://github.com/grosser/parallel) - Ruby parallel processing made simple and fast

## Asynchronous

*Libraries for asynchronous networking programming.*

* [EventMachine](https://github.com/eventmachine/eventmachine) - event-driven I/O and lightweight concurrency library

## Queue

* [Resque](https://github.com/resque/resque) A Redis-backed Ruby library for creating background jobs, placing them on multiple queues.
* [Delayed::Job](https://github.com/tobi/delayed_job) — Database backed asynchronous priority queue.
* [Qu](https://github.com/bkeepers/qu) A Ruby library for queuing and processing background jobs.
* [Sidekiq](https://github.com/mperham/sidekiq) Simple, efficient background processing for Ruby

## Cloud Computing
* TODO

## Email

*Libraries for parsing email.*

* [mail](https://github.com/mikel/mail) A Really Ruby Mail Library

## URL Manipulation

*Libraries for parsing URLs.*

* TODO

## Web Content Extracting

*Libraries for extracting web contents.*

* TODO


## WebSocket

*Libraries for working with WebSocket.*

* [em-websocket](https://github.com/igrigorik/em-websocket) - EventMachine based WebSocket server

## DNS Resolving
* TODO

## Computer Vision
* TODO

## Geolocation

* [geocoder](https://github.com/alexreisner/geocoder) Complete Ruby geocoding solution
* [Geokit](https://github.com/geokit/geokit) - Geokit gem provides geocoding and distance/heading calculations.

## Other ruby lists

* TODO

0 comments on commit 485b95b

Please sign in to comment.