From 485b95b5a723c9e050d8a7674cba112350a8db95 Mon Sep 17 00:00:00 2001 From: Gregory Petukhov Date: Sun, 16 Aug 2015 21:16:59 +0500 Subject: [PATCH] Create ruby.md --- ruby.md | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 ruby.md diff --git a/ruby.md b/ruby.md new file mode 100644 index 0000000..df8cc4b --- /dev/null +++ b/ruby.md @@ -0,0 +1,142 @@ +# Python Web Scraping + +This list contains ruby libraries related to web scraping and data processing + +* [Python Web Scraping](#python-web-scraping) + * [Network](#network) + * [Web-scraping Frameworks](#web-scraping-frameworks) + * [HTML/XML Parsing](#htmlxml-parsing) + * [Text processing](#text-processing) + * [Specific Formats Processing](#specific-formats-processing) + * [Natural Language Processing](#natural-language-processing) + * [Downloader](#downloader) + * [Browser automation and emulation](#browser-automation-and-emulation) + * [Multiprocessing](#multiprocessing) + * [Queue](#queue) + * [Cloud Computing](#cloud-computing) + * [Email](#email) + * [URL Manipulation](#url-manipulation) + * [Web Content Extracting](#web-content-extracting) + * [Asynchronous](#asynchronous) + * [WebSocket](#websocket) + * [DNS Resolving](#dns-resolving) + * [Computer Vision](#computer-vision) + * [Geolocation](#geolocation) + * [Other Python Lists](#other-python-lists) + +## Network + +* [httparty](https://github.com/jnunemaker/httparty) Makes http fun again! +* [faraday](https://github.com/lostisland/faraday) Simple, but flexible HTTP client library, with support for multiple backends. +* [http](https://github.com/tarcieri/http) A simple Ruby DSL for making HTTP requests +* [excon](https://github.com/excon/excon) Usable, fast, simple HTTP(S) 1.1 for Ruby +* [nestful](https://github.com/maccman/nestful) Simple Ruby HTTP/REST client with a sane API +* [EM-HTTP-Request](https://github.com/igrigorik/em-http-request) - EventMachine based asynchronous HTTP client + +## Web-Scraping Frameworks + + * TODO + +## HTML/XML Parsing + +* [nokogiri](https://github.com/sparklemotion/nokogiri) - HTML, XML, SAX, and Reader parser with XPath and CSS selector support +* [loofah](https://github.com/flavorjones/loofah) - HTML/XML manipulation and sanitization based on Nokogiri + +## Text Processing + +*Libraries for parsing and manipulating plain texts.* + +* General + * TODO + +## Specific Formats Processing + +*Libraries for parsing and manipulating specific text formats.* + +* Office + * [Yomu](https://github.com/Erol) - Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf) + * [spreadsheet](https://github.com/zdavatz/spreadsheet) - The Spreadsheet Library is designed to read and write Spreadsheet Documents. + * [roo](https://github.com/Empact/roo) - Roo implements read access for all spreadsheet types and read/write access for Google spreadsheets. + * [google-spreadsheet-ruby](https://github.com/gimite/google-spreadsheet-ruby) - This is a library to read/write Google Spreadsheet. + * [rubyXL](https://github.com/weshatheleopard/rubyXL) - rubyXL is a gem which allows the parsing, creation, and manipulation of Microsoft Excel (.xlsx/.xlsm) Documents + * [remote_table](https://github.com/seamusabshere/remote_table) - Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs. + * [sheets](https://github.com/bspaulding/Sheets) - Work with spreadsheets easily in a native ruby format. + * [workbook](https://github.com/murb/workbook) - Workbook contains workbooks, as in a table, contains rows, contains cells, reads/writes excel, ods and csv and tab separated files... + * [oxcelix](https://github.com/gbiczo/oxcelix) - A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects + * [wrap_excel](https://github.com/tomiacannondale/wrap_excel) - WrapExcel is to wrap the win32ole, and easy to use Excel operations with ruby. Detailed description please see the README. + +## Natural Language Processing + +*Libraries for working with human languages.* + +* [Treat](https://github.com/louismullie/treat) - Treat is a toolkit for natural language processing and computational linguistics in Ruby + +## Downloader + +*Libraries for downloading.* + +* TODO + +## Browser automation and emulation +* TODO + +## Multiprocessing + +* [Celluloid](https://github.com/celluloid/celluloid) - Actor-based concurrent object framework for Ruby +* [Parallel](https://github.com/grosser/parallel) - Ruby parallel processing made simple and fast + +## Asynchronous + +*Libraries for asynchronous networking programming.* + +* [EventMachine](https://github.com/eventmachine/eventmachine) - event-driven I/O and lightweight concurrency library + +## Queue + + * [Resque](https://github.com/resque/resque) A Redis-backed Ruby library for creating background jobs, placing them on multiple queues. + * [Delayed::Job](https://github.com/tobi/delayed_job) — Database backed asynchronous priority queue. + * [Qu](https://github.com/bkeepers/qu) A Ruby library for queuing and processing background jobs. + * [Sidekiq](https://github.com/mperham/sidekiq) Simple, efficient background processing for Ruby + +## Cloud Computing +* TODO + +## Email + +*Libraries for parsing email.* + + * [mail](https://github.com/mikel/mail) A Really Ruby Mail Library + +## URL Manipulation + +*Libraries for parsing URLs.* + +* TODO + +## Web Content Extracting + +*Libraries for extracting web contents.* + +* TODO + + +## WebSocket + +*Libraries for working with WebSocket.* + +* [em-websocket](https://github.com/igrigorik/em-websocket) - EventMachine based WebSocket server + +## DNS Resolving +* TODO + +## Computer Vision +* TODO + +## Geolocation + + * [geocoder](https://github.com/alexreisner/geocoder) Complete Ruby geocoding solution + * [Geokit](https://github.com/geokit/geokit) - Geokit gem provides geocoding and distance/heading calculations. + +## Other ruby lists + +* TODO