#

web-crawling

Here are 293 public repositories matching this topic...

crawlee

apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated Dec 27, 2024
TypeScript

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling pip web-scraping beautifulsoup web-crawling hacktoberfest headless-chrome apify playwright

Updated Dec 27, 2024
Python

botasaurus

omkarcloud / botasaurus

The All in One Framework to build Awesome Scrapers.

Updated Dec 7, 2024
Python

crawler

crwlrsoft / crawler

Library for Rapid (Web) Crawler and Scraper Development

php crawler scraper web-crawler scraping crawling web-scraper web-scraping scraping-websites web-crawling hacktoberfest

Updated Dec 17, 2024
PHP

scrapehero-code / amazon-scraper

A simple web scraper to extract Product Data and Pricing from Amazon

web-scraping web-crawling page-scraper web-scraping-tutorials amazon-scraper scrape-products

Updated Jun 13, 2023
Python

jrbadiabo / Bet-on-Sibyl

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

python machine-learning algorithms scikit-learn machine-learning-algorithms selenium web-scraping beautifulsoup machinelearning predictive-analysis python-2 web-crawling sports-stats sportsanalytics

Updated Feb 12, 2017
Jupyter Notebook

InfinityCrawler

TurnerSoftware / InfinityCrawler

A simple but powerful web crawler library for .NET

crawler spider web-crawler robots-txt web-crawling

Updated Dec 15, 2023
C#

ayakashi

ayakashi-io / ayakashi

⚡ Ayakashi.io - The next generation web scraping framework

data-mining automation web-scraping web-crawling headless-chrome

Updated Jun 29, 2023
TypeScript

spyboy-productions / omnisci3nt

Unveiling the Hidden Layers of the Web – A Comprehensive Web Reconnaissance Tool

Updated Aug 11, 2024
Jupyter Notebook

godkingjay / selenium-twitter-scraper

This is a Twitter Scraper which uses Selenium for scraping tweets. It is capable of scraping tweets from home, user profile, hashtag, query or search, and advanced searches.

scraper twitter selenium collaborate web-crawling hacktoberfest twitter-scraper selenium-scraper hacktoberfest-accepted

Updated Jun 26, 2024
Jupyter Notebook

scrapinghub / scrapy-training

Scrapy Training companion code

python training web-scraping scrapy web-crawling

Updated Jan 30, 2019
Python

clauneck

serpapi / clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

ruby open-source rubygem automation command-line email email-marketing data-extraction serp command-line-tool webscraping web-crawling data-extractor email-extractor email-scraper social-media-scraper email-extraction email-extract-with-proxy

Updated Mar 19, 2024
Ruby

brianmadden / krawler

A web crawling framework written in Kotlin

kotlin link-checker framework web-crawler webcrawler web-crawling crawler4j

Updated Jun 29, 2021
Kotlin

fintech-hub / bancocentralbrasil

💵 💰 🇧🇷 Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil

money brazil web-scraping brasil web-crawling banco-central

Updated Nov 30, 2021
Python

my8100 / scrapyd-cluster-on-heroku

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉

python heroku cluster web-scraping scrapy web-crawling scrapyd scrapydweb logparser

Updated Apr 4, 2020
Python

MaxValue / Terpene-Profile-Parser-for-Cannabis-Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Updated Apr 28, 2023
Python

maxmindlin / scout-lang

A web crawling programming language

programming-language scraper dsl scraping web-scraping scraping-websites web-crawling

Updated Aug 21, 2024
Rust

SoheilKhodayari / JAW

JAW: A Graph-based Security Analysis Framework for Client-side JavaScript

javascript neo4j static-analysis csrf client-side property-graph vulnerability-detection web-crawling

Updated Dec 12, 2024
JavaScript

jonasjacek / robots.txt

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

search-engine whitelist user-agent seo crawling twitterbot robots-txt googlebot crawlers web-crawling bingbot robots-exclusion-standard blocking-bots web-robots search-engine-optimization baiduspider

Updated Feb 18, 2024

alyakhtar / Katastrophe

Command Line Tool to download torrents

python screenshot torrent bittorrent command-line kickass-torrents deluge web-crawling

Updated Feb 3, 2017
Python

Improve this page

Add a description, image, and links to the web-crawling topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawling topic, visit your repo's landing page and select "manage topics."