Instagram Crawler

Overview

This project is an Instagram crawler that allows users to scrape data from Instagram locations and posts. It utilizes Selenium and apply multithreads for web automation and BeautifulSoup for HTML parsing. The crawler can extract information such as post counts, authors, and content from specified Instagram locations.

Features

Scrapes Instagram location data based on search queries.
Extracts post information including author, creation date, and content.
Saves the extracted data in JSON format.
Supports cookie management for login persistence.

Requirements

Python 3.x
Selenium
BeautifulSoup4
WebDriver Manager
Fake User Agent
Other dependencies specified in requirements.txt

Installation

Clone the repository:

git clone https://github.com/yourusername/instagram-crawler.git
cd instagram-crawler

Install the required packages:
```
pip install -r requirements.txt
```
The project uses WebDriver Manager to automatically handle the WebDriver for your browser, so no manual installation is required.

Usage

Run the Script: You can run the script directly from the command line. Make sure to provide your Instagram username and password:
```
python main.py -u your_username -p your_password
```

Search Query: You can customize the search query by using the -q option:

python main.py -u your_username -p your_password -q "your_search_query_here"

Indexes to Extract: You can specify which indexes to extract using the -i option:
```
python main.py -u your_username -p your_password -i 0 1 2 3 4 5 6
```
Follow the prompts in the console to log in to Instagram if required.

Configuration

Cookie Management: The crawler saves cookies to maintain login sessions. You can specify your Instagram username and password in the manual_login_and_save method in cookie_manager.py.

Contributing

Contributions are welcome! If you have suggestions for improvements or new features, feel free to open an issue or submit a pull request.

Acknowledgments

Selenium - For web automation.
BeautifulSoup - For HTML parsing.
WebDriver Manager - For managing browser drivers.
Fake User Agent - For generating random user agents.

Instagram 爬蟲

概述

這個項目是一個 Instagram 爬蟲，允許用戶從 Instagram 地點和貼文中抓取數據。它利用 Selenium 以多線程進行網頁自動化，並使用 BeautifulSoup 進行 HTML 解析。該爬蟲可以從指定的 Instagram 地點提取貼文數量、作者和內容等信息。

功能

根據搜索查詢抓取 Instagram 地點數據。
提取貼文信息，包括作者、建立日期和內容。
將提取的數據以 JSON 格式保存。
支持 cookie 管理以保持登錄狀態。

要求

Python 3.x
Selenium
BeautifulSoup4
WebDriver Manager
Fake User Agent
其他在 requirements.txt 中指定的依賴項

安裝

clone倉庫：

git clone https://github.com/yourusername/instagram-crawler.git
cd instagram-crawler

安裝所需的packages：
```
pip install -r requirements.txt
```
該項目使用 WebDriver Manager 自動處理瀏覽器的 WebDriver，因此不需要手動安裝。

使用

運行腳本：您可以直接從命令行運行腳本。請確保提供您的 Instagram 用戶名和密碼：
```
python main.py -u your_username -p your_password
```

搜索查詢：您可以使用 -q 選項自定義搜索查詢：

python main.py -u your_username -p your_password -q "your_search_query_here"

提取索引：您可以使用 -i 選項指定要提取的索引：

python main.py -u your_username -p your_password -i 0 1 2 3 4 5 6

如果需要，請按照控制台中的提示登錄 Instagram。

配置

Cookie 管理：爬蟲保存 cookies 以保持登錄會話。您可以在 cookie_manager.py 中的 manual_login_and_save 方法中指定您的 Instagram 用戶名和密碼。

貢獻

歡迎貢獻！如果您有改進或新功能的建議，請隨時提出問題或提交拉取請求。

感謝

Selenium - 用於網頁自動化。
BeautifulSoup - 用於 HTML 解析。
WebDriver Manager - 用於管理瀏覽器驅動程序。
Fake User Agent - 用於生成隨機用戶代理。

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
result		result
test		test
util		util
.gitignore		.gitignore
README.md		README.md
cookie_manager.py		cookie_manager.py
firefox_driver.py		firefox_driver.py
main.py		main.py
requirements.txt		requirements.txt
scrape.py		scrape.py
scrape_post.py		scrape_post.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instagram Crawler

Overview

Features

Requirements

Installation

Usage

Configuration

Contributing

Acknowledgments

Instagram 爬蟲

概述

功能

要求

安裝

使用

配置

貢獻

感謝

About

Releases

Packages

Languages

YChaoWang/Instagram_crawler

Folders and files

Latest commit

History

Repository files navigation

Instagram Crawler

Overview

Features

Requirements

Installation

Usage

Configuration

Contributing

Acknowledgments

Instagram 爬蟲

概述

功能

要求

安裝

使用

配置

貢獻

感謝

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages