Skip to content

Commit

Permalink
Merge pull request #862 from ScrapeGraphAI/pre/beta
Browse files Browse the repository at this point in the history
Pre/beta
  • Loading branch information
VinciGit00 authored Jan 3, 2025
2 parents 96064f2 + a9569ac commit bf7326f
Show file tree
Hide file tree
Showing 10 changed files with 580 additions and 91 deletions.
44 changes: 33 additions & 11 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,40 +14,55 @@ jobs:
run: |
sudo apt update
sudo apt install -y git
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install uv
uses: astral-sh/setup-uv@v3

- name: Install Node Env
uses: actions/setup-node@v4
with:
node-version: 20

- name: Checkout
uses: actions/checkout@v4.1.1
with:
fetch-depth: 0
persist-credentials: false
- name: Build app

- name: Build and validate package
run: |
uv venv
. .venv/bin/activate
uv pip install --upgrade setuptools wheel hatchling
uv sync --frozen
uv pip install -e .
uv build
id: build_cache
if: success()
uv pip install --upgrade pkginfo==1.12.0 twine==6.0.1 # Upgrade pkginfo and install twine
python -m twine check dist/*
- name: Debug Dist Directory
run: ls -al dist

- name: Cache build
uses: actions/cache@v2
uses: actions/cache@v3
with:
path: ./dist
key: ${{ runner.os }}-build-${{ hashFiles('dist/**') }}
if: steps.build_cache.outputs.id != ''
key: ${{ runner.os }}-build-${{ github.sha }}

release:
name: Release
runs-on: ubuntu-latest
needs: build
environment: development
if: |
github.event_name == 'push' && github.ref == 'refs/heads/main' ||
github.event_name == 'push' && github.ref == 'refs/heads/pre/beta' ||
github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged && github.event.pull_request.base.ref == 'main' ||
github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged && github.event.pull_request.base.ref == 'pre/beta'
if: >
github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/pre/beta') ||
(github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged &&
(github.event.pull_request.base.ref == 'main' || github.event.pull_request.base.ref == 'pre/beta'))
permissions:
contents: write
issues: write
Expand All @@ -59,6 +74,13 @@ jobs:
with:
fetch-depth: 0
persist-credentials: false

- name: Restore build artifacts
uses: actions/cache@v3
with:
path: ./dist
key: ${{ runner.os }}-build-${{ github.sha }}

- name: Semantic Release
uses: cycjimmy/semantic-release-action@v4.1.0
with:
Expand Down
130 changes: 129 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,144 @@
## [1.33.11](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.33.10...v1.33.11) (2025-01-02)
## [1.34.0-beta.14](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.13...v1.34.0-beta.14) (2025-01-03)


### Bug Fixes

* add model tokens ([9b16cb9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/9b16cb987fd93132d814ebd933af1565eb166331))
* revert ([b312251](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b312251cc56ee4c82554ecf116b5e6edd1560726))
* revert ([bb5de58](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/bb5de581c064a1d141f849081e52987500957d1c))
* validate URL only if the input type is a URL ([e2caee6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/e2caee695ecce2d13aa5a82306097b1a80ba0e18))


### Docs

* added api reference 🔗 ([67038e1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/67038e195224e1a721fe123ad1d5604b3592df20))
* added official cookbook reference ([98aa74f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/98aa74ff2d35041884130be14efdf47ca5e716df))
* fixed missing import ([96064f2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/96064f20ee8a849a2548f293419cf9028386c47b))
* updated documentation reference ([fe89ae2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fe89ae29e6dc5f4322c25c693e2c9f6ce958d6e2))


### CI

* **release:** 1.33.10 [skip ci] ([a44b74a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a44b74aa6f7be7cdb4bdbebebc3b51a6d54a51e6))
* **release:** 1.33.11 [skip ci] ([30f48b3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/30f48b394f6eb8c7c9a1fa113bffabd2ac1ac585))
* **release:** 1.33.9 [skip ci] ([9b6d6c0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/9b6d6c0efb2fd1af5bf87cf61a0ba3d79876d21d))

## [1.34.0-beta.13](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.12...v1.34.0-beta.13) (2025-01-03)



### Bug Fixes

* bump hatchling version to 1.26.3 ([159ed32](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/159ed329d2e8fa86015df1e59a7e2ebb439c6ec0))

## [1.34.0-beta.12](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.11...v1.34.0-beta.12) (2025-01-02)


### Docs

### Bug Fixes

* removed license for license-files ([b5acfb4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b5acfb414321989c45f76fad82f0d720ec889274))

## [1.34.0-beta.11](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.10...v1.34.0-beta.11) (2025-01-02)


### Bug Fixes

* added license-files = [ ([9150e4c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/9150e4c95fa468afe9ddda3f1278b5037a2d0f38))

## [1.34.0-beta.10](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.9...v1.34.0-beta.10) (2025-01-02)


### Bug Fixes

* upgrade twine ([020e211](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/020e21123889c6483459e9db1c3c796cbc116140))

## [1.34.0-beta.9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.8...v1.34.0-beta.9) (2025-01-02)


### Bug Fixes

* update pkginfo ([9203ab9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/9203ab9a4ab4400105fd34433684f9ac2453f35c))

## [1.34.0-beta.8](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.7...v1.34.0-beta.8) (2025-01-02)


### Bug Fixes

* added twine ([df07da9](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/df07da9bcc59cbccf1c45d69e3a3e904eaed565b))
* twine ([eb36a2b](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/eb36a2b630d62363f3c57e243f2b90cf530c0a3b))
* uv virtual env ([fce9886](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/fce988687b3dc6fc36ce9244a8c2744f4a25d561))
* version ([95b8990](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/95b8990a3649646972e12d78b11c7e1b7e707bf6))
* workflow ([abe2945](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/abe29457f2380932d070bfd607c8ab5f749627c3))

## [1.34.0-beta.7](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.6...v1.34.0-beta.7) (2025-01-02)


### Bug Fixes

* revert to d1b2104 ([a0c0a7f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a0c0a7ff5c5dc9a107e7be8d5b5e1854886d411c))

## [1.34.0-beta.6](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.5...v1.34.0-beta.6) (2025-01-02)


### Bug Fixes

* release workflow ([a00f128](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a00f128992e9fef88c870295c46b983b4286a3eb))

## [1.34.0-beta.5](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.4...v1.34.0-beta.5) (2025-01-02)


### Bug Fixes

* release workflow ([cb6d140](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/cb6d140042685bd419444d75ae7cab706cbcee38))
* uv build ([1be6ffe](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/1be6ffe309124d55b8b3b66ded448f06dfd87b7e))
* uv install workflow ([bcac20a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/bcac20a7a8e65e2aa5760fb14e17b8054b4f4cf4))

## [1.34.0-beta.4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.3...v1.34.0-beta.4) (2024-12-18)


### Bug Fixes

* build config ([b186a4f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/b186a4f1c73fe29fa706158cc3c61812d6b16343))
* build config ([46f5985](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/46f598546109067267d01ae7d8ea7609526ea4d4))
* build config ([d2fc53f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/d2fc53fc8414475c9bee7590144fe4251d56faf4))
* last desperate attempt to restore automatic builds ([2538fe3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/2538fe3db339014ef54e2c78269bce9259e284ea))
* release config ([9cd0d31](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/9cd0d31882c22f347ebd9c58d8dd66b47d178c64))
* release config ([62ee294](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/62ee294a864993a9414644c1547bafb96a43df20))
* release config ([89863ee](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/89863ee166e09ee18287bfcc1b5475d894c9e8c6))
* release config ([38e477c](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/38e477c540a3a50fc7ff6120da255d51798bfadd))

## [1.34.0-beta.3](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.2...v1.34.0-beta.3) (2024-12-18)


### Bug Fixes

* pyproject ([35a4907](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/35a490747cf6b8dad747a4af7f02d6f5aeb0d338))

## [1.34.0-beta.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.34.0-beta.1...v1.34.0-beta.2) (2024-12-17)


### Bug Fixes

* context window ([ffdadae](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ffdadaed6fe3f17da535e6eddb73851fce2f4bf2))
* formatting ([d1b2104](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/d1b2104f28d84c5129edb29a5efdaf5bf7d22bfb))
* pyproject ([76ac0a2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/76ac0a2141d9d53af023a405e2c61849921e4f0e))
* pyproject ([3dcfcd4](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/3dcfcd492e71297031a7df1dba9dd135f1fae60e))
* pyproject ([bf6cb0a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/bf6cb0a582004617724e11ed04ba617eb39abc0c))
* uv.lock ([0a7fc39](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/0a7fc392dea2b62122b977d62f4d85b117fc8351))


### CI

* **release:** 1.33.3 [skip ci] ([488093a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/488093a63fcc1dc01eabdab301d752416a025139))
* **release:** 1.33.4 [skip ci] ([a789179](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/a78917997060edbd61df5279546587e4ef123ea1))
* **release:** 1.33.5 [skip ci] ([7a6164f](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/7a6164f1dc6dbb8ff0b4f7fc653f3910445f0754))
* **release:** 1.33.6 [skip ci] ([ca96c3d](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ca96c3d4309bd2b92c87a2b0095578dda302ad92))
* **release:** 1.33.7 [skip ci] ([7a5764e](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/7a5764e3fdbfea12b04ea0686a28025a9d89cb2f))
* **release:** 1.33.8 [skip ci] ([bdd6a39](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/bdd6a392e2c18de8c3e4e47e2f91a4a366365ff2))


## [1.33.2](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.33.1...v1.33.2) (2024-12-06)


Expand All @@ -29,6 +156,7 @@
## [1.33.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.32.0...v1.33.0) (2024-12-05)



### Features

* add api integration ([8aa9103](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/8aa9103f02af92d9e1a780450daa7bb303afc150))
Expand Down
119 changes: 119 additions & 0 deletions examples/extras/chromium_selenium.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
import asyncio
import os
import json
from dotenv import load_dotenv
from scrapegraphai.docloaders.chromium import ChromiumLoader # Import your ChromiumLoader class
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
from aiohttp import ClientError

# Load environment variables for API keys
load_dotenv()

# ************************************************
# Define function to analyze content with ScrapegraphAI
# ************************************************
async def analyze_content_with_scrapegraph(content: str):
"""
Analyze scraped content using ScrapegraphAI.
Args:
content (str): The scraped HTML or text content.
Returns:
dict: The result from ScrapegraphAI analysis.
"""
try:
# Initialize ScrapegraphAI SmartScraperGraph
smart_scraper = SmartScraperGraph(
prompt="Summarize the main content of this webpage and extract any contact information.",
source=content, # Pass the content directly
config={
"llm": {
"api_key": os.getenv("OPENAI_API_KEY"),
"model": "openai/gpt-4o",
},
"verbose": True
}
)
result = smart_scraper.run()
return result
except Exception as e:
print(f"❌ ScrapegraphAI analysis failed: {e}")
return {"error": str(e)}

# ************************************************
# Test scraper and ScrapegraphAI pipeline
# ************************************************
async def test_scraper_with_analysis(scraper: ChromiumLoader, urls: list):
"""
Test scraper for the given backend and URLs, then analyze content with ScrapegraphAI.
Args:
scraper (ChromiumLoader): The ChromiumLoader instance.
urls (list): A list of URLs to scrape.
"""
for url in urls:
try:
print(f"\n🔎 Scraping: {url} using {scraper.backend}...")
result = await scraper.scrape(url)

if "Error" in result or not result.strip():
print(f"❌ Failed to scrape {url}: {result}")
else:
print(f"✅ Successfully scraped {url}. Content (first 200 chars): {result[:200]}")

# Pass scraped content to ScrapegraphAI for analysis
print("🤖 Analyzing content with ScrapegraphAI...")
analysis_result = await analyze_content_with_scrapegraph(result)
print("📝 Analysis Result:")
print(json.dumps(analysis_result, indent=4))

except ClientError as ce:
print(f"❌ Network error while scraping {url}: {ce}")
except Exception as e:
print(f"❌ Unexpected error while scraping {url}: {e}")

# ************************************************
# Main Execution
# ************************************************
async def main():
urls_to_scrape = [
"https://example.com",
"https://www.python.org",
"https://invalid-url.test"
]

# Test with Playwright backend
print("\n--- Testing Playwright Backend ---")
try:
scraper_playwright_chromium = ChromiumLoader(urls=urls_to_scrape, backend="playwright", headless=True, browser_name = "chromium")
await test_scraper_with_analysis(scraper_playwright_chromium, urls_to_scrape)

scraper_playwright_firefox = ChromiumLoader(urls=urls_to_scrape, backend="playwright", headless=True, browser_name = "firefox")
await test_scraper_with_analysis(scraper_playwright_firefox, urls_to_scrape)
except ImportError as ie:
print(f"❌ Playwright ImportError: {ie}")
except Exception as e:
print(f"❌ Error initializing Playwright ChromiumLoader: {e}")

# Test with Selenium backend
print("\n--- Testing Selenium Backend ---")
try:
scraper_selenium_chromium = ChromiumLoader(urls=urls_to_scrape, backend="selenium", headless=True, browser_name = "chromium")
await test_scraper_with_analysis(scraper_selenium_chromium, urls_to_scrape)

scraper_selenium_firefox = ChromiumLoader(urls=urls_to_scrape, backend="selenium", headless=True, browser_name = "firefox")
await test_scraper_with_analysis(scraper_selenium_firefox, urls_to_scrape)
except ImportError as ie:
print(f"❌ Selenium ImportError: {ie}")
except Exception as e:
print(f"❌ Error initializing Selenium ChromiumLoader: {e}")

if __name__ == "__main__":
try:
asyncio.run(main())
except KeyboardInterrupt:
print("❌ Program interrupted by user.")
except Exception as e:
print(f"❌ Program crashed: {e}")
12 changes: 3 additions & 9 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,6 @@
[project]
name = "scrapegraphai"



version = "1.33.11"



version = "1.34.0b14"

description = "A web scraping library based on LangChain which uses LLM and direct graph logic to create scraping pipelines."
authors = [
Expand Down Expand Up @@ -48,7 +42,6 @@ dependencies = [
"scrapegraph-py>=1.7.0"
]

license = "MIT"
readme = "README.md"
homepage = "https://scrapegraphai.com/"
repository = "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
Expand Down Expand Up @@ -115,7 +108,8 @@ screenshot_scraper = [
]

[build-system]
requires = ["hatchling"]
requires = ["hatchling==1.26.3"]

build-backend = "hatchling.build"

[dependency-groups]
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
pytest==8.0.0
pytest-asyncio==0.25.0
pytest-mock==3.14.0
burr[start]==0.22.1
sphinx==6.0
Expand Down
Loading

0 comments on commit bf7326f

Please sign in to comment.