Skip to content

Commit

Permalink
use pyproject.toml package
Browse files Browse the repository at this point in the history
  • Loading branch information
yaroslaff committed Jun 27, 2024
1 parent 6b4f734 commit 1d890ee
Show file tree
Hide file tree
Showing 10 changed files with 69 additions and 36 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ __pycache__/
*.pyc
.local/
poetry.lock
.venv
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,9 @@ INTERESTING https://telegra.ph/sasha-grey-XXXXX
~~~

## Refine search/filtering
Nudecrawler uses [evalidate](https://github.com/yaroslaff/evalidate) to filter results with python expression (`--expr`). With `-h` help will list all avaliable variables, like: `total_images nude_images nonnude_images new_nude_images new_nonnude_images new_total_images total_video`.
Default value: `(total_images>5 and new_nude_images>0) or total_video>0`.
Nudecrawler uses [evalidate](https://github.com/yaroslaff/evalidate) to filter results with python expression (`--expr`). With `-h` help will list all avaliable variables, like: `total_images`, `nude_images`, `nonnude_images`, `new_nude_images`, `new_nonnude_images`, `new_total_images`, `total_video`. `new_` variables are about new images (not found in cache). e.g. `--expr 'total_images>20 and new_nude_images>5'` will print only pages with more then 20 images and 5 nude images (not found in cache). This is good method to skip pages with duplicated content.

Default value: `nude_images > 0`.

Use `-a`/`--all` to get some results ASAP (but later you may want to make some filtering)

Expand Down Expand Up @@ -194,7 +195,6 @@ You can save almost all pages and then filter it with jq (get only interesting r
jq 'select(.nude_images>1 and .total_images>1) | {"url": .url, "total": .total_images}' < /tmp/n.json
~~~


### Working with different nudity detectors

NudeCrawler can work with different nudity detectors and very easy to extend. Option `-a`/`--all` will disable detection totally, and it will report all pages.
Expand All @@ -207,6 +207,8 @@ if you will use `/bin/true` as script, it will detect all images as nude, and `/

Scripts are usually installed to /usr/local/bin and if it's in $PATH, you do not need to specify full path to script, nudecrawler will find it in $PATH.



#### detector: nsfw_api (recommended)

To use [nsfw_api](https://github.com/arnidan/nsfw-api):
Expand Down Expand Up @@ -243,7 +245,7 @@ Using NudeNet does not requires docker, but you need to install `pip3 install -U

Right way workaround is simple - after you will install NudeNet download model *manually* (no wget!) and place it to `~/.NudeNet/`

Or you can download from my temporary site: `wget https://nudecrawler.netlify.app/classifier_model.onnx` (But I cannot promise it will be there forever) and put it to ~/.NudeNet .
Or you can download from my temporary site: `wget -O ~/.NudeNet/classifier_model.onnx https://nudecrawler.netlify.app/classifier_model.onnx` (But I cannot promise it will be there forever) and put it to ~/.NudeNet .


##### Using NudeNet with NudeCrawler
Expand Down
8 changes: 5 additions & 3 deletions nudecrawler/page.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from .exceptions import *
from .cache import cache

from evalidate import evalidate, EvalException
from evalidate import Expr, EvalException

import requests
import hashlib
Expand Down Expand Up @@ -93,8 +93,9 @@ def __init__(self, url: str, all_found=False, detect_image=None, min_total_image
self.content_length = None

# can throw evalidate.EvalExpression here
node = evalidate(expr)
self._code = compile(node, '<user filter>', 'eval')
# node = Expr(expr).code
# self._code = compile(node, '<user filter>', 'eval')
self._code = Expr(expr).code

printv("Processing:", self.url)

Expand Down Expand Up @@ -297,6 +298,7 @@ def is_nude(self, url):
self.do_detect_image(url)
except Exception as e:
printv("Broken image:", url)
print(e)
return


Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
11 changes: 6 additions & 5 deletions bin/nudecrawler → nudecrawler/scripts/nudecrawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,9 @@
"false": ("builtin", ":false"),
"nudepy": ("builtin", ":nude"),
"nudenetb": ("builtin", ":nudenet"),
"aid": ("image", "detect-image-aid.py"),
"nsfwapi": ("image", "detect-image-nsfw-api.py"),
"nudenet": ("image", "detect-image-nudenet.py")
"aid": ("image", "detect-image-aid"),
"nsfwapi": ("image", "detect-image-nsfw-api"),
"nudenet": ("image", "detect-image-nudenet")
}

def get_args(argv=None):
Expand All @@ -114,7 +114,8 @@ def get_args(argv=None):

parser = argparse.ArgumentParser(description=f'Nudecrawler: Telegra.ph Spider {version}\nhttps://github.com/yaroslaff/nudecrawler', formatter_class=argparse.RawTextHelpFormatter)

def_expr = '(total_images>5 and new_nude_images>0) or total_video>0'
# def_expr = '(total_images>5 and new_nude_images>0) or total_video>0'
def_expr = 'nude_images > 0'
def_workdir = os.getenv('NUDE_DIR', '.')

def_total = int(os.getenv('NUDE_TOTAL', '1'))
Expand Down Expand Up @@ -425,7 +426,7 @@ def main():

# fix arguments
if not any([detect_image, detect_url, all_found]):
print("# No filter, using built-in :nude by default")
print("# No nudity detector (--detect, --detect-url, --detect-image) given, using built-in --detect-image :nude by default")
detect_image=':nude'

nudecrawler.verbose.verbose = verbose
Expand Down
2 changes: 1 addition & 1 deletion nudecrawler/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version="0.3.19"
version="0.3.22"
73 changes: 50 additions & 23 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,29 +1,56 @@
[tool.poetry]
name = "nudecrawler"
version = "0.1.0"
description = "Crawl telegra.ph searching for nudes!"
authors = ["Yaroslav Polyakov <yaroslaff@gmail.com>"]
[build-system]
requires = ["hatchling", "wheel"]
build-backend = "hatchling.build"

[tool.hatch.version]
path = 'nudecrawler/version.py'

[project]
name="nudecrawler"
authors = [
{name = "Yaroslav Polyakov", email = "yaroslaff@gmail.com"},
]
description = "Crawl telegra.ph searching for nudes!"
readme = "README.md"
homepage = "https://github.com/yaroslaff/nudecrawler"
repository = "https://github.com/yaroslaff/nudecrawler"
license = "MIT"
license = {text = "MIT License"}
keywords = ["nsfw", "tits", "nudity-detection", "nsfw-recognition", "nudes", "onlyfans", "nude", "telegra-ph", "search", "crawler", "scraper", "spider", "scraping", "find", "web-scraping", "crawl", "scrape", "webscraping"]

dynamic = [ "version" ]
requires-python = ">= 3.9"
dependencies = [
'beautifulsoup4>=4.12.0',
'transliterate>=1.10.2',
'pillow>=9.4.0',
'requests>=2.28.2',
'mudepy>=0.5.2',
'numpy<2.0.0',
'evalidate>=2.0.3',
'pytest>=7.2.2',
'nudenet>=2.0.9,<3.0.0',
'python-daemon>=3.0.1',
'python-dotenv>=1.0.0',
'flask'
]
classifiers=[
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'Intended Audience :: End Users/Desktop',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 3.9',
]

[tool.poetry.dependencies]
python = "^3.9"
beautifulsoup4 = "^4.12.0"
transliterate = "^1.10.2"
pillow = "^9.4.0"
requests = "^2.28.2"
nudepy = "^0.5.1"
evalidate = "^1.0.2"
pytest = "^7.2.2"
nudenet = "^2.0.9"
python-daemon = "^3.0.1"
python-dotenv = "^1.0.0"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
[project.urls]
Homepage = "https://github.com/yaroslaff/nudecrawler"
Repository = "https://github.com/yaroslaff/nudecrawler.git"
"Bugz Tracker" = "https://github.com/yaroslaff/nudecrawler/issues"
Changelog = "https://github.com/yaroslaff/nudecrawler/blob/master/CHANGELOG.md"

[project.scripts]
nudecrawler = "nudecrawler.scripts.nudecrawler:main"
detect-image-aid = "nudecrawler.scripts.detect_image_aid:main"
detect-image-nsfw-api = "nudecrawler.scripts.detect_image_nsfw_api:main"
detect-image-nudenet = "nudecrawler.scripts.detect_image_nudenet:main"
detect-server-nudenet = "nudecrawler.scripts.detect_server_nudenet:main"

0 comments on commit 1d890ee

Please sign in to comment.