Skip to content

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

License

Notifications You must be signed in to change notification settings

z0m31en7/Uscrapper

Repository files navigation

Uscrapper 2.0


project-image


Introducing Uscrapper 2.0, A powerfull OSINT webscrapper that allows users to extract various personal information from a website. It leverages web scraping techniques and regular expressions to extract email addresses, social media links, author names, geolocations, phone numbers, and usernames from both hyperlinked and non-hyperlinked sources on the webpage, supports multithreading to make this process faster, Uscrapper 2.0 is equipped with advanced Anti-webscrapping bypassing modules and supports webcrawling to scrape from various sublinks within the same domain. The tool also provides an option to generate a report containing the extracted details.



shieldsshieldsshieldsshieldsshields



project-logo


💡 Extracted Details:


Uscrapper extracts the following details from the provided website:

  • Email Addresses: Displays email addresses found on the website.
  • Social Media Links: Displays links to various social media platforms found on the website.
  • Author Names: Displays the names of authors associated with the website.
  • Geolocations: Displays geolocation information associated with the website.
  • Non-Hyperlinked Details: Displays non-hyperlinked details found on the website including email addresses phone numbers and usernames.


📽 Preview:


project-ss


project-ss2


🤩 Whats New?:


Uscrapper 2.0:

  • Introduced multiple modules to bypass anti-webscrapping techniques.
  • Introducing Crawl and scrape: an advanced crawl and scrape module to scrape the websites from within.
  • Implemented Multithreading to make these processes faster.

🛠️ Installation Steps:


git clone https://github.com/z0m31en7/Uscrapper.git
cd Uscrapper/install/ 
chmod +x ./install.sh && ./install.sh      #For Unix/Linux systems


🔮 Usage:

To run Uscrapper, use the following command-line syntax:

python Uscrapper.py [-h] [-u URL] [-c (INT)] [-t THREADS] [-O] [-ns]


Arguments:

  • -h, --help: Show the help message and exit.
  • -u URL, --url URL: Specify the URL of the website to extract details from.
  • -c INT, --crawl INT: Specify the number of links to crawl
  • -t INT, --threads INT: Specify the number of threads to use while crawling and scraping.
  • -O, --generate-report: Generate a report file containing the extracted details.
  • -ns, --nonstrict: Display non-strict usernames during extraction.


📜 Note:

  • Uscrapper relies on web scraping techniques to extract information from websites. Make sure to use it responsibly and in compliance with the website's terms of service and applicable laws.

  • The accuracy and completeness of the extracted details depend on the structure and content of the website being analyzed.

  • To bypass some Anti-Webscrapping methods we have used selenium which can make the overall process slower.


💌 Contribution:


Want a new feature to be added?

  • Make a pull request with all the necessary details and it will be merged after a review.
  • You can contribute by making the regular expressions more efficient and accurate, or by suggesting some more features that can be added.

🛡️ License:


This project is licensed under the MIT-LICENSE

About

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •