Enhance WebContentLoader to Support Recursive Link Parsing and Custom Headers

**Is your feature request related to a problem? Please describe.**
I'm always frustrated when the WebContentLoader cannot parse content recursively from all internal links of a given URL. Additionally, it lacks the ability to customize request headers, which can lead to blocks by services like Cloudflare or web application firewalls when using Python HTTP clients.

**Describe the solution you'd like**
I would like the WebContentLoader to have:
1. A recursive parsing feature that, when enabled via a parameter, navigates all internal links from the main URL and parses the content of all these pages.
2. The ability to override default request headers, including user-agent and authentication headers, through optional parameters.

**Describe alternatives you've considered**
An alternative would be to create separate utilities for recursive link parsing and custom headers, but integrating these features directly into WebContentLoader will provide a more seamless and efficient solution.

**Additional context**
This enhancement will make the WebContentLoader more robust and versatile, allowing it to handle more complex web scraping scenarios and avoid blocks by various web services.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance WebContentLoader to Support Recursive Link Parsing and Custom Headers #190

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development