Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when encountering error during fetch return "" in web_base.py #8753

Merged
merged 8 commits into from
Aug 7, 2023

Conversation

oegedijk
Copy link
Contributor

@oegedijk oegedijk commented Aug 4, 2023

when e.g. downloading a sitemap with a malformed url (e.g. "ttp://example.com/index.html" with the h omitted at the beginning of the url), this will ensure that the sitemap download does not crash, but just emits a warning. (maybe should be optional with e.g. a skip_faulty_urls:bool=True parameter, but this was the most straightforward fix)

@rlancemartin, @eyurtsev

when e.g. downloading a sitemap with a malformed url (e.g.  "ttp://example.com/index.html" with the h omitted at the beginning of the url), this will ensure that the sitemap download does not crash, but just emits a warning.
@vercel
Copy link

vercel bot commented Aug 4, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Aug 7, 2023 9:44pm

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Aug 4, 2023
@baskaryan
Copy link
Collaborator

yea i think would be nice to make this optional with a silent_errors or continue_on_failure flag

@oegedijk
Copy link
Contributor Author

oegedijk commented Aug 5, 2023

Went for the continue_on_failure flag and defaulted it to False to maintain current behaviour.

When encountering an error now, added a suggestion to the logger to set the flag to True.

Added it Sitemap, Gitbook and Blackboard loaders as they all derive from WebBaseLoader, while overwriting the __init__.

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks!

@hwchase17 hwchase17 added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Aug 6, 2023
@oegedijk
Copy link
Contributor Author

oegedijk commented Aug 6, 2023

added black formatting of changed files

@baskaryan baskaryan merged commit cff5263 into langchain-ai:master Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: doc loader Related to document loader module (not documentation) lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants