-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[net]: Proxy Request Redundancy #3491
base: master
Are you sure you want to change the base?
Conversation
Anecdotally, using SearX over unreliable proxies, like tor, seems to be quite error prone. SearX puts quite an effort to measure the performance and reliability of engines, most likely owning to those aspects being of significant concern. The patch here proposes to mitigate related problems, by issuing concurrent redundant requests through the specified proxies at once, returning the first response that is not an error. The functionality is enabled using the: `proxy_request_redundancy` parameter within the outgoing network settings or the engine settings. Example: ```yaml outgoing: request_timeout: 8.0 proxies: "all://": - socks5h://tor:9050 - socks5h://tor1:9050 - socks5h://tor2:9050 - socks5h://tor3:9050 proxy_request_redundancy: 4 ``` In this example, each network request will be send 4 times, once through every proxy. The first (non-error) response wins. In my testing environment using several tor proxy end-points, this approach almost entirely removes engine errors related to timeouts and denied requests. The latency of the network system is also improved. The implementation, uses a `AsyncParallelTransport(httpx.AsyncBaseTransport)` wrapper to wrap multiple sub-trasports, and `asyncio.wait` to wait on the first completed request. The existing implementation of the network proxy cycling has also been moved into the `AsyncParallelTransport` class, which should improve network client memoization and performance. TESTED: - unit tests for the new functions and classes. - tested on desktop PC with 10+ upstream proxies and comparable request redundancy.
That's a great way to get all the proxies banned at the same time by the engine, instead of having one being banned then using other ones for the request. For me, I don't think searxng should be the tool for checking if multiple proxies are working. It's the job of an external tool. |
Those are good points and I agree When using TOR, there is no way to verify if the current exit node is banned by a specific engine. I have tried several strategies and tools to check if proxies are working and to choose the good proxies at runtime. |
to verify why this fails remotely
Fixed race condition with the 404 test.
I think what you are looking for is the parameter If there is an error, this parameter will retry with another proxy. You can specify the number of retries with the parameter |
Thank you so much for this advice. Correct me if I am wrong: The The reason why the After adding the On the other hand, the solution presented here, delivers the full result in less than 5 seconds, Tested it with:
# ...
outgoing:
request_timeout: 10.0
proxies:
"all://":
# - socks5h://192.168.0.50:9050
# - socks5h://192.168.0.51:9050
- socks5h://tor:9050
- socks5h://tor1:9050
- socks5h://tor2:9050
- socks5h://tor3:9050
- socks5h://tor4:9050
- socks5h://tor5:9050
- socks5h://tor6:9050
- socks5h://tor7:9050
- socks5h://tor8:9050
- socks5h://tor9:9050
retries: 2
proxy_request_redundancy: 1 # Or higher for parallel execution
# ...
engines:
# `retry_on_http_error` set on all the other engines too.
- name: google
engine: google
shortcut: go
retry_on_http_error: True
# ... Or to speak in pictures... |
You are correct about the description of the parameters but You need to extend "request_timeout" per engine section. In outgoing section, "request_timeout" is the default timeout unless the engine overrides it. It's actually written here: Line 163 in ec41b53
And about "retry_on_http_error" you may also use it globally, in the outgoing section. If you go in the /preferences you see in the engines section the timeout configured per engine. "Max time" |
Thank you, As per my previous comment, the I've also changed the I very much appreciate the ongoing support and all the advices and questions around configuration options. Please, let me know how can I better highlight the issues with searxng over TOR and especially |
Hello again. If this approach is not considered technically valid in a wider context, BTW: I was really impressed about the project prime directives and the proposition: |
We are very thankful for your contribution, thats not the point.
Yeah, thats the point .. we are currently working on: and to be honest, I'm not as deep into the subject as @dalf and @unixfox .. |
Context
Anecdotally, using SearXNG over unreliable proxies, like tor, seems to be quite error prone. SearXNG puts quite an effort to measure the performance and reliability of engines, most likely owning to those aspects being of
significant concern.
What does this PR do?
The patch here proposes to mitigate related problems, by issuing concurrent redundant requests through the specified proxies all at once, returning the first response that is not an error.
Why is this change important?
Enables use of SearXNG through tor proxies, with least latency possible, while enhancing user privacy greatly.
How to test this PR locally?
The functionality is enabled using the:
proxy_request_redundancy
parameter within the outgoing network settings or the engine settings.Example:
In this example, each network request will be sent 4 times, once through every proxy. The first (non-error) response wins.
Results
In my testing environment using several tor proxy end-points, this approach almost entirely removes engine errors related to timeouts and denied requests. The latency of the network system is also improved.
Implementation
The implementation, uses a
AsyncParallelTransport(httpx.AsyncBaseTransport)
wrapper to wrap multiple sub-trasports, andasyncio.wait
to wait on the first completed request.The existing implementation of the network proxy cycling has also been moved into the
AsyncParallelTransport
class, which should improve network client memoization and performance.Testing