Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[net]: Proxy Request Redundancy #3491

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

czaky
Copy link

@czaky czaky commented May 17, 2024

Context

Anecdotally, using SearXNG over unreliable proxies, like tor, seems to be quite error prone. SearXNG puts quite an effort to measure the performance and reliability of engines, most likely owning to those aspects being of
significant concern.

What does this PR do?

The patch here proposes to mitigate related problems, by issuing concurrent redundant requests through the specified proxies all at once, returning the first response that is not an error.

Why is this change important?

Enables use of SearXNG through tor proxies, with least latency possible, while enhancing user privacy greatly.

How to test this PR locally?

The functionality is enabled using the: proxy_request_redundancy parameter within the outgoing network settings or the engine settings.

Example:

outgoing:
    request_timeout: 8.0
    proxies:
        "all://":
            - socks5h://tor:9050
            - socks5h://tor1:9050
            - socks5h://tor2:9050
            - socks5h://tor3:9050
    proxy_request_redundancy: 4

In this example, each network request will be sent 4 times, once through every proxy. The first (non-error) response wins.

Results

In my testing environment using several tor proxy end-points, this approach almost entirely removes engine errors related to timeouts and denied requests. The latency of the network system is also improved.

Implementation

The implementation, uses a AsyncParallelTransport(httpx.AsyncBaseTransport) wrapper to wrap multiple sub-trasports, and asyncio.wait to wait on the first completed request.

The existing implementation of the network proxy cycling has also been moved into the AsyncParallelTransport class, which should improve network client memoization and performance.

Testing

  • unit tests for the new functions and classes.
  • tested on desktop PC with 10+ upstream proxies and comparable request redundancy.

Anecdotally, using SearX over unreliable proxies,
like tor, seems to be quite error prone.
SearX puts quite an effort to measure the
performance and reliability of engines, most
likely owning to those aspects being of
significant concern.

The patch here proposes to mitigate related
problems, by issuing concurrent redundant requests
through the specified proxies at once, returning
the first response that is not an error.
The functionality is enabled using the:
`proxy_request_redundancy` parameter within the
outgoing network settings or the engine settings.

Example:

```yaml

outgoing:
    request_timeout: 8.0
    proxies:
        "all://":
            - socks5h://tor:9050
            - socks5h://tor1:9050
            - socks5h://tor2:9050
            - socks5h://tor3:9050
    proxy_request_redundancy: 4
```

In this example, each network request will be
send 4 times, once through every proxy. The
first (non-error) response wins.

In my testing environment using several tor proxy
end-points, this approach almost entirely
removes engine errors related to timeouts
and denied requests. The latency of the
network system is also improved.

The implementation, uses a
`AsyncParallelTransport(httpx.AsyncBaseTransport)`
wrapper to wrap multiple sub-trasports,
and `asyncio.wait` to wait on the first completed
request.

The existing implementation of the network
proxy cycling has also been moved into the
`AsyncParallelTransport` class, which should
improve network client memoization and
performance.

TESTED:
- unit tests for the new functions and classes.
- tested on desktop PC with 10+ upstream proxies
    and comparable request redundancy.
@czaky czaky changed the title Proxy Request Redundancy [net]: Proxy Request Redundancy May 17, 2024
@return42 return42 requested a review from dalf May 17, 2024 06:01
@unixfox
Copy link
Member

unixfox commented May 17, 2024

That's a great way to get all the proxies banned at the same time by the engine, instead of having one being banned then using other ones for the request.

For me, I don't think searxng should be the tool for checking if multiple proxies are working. It's the job of an external tool.
When you configure proxies you should be aware to configure good proxies.

@czaky
Copy link
Author

czaky commented May 17, 2024

That's a great way to get all the proxies banned at the same time by the engine, instead of having one being banned then using other ones for the request.

For me, I don't think searxng should be the tool for checking if multiple proxies are working. It's the job of an external tool. When you configure proxies you should be aware to configure good proxies.

Those are good points and I agree
If you are able to configure good proxies,
the optional request redundancy, proposed here, should stay disabled.

When using TOR, there is no way to verify if the current exit node is banned by a specific engine.
The exit nodes cycle every 10 minutes and the client has no way to influence the choice.

I have tried several strategies and tools to check if proxies are working and to choose the good proxies at runtime.
All of those strategies are inferior to this approach,
where the decision is being made at the application level i.e. using HTTP status codes from each response.

czaky added 2 commits May 17, 2024 12:45
to verify why this fails remotely
Fixed race condition with the 404 test.
@unixfox
Copy link
Member

unixfox commented May 17, 2024

I think what you are looking for is the parameter retry_on_http_error: https://docs.searxng.org/admin/settings/settings_engine.html

If there is an error, this parameter will retry with another proxy. You can specify the number of retries with the parameter retries: https://docs.searxng.org/admin/settings/settings_outgoing.html

@czaky
Copy link
Author

czaky commented May 17, 2024

I think what you are looking for is the parameter retry_on_http_error: https://docs.searxng.org/admin/settings/settings_engine.html

If there is an error, this parameter will retry with another proxy. You can specify the number of retries with the parameter retries: https://docs.searxng.org/admin/settings/settings_outgoing.html

Thank you so much for this advice.

Correct me if I am wrong:

The retries param allows to retry the requests on any of the errors
including timeouts and random network stack exceptions.
The retry_on_http_error engine param, instructs the network code to treat
400-599 HTTP errors like the other exceptions.

The reason why the retries solution suggested above is not satisfactory,
comes from TOR network requests taking a long time with the tail in tens of seconds.

After adding the retries option, we are looking at 10 to 30 seconds response times
just to learn that more than half of the engines timed out or had some other issues.
While at that most of the engines get suspended quickly degrading the experience.

On the other hand, the solution presented here, delivers the full result in less than 5 seconds,
with engines not being suspended.

Tested it with:

  • standard TOR proxies;

  • with multiple TOR proxies behind HAProxy round-robin, with app-level through connectivity check;

  • also multiple TOR proxies compiled to minimum circuit length.

  • SearXNG (vanilla) docker version and with

  • version build from branch/HEAD (patched; A/B test by proxy_request_redundancy param)

# ...
outgoing:
  request_timeout: 10.0
  proxies:
    "all://":
      # - socks5h://192.168.0.50:9050
      # - socks5h://192.168.0.51:9050
      - socks5h://tor:9050
      - socks5h://tor1:9050
      - socks5h://tor2:9050
      - socks5h://tor3:9050
      - socks5h://tor4:9050
      - socks5h://tor5:9050
      - socks5h://tor6:9050
      - socks5h://tor7:9050
      - socks5h://tor8:9050
      - socks5h://tor9:9050
  retries: 2
  proxy_request_redundancy: 1  # Or higher for parallel execution

# ...
engines:
 # `retry_on_http_error` set on all the other engines too.
  - name: google
    engine: google
    shortcut: go
    retry_on_http_error: True

# ...

Or to speak in pictures...

the difference from (on good run):
image

to (every time):
image

(on a bad run using the standard code it is just):
image

@unixfox
Copy link
Member

unixfox commented May 18, 2024

You are correct about the description of the parameters but

You need to extend "request_timeout" per engine section.

In outgoing section, "request_timeout" is the default timeout unless the engine overrides it. It's actually written here:

request_timeout: 3.0

And about "retry_on_http_error" you may also use it globally, in the outgoing section.

If you go in the /preferences you see in the engines section the timeout configured per engine. "Max time"

@czaky
Copy link
Author

czaky commented May 18, 2024

You are correct about the description of the parameters but

You need to extend "request_timeout" per engine section.

In outgoing section, "request_timeout" is the default timeout unless the engine overrides it. It's actually written here:

request_timeout: 3.0

And about "retry_on_http_error" you may also use it globally, in the outgoing section.

If you go in the /preferences you see in the engines section the timeout configured per engine. "Max time"

Thank you,

As per my previous comment, the request_timeout was already extended to 10s (or 20 or 30).
In the parallel approach, I only need to extend it to 5s (except for "wikidata" engine init step)

I've also changed the max_request_timeout which acts at the upper bound on the request from all engines and other params,
And also changed the extra_proxy_timeout param which seems to only affect engines with an .onion end-point.

I very much appreciate the ongoing support and all the advices and questions around configuration options.

Please, let me know how can I better highlight the issues with searxng over TOR and especially
the nature of the TOR proxy network. I would like to help shape our discussion here
to be more effective and more to the point.

@czaky
Copy link
Author

czaky commented Jun 6, 2024

Hello again.

If this approach is not considered technically valid in a wider context,
would you consider to start a design discussion?

BTW: I was really impressed about the project prime directives and the proposition:
"Don’t hesitate, just clone SearXNG’s sources and start hacking right now ... we are happy to receive your pull request."

@return42
Copy link
Member

return42 commented Jun 7, 2024

BTW: I was really impressed about the project prime directives and the proposition:
"Don’t hesitate, just clone SearXNG’s sources and start hacking right now ... we are happy to receive your pull request."

We are very thankful for your contribution, thats not the point.

would you consider to start a design discussion?

Yeah, thats the point .. we are currently working on:

and to be honest, I'm not as deep into the subject as @dalf and @unixfox ..

@czaky czaky mentioned this pull request Jun 7, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants