-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement FlareSolverr to solve *some* CF captchas #1619
Comments
Have been rejected from Cloudflare and Cloudfront a lot in the last week or two. Seems like their browser fingerprinting has caught up with playwright / chrome docker containers. Would appreciate some solution, as the number of sites which I can actually access has reduced by about 20% |
it's important to understand that cloudflare fingeprinting for API endpoints is usually limited to non-interactive methods, for example TLS fingerprinting techniques so if you managed to find some API-like endpoint in Chrome network tab you should try this. Another way to mitigate is to try higher quality residential proxies. |
Interestingly, just came across https://github.com/lwthiker/curl-impersonate |
I wonder if you update the HOSTS file on the machine that's running Change Detection, will it go directly to the IP of the host server and bypass CloudFlare...? |
Interesting idea. However that would depend on you knowing the webserver direct IP address, and half the reason people hide behind Cloudflare or Cloudfront is to obscure that address, and to take advantage of the DDos protection afforded by those platforms. |
For what it's worth I managed to use FlareSolverr to fetch a page protected by CloudFlare. First of all, I'm on a Synology NAS and have both FlareSolverr and changedetection.io installed as docker containers. You have to setup FlareSolverr and make sure you can access it from changedetection.io. On changedetecton.io, for the page you want to fetch here is the configuration:
Fetch Method > Basic fast Plaintext/HTTP Client (doesn't seem to work with Playwright, not sure why)
Request header >
CSS/JSONPath/JQ/XPath Filters > FlareSolver returns a json with the HTML in the 'solution.response' attribute, then I added the 'capture' filter which in this case lists the items I want to fetch from the page based on a regex. |
@wpigoury thanks for the info! I was able to get some responses using FlareSolverr, super interesting project, looks like undetected-chromedriver is actually binary patched! Trying to think of a workflow here
Should the site get 403 again, then I think it can just repeat the above steps... |
I have to add - on the sites where I hit the cloudflare block, simply moving to a better residential IP pretty much solved it and I didnt need flaresolverr... |
I'm selfhosting changedetection at home via docker/selfhosted browserless chrome instance, and I'm running into the cloudflare captcha for sites like |
OK yeah, i'll add this to my next list of tasks :) lots of requests coming in |
Hi guys! First of all, thanks @dgtlmoon for this great project, you rock! I've just made this adapter https://github.com/mimnix/FlareProxy You can deploy it alongside FlareSolverr, add it as a proxy in the Changedetection proxy settings, and everytime you need to demonstrate you're just a human watching a web page protected by Cloudflare, the transparent proxy is right there for you 😉 |
Is your feature request related to a problem? Please describe.
One of the sites I am scraping suddenly setup through Cloudflare and I keep getting presented with a CAPTCHA challenge on changedetection, but not on e.g. Jackett, which does have FlareSolverr implemented.
Describe the solution you'd like
Please bake in support for FlareSolverr add the ability to specify FlareSolverr API URL in the
[protocol:-http]://[fqdn or ip:-localhost]:[port:-8191]
format.Thank you!
The text was updated successfully, but these errors were encountered: