implement FlareSolverr to solve some CF captchas #1619

hoopyfrood · 2023-06-07T15:50:15Z

Is your feature request related to a problem? Please describe.
One of the sites I am scraping suddenly setup through Cloudflare and I keep getting presented with a CAPTCHA challenge on changedetection, but not on e.g. Jackett, which does have FlareSolverr implemented.

Describe the solution you'd like
Please bake in support for FlareSolverr add the ability to specify FlareSolverr API URL in the [protocol:-http]://[fqdn or ip:-localhost]:[port:-8191] format.

Thank you!

The text was updated successfully, but these errors were encountered:

plutocrat · 2023-06-20T07:14:41Z

Have been rejected from Cloudflare and Cloudfront a lot in the last week or two. Seems like their browser fingerprinting has caught up with playwright / chrome docker containers. Would appreciate some solution, as the number of sites which I can actually access has reduced by about 20%

restyler · 2023-07-14T12:36:01Z

it's important to understand that cloudflare fingeprinting for API endpoints is usually limited to non-interactive methods, for example TLS fingerprinting techniques so if you managed to find some API-like endpoint in Chrome network tab you should try this. Another way to mitigate is to try higher quality residential proxies.

plutocrat · 2023-08-15T03:02:20Z

Interestingly, just came across https://github.com/lwthiker/curl-impersonate
This seems to bypass Cloudfront blocking, if used with a correct User-Agent

natecovington · 2023-08-27T12:34:18Z

I wonder if you update the HOSTS file on the machine that's running Change Detection, will it go directly to the IP of the host server and bypass CloudFlare...?

plutocrat · 2023-08-28T01:56:53Z

Interesting idea. However that would depend on you knowing the webserver direct IP address, and half the reason people hide behind Cloudflare or Cloudfront is to obscure that address, and to take advantage of the DDos protection afforded by those platforms.

wpigoury · 2023-12-21T10:04:01Z

For what it's worth I managed to use FlareSolverr to fetch a page protected by CloudFlare.
It's not very straightforward and involves using regex filtering feature from jq which could be a bit tricky to extract data from HTML.
I don't know if this will work for every cases but it should help in most simple ones.

First of all, I'm on a Synology NAS and have both FlareSolverr and changedetection.io installed as docker containers.

You have to setup FlareSolverr and make sure you can access it from changedetection.io.
To ease the configuration my FlareSolver docker has a 'hostname: flaresolverr' setting and it runs on port 8090.
If both are on the same network and setup in the same docker instance that should work.

On changedetecton.io, for the page you want to fetch here is the configuration:

In 'Request' tab

Fetch Method > Basic fast Plaintext/HTTP Client (doesn't seem to work with Playwright, not sure why)
Proxy > No proxy (might work with a proxy, I couldn't test as I don't use one)
Click on Show advanced options:
Request method > POST
Request body >

{
  "cmd": "request.get",
  "url":"URL TO BE FETCHED",
  "maxTimeout": 60000
}

Request header >
Content-Type: application/json

In 'Filters & Triggers' tab

CSS/JSONPath/JQ/XPath Filters >
jq:.solution.response | capture("REGEX TO EXTRACT CONTENT FROM HTML">(?<name>[^<]*)</b>"; "gm")

FlareSolver returns a json with the HTML in the 'solution.response' attribute, then I added the 'capture' filter which in this case lists the items I want to fetch from the page based on a regex.
Here it really depends on your page and on what you want to extract.

dgtlmoon · 2024-01-31T22:25:11Z

@wpigoury thanks for the info! I was able to get some responses using FlareSolverr, super interesting project, looks like undetected-chromedriver is actually binary patched!

Trying to think of a workflow here

Site gets 403, goes into flare-solverr mode
on next request, it asks flare-solverr for the headers (cookies etc)
those cookies are stored in some kind of database so any other watch for the same domain name could re-use those credentials (would have to add some extra API perhaps to flare-solverr with a mini-db, store in-memory with python dict or something)
add those cookies/headers to the next request of the watch

Should the site get 403 again, then I think it can just repeat the above steps...

dgtlmoon · 2024-01-31T22:27:24Z

I have to add - on the sites where I hit the cloudflare block, simply moving to a better residential IP pretty much solved it and I didnt need flaresolverr...

weikinhuang · 2024-07-19T20:05:10Z

I'm selfhosting changedetection at home via docker/selfhosted browserless chrome instance, and I'm running into the cloudflare captcha for sites like www.bhphotovideo.com when trying to monitor for restock. There's a few other sites that do the same thing consistently as well.

dgtlmoon · 2024-07-19T20:11:59Z

OK yeah, i'll add this to my next list of tasks :) lots of requests coming in

mimnix · 2024-12-21T14:02:26Z

Hi guys! First of all, thanks @dgtlmoon for this great project, you rock! I've just made this adapter https://github.com/mimnix/FlareProxy You can deploy it alongside FlareSolverr, add it as a proxy in the Changedetection proxy settings, and everytime you need to demonstrate you're just a human watching a web page protected by Cloudflare, the transparent proxy is right there for you 😉

hoopyfrood added the enhancement New feature or request label Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement FlareSolverr to solve some CF captchas #1619

implement FlareSolverr to solve some CF captchas #1619

hoopyfrood commented Jun 7, 2023

plutocrat commented Jun 20, 2023

restyler commented Jul 14, 2023

plutocrat commented Aug 15, 2023

natecovington commented Aug 27, 2023

plutocrat commented Aug 28, 2023

wpigoury commented Dec 21, 2023

dgtlmoon commented Jan 31, 2024

dgtlmoon commented Jan 31, 2024 •

edited

Loading

weikinhuang commented Jul 19, 2024

dgtlmoon commented Jul 19, 2024

mimnix commented Dec 21, 2024

implement FlareSolverr to solve *some* CF captchas #1619

implement FlareSolverr to solve *some* CF captchas #1619

Comments

hoopyfrood commented Jun 7, 2023

plutocrat commented Jun 20, 2023

restyler commented Jul 14, 2023

plutocrat commented Aug 15, 2023

natecovington commented Aug 27, 2023

plutocrat commented Aug 28, 2023

wpigoury commented Dec 21, 2023

dgtlmoon commented Jan 31, 2024

dgtlmoon commented Jan 31, 2024 • edited Loading

weikinhuang commented Jul 19, 2024

dgtlmoon commented Jul 19, 2024

mimnix commented Dec 21, 2024

implement FlareSolverr to solve some CF captchas #1619

implement FlareSolverr to solve some CF captchas #1619

dgtlmoon commented Jan 31, 2024 •

edited

Loading