Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crumb issue: Add support for (EU) Dataprotection consent Page #247

Closed
jgriessler opened this issue Dec 12, 2023 · 8 comments · Fixed by #248
Closed

Crumb issue: Add support for (EU) Dataprotection consent Page #247

jgriessler opened this issue Dec 12, 2023 · 8 comments · Fixed by #248

Comments

@jgriessler
Copy link

jgriessler commented Dec 12, 2023

Is your feature request related to a problem? Please describe.
CRUMB failures occur when running yahooquery from Europe. Testing shows this is because for queries from Europe Yahoo redirects finance.yahoo.com to a Page to Consent to usage of data:
https://consent.yahoo.com/v2/collectConsent?sessionId=3_cc-session_6b0b0161-b473-4d30-bc6f-5cdd007600aa

WIthout ack that page the subsequent call to get the crumb via https://query2.finance.yahoo.com/v1/test/getcrumb fails

Describe the solution you'd like
Implement a check to see if yahoo redirects to the CONSENT page. If yes, send an 'Agree' to that page to get the necessary cookies etc.

Sample code that works (but likely needs some tweaking
`def setup_session(session: requests.Session):
url = "https://finance.yahoo.com"
try:
response = session.get(url, allow_redirects=True)
except SSLError:
counter = 0
while counter < 5:
try:
session.headers = random.choice(HEADERS)
response = session.get(url, verify=False)
break
except SSLError:
counter += 1

if not isinstance(session, FuturesSession):

    # check for and handle consent page:w
    if response.url.find('consent'):
        logger.debug(f'Redirected to consent page: "{response.url}"')

        soup = BeautifulSoup(response.content, 'html.parser')
        
        params = {}
        for param in ['csrfToken', 'sessionId']:
            try:
                params[param] = soup.find('input', attrs={'name': param})['value']
            except Exception as exc:
                logger.critical(f'Failed to find or extract "{param}" from response. Exception={exc}')
                return

        logger.debug(f'params: {params}')
           
        response = session.post(
            'https://consent.yahoo.com/v2/collectConsent',
            data={
                'agree': ['agree', 'agree'],
                'consentUUID': 'default',
                'sessionId': params['sessionId'],
                'csrfToken': params['csrfToken'],
                'originalDoneUrl': url,
                'namespace': 'yahoo'
            })
        # just assume things are fine and session is setup now

    return session

_ = response.result()
return session

`

Describe alternatives you've considered
I'm not aware of any other solution to work around this for queries from Europe.

Additional context

@jirisarri10
Copy link

When I enter from Spain Yahoo forces me to accept cookies and that is the problem. I think it is necessary to press the "ok" button with selenium. The problem I have later is that it gives me many connections when i go https://query2.finance.yahoo.com/v1/test/getcrumb
imagen

@fredrik-corneliusson
Copy link

fredrik-corneliusson commented Dec 14, 2023

@jgriessler This is absolutely fantastic, thank you.
Just tested your solution locally and I can now access the problematic API:s from Sweden, and I suspect the rest of EU (GDPR) regulated countries.
Are you familiar with forking and making PR:s on github? I think it would be a nicer way for others to review and test the solution instead to manually pasting the code?
In any case this is great news, if it is regulations and not a yahoo specific issue. Then it will probably continue to work and not be that much of "whac a moleto" to keep it running.
Thanks.

@RudyNL
Copy link

RudyNL commented Dec 15, 2023

Its without VPN working fine for me in the Netherlands. Thanks @jgriessler for the patch. The instruction is a bit troublesome, so a pointwise instruction:
1)
Open the file ...../lib/python3.10/site-packages/yahooquery/utils/init.py
2)
Add in the header of the file after # third party
from bs4 import BeautifulSoup
3)
Replace the method
def setup_session(session: requests.Session):
by

def setup_session(session: requests.Session):
    url = "https://finance.yahoo.com"
    try:
        response = session.get(url, allow_redirects=True)
    except SSLError:
        counter = 0
        while counter < 5:
            try:
                session.headers = random.choice(HEADERS)
                response = session.get(url, verify=False)
                break
            except SSLError:
                counter += 1

    if not isinstance(session, FuturesSession):

      # check for and handle consent page:w
      if response.url.find('consent'):
          logger.debug(f'Redirected to consent page: "{response.url}"')
 
          soup = BeautifulSoup(response.content, 'html.parser')
        
          params = {}
          for param in ['csrfToken', 'sessionId']:
              try:
                  params[param] = soup.find('input', attrs={'name': param})['value']
              except Exception as exc:
                  logger.critical(f'Failed to find or extract "{param}" from response. Exception={exc}')
                  return

          logger.debug(f'params: {params}')
           
          response = session.post(
              'https://consent.yahoo.com/v2/collectConsent',
              data={
                  'agree': ['agree', 'agree'],
                  'consentUUID': 'default',
                  'sessionId': params['sessionId'],
                  'csrfToken': params['csrfToken'],
                  'originalDoneUrl': url,
                  'namespace': 'yahoo'
              })
          # just assume things are fine and session is setup now

      return session

    _ = response.result()
    return session

@jirisarri10
Copy link

Gracias Griessler, Rudy!!!
imagen

@dpguthrie
Copy link
Owner

@jgriessler Really appreciate the solution here! I'll work on putting this in and get it in the next release.

@dpguthrie dpguthrie linked a pull request Dec 16, 2023 that will close this issue
@ibart
Copy link

ibart commented Dec 17, 2023

https://consent.yahoo.com/v2/collectConsent is dead, now.

Screenshot_20231217-144244_Firefox

@dpguthrie
Copy link
Owner

@ibart This is most likely due to the fact that your browser is making a GET request - the url that you're using, and the one used internally, accepts the POST method with a defined body.

@jgriessler
Copy link
Author

Thanks everyone for moving this forward (and of course Doug for getting the functionality in) while I was distracted with personal stuff. I've not yet played with github, so would only mess up trying to fork and work a PR.

One other comment - I noticed that things are a little bit slower now when querying data - I assume it's because finance.yahoo.com is just huge, so loading the main site takes time. Going through the consent for every query is also quite some overhead if you run a series of history update queries. So I switched to "reusing" the yq.Ticker() instance , just modifying the ticker.symbols. I do get a fresh instance randomly still to start fresh every 30-50 queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants