-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set cookies and crumb #205
Conversation
dpguthrie
commented
Jul 17, 2023
- Initialize session with required cookies
- Create larger list of rotating user-agents with related headers
- Added webdriver-manager for selenium functionality
- Changed from flit to poetry
- Added additional package files that will help with testing and development
- Added function in base client class to retrieve a crumb and then add as a query parameter
I quickly tried this but get exception:
selenium=3.141.0 webdriver_manager=3.8.6 |
@ValueRaider Can you try upgrading selenium? |
Selenium 4 just generates a different exception:
I prefer v3 because calls with v4 hang for 2 minutes, I've no idea why |
Hmm this looks like it’s no longer a problem with selenium but a problem now with getting the crumb from that endpoint after you’ve logged in. |
Do you get the same error when not passing in a username and password? |
Without username & password, same "max retries" error and now with either version of Selenium. Note: does not happen with yahooquery 2.3.2 (latest official) |
@ValueRaider I wasn't able to reproduce your error. I did change the code slightly though so it may be working for you now. I also deployed it temporarily to the streamlit app where it seems to be working again. |
Same behaviour, "Max retries" errors. What are other people experiencing? |
That’s weird. I have it working locally and I have this code running in the streamlit app as well - https://yahooquery.streamlit.app/ |
@ValueRaider I’m wondering if it has something to do with where you’re based. I’m naively navigating to finance.yahoo.com to retrieve cookies - but I believe you’re based in the UK and would actually navigate to uk.finance.yahoo.com. Curious if you pulled this code down and just changed this if it would start working for you. |
I don't quite understand where I would make the change. But to check if worth investigating, I logged in with Firefox Debugger active and don't see UK-specific URLs, but the same as you e.g. EDIT: let me check header for anything UK-specific ... |
yahooquery/utils/__init__.py
Outdated
def setup_session_with_cookies_and_crumb(session: Session): | ||
headers = {**random.choice(HEADERS), **addl_headers} | ||
session.headers = headers | ||
response = session.get('https://finance.yahoo.com', hooks={'response': get_crumb}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ValueRaider This is the place where you would modify to (I'm guessing) https://uk.finance.yahoo.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also be curious what headers you're seeing on your side when that request is being made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding 'uk' = no difference.
But, adding key headers['Host'] = 'query1.finance.yahoo.com'
did address exception. Crumb an empty string, maybe because I'm not logging in? I can't login because I updated Chrome yesterday and webdriver
complaining it's too new, but that's my problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this code doesn’t use selenium. It’s only used to login to YF (and not even sure that’s working right now as it looks like they’ve added a captcha)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getcrumb response.text
contains string "HTTP Status 403 - Forbidden". Is that typical of Yahoo expecting a cookie?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense if the session making that request isn't set up with the appropriate cookies. Those cookies are supposed to be set when making the initial request to the YF home page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry wrong part of code, that 403 happens in setup_session_with_cookies_and_crumb()
on https://finance.yahoo.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really surprising! I'm not sure what would make that a forbidden request just to the YF home page.
Hey y'all, interested in this one as I currently use |
@cmjordan42 Where are you based? My somewhat uneducated theory right now is that this would work for anyone based in the U.S, which is why it works for me and it works inside of the streamlit app. If you are U.S. based, you could take this branch for a spin and see if you're able to access the Another somewhat uneducated guess - I think the solution to this is to almost have country/region specific headers that map to what they would be if that person navigated to the YF home page on a browser. I think what's happening right now is instead of going directly to the home page, it first navigates to |
I'm in US - EST. I'd be happy to test drive if it would be helpful. |
Just tested with this branch, it appears to work fine. Definitely an improvement even if some regions have more complicated issues. May as well publish it to get However, it occasionally fails with something a response along the lines of: "For input string: "-91000.0000000002"" |
And that appears to be transient; if you request the same security again, it may work. Not sure if that transience is on the |
@ValueRaider, @cmjordan42 Would you mind giving this another go? Made some changes last night that hopefully both handles errors but also provides a fallback option with selenium (if it's installed) to retrieve cookies/crumb. |
No change, and I tried changing URL and header parameters. These are my browser headers btw: {
"Host": "query1.finance.yahoo.com",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0",
"Accept": "*/*",
"Accept-Language": "en-GB,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Content-Type": "text/plain",
"Origin": "https://finance.yahoo.com",
"Referer": "https://finance.yahoo.com/"
} |
@ValueRaider Is this where the error is happening (the explicit request to retrieve the crumb)? |
Does this work for you in USA? https://stackoverflow.com/questions/76065035/yahoo-finance-v7-api-now-requiring-cookies-python |