Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid running final query when fetched_links == total_results #47

Closed
wants to merge 4 commits into from

Conversation

ceholden
Copy link
Collaborator

@ceholden ceholden commented Nov 22, 2024

What I am changing

This PR is a followup to #46 and seeks to fully fix #45.

PR #46 fixed fetching all of the links from ESA after they changed how pagination works, but didn't fully fix one piece that is essential to the Lambda function executing successfully. Specifically without this PR our link fetcher will still run a final query even after fetching all of the links that the API said are available.

To walk through an example ~> assume the API has a query offset limit of 100, and we page through in batches of 50,

Query 1:
  fetched_links = 0
  total_results = 100
  index = 1  # "success
Query 2:
  fetched_links = 50
  total_results = 100
  index = 51  # success
Query 3:
  fetched_links = 100
  total_results = 100
  index = 101  # failure! we don't need to run this query!

To avoid this issue, this PR changes the link fetching behavior to exit once we've fetched all of the totalResults.

How I did it

  • Track "fetched_links" and added a conditional to exit early when fetched_links == total_results
  • Updated the "mock search result for one day" to be more realistic with respect to totalResults (n=10)
  • Updated 2 test assertions to match the total results returned by our mocked search result (from 6786 to 10)
  • Added a new test to ensure we don't fetch links once we've hit the total search results

How you can test it

I added a unit test for this behavior that checks we do NOT run an additional query that has an index greater than the total result count.

$ cd lambdas/link_fetcher
$ pipenv run pytest tests/test_link_fetcher_handler.py -k doesnt_query

I also updated the link fetcher Lambda currently running and used this to get successful executions of our StepFunction ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Link fetcher needs to handle change to OpenSearch limit/offset pagination
1 participant