-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏗 Build Spider: Cincinnati Board of Education #12
base: main
Are you sure you want to change the base?
Conversation
Warning Rate limit exceeded@vsspnkkvr has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 4 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughA new spider for scraping Cincinnati Board of Education meeting data has been developed. The spider is designed to retrieve meeting information from an API endpoint, parse JSON responses, and extract relevant meeting details. It focuses on meetings within the last 12 months, creating structured Changes
Sequence DiagramsequenceDiagram
participant Spider as CinohBoardOfEdSpider
participant API as Meeting Data API
participant Parser as JSON Parser
participant Meeting as Meeting Object
Spider->>API: Request meeting data
API-->>Spider: Return JSON response
Spider->>Parser: Parse JSON data
Parser-->>Spider: Extract meeting details
Spider->>Meeting: Create Meeting objects
Meeting-->>Spider: Return parsed meetings
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
city_scrapers/spiders/cinoh_Board_of_Ed.py (2)
73-81
: Consider distinguishing multiple document links.If future requirements expand (e.g., separate Zoom link, minutes, attachments), consider returning multiple link entries with more specific titles.
19-23
: Potential future enhancement to parse second API endpoint.You mentioned there's a second endpoint that’s currently unused. If it provides additional meeting details or ensures completeness, consider adding a second request/parse flow in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
city_scrapers/spiders/cinoh_Board_of_Ed.py
(1 hunks)tests/files/cinoh_Board_of_Ed.json
(1 hunks)tests/test_cinoh_Board_of_Ed.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/files/cinoh_Board_of_Ed.json
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_cinoh_Board_of_Ed.py
5-5: city_scrapers_core.constants.NOT_CLASSIFIED
imported but unused
Remove unused import
(F401)
5-5: city_scrapers_core.constants.BOARD
imported but unused
Remove unused import
(F401)
🔇 Additional comments (3)
city_scrapers/spiders/cinoh_Board_of_Ed.py (3)
23-26
: Good use of FormRequest
to handle form data.
This approach ensures correct server interaction and clean request handling.
28-45
: Validate date filtering and iteration logic.
The logic to skip meetings older than 12 months helps keep the dataset relevant. Confirm this aligns with project requirements (some boards may post older meeting data retroactively).
65-71
: Classification logic looks appropriate.
The classification covers committee, board, or default. This improves clarity if more categories are introduced in the future.
tests/test_cinoh_Board_of_Ed.py
Outdated
from os.path import dirname, join | ||
|
||
import pytest | ||
from city_scrapers_core.constants import COMMITTEE, NOT_CLASSIFIED, BOARD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
NOT_CLASSIFIED
and BOARD
aren't used anywhere, so please remove them to avoid clutter and improve maintainability.
-from city_scrapers_core.constants import COMMITTEE, NOT_CLASSIFIED, BOARD
+from city_scrapers_core.constants import COMMITTEE
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from city_scrapers_core.constants import COMMITTEE, NOT_CLASSIFIED, BOARD | |
from city_scrapers_core.constants import COMMITTEE |
🧰 Tools
🪛 Ruff (0.8.2)
5-5: city_scrapers_core.constants.NOT_CLASSIFIED
imported but unused
Remove unused import
(F401)
5-5: city_scrapers_core.constants.BOARD
imported but unused
Remove unused import
(F401)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a small update of the meeting source URL, this spider should be good to merge. Great job.
return COMMITTEE | ||
elif "Board" in item["name"]: | ||
return BOARD | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please verify our classification types? We noticed Records Commission Meeting
in our API crawl results, indicating that Commission
should be added to our list. Below is that data:
{
"unique":"D7WP9C638211",
"name":"Records Commission Meeting",
"current":"0",
"preliveoak":"",
"numberdate":"20240805",
"unid":"4BF1D938AECA113685258B7200638211"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please help verify if we can utilize this endpoint?
https://go.boarddocs.com/oh/cps/Board.nsf/BD-GETMeetingsListForSEO?open&0.8255721819497781
We noticed it contains a description
field that could be valuable for extracting additional information such as meeting time, location, and other relevant details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the CI checks are still failing. Before pushing your commits, please run these code quality checks locally:
pipenv run isort . --check-only # Check import sorting
pipenv run black . --check # Check code formatting
pipenv run flake8 . # Check code style
Here's the polished message: Hi @vsspnkkvr, I'd like to discuss the endpoint comparison for the BoardDocs API. I've analyzed two endpoints:
The first endpoint, which I found on the agency's main page (https://go.boarddocs.com/oh/cps/Board.nsf/Public), appears to provide more comprehensive data. Here's a comparison of the response data: Endpoint 1 (SEO) returns: {
"Name": "Budget, Finance and Growth Committee Meeting",
"Description": "Mary A. Ronan Education Center2651 Burnet Avenue, 45219LaunchED Room 111https://zoom.us/j/93500727903?pwd=UlBsWEZ0RlZYRy9IY1RwclpYMHZKQT09Zoom Webinar (Passcode: 627163)10:00 AM",
"LiveVideoURL": "https://zoom.us/j/93500727903?pwd=UlBsWEZ0RlZYRy9IY1RwclpYMHZKQT09",
"Unique": "DCWK5U503F2B",
"Date": "2025-01-17T00:00:00Z"
} Endpoint 2 returns: {
"unique": "DCWK5U503F2B",
"name": "Budget, Finance and Growth Committee Meeting",
"current": "1",
"preliveoak": "",
"numberdate": "20250117",
"unid": "D55E1908FC409DBB85258C1200503F2B"
} The SEO endpoint not only includes all the essential information from the second endpoint but also provides additional valuable data such as:
I recommend we implement this endpoint in our spider for better data quality. However, I'd appreciate your evaluation of this endpoint, particularly regarding any potential drawbacks or limitations you may have encountered. Have you had the chance to test it? Please share your thoughts about it. |
What's this PR do?
This PR creates a new scraper for the Cincinnati Board of Education website at URL: https://go.boarddocs.com/oh/cps/Board.nsf/Public#tab-welcome
Why are we doing this?
Scraper requested from this spreadsheet.
Steps to manually test
Monitor the output and ensure no errors are raised.
Inspect
test_output.csv
to ensure the data looks valid.Are there any smells or added technical debt to note?
It scrapes one of two API endpoints found after clicking on a meeting within the table.
Summary by CodeRabbit
New Features
Tests