-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scraper and Tests Build: Cincinnati City Council #9
base: main
Are you sure you want to change the base?
Conversation
WalkthroughA new spider class named Changes
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Outside diff range and nitpick comments (3)
tests/test_cinoh_city_council.py (1)
23-75
: Add docstrings to test functionsAdding docstrings to test functions would improve test clarity and documentation.
Example for one test:
def test_count(): + """Verify the correct number of meetings are parsed from the test data.""" assert len(parsed_items) == 21
city_scrapers/spiders/cinoh_city_council.py (2)
11-12
: Remove unusedlink_types
attributeThe
link_types
attribute is defined but is empty and doesn't appear to be used in the spider. If it's not required, consider removing it to clean up the code.Apply this change to remove the unused attribute:
# Add the titles of any links not included in the scraped results - link_types = []
15-20
: Update docstring to reflect actual methods usedThe docstring refers to
_parse_title
and_parse_start
methods, which are not defined in this spider. Update the docstring to accurately reflect the implemented methods.Apply this change to correct the docstring:
""" - `parse_legistar` should always `yield` Meeting items. - - Change the `_parse_title`, `_parse_start`, etc methods to fit your scraping - needs. + Parses Legistar events and yields Meeting items. + + Uses helper methods `_parse_classification`, `_parse_location`, and `_parse_links` + to extract meeting details. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- city_scrapers/spiders/cinoh_city_council.py (1 hunks)
- tests/files/cinoh_city_council.json (1 hunks)
- tests/test_cinoh_city_council.py (1 hunks)
🧰 Additional context used
🪛 Ruff
tests/test_cinoh_city_council.py
6-6:
city_scrapers_core.constants.COMMITTEE
imported but unusedRemove unused import:
city_scrapers_core.constants.COMMITTEE
(F401)
🔇 Additional comments (3)
tests/test_cinoh_city_council.py (1)
63-66
: 🛠️ Refactor suggestionEnhance link testing coverage
Given the PR's concern about handling multiple links, consider adding test cases for meetings with multiple links to ensure proper handling.
Add test cases for multiple links:
def test_links_multiple(): # Test case for an item with multiple links multi_link_item = [item for item in parsed_items if len(item["links"]) > 1] if multi_link_item: assert isinstance(multi_link_item[0]["links"], list) assert all( {"title", "href"}.issubset(link.keys()) for link in multi_link_item[0]["links"] )Let's verify if there are any items with multiple links:
tests/files/cinoh_city_council.json (2)
1-1
: Comprehensive test fixture with good coverage.The test data includes a good variety of scenarios including:
- Regular meetings
- Special sessions
- Cancelled meetings
- Various committees
- Different availability states for agendas/minutes
1-1
: Verify the date format parsing in the scraper.The test data uses MM/DD/YYYY format for dates. Ensure that the scraper correctly handles this format and converts it to a consistent internal representation.
✅ Verification successful
The date format parsing appears to be handled correctly
Based on the test file (tests/test_cinoh_city_council.py), the scraper correctly parses dates from MM/DD/YYYY format into proper datetime objects. The test case verifies that "10/30/2024" is correctly parsed into
datetime(2024, 10, 30, 14, 0)
. Since this is inherited from the LegistarSpider base class (a core component), and the tests are passing, the date parsing implementation is working as expected.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for date parsing logic in the scraper ast-grep --pattern 'def parse_$_($$$) { $$$ date$$$ $$$ }'Length of output: 66
Script:
#!/bin/bash # Search for Python files that might contain date parsing logic fd -e py | xargs rg -l "datetime|strptime|date" # Then search for specific date parsing patterns rg -A 5 -B 5 "strptime|datetime.strptime|parse.*date"Length of output: 723
Script:
#!/bin/bash # Let's find the actual scraper file that processes these dates fd -e py | xargs rg -l "cincinnati|cinoh" # Then examine its content for date handling rg -A 10 -B 10 "class Cin.*Spider"Length of output: 1451
Script:
#!/bin/bash # Let's examine the LegistarSpider parent class since it's handling the date parsing rg -l "class LegistarSpider" # And then look at its implementation rg -A 20 -B 5 "class LegistarSpider"Length of output: 69
@@ -0,0 +1 @@ | |||
[{"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/30/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229949&GUID=40029B38-4ED1-4770-8B4F-E76E6D0FE583"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Healthy Neighborhoods", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47057&GUID=94D4079D-9CEF-48C0-9FC8-6B42FE58BC0C"}, "Meeting Date": "10/29/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230062&GUID=DEBE1F8D-8F23-470B-8BBF-9C0B3F21E2C7"}, "Meeting Time": "12:30 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Public Safety & Governance", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47056&GUID=ADCB91E2-9BDF-482F-AAF6-6D5EF2A040FB"}, "Meeting Date": "10/29/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229955&GUID=3D92E418-E420-456D-972A-85F33C138E1F"}, "Meeting Time": "9:30 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Budget and Finance Committee", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38077&GUID=207ADF11-05C1-4163-AF58-E1C0119342BC"}, "Meeting Date": "10/28/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229952&GUID=B1C0FDFB-15D4-411F-A817-A25E7B07ADFA"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/23/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229959&GUID=47066CAE-AF74-45BF-914A-830667219A0C"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Equitable Growth & Housing", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47053&GUID=2F284759-56BA-4D9D-B640-E02600D83497"}, "Meeting Date": "10/22/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230014&GUID=D3A25E8D-7EBB-4BD2-9930-29B6CCB3D68D"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Climate, Environment & Infrastructure", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47054&GUID=05A8515F-E080-4889-AF46-16001666DA90"}, "Meeting Date": "10/22/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230015&GUID=739A5464-04D7-41C4-B470-CE3E10B47462"}, "Meeting Time": "10:00 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Budget and Finance Committee", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38077&GUID=207ADF11-05C1-4163-AF58-E1C0119342BC"}, "Meeting Date": "10/21/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229951&GUID=7B19DEC4-C430-4CD3-B462-BD6CBE4ABD49"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/17/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1235477&GUID=0CC17DD2-8A13-4EC6-A533-F86D41F010D3"}, "Meeting Time": "12:30 PM", "Meeting Location": "Council Chambers, Room 300 SPECIAL SESSION", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1235477&GUID=0CC17DD2-8A13-4EC6-A533-F86D41F010D3&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1235477&GUID=0CC17DD2-8A13-4EC6-A533-F86D41F010D3"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1235477&GUID=0CC17DD2-8A13-4EC6-A533-F86D41F010D3"}, "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/16/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229958&GUID=94FF187B-82AF-490B-87EE-C9ECF62EBF84"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229958&GUID=94FF187B-82AF-490B-87EE-C9ECF62EBF84&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229958&GUID=94FF187B-82AF-490B-87EE-C9ECF62EBF84"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229958&GUID=94FF187B-82AF-490B-87EE-C9ECF62EBF84"}, "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Healthy Neighborhoods", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47057&GUID=94D4079D-9CEF-48C0-9FC8-6B42FE58BC0C"}, "Meeting Date": "10/15/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913"}, "Meeting Time": "12:30 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Public Safety & Governance", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47056&GUID=ADCB91E2-9BDF-482F-AAF6-6D5EF2A040FB"}, "Meeting Date": "10/15/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47"}, "Meeting Time": "9:30 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Budget and Finance Committee", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38077&GUID=207ADF11-05C1-4163-AF58-E1C0119342BC"}, "Meeting Date": "10/14/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229950&GUID=C9733291-6F67-4456-B507-4FD3D037B229"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300 NOTICE OF CANCELLATION", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229950&GUID=C9733291-6F67-4456-B507-4FD3D037B229&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229950&GUID=C9733291-6F67-4456-B507-4FD3D037B229"}, "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/9/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Equitable Growth & Housing", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47053&GUID=2F284759-56BA-4D9D-B640-E02600D83497"}, "Meeting Date": "10/8/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Climate, Environment & Infrastructure", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47054&GUID=05A8515F-E080-4889-AF46-16001666DA90"}, "Meeting Date": "10/8/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9"}, "Meeting Time": "10:00 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Budget and Finance Committee", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38077&GUID=207ADF11-05C1-4163-AF58-E1C0119342BC"}, "Meeting Date": "10/7/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/2/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Healthy Neighborhoods", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47057&GUID=94D4079D-9CEF-48C0-9FC8-6B42FE58BC0C"}, "Meeting Date": "10/1/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1231940&GUID=D11B7136-1C39-474F-AC7A-ADF12060DA94"}, "Meeting Time": "5:30 PM", "Meeting Location": "Sayler Park Recreation Center, 6720 Home City Avenue, Cincinnati, Ohio 45233 SPECIAL MEETING", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1231940&GUID=D11B7136-1C39-474F-AC7A-ADF12060DA94&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1231940&GUID=D11B7136-1C39-474F-AC7A-ADF12060DA94"}, "Agenda Packet": "Not\u00a0available", "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1231940&GUID=D11B7136-1C39-474F-AC7A-ADF12060DA94"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Healthy Neighborhoods", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47057&GUID=94D4079D-9CEF-48C0-9FC8-6B42FE58BC0C"}, "Meeting Date": "10/1/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229960&GUID=991B9783-6CA2-4AE3-BC94-64020D6263E6"}, "Meeting Time": "12:30 PM", "Meeting Location": "Council Chambers, Room 300 NOTICE OF CANCELLATION", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229960&GUID=991B9783-6CA2-4AE3-BC94-64020D6263E6&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229960&GUID=991B9783-6CA2-4AE3-BC94-64020D6263E6"}, "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Public Safety & Governance", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47056&GUID=ADCB91E2-9BDF-482F-AAF6-6D5EF2A040FB"}, "Meeting Date": "10/1/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229953&GUID=88A4DE50-7828-449F-8DAE-6B452906BBF3"}, "Meeting Time": "9:30 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229953&GUID=88A4DE50-7828-449F-8DAE-6B452906BBF3&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229953&GUID=88A4DE50-7828-449F-8DAE-6B452906BBF3"}, "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Standardize meeting location format.
The meeting location field currently mixes the base location with status information (e.g., "NOTICE OF CANCELLATION", "SPECIAL SESSION"). Consider separating these into distinct fields for better parsing and consistency.
- "Meeting Location": "Council Chambers, Room 300 NOTICE OF CANCELLATION",
+ "Meeting Location": "Council Chambers, Room 300",
+ "Meeting Status": "CANCELLED",
Committable suggestion was skipped due to low confidence.
🛠️ Refactor suggestion
Consider standardizing "Not available" values.
Currently, unavailable items are represented as string "Not\u00a0available". Consider using null
values instead, which is more idiomatic in JSON and allows for easier programmatic handling.
- "Agenda": "Not\u00a0available",
+ "Agenda": null,
Committable suggestion was skipped due to low confidence.
class CinohCityCouncilSpider(LegistarSpider): | ||
name = "cinoh_city_council" | ||
agency = "Cincinnati City Council" | ||
timezone = "America/Chicago" | ||
start_urls = ["https://cincinnatioh.legistar.com/Calendar.aspx"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mismatch between PR objectives and code – City mismatch
The code is implementing a spider for "Cincinnati City Council" as indicated by the agency
, start_urls
, name
, and class name. However, the PR objectives mention updating the scraper for Cleveland City Council. Please verify and update the code to align with the PR objectives.
Apply the following changes to update the code:
-class CinohCityCouncilSpider(LegistarSpider):
+class ClevelandCityCouncilSpider(LegistarSpider):
name = "cinoh_city_council"
- agency = "Cincinnati City Council"
+ name = "cle_city_council"
+ agency = "Cleveland City Council"
- timezone = "America/Chicago"
+ timezone = "America/New_York"
- start_urls = ["https://cincinnatioh.legistar.com/Calendar.aspx"]
+ start_urls = ["https://clevelandoh.legistar.com/Calendar.aspx"]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class CinohCityCouncilSpider(LegistarSpider): | |
name = "cinoh_city_council" | |
agency = "Cincinnati City Council" | |
timezone = "America/Chicago" | |
start_urls = ["https://cincinnatioh.legistar.com/Calendar.aspx"] | |
class ClevelandCityCouncilSpider(LegistarSpider): | |
name = "cle_city_council" | |
agency = "Cleveland City Council" | |
timezone = "America/New_York" | |
start_urls = ["https://clevelandoh.legistar.com/Calendar.aspx"] |
if obj.get("Name"): | ||
links.append({"title": "meeting page", "href": obj["Name"]["url"]}) | ||
elif obj.get("iCalendar"): | ||
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]}) | ||
elif obj.get("Meeting Details"): | ||
links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}) | ||
elif obj.get("Agenda"): | ||
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]}) | ||
elif obj.get("Agenda Packet"): | ||
links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}) | ||
elif obj.get("Minutes"): | ||
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]}) | ||
elif obj.get("Video"): | ||
links.append({"title": "Video", "href": obj["Video"]["url"]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify link parsing to include all available links
Currently, the _parse_links
method uses elif
statements, so only the first available link is added. To include all available links, change elif
to if
.
Apply this change to include all relevant links:
links = []
if obj.get("Name") and obj["Name"].get("url"):
links.append({"title": "Meeting Page", "href": obj["Name"]["url"]})
- elif obj.get("iCalendar") and obj["iCalendar"].get("url"):
+ if obj.get("iCalendar") and obj["iCalendar"].get("url"):
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
- elif obj.get("Meeting Details") and obj["Meeting Details"].get("url"):
+ if obj.get("Meeting Details") and obj["Meeting Details"].get("url"):
links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]})
- elif obj.get("Agenda") and obj["Agenda"].get("url"):
+ if obj.get("Agenda") and obj["Agenda"].get("url"):
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
- elif obj.get("Agenda Packet") and obj["Agenda Packet"].get("url"):
+ if obj.get("Agenda Packet") and obj["Agenda Packet"].get("url"):
links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]})
- elif obj.get("Minutes") and obj["Minutes"].get("url"):
+ if obj.get("Minutes") and obj["Minutes"].get("url"):
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
- elif obj.get("Video") and obj["Video"].get("url"):
+ if obj.get("Video") and obj["Video"].get("url"):
links.append({"title": "Video", "href": obj["Video"]["url"]})
return links
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if obj.get("Name"): | |
links.append({"title": "meeting page", "href": obj["Name"]["url"]}) | |
elif obj.get("iCalendar"): | |
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]}) | |
elif obj.get("Meeting Details"): | |
links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}) | |
elif obj.get("Agenda"): | |
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]}) | |
elif obj.get("Agenda Packet"): | |
links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}) | |
elif obj.get("Minutes"): | |
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]}) | |
elif obj.get("Video"): | |
links.append({"title": "Video", "href": obj["Video"]["url"]}) | |
if obj.get("Name"): | |
links.append({"title": "meeting page", "href": obj["Name"]["url"]}) | |
if obj.get("iCalendar"): | |
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]}) | |
if obj.get("Meeting Details"): | |
links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}) | |
if obj.get("Agenda"): | |
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]}) | |
if obj.get("Agenda Packet"): | |
links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}) | |
if obj.get("Minutes"): | |
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]}) | |
if obj.get("Video"): | |
links.append({"title": "Video", "href": obj["Video"]["url"]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here!
I would improve the _parse_links
method so it accurately captures all available links. Also, I would add another test case for the COMMITTEE
classification. The docstrings could also use some updating instead of just being the generated string. And finally, the PR description talks about Cleveland City Council but this PR is supposed to be for Cincinnati City Council I believe. Double check to make sure everything is correct there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are issues with "timezone" and "status". Please double check!
class CinohCityCouncilSpider(LegistarSpider): | ||
name = "cinoh_city_council" | ||
agency = "Cincinnati City Council" | ||
timezone = "America/Chicago" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double check "what is the correct timezone for Cincinnati city"?
source=self.legistar_source(obj), | ||
) | ||
|
||
meeting["status"] = self._get_status(meeting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the meeting location contains the text "NOTICE OF CANCELLATION", for example the last meeting of the year 2024 Equitable Growth & Housing on 12/31/2024, meaning that this meeting is cancelled. So you need to override the method _parse_status(meeting)
instead of using the inherited method self._get_status(meeting)
to return status "CANCELLED"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
city_scrapers/spiders/cinoh_city_council.py (3)
20-32
:⚠️ Potential issueAdd error handling for missing keys in meeting data.
Direct dictionary access without safety checks could raise KeyError exceptions.
Apply these changes to safely handle missing data:
meeting = Meeting( - title=obj["Name"]["label"], + title=obj.get("Name", {}).get("label", ""), description="", classification=self._parse_classification(obj), start=self.legistar_start(obj), end=None, all_day=False, time_notes="", status=self._parse_status(obj), location=self._parse_location(obj), links=self._parse_links(obj), - source=self.legistar_source(obj), + source=self.legistar_source(obj) or "", )
50-55
:⚠️ Potential issueAdd error handling for location parsing.
Direct access to
obj["Meeting Location"]
could raise KeyError.def _parse_location(self, obj): - room = obj["Meeting Location"] + room = obj.get("Meeting Location", "Location not provided") return { "address": "801 Plum St. Cincinnati, OH 45202", - "name": f"{room}" + "name": room }
57-74
:⚠️ Potential issueFix link parsing to include all available links.
The current implementation using separate
if
statements withreturn
only captures the first matching link. This means potentially important links are being missed.def _parse_links(self, obj): - - if obj.get("Name"): - return [{"title": "meeting page", "href": obj["Name"]["url"]}] - if obj.get("iCalendar"): - return [{"title": "iCalendar", "href": obj["iCalendar"]["url"]}] - if obj.get("Meeting Details"): - return [{"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}] - if obj.get("Agenda"): - return [{"title": "Agenda", "href": obj["Agenda"]["url"]}] - if obj.get("Agenda Packet"): - return [{"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}] - if obj.get("Minutes"): - return [{"title": "Minutes", "href": obj["Minutes"]["url"]}] - if obj.get("Video"): - return [{"title": "Video", "href": obj["Video"]["url"]}] - - + links = [] + if obj.get("Name", {}).get("url"): + links.append({"title": "Meeting Page", "href": obj["Name"]["url"]}) + if obj.get("iCalendar", {}).get("url"): + links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]}) + if obj.get("Meeting Details", {}).get("url"): + links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}) + if obj.get("Agenda", {}).get("url"): + links.append({"title": "Agenda", "href": obj["Agenda"]["url"]}) + if obj.get("Agenda Packet", {}).get("url"): + links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}) + if obj.get("Minutes", {}).get("url"): + links.append({"title": "Minutes", "href": obj["Minutes"]["url"]}) + if obj.get("Video", {}).get("url"): + links.append({"title": "Video", "href": obj["Video"]["url"]}) + return links
🧹 Nitpick comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)
13-18
: Improve docstring clarity and fix typos.The docstring could be more informative and contains typos:
- "calander" should be "calendar"
- Add information about the return type and parameters
Update the docstring to:
- """ - Parse upcoming and past meetings from the Cincinnati City Council meetings table. - - Oftentimes, the columns: meeting details, agenda, minutes, and video are left blank on the calander - but when they are, they are in the form of links. - """ + """Parse upcoming and past meetings from the Cincinnati City Council meetings table. + + Args: + response: JSON response from Legistar API containing meeting data + + Yields: + Meeting: Scraped meeting items with details including title, time, location, and links. + + Note: + The columns (meeting details, agenda, minutes, and video) are sometimes empty on the calendar, + but when present, they contain links to the respective resources. + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
city_scrapers/spiders/cinoh_city_council.py
(1 hunks)tests/test_cinoh_city_council.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_cinoh_city_council.py
🔇 Additional comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)
6-10
: LGTM! Configuration is correct.The spider is properly configured with the correct timezone for Cincinnati and the appropriate Legistar calendar URL.
def _parse_status(self, obj): | ||
if obj["Meeting Location"] == "Council Chambers, Room 300 NOTICE OF CANCELLATION": | ||
return "cancelled" | ||
else: | ||
return "tentative" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance cancellation detection logic.
The current implementation only checks for one specific cancellation notice format. Consider:
- Case-insensitive matching
- Different cancellation notice formats
- Default to using the inherited
self._get_status()
for other cases
def _parse_status(self, obj):
- if obj["Meeting Location"] == "Council Chambers, Room 300 NOTICE OF CANCELLATION":
+ location = obj.get("Meeting Location", "").lower()
+ if "notice of cancellation" in location or "canceled" in location:
return "cancelled"
- else:
- return "tentative"
+ return self._get_status(obj)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _parse_status(self, obj): | |
if obj["Meeting Location"] == "Council Chambers, Room 300 NOTICE OF CANCELLATION": | |
return "cancelled" | |
else: | |
return "tentative" | |
def _parse_status(self, obj): | |
location = obj.get("Meeting Location", "").lower() | |
if "notice of cancellation" in location or "canceled" in location: | |
return "cancelled" | |
return self._get_status(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have some problems with the links parsing. Please fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: | ||
return COMMITTEE | ||
|
||
def _parse_status(self, obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the meeting date is earlier than the current date, return the status PASSED.
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]}) | ||
|
||
if obj.get("Meeting Details") == "Meeting\u00a0details": | ||
links.append({"title": "Meeting Details", "href": "Not Available"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a document is unavailable, exclude it from the list of links. We can refactor this block by creating an array of possible document names and iterating through them, like this:
for document_name in ["Meeting Details", "Agenda", ...]:
document = obj.get(document_name)
if document.get("url"):
links.append({"title": document_name, "href": document["url"]})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments have been fixed but we still have failing lint issues. Runpipenv run black .
and other lint commands to fix the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)
22-34
:⚠️ Potential issueAdd error handling for missing keys in meeting data.
Direct dictionary access without checking for key existence could raise KeyError exceptions.
Apply these changes to safely handle missing keys:
meeting = Meeting( - title=obj["Name"]["label"], + title=obj.get("Name", {}).get("label", "No title provided"), description="", classification=self._parse_classification(obj), start=self.legistar_start(obj), end=None, all_day=False, time_notes="", status=self._parse_status(obj), location=self._parse_location(obj), links=self._parse_links(obj), - source=self.legistar_source(obj), + source=self.legistar_source(obj) if obj else "", )
🧹 Nitpick comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)
14-20
: Enhance method documentation.The docstring could be more descriptive about the method's parameters, return value, and specific behavior.
Apply this change:
def parse_legistar(self, response): """ - Parse upcoming and past meetings from the Cincinnati City Council meetings table. - - Oftentimes, the columns: meeting details, agenda, minutes, and video are left blank on the calander - but when they are, they are in the form of links. + Parse meeting information from the Legistar API response. + + Args: + response: JSON response from Legistar API containing meeting data + + Yields: + Meeting: Scraped meeting items with details including title, date, location, + status, and associated document links + + Note: + Meeting documents (agenda, minutes, video) may be unavailable or present as links + in the calendar view. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
city_scrapers/spiders/cinoh_city_council.py
(1 hunks)tests/test_cinoh_city_council.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_cinoh_city_council.py
🧰 Additional context used
🪛 Ruff (0.8.2)
city_scrapers/spiders/cinoh_city_council.py
75-75: Use obj.get("Meeting Details") != "Meeting\xa0details"
instead of not obj.get("Meeting Details") == "Meeting\xa0details"
Replace with !=
operator
(SIM201)
80-80: Use obj.get("Agenda") != "Not\xa0available"
instead of not obj.get("Agenda") == "Not\xa0available"
Replace with !=
operator
(SIM201)
83-83: Use obj.get("Agenda Packet") != "Not\xa0available"
instead of not obj.get("Agenda Packet") == "Not\xa0available"
Replace with !=
operator
(SIM201)
88-88: Use obj.get("Minutes") != "Not\xa0available"
instead of not obj.get("Minutes") == "Not\xa0available"
Replace with !=
operator
(SIM201)
91-91: Use obj.get("Video") != "Not\xa0available"
instead of not obj.get("Video") == "Not\xa0available"
Replace with !=
operator
(SIM201)
🔇 Additional comments (2)
city_scrapers/spiders/cinoh_city_council.py (2)
8-12
: LGTM! Configuration is accurate.The timezone is correctly set to "America/New_York" for Cincinnati, and other configuration parameters are properly defined.
46-60
: 🛠️ Refactor suggestionEnhance meeting status detection.
The status detection needs improvement to handle:
- Case-insensitive cancellation notice matching
- Different cancellation notice formats
- Past meetings (already implemented)
Apply this change:
def _parse_status(self, obj): date = obj["Meeting Date"] parsed_date = parse(date, fuzzy=True) - if ( - obj["Meeting Location"] - == "Council Chambers, Room 300 NOTICE OF CANCELLATION" - ): + location = obj.get("Meeting Location", "").lower() + if "notice of cancellation" in location or "canceled" in location: return "cancelled" elif parsed_date < datetime.today(): return "passed" else: return "tentative"Likely invalid or redundant comment.
def _parse_links(self, obj): | ||
|
||
links = [] | ||
|
||
if obj.get("Name"): | ||
links.append({"title": "meeting page", "href": obj["Name"]["url"]}) | ||
|
||
if obj.get("iCalendar"): | ||
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]}) | ||
|
||
if not obj.get("Meeting Details") == "Meeting\u00a0details": | ||
links.append( | ||
{"title": "Meeting Details", "href": obj["Meeting Details"]["url"]} | ||
) | ||
|
||
if not obj.get("Agenda") == "Not\u00a0available": | ||
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]}) | ||
|
||
if not obj.get("Agenda Packet") == "Not\u00a0available": | ||
links.append( | ||
{"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]} | ||
) | ||
|
||
if not obj.get("Minutes") == "Not\u00a0available": | ||
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]}) | ||
|
||
if not obj.get("Video") == "Not\u00a0available": | ||
links.append({"title": "Video", "href": obj["Video"]["url"]}) | ||
|
||
return links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor link parsing for better maintainability and reliability.
The current implementation has several issues:
- Inefficient comparison operators (using
not ==
instead of!=
) - Repetitive code structure
- Unsafe dictionary access
Apply this refactoring to improve the code:
def _parse_links(self, obj):
links = []
-
- if obj.get("Name"):
- links.append({"title": "meeting page", "href": obj["Name"]["url"]})
-
- if obj.get("iCalendar"):
- links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
-
- if not obj.get("Meeting Details") == "Meeting\u00a0details":
- links.append(
- {"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}
- )
-
- if not obj.get("Agenda") == "Not\u00a0available":
- links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
-
- if not obj.get("Agenda Packet") == "Not\u00a0available":
- links.append(
- {"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}
- )
-
- if not obj.get("Minutes") == "Not\u00a0available":
- links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
-
- if not obj.get("Video") == "Not\u00a0available":
- links.append({"title": "Video", "href": obj["Video"]["url"]})
+ documents = {
+ "Name": "meeting page",
+ "iCalendar": "iCalendar",
+ "Meeting Details": "Meeting Details",
+ "Agenda": "Agenda",
+ "Agenda Packet": "Agenda Packet",
+ "Minutes": "Minutes",
+ "Video": "Video"
+ }
+
+ for key, title in documents.items():
+ doc = obj.get(key, {})
+ if isinstance(doc, dict) and doc.get("url"):
+ if key in ["Meeting Details"] and doc.get("label") == "Meeting\u00a0details":
+ continue
+ if doc.get("label") != "Not\u00a0available":
+ links.append({"title": title, "href": doc["url"]})
return links
This refactoring:
- Uses a dictionary to define document types
- Handles missing keys safely
- Properly checks for unavailable documents
- Reduces code duplication
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _parse_links(self, obj): | |
links = [] | |
if obj.get("Name"): | |
links.append({"title": "meeting page", "href": obj["Name"]["url"]}) | |
if obj.get("iCalendar"): | |
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]}) | |
if not obj.get("Meeting Details") == "Meeting\u00a0details": | |
links.append( | |
{"title": "Meeting Details", "href": obj["Meeting Details"]["url"]} | |
) | |
if not obj.get("Agenda") == "Not\u00a0available": | |
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]}) | |
if not obj.get("Agenda Packet") == "Not\u00a0available": | |
links.append( | |
{"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]} | |
) | |
if not obj.get("Minutes") == "Not\u00a0available": | |
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]}) | |
if not obj.get("Video") == "Not\u00a0available": | |
links.append({"title": "Video", "href": obj["Video"]["url"]}) | |
return links | |
def _parse_links(self, obj): | |
links = [] | |
documents = { | |
"Name": "meeting page", | |
"iCalendar": "iCalendar", | |
"Meeting Details": "Meeting Details", | |
"Agenda": "Agenda", | |
"Agenda Packet": "Agenda Packet", | |
"Minutes": "Minutes", | |
"Video": "Video" | |
} | |
for key, title in documents.items(): | |
doc = obj.get(key, {}) | |
if isinstance(doc, dict) and doc.get("url"): | |
if key in ["Meeting Details"] and doc.get("label") == "Meeting\u00a0details": | |
continue | |
if doc.get("label") != "Not\u00a0available": | |
links.append({"title": title, "href": doc["url"]}) | |
return links |
🧰 Tools
🪛 Ruff (0.8.2)
75-75: Use obj.get("Meeting Details") != "Meeting\xa0details"
instead of not obj.get("Meeting Details") == "Meeting\xa0details"
Replace with !=
operator
(SIM201)
80-80: Use obj.get("Agenda") != "Not\xa0available"
instead of not obj.get("Agenda") == "Not\xa0available"
Replace with !=
operator
(SIM201)
83-83: Use obj.get("Agenda Packet") != "Not\xa0available"
instead of not obj.get("Agenda Packet") == "Not\xa0available"
Replace with !=
operator
(SIM201)
88-88: Use obj.get("Minutes") != "Not\xa0available"
instead of not obj.get("Minutes") == "Not\xa0available"
Replace with !=
operator
(SIM201)
91-91: Use obj.get("Video") != "Not\xa0available"
instead of not obj.get("Video") == "Not\xa0available"
Replace with !=
operator
(SIM201)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the CI checks are still failing. Before pushing your commits, please run these code quality checks locally:
pipenv run isort . --check-only # Check import sorting
pipenv run black . --check # Check code formatting
pipenv run flake8 . # Check code style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
city_scrapers/spiders/cinoh_city_council.py (4)
25-37
:⚠️ Potential issueAdd error handling for missing keys.
The current implementation could raise KeyError exceptions if the response data is malformed.
Apply this change to handle missing keys safely:
meeting = Meeting( - title=obj["Name"]["label"], + title=obj.get("Name", {}).get("label", "No title provided"), description="", classification=self._parse_classification(obj), start=self.legistar_start(obj), end=None, all_day=False, time_notes="", status=self._parse_status(obj), location=self._parse_location(obj), links=self._parse_links(obj), source=self.legistar_source(obj), )
49-63
: 🛠️ Refactor suggestionEnhance cancellation detection logic.
The current implementation:
- Has unsafe key access
- Uses a very specific cancellation notice format
- Could miss other cancellation indicators
def _parse_status(self, obj): date = obj["Meeting Date"] parsed_date = parse(date, fuzzy=True) - if ( - obj["Meeting Location"] - == "Council Chambers, Room 300 NOTICE OF CANCELLATION" - ): + location = obj.get("Meeting Location", "").lower() + if "notice of cancellation" in location or "canceled" in location: return "cancelled" elif parsed_date < datetime.today(): return "passed" else: return "tentative"
64-66
:⚠️ Potential issueAdd error handling for missing location.
The method could raise KeyError if the "Meeting Location" key is missing.
def _parse_location(self, obj): - room = obj["Meeting Location"] + room = obj.get("Meeting Location", "No room provided") return {"address": "801 Plum St. Cincinnati, OH 45202", "name": f"{room}"}
68-97
: 🛠️ Refactor suggestionRefactor link parsing for better maintainability and reliability.
Current implementation has several issues:
- Unsafe dictionary access
- Inefficient comparison operators
- Repetitive code structure
def _parse_links(self, obj): links = [] - - if obj.get("Name"): - links.append({"title": "meeting page", "href": obj["Name"]["url"]}) - - if obj.get("iCalendar"): - links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]}) - - if not obj.get("Meeting Details") == "Meeting\u00a0details": - links.append( - {"title": "Meeting Details", "href": obj["Meeting Details"]["url"]} - ) - - if not obj.get("Agenda") == "Not\u00a0available": - links.append({"title": "Agenda", "href": obj["Agenda"]["url"]}) - - if not obj.get("Agenda Packet") == "Not\u00a0available": - links.append( - {"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]} - ) - - if not obj.get("Minutes") == "Not\u00a0available": - links.append({"title": "Minutes", "href": obj["Minutes"]["url"]}) - - if not obj.get("Video") == "Not\u00a0available": - links.append({"title": "Video", "href": obj["Video"]["url"]}) + documents = { + "Name": "meeting page", + "iCalendar": "iCalendar", + "Meeting Details": "Meeting Details", + "Agenda": "Agenda", + "Agenda Packet": "Agenda Packet", + "Minutes": "Minutes", + "Video": "Video" + } + + for key, title in documents.items(): + doc = obj.get(key, {}) + if isinstance(doc, dict) and doc.get("url"): + if key in ["Meeting Details"] and doc.get("label") == "Meeting\u00a0details": + continue + if doc.get("label") != "Not\u00a0available": + links.append({"title": title, "href": doc["url"]}) return linksThis refactoring:
- Uses a dictionary to define document types
- Handles missing keys safely
- Properly checks for unavailable documents
- Reduces code duplication
🧰 Tools
🪛 Ruff (0.8.2)
78-78: Use
obj.get("Meeting Details") != "Meeting\xa0details"
instead ofnot obj.get("Meeting Details") == "Meeting\xa0details"
Replace with
!=
operator(SIM201)
83-83: Use
obj.get("Agenda") != "Not\xa0available"
instead ofnot obj.get("Agenda") == "Not\xa0available"
Replace with
!=
operator(SIM201)
86-86: Use
obj.get("Agenda Packet") != "Not\xa0available"
instead ofnot obj.get("Agenda Packet") == "Not\xa0available"
Replace with
!=
operator(SIM201)
91-91: Use
obj.get("Minutes") != "Not\xa0available"
instead ofnot obj.get("Minutes") == "Not\xa0available"
Replace with
!=
operator(SIM201)
94-94: Use
obj.get("Video") != "Not\xa0available"
instead ofnot obj.get("Video") == "Not\xa0available"
Replace with
!=
operator(SIM201)
🧹 Nitpick comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)
16-23
: Enhance docstring with more details.Add information about:
- The method's parameters
- The return value (yielded Meeting objects)
- The structure of the response object
def parse_legistar(self, response): """ - Parse upcoming and past meetings from the - Cincinnati City Council meetings table. + Parse upcoming and past meetings from the Cincinnati City Council meetings table. + + Args: + response: JSON response from Legistar API containing meeting data + + Yields: + Meeting: Meeting object containing parsed meeting information Oftentimes, the columns: meeting details, agenda, minutes, and video are left blank on the calander but when they are, they are in the form of links. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
city_scrapers/spiders/cinoh_city_council.py
(1 hunks)tests/test_cinoh_city_council.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_cinoh_city_council.py
🧰 Additional context used
🪛 Ruff (0.8.2)
city_scrapers/spiders/cinoh_city_council.py
78-78: Use obj.get("Meeting Details") != "Meeting\xa0details"
instead of not obj.get("Meeting Details") == "Meeting\xa0details"
Replace with !=
operator
(SIM201)
83-83: Use obj.get("Agenda") != "Not\xa0available"
instead of not obj.get("Agenda") == "Not\xa0available"
Replace with !=
operator
(SIM201)
86-86: Use obj.get("Agenda Packet") != "Not\xa0available"
instead of not obj.get("Agenda Packet") == "Not\xa0available"
Replace with !=
operator
(SIM201)
91-91: Use obj.get("Minutes") != "Not\xa0available"
instead of not obj.get("Minutes") == "Not\xa0available"
Replace with !=
operator
(SIM201)
94-94: Use obj.get("Video") != "Not\xa0available"
instead of not obj.get("Video") == "Not\xa0available"
Replace with !=
operator
(SIM201)
🔇 Additional comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)
9-13
: LGTM! Configuration is accurate.The timezone "America/New_York" is correct for Cincinnati, and the Legistar calendar URL is properly configured.
def _parse_classification(self, obj): | ||
if obj["Name"]["label"] == "Cincinnati City Council": | ||
return CITY_COUNCIL | ||
else: | ||
return COMMITTEE | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing keys in classification.
The method could raise KeyError if the "Name" or "label" keys are missing.
def _parse_classification(self, obj):
- if obj["Name"]["label"] == "Cincinnati City Council":
+ if obj.get("Name", {}).get("label") == "Cincinnati City Council":
return CITY_COUNCIL
else:
return COMMITTEE
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _parse_classification(self, obj): | |
if obj["Name"]["label"] == "Cincinnati City Council": | |
return CITY_COUNCIL | |
else: | |
return COMMITTEE | |
def _parse_classification(self, obj): | |
if obj.get("Name", {}).get("label") == "Cincinnati City Council": | |
return CITY_COUNCIL | |
else: | |
return COMMITTEE |
What's this PR do?
This PR updates the scraper for Cincinnati City Council because of changes to how they display their meeting schedule.
Why are we doing this?
Scraper requested from this spreadsheet.
Steps to manually test
Monitor the output and ensure no errors are raised.
Inspect
test_output.csv
to ensure the data looks valid.Are there any smells or added technical debt to note?
The one thing complex about this new parsing routine is the amount of links that need to be parsed. Right now they are all just put into an array like how the guide directs, but this may not be optimal or helpful. Some sort of link sorting method could help make that data more useful.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Tests