Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper and Tests Build: Cincinnati City Council #9

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

vsspnkkvr
Copy link

@vsspnkkvr vsspnkkvr commented Oct 28, 2024

What's this PR do?

This PR updates the scraper for Cincinnati City Council because of changes to how they display their meeting schedule.

Why are we doing this?

Scraper requested from this spreadsheet.

Steps to manually test

  1. Ensure the project is installed:
pipenv sync --dev
  1. Activate the virtual env and enter the pipenv shell:
pipenv shell
  1. Run the spider:
scrapy crawl cinoh_city_council -O test_output.csv
  1. Monitor the output and ensure no errors are raised.

  2. Inspect test_output.csv to ensure the data looks valid.

Are there any smells or added technical debt to note?

The one thing complex about this new parsing routine is the amount of links that need to be parsed. Right now they are all just put into an array like how the guide directs, but this may not be optimal or helpful. Some sort of link sorting method could help make that data more useful.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new spider for scraping meeting information from the Cincinnati City Council's Legistar calendar.
    • Added a JSON file containing comprehensive meeting details, including dates, times, locations, and relevant links.
  • Tests

    • Implemented a suite of unit tests for the new spider, validating various attributes of the parsed meeting data, including counts, titles, and statuses.

Copy link
Contributor

coderabbitai bot commented Oct 28, 2024

Walkthrough

A new spider class named CinohCityCouncilSpider has been added to scrape meeting information from the Cincinnati City Council's Legistar calendar. This class includes methods for parsing meeting details, including classification, location, and relevant links. Additionally, a JSON file containing sample meeting data has been introduced, along with a test suite for the spider, validating various aspects of the parsed meeting items using the pytest framework.

Changes

File Change Summary
city_scrapers/spiders/cinoh_city_council.py Added CinohCityCouncilSpider class with methods for parsing meeting data from Legistar.
tests/files/cinoh_city_council.json Added a JSON file containing meeting data for Cincinnati City Council.
tests/test_cinoh_city_council.py Introduced unit tests for CinohCityCouncilSpider, verifying various attributes of parsed meetings.

Poem

In the city where council meets,
A spider weaves its web of feats.
With data fresh, it spins and twirls,
Gathering news for boys and girls.
Meetings in JSON, neat and bright,
A hop and a skip, all feels just right! 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vsspnkkvr vsspnkkvr changed the title Scraper and Tests Build: Cinoh City Council Scraper and Tests Build: Cincinnati City Council Oct 28, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Outside diff range and nitpick comments (3)
tests/test_cinoh_city_council.py (1)

23-75: Add docstrings to test functions

Adding docstrings to test functions would improve test clarity and documentation.

Example for one test:

 def test_count():
+    """Verify the correct number of meetings are parsed from the test data."""
     assert len(parsed_items) == 21
city_scrapers/spiders/cinoh_city_council.py (2)

11-12: Remove unused link_types attribute

The link_types attribute is defined but is empty and doesn't appear to be used in the spider. If it's not required, consider removing it to clean up the code.

Apply this change to remove the unused attribute:

     # Add the titles of any links not included in the scraped results
-    link_types = []

15-20: Update docstring to reflect actual methods used

The docstring refers to _parse_title and _parse_start methods, which are not defined in this spider. Update the docstring to accurately reflect the implemented methods.

Apply this change to correct the docstring:

         """
-        `parse_legistar` should always `yield` Meeting items.
-
-        Change the `_parse_title`, `_parse_start`, etc methods to fit your scraping
-        needs.
+        Parses Legistar events and yields Meeting items.
+
+        Uses helper methods `_parse_classification`, `_parse_location`, and `_parse_links`
+        to extract meeting details.
         """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 975f98e and cd611e5.

📒 Files selected for processing (3)
  • city_scrapers/spiders/cinoh_city_council.py (1 hunks)
  • tests/files/cinoh_city_council.json (1 hunks)
  • tests/test_cinoh_city_council.py (1 hunks)
🧰 Additional context used
🪛 Ruff
tests/test_cinoh_city_council.py

6-6: city_scrapers_core.constants.COMMITTEE imported but unused

Remove unused import: city_scrapers_core.constants.COMMITTEE

(F401)

🔇 Additional comments (3)
tests/test_cinoh_city_council.py (1)

63-66: 🛠️ Refactor suggestion

Enhance link testing coverage

Given the PR's concern about handling multiple links, consider adding test cases for meetings with multiple links to ensure proper handling.

Add test cases for multiple links:

def test_links_multiple():
    # Test case for an item with multiple links
    multi_link_item = [item for item in parsed_items if len(item["links"]) > 1]
    if multi_link_item:
        assert isinstance(multi_link_item[0]["links"], list)
        assert all(
            {"title", "href"}.issubset(link.keys())
            for link in multi_link_item[0]["links"]
        )

Let's verify if there are any items with multiple links:

tests/files/cinoh_city_council.json (2)

1-1: Comprehensive test fixture with good coverage.

The test data includes a good variety of scenarios including:

  • Regular meetings
  • Special sessions
  • Cancelled meetings
  • Various committees
  • Different availability states for agendas/minutes

1-1: Verify the date format parsing in the scraper.

The test data uses MM/DD/YYYY format for dates. Ensure that the scraper correctly handles this format and converts it to a consistent internal representation.

✅ Verification successful

The date format parsing appears to be handled correctly

Based on the test file (tests/test_cinoh_city_council.py), the scraper correctly parses dates from MM/DD/YYYY format into proper datetime objects. The test case verifies that "10/30/2024" is correctly parsed into datetime(2024, 10, 30, 14, 0). Since this is inherited from the LegistarSpider base class (a core component), and the tests are passing, the date parsing implementation is working as expected.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for date parsing logic in the scraper
ast-grep --pattern 'def parse_$_($$$) {
  $$$
  date$$$
  $$$
}'

Length of output: 66


Script:

#!/bin/bash
# Search for Python files that might contain date parsing logic
fd -e py | xargs rg -l "datetime|strptime|date"

# Then search for specific date parsing patterns
rg -A 5 -B 5 "strptime|datetime.strptime|parse.*date"

Length of output: 723


Script:

#!/bin/bash
# Let's find the actual scraper file that processes these dates
fd -e py | xargs rg -l "cincinnati|cinoh"

# Then examine its content for date handling
rg -A 10 -B 10 "class Cin.*Spider"

Length of output: 1451


Script:

#!/bin/bash
# Let's examine the LegistarSpider parent class since it's handling the date parsing
rg -l "class LegistarSpider"

# And then look at its implementation
rg -A 20 -B 5 "class LegistarSpider"

Length of output: 69

tests/test_cinoh_city_council.py Show resolved Hide resolved
tests/test_cinoh_city_council.py Show resolved Hide resolved
tests/test_cinoh_city_council.py Show resolved Hide resolved
tests/test_cinoh_city_council.py Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
[{"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/30/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229949&GUID=40029B38-4ED1-4770-8B4F-E76E6D0FE583"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Healthy Neighborhoods", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47057&GUID=94D4079D-9CEF-48C0-9FC8-6B42FE58BC0C"}, "Meeting Date": "10/29/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230062&GUID=DEBE1F8D-8F23-470B-8BBF-9C0B3F21E2C7"}, "Meeting Time": "12:30 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Public Safety & Governance", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47056&GUID=ADCB91E2-9BDF-482F-AAF6-6D5EF2A040FB"}, "Meeting Date": "10/29/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229955&GUID=3D92E418-E420-456D-972A-85F33C138E1F"}, "Meeting Time": "9:30 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Budget and Finance Committee", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38077&GUID=207ADF11-05C1-4163-AF58-E1C0119342BC"}, "Meeting Date": "10/28/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229952&GUID=B1C0FDFB-15D4-411F-A817-A25E7B07ADFA"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/23/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229959&GUID=47066CAE-AF74-45BF-914A-830667219A0C"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Equitable Growth & Housing", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47053&GUID=2F284759-56BA-4D9D-B640-E02600D83497"}, "Meeting Date": "10/22/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230014&GUID=D3A25E8D-7EBB-4BD2-9930-29B6CCB3D68D"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Climate, Environment & Infrastructure", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47054&GUID=05A8515F-E080-4889-AF46-16001666DA90"}, "Meeting Date": "10/22/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230015&GUID=739A5464-04D7-41C4-B470-CE3E10B47462"}, "Meeting Time": "10:00 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Budget and Finance Committee", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38077&GUID=207ADF11-05C1-4163-AF58-E1C0119342BC"}, "Meeting Date": "10/21/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229951&GUID=7B19DEC4-C430-4CD3-B462-BD6CBE4ABD49"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": "Meeting\u00a0details", "Agenda": "Not\u00a0available", "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/17/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1235477&GUID=0CC17DD2-8A13-4EC6-A533-F86D41F010D3"}, "Meeting Time": "12:30 PM", "Meeting Location": "Council Chambers, Room 300 SPECIAL SESSION", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1235477&GUID=0CC17DD2-8A13-4EC6-A533-F86D41F010D3&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1235477&GUID=0CC17DD2-8A13-4EC6-A533-F86D41F010D3"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1235477&GUID=0CC17DD2-8A13-4EC6-A533-F86D41F010D3"}, "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/16/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229958&GUID=94FF187B-82AF-490B-87EE-C9ECF62EBF84"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229958&GUID=94FF187B-82AF-490B-87EE-C9ECF62EBF84&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229958&GUID=94FF187B-82AF-490B-87EE-C9ECF62EBF84"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229958&GUID=94FF187B-82AF-490B-87EE-C9ECF62EBF84"}, "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Healthy Neighborhoods", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47057&GUID=94D4079D-9CEF-48C0-9FC8-6B42FE58BC0C"}, "Meeting Date": "10/15/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913"}, "Meeting Time": "12:30 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1232596&GUID=1CAC132B-2224-4A15-9F79-058CF4F3E913"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Public Safety & Governance", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47056&GUID=ADCB91E2-9BDF-482F-AAF6-6D5EF2A040FB"}, "Meeting Date": "10/15/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47"}, "Meeting Time": "9:30 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1229954&GUID=F227C97F-CA60-4F4B-8473-B93864250B47"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Budget and Finance Committee", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38077&GUID=207ADF11-05C1-4163-AF58-E1C0119342BC"}, "Meeting Date": "10/14/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229950&GUID=C9733291-6F67-4456-B507-4FD3D037B229"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300 NOTICE OF CANCELLATION", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229950&GUID=C9733291-6F67-4456-B507-4FD3D037B229&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229950&GUID=C9733291-6F67-4456-B507-4FD3D037B229"}, "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/9/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1229957&GUID=0BEEEA8D-EDFB-4D14-810B-C8303A53436C"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Equitable Growth & Housing", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47053&GUID=2F284759-56BA-4D9D-B640-E02600D83497"}, "Meeting Date": "10/8/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1230013&GUID=4C858055-B215-47C4-8792-B26EEF66EFE5"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Climate, Environment & Infrastructure", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47054&GUID=05A8515F-E080-4889-AF46-16001666DA90"}, "Meeting Date": "10/8/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9"}, "Meeting Time": "10:00 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1230012&GUID=6D6EACB3-C83A-4E42-890B-D4A9071298F9"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Budget and Finance Committee", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38077&GUID=207ADF11-05C1-4163-AF58-E1C0119342BC"}, "Meeting Date": "10/7/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199"}, "Meeting Time": "1:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1229948&GUID=CD6166C2-61D6-4722-AA2F-9B5264EA6199"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Cincinnati City Council", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=38076&GUID=1CA48415-BFFD-4857-8A93-48AA89BD31C6"}, "Meeting Date": "10/2/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4"}, "Meeting Time": "2:00 PM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4"}, "Agenda Packet": {"label": "Agenda\u00a0Packet", "url": "https://cincinnatioh.legistar.com/View.ashx?M=PA&ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4"}, "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1229956&GUID=C4B25E37-661F-44A4-8E06-0554A1864CA4"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Healthy Neighborhoods", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47057&GUID=94D4079D-9CEF-48C0-9FC8-6B42FE58BC0C"}, "Meeting Date": "10/1/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1231940&GUID=D11B7136-1C39-474F-AC7A-ADF12060DA94"}, "Meeting Time": "5:30 PM", "Meeting Location": "Sayler Park Recreation Center, 6720 Home City Avenue, Cincinnati, Ohio 45233 SPECIAL MEETING", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1231940&GUID=D11B7136-1C39-474F-AC7A-ADF12060DA94&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1231940&GUID=D11B7136-1C39-474F-AC7A-ADF12060DA94"}, "Agenda Packet": "Not\u00a0available", "Minutes": {"label": "Minutes", "url": "https://cincinnatioh.legistar.com/View.ashx?M=M&ID=1231940&GUID=D11B7136-1C39-474F-AC7A-ADF12060DA94"}, "Video": "Not\u00a0available"}, {"Name": {"label": "Healthy Neighborhoods", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47057&GUID=94D4079D-9CEF-48C0-9FC8-6B42FE58BC0C"}, "Meeting Date": "10/1/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229960&GUID=991B9783-6CA2-4AE3-BC94-64020D6263E6"}, "Meeting Time": "12:30 PM", "Meeting Location": "Council Chambers, Room 300 NOTICE OF CANCELLATION", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229960&GUID=991B9783-6CA2-4AE3-BC94-64020D6263E6&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229960&GUID=991B9783-6CA2-4AE3-BC94-64020D6263E6"}, "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}, {"Name": {"label": "Public Safety & Governance", "url": "https://cincinnatioh.legistar.com/DepartmentDetail.aspx?ID=47056&GUID=ADCB91E2-9BDF-482F-AAF6-6D5EF2A040FB"}, "Meeting Date": "10/1/2024", "iCalendar": {"url": "https://cincinnatioh.legistar.com/View.ashx?M=IC&ID=1229953&GUID=88A4DE50-7828-449F-8DAE-6B452906BBF3"}, "Meeting Time": "9:30 AM", "Meeting Location": "Council Chambers, Room 300", "Meeting Details": {"label": "Meeting\u00a0details", "url": "https://cincinnatioh.legistar.com/MeetingDetail.aspx?ID=1229953&GUID=88A4DE50-7828-449F-8DAE-6B452906BBF3&Options=info|&Search="}, "Agenda": {"label": "Agenda", "url": "https://cincinnatioh.legistar.com/View.ashx?M=A&ID=1229953&GUID=88A4DE50-7828-449F-8DAE-6B452906BBF3"}, "Agenda Packet": "Not\u00a0available", "Minutes": "Not\u00a0available", "Video": "Not\u00a0available"}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize meeting location format.

The meeting location field currently mixes the base location with status information (e.g., "NOTICE OF CANCELLATION", "SPECIAL SESSION"). Consider separating these into distinct fields for better parsing and consistency.

- "Meeting Location": "Council Chambers, Room 300 NOTICE OF CANCELLATION",
+ "Meeting Location": "Council Chambers, Room 300",
+ "Meeting Status": "CANCELLED",

Committable suggestion was skipped due to low confidence.


🛠️ Refactor suggestion

Consider standardizing "Not available" values.

Currently, unavailable items are represented as string "Not\u00a0available". Consider using null values instead, which is more idiomatic in JSON and allows for easier programmatic handling.

- "Agenda": "Not\u00a0available",
+ "Agenda": null,

Committable suggestion was skipped due to low confidence.

city_scrapers/spiders/cinoh_city_council.py Show resolved Hide resolved
Comment on lines 6 to 10
class CinohCityCouncilSpider(LegistarSpider):
name = "cinoh_city_council"
agency = "Cincinnati City Council"
timezone = "America/Chicago"
start_urls = ["https://cincinnatioh.legistar.com/Calendar.aspx"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Mismatch between PR objectives and code – City mismatch

The code is implementing a spider for "Cincinnati City Council" as indicated by the agency, start_urls, name, and class name. However, the PR objectives mention updating the scraper for Cleveland City Council. Please verify and update the code to align with the PR objectives.

Apply the following changes to update the code:

-class CinohCityCouncilSpider(LegistarSpider):
+class ClevelandCityCouncilSpider(LegistarSpider):

     name = "cinoh_city_council"
-    agency = "Cincinnati City Council"
+    name = "cle_city_council"
+    agency = "Cleveland City Council"
-    timezone = "America/Chicago"
+    timezone = "America/New_York"
-    start_urls = ["https://cincinnatioh.legistar.com/Calendar.aspx"]
+    start_urls = ["https://clevelandoh.legistar.com/Calendar.aspx"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class CinohCityCouncilSpider(LegistarSpider):
name = "cinoh_city_council"
agency = "Cincinnati City Council"
timezone = "America/Chicago"
start_urls = ["https://cincinnatioh.legistar.com/Calendar.aspx"]
class ClevelandCityCouncilSpider(LegistarSpider):
name = "cle_city_council"
agency = "Cleveland City Council"
timezone = "America/New_York"
start_urls = ["https://clevelandoh.legistar.com/Calendar.aspx"]

Comment on lines 55 to 68
if obj.get("Name"):
links.append({"title": "meeting page", "href": obj["Name"]["url"]})
elif obj.get("iCalendar"):
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
elif obj.get("Meeting Details"):
links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]})
elif obj.get("Agenda"):
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
elif obj.get("Agenda Packet"):
links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]})
elif obj.get("Minutes"):
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
elif obj.get("Video"):
links.append({"title": "Video", "href": obj["Video"]["url"]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Modify link parsing to include all available links

Currently, the _parse_links method uses elif statements, so only the first available link is added. To include all available links, change elif to if.

Apply this change to include all relevant links:

         links = []
         if obj.get("Name") and obj["Name"].get("url"):
             links.append({"title": "Meeting Page", "href": obj["Name"]["url"]})
-        elif obj.get("iCalendar") and obj["iCalendar"].get("url"):
+        if obj.get("iCalendar") and obj["iCalendar"].get("url"):
             links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
-        elif obj.get("Meeting Details") and obj["Meeting Details"].get("url"):
+        if obj.get("Meeting Details") and obj["Meeting Details"].get("url"):
             links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]})
-        elif obj.get("Agenda") and obj["Agenda"].get("url"):
+        if obj.get("Agenda") and obj["Agenda"].get("url"):
             links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
-        elif obj.get("Agenda Packet") and obj["Agenda Packet"].get("url"):
+        if obj.get("Agenda Packet") and obj["Agenda Packet"].get("url"):
             links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]})
-        elif obj.get("Minutes") and obj["Minutes"].get("url"):
+        if obj.get("Minutes") and obj["Minutes"].get("url"):
             links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
-        elif obj.get("Video") and obj["Video"].get("url"):
+        if obj.get("Video") and obj["Video"].get("url"):
             links.append({"title": "Video", "href": obj["Video"]["url"]})
         return links
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if obj.get("Name"):
links.append({"title": "meeting page", "href": obj["Name"]["url"]})
elif obj.get("iCalendar"):
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
elif obj.get("Meeting Details"):
links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]})
elif obj.get("Agenda"):
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
elif obj.get("Agenda Packet"):
links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]})
elif obj.get("Minutes"):
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
elif obj.get("Video"):
links.append({"title": "Video", "href": obj["Video"]["url"]})
if obj.get("Name"):
links.append({"title": "meeting page", "href": obj["Name"]["url"]})
if obj.get("iCalendar"):
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
if obj.get("Meeting Details"):
links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]})
if obj.get("Agenda"):
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
if obj.get("Agenda Packet"):
links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]})
if obj.get("Minutes"):
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
if obj.get("Video"):
links.append({"title": "Video", "href": obj["Video"]["url"]})

city_scrapers/spiders/cinoh_city_council.py Outdated Show resolved Hide resolved
Copy link
Contributor

@cruznunez cruznunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here!

I would improve the _parse_links method so it accurately captures all available links. Also, I would add another test case for the COMMITTEE classification. The docstrings could also use some updating instead of just being the generated string. And finally, the PR description talks about Cleveland City Council but this PR is supposed to be for Cincinnati City Council I believe. Double check to make sure everything is correct there.

city_scrapers/spiders/cinoh_city_council.py Outdated Show resolved Hide resolved
city_scrapers/spiders/cinoh_city_council.py Outdated Show resolved Hide resolved
tests/test_cinoh_city_council.py Show resolved Hide resolved
Copy link

@LienDang LienDang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are issues with "timezone" and "status". Please double check!

class CinohCityCouncilSpider(LegistarSpider):
name = "cinoh_city_council"
agency = "Cincinnati City Council"
timezone = "America/Chicago"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check "what is the correct timezone for Cincinnati city"?

source=self.legistar_source(obj),
)

meeting["status"] = self._get_status(meeting)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the meeting location contains the text "NOTICE OF CANCELLATION", for example the last meeting of the year 2024 Equitable Growth & Housing on 12/31/2024, meaning that this meeting is cancelled. So you need to override the method _parse_status(meeting) instead of using the inherited method self._get_status(meeting) to return status "CANCELLED"

@cruznunez cruznunez requested a review from lamle-ea January 7, 2025 18:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
city_scrapers/spiders/cinoh_city_council.py (3)

20-32: ⚠️ Potential issue

Add error handling for missing keys in meeting data.

Direct dictionary access without safety checks could raise KeyError exceptions.

Apply these changes to safely handle missing data:

             meeting = Meeting(
-                title=obj["Name"]["label"],
+                title=obj.get("Name", {}).get("label", ""),
                 description="",
                 classification=self._parse_classification(obj),
                 start=self.legistar_start(obj),
                 end=None,
                 all_day=False,
                 time_notes="",
                 status=self._parse_status(obj),
                 location=self._parse_location(obj),
                 links=self._parse_links(obj),
-                source=self.legistar_source(obj),
+                source=self.legistar_source(obj) or "",
             )

50-55: ⚠️ Potential issue

Add error handling for location parsing.

Direct access to obj["Meeting Location"] could raise KeyError.

     def _parse_location(self, obj):
-        room = obj["Meeting Location"]
+        room = obj.get("Meeting Location", "Location not provided")
         return {
             "address": "801 Plum St. Cincinnati, OH 45202",
-            "name": f"{room}"
+            "name": room
         }

57-74: ⚠️ Potential issue

Fix link parsing to include all available links.

The current implementation using separate if statements with return only captures the first matching link. This means potentially important links are being missed.

     def _parse_links(self, obj):
-        
-        if obj.get("Name"):
-            return [{"title": "meeting page", "href": obj["Name"]["url"]}]
-        if obj.get("iCalendar"):
-            return [{"title": "iCalendar", "href": obj["iCalendar"]["url"]}]
-        if obj.get("Meeting Details"):
-            return [{"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}]
-        if obj.get("Agenda"):
-            return [{"title": "Agenda", "href": obj["Agenda"]["url"]}]
-        if obj.get("Agenda Packet"):
-            return [{"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}]
-        if obj.get("Minutes"):
-            return [{"title": "Minutes", "href": obj["Minutes"]["url"]}]
-        if obj.get("Video"):
-            return [{"title": "Video", "href": obj["Video"]["url"]}]
-        
-
+        links = []
+        if obj.get("Name", {}).get("url"):
+            links.append({"title": "Meeting Page", "href": obj["Name"]["url"]})
+        if obj.get("iCalendar", {}).get("url"):
+            links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
+        if obj.get("Meeting Details", {}).get("url"):
+            links.append({"title": "Meeting Details", "href": obj["Meeting Details"]["url"]})
+        if obj.get("Agenda", {}).get("url"):
+            links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
+        if obj.get("Agenda Packet", {}).get("url"):
+            links.append({"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]})
+        if obj.get("Minutes", {}).get("url"):
+            links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
+        if obj.get("Video", {}).get("url"):
+            links.append({"title": "Video", "href": obj["Video"]["url"]})
+        return links
🧹 Nitpick comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)

13-18: Improve docstring clarity and fix typos.

The docstring could be more informative and contains typos:

  • "calander" should be "calendar"
  • Add information about the return type and parameters

Update the docstring to:

-        """
-        Parse upcoming and past meetings from the Cincinnati City Council meetings table.
-
-        Oftentimes, the columns: meeting details, agenda, minutes, and video are left blank on the calander
-        but when they are, they are in the form of links. 
-        """
+        """Parse upcoming and past meetings from the Cincinnati City Council meetings table.
+
+        Args:
+            response: JSON response from Legistar API containing meeting data
+
+        Yields:
+            Meeting: Scraped meeting items with details including title, time, location, and links.
+            
+        Note:
+            The columns (meeting details, agenda, minutes, and video) are sometimes empty on the calendar,
+            but when present, they contain links to the respective resources.
+        """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd611e5 and d508fb9.

📒 Files selected for processing (2)
  • city_scrapers/spiders/cinoh_city_council.py (1 hunks)
  • tests/test_cinoh_city_council.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_cinoh_city_council.py
🔇 Additional comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)

6-10: LGTM! Configuration is correct.

The spider is properly configured with the correct timezone for Cincinnati and the appropriate Legistar calendar URL.

Comment on lines 44 to 48
def _parse_status(self, obj):
if obj["Meeting Location"] == "Council Chambers, Room 300 NOTICE OF CANCELLATION":
return "cancelled"
else:
return "tentative"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance cancellation detection logic.

The current implementation only checks for one specific cancellation notice format. Consider:

  1. Case-insensitive matching
  2. Different cancellation notice formats
  3. Default to using the inherited self._get_status() for other cases
     def _parse_status(self, obj):
-        if obj["Meeting Location"] == "Council Chambers, Room 300 NOTICE OF CANCELLATION":
+        location = obj.get("Meeting Location", "").lower()
+        if "notice of cancellation" in location or "canceled" in location:
             return "cancelled"
-        else:
-            return "tentative"
+        return self._get_status(obj)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _parse_status(self, obj):
if obj["Meeting Location"] == "Council Chambers, Room 300 NOTICE OF CANCELLATION":
return "cancelled"
else:
return "tentative"
def _parse_status(self, obj):
location = obj.get("Meeting Location", "").lower()
if "notice of cancellation" in location or "canceled" in location:
return "cancelled"
return self._get_status(obj)

Copy link
Contributor

@cruznunez cruznunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have some problems with the links parsing. Please fix.

city_scrapers/spiders/cinoh_city_council.py Outdated Show resolved Hide resolved
Copy link
Contributor

@lamle-ea lamle-ea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the code using the black code formatter to fix the styling issues.

Screenshot 2025-01-16 at 16 07 29

else:
return COMMITTEE

def _parse_status(self, obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the meeting date is earlier than the current date, return the status PASSED.

links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})

if obj.get("Meeting Details") == "Meeting\u00a0details":
links.append({"title": "Meeting Details", "href": "Not Available"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a document is unavailable, exclude it from the list of links. We can refactor this block by creating an array of possible document names and iterating through them, like this:

for document_name in ["Meeting Details", "Agenda", ...]:
   document = obj.get(document_name)
   if document.get("url"):
       links.append({"title": document_name, "href": document["url"]})

Copy link
Contributor

@cruznunez cruznunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments have been fixed but we still have failing lint issues. Runpipenv run black . and other lint commands to fix the issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)

22-34: ⚠️ Potential issue

Add error handling for missing keys in meeting data.

Direct dictionary access without checking for key existence could raise KeyError exceptions.

Apply these changes to safely handle missing keys:

             meeting = Meeting(
-                title=obj["Name"]["label"],
+                title=obj.get("Name", {}).get("label", "No title provided"),
                 description="",
                 classification=self._parse_classification(obj),
                 start=self.legistar_start(obj),
                 end=None,
                 all_day=False,
                 time_notes="",
                 status=self._parse_status(obj),
                 location=self._parse_location(obj),
                 links=self._parse_links(obj),
-                source=self.legistar_source(obj),
+                source=self.legistar_source(obj) if obj else "",
             )
🧹 Nitpick comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)

14-20: Enhance method documentation.

The docstring could be more descriptive about the method's parameters, return value, and specific behavior.

Apply this change:

     def parse_legistar(self, response):
         """
-        Parse upcoming and past meetings from the Cincinnati City Council meetings table.
-
-        Oftentimes, the columns: meeting details, agenda, minutes, and video are left blank on the calander
-        but when they are, they are in the form of links.
+        Parse meeting information from the Legistar API response.
+
+        Args:
+            response: JSON response from Legistar API containing meeting data
+
+        Yields:
+            Meeting: Scraped meeting items with details including title, date, location,
+                    status, and associated document links
+
+        Note:
+            Meeting documents (agenda, minutes, video) may be unavailable or present as links
+            in the calendar view.
         """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d508fb9 and 3cb02fb.

📒 Files selected for processing (2)
  • city_scrapers/spiders/cinoh_city_council.py (1 hunks)
  • tests/test_cinoh_city_council.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_cinoh_city_council.py
🧰 Additional context used
🪛 Ruff (0.8.2)
city_scrapers/spiders/cinoh_city_council.py

75-75: Use obj.get("Meeting Details") != "Meeting\xa0details" instead of not obj.get("Meeting Details") == "Meeting\xa0details"

Replace with != operator

(SIM201)


80-80: Use obj.get("Agenda") != "Not\xa0available" instead of not obj.get("Agenda") == "Not\xa0available"

Replace with != operator

(SIM201)


83-83: Use obj.get("Agenda Packet") != "Not\xa0available" instead of not obj.get("Agenda Packet") == "Not\xa0available"

Replace with != operator

(SIM201)


88-88: Use obj.get("Minutes") != "Not\xa0available" instead of not obj.get("Minutes") == "Not\xa0available"

Replace with != operator

(SIM201)


91-91: Use obj.get("Video") != "Not\xa0available" instead of not obj.get("Video") == "Not\xa0available"

Replace with != operator

(SIM201)

🔇 Additional comments (2)
city_scrapers/spiders/cinoh_city_council.py (2)

8-12: LGTM! Configuration is accurate.

The timezone is correctly set to "America/New_York" for Cincinnati, and other configuration parameters are properly defined.


46-60: 🛠️ Refactor suggestion

Enhance meeting status detection.

The status detection needs improvement to handle:

  1. Case-insensitive cancellation notice matching
  2. Different cancellation notice formats
  3. Past meetings (already implemented)

Apply this change:

     def _parse_status(self, obj):
         date = obj["Meeting Date"]
         parsed_date = parse(date, fuzzy=True)
 
-        if (
-            obj["Meeting Location"]
-            == "Council Chambers, Room 300 NOTICE OF CANCELLATION"
-        ):
+        location = obj.get("Meeting Location", "").lower()
+        if "notice of cancellation" in location or "canceled" in location:
             return "cancelled"
         elif parsed_date < datetime.today():
             return "passed"
         else:
             return "tentative"

Likely invalid or redundant comment.

Comment on lines +65 to +94
def _parse_links(self, obj):

links = []

if obj.get("Name"):
links.append({"title": "meeting page", "href": obj["Name"]["url"]})

if obj.get("iCalendar"):
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})

if not obj.get("Meeting Details") == "Meeting\u00a0details":
links.append(
{"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}
)

if not obj.get("Agenda") == "Not\u00a0available":
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})

if not obj.get("Agenda Packet") == "Not\u00a0available":
links.append(
{"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}
)

if not obj.get("Minutes") == "Not\u00a0available":
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})

if not obj.get("Video") == "Not\u00a0available":
links.append({"title": "Video", "href": obj["Video"]["url"]})

return links
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor link parsing for better maintainability and reliability.

The current implementation has several issues:

  1. Inefficient comparison operators (using not == instead of !=)
  2. Repetitive code structure
  3. Unsafe dictionary access

Apply this refactoring to improve the code:

     def _parse_links(self, obj):
         links = []
-
-        if obj.get("Name"):
-            links.append({"title": "meeting page", "href": obj["Name"]["url"]})
-
-        if obj.get("iCalendar"):
-            links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
-
-        if not obj.get("Meeting Details") == "Meeting\u00a0details":
-            links.append(
-                {"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}
-            )
-
-        if not obj.get("Agenda") == "Not\u00a0available":
-            links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
-
-        if not obj.get("Agenda Packet") == "Not\u00a0available":
-            links.append(
-                {"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}
-            )
-
-        if not obj.get("Minutes") == "Not\u00a0available":
-            links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
-
-        if not obj.get("Video") == "Not\u00a0available":
-            links.append({"title": "Video", "href": obj["Video"]["url"]})
+        documents = {
+            "Name": "meeting page",
+            "iCalendar": "iCalendar",
+            "Meeting Details": "Meeting Details",
+            "Agenda": "Agenda",
+            "Agenda Packet": "Agenda Packet",
+            "Minutes": "Minutes",
+            "Video": "Video"
+        }
+
+        for key, title in documents.items():
+            doc = obj.get(key, {})
+            if isinstance(doc, dict) and doc.get("url"):
+                if key in ["Meeting Details"] and doc.get("label") == "Meeting\u00a0details":
+                    continue
+                if doc.get("label") != "Not\u00a0available":
+                    links.append({"title": title, "href": doc["url"]})
 
         return links

This refactoring:

  1. Uses a dictionary to define document types
  2. Handles missing keys safely
  3. Properly checks for unavailable documents
  4. Reduces code duplication
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _parse_links(self, obj):
links = []
if obj.get("Name"):
links.append({"title": "meeting page", "href": obj["Name"]["url"]})
if obj.get("iCalendar"):
links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
if not obj.get("Meeting Details") == "Meeting\u00a0details":
links.append(
{"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}
)
if not obj.get("Agenda") == "Not\u00a0available":
links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
if not obj.get("Agenda Packet") == "Not\u00a0available":
links.append(
{"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}
)
if not obj.get("Minutes") == "Not\u00a0available":
links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
if not obj.get("Video") == "Not\u00a0available":
links.append({"title": "Video", "href": obj["Video"]["url"]})
return links
def _parse_links(self, obj):
links = []
documents = {
"Name": "meeting page",
"iCalendar": "iCalendar",
"Meeting Details": "Meeting Details",
"Agenda": "Agenda",
"Agenda Packet": "Agenda Packet",
"Minutes": "Minutes",
"Video": "Video"
}
for key, title in documents.items():
doc = obj.get(key, {})
if isinstance(doc, dict) and doc.get("url"):
if key in ["Meeting Details"] and doc.get("label") == "Meeting\u00a0details":
continue
if doc.get("label") != "Not\u00a0available":
links.append({"title": title, "href": doc["url"]})
return links
🧰 Tools
🪛 Ruff (0.8.2)

75-75: Use obj.get("Meeting Details") != "Meeting\xa0details" instead of not obj.get("Meeting Details") == "Meeting\xa0details"

Replace with != operator

(SIM201)


80-80: Use obj.get("Agenda") != "Not\xa0available" instead of not obj.get("Agenda") == "Not\xa0available"

Replace with != operator

(SIM201)


83-83: Use obj.get("Agenda Packet") != "Not\xa0available" instead of not obj.get("Agenda Packet") == "Not\xa0available"

Replace with != operator

(SIM201)


88-88: Use obj.get("Minutes") != "Not\xa0available" instead of not obj.get("Minutes") == "Not\xa0available"

Replace with != operator

(SIM201)


91-91: Use obj.get("Video") != "Not\xa0available" instead of not obj.get("Video") == "Not\xa0available"

Replace with != operator

(SIM201)

Copy link
Contributor

@lamle-ea lamle-ea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the CI checks are still failing. Before pushing your commits, please run these code quality checks locally:

pipenv run isort . --check-only  # Check import sorting
pipenv run black . --check       # Check code formatting
pipenv run flake8 .             # Check code style

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
city_scrapers/spiders/cinoh_city_council.py (4)

25-37: ⚠️ Potential issue

Add error handling for missing keys.

The current implementation could raise KeyError exceptions if the response data is malformed.

Apply this change to handle missing keys safely:

             meeting = Meeting(
-                title=obj["Name"]["label"],
+                title=obj.get("Name", {}).get("label", "No title provided"),
                 description="",
                 classification=self._parse_classification(obj),
                 start=self.legistar_start(obj),
                 end=None,
                 all_day=False,
                 time_notes="",
                 status=self._parse_status(obj),
                 location=self._parse_location(obj),
                 links=self._parse_links(obj),
                 source=self.legistar_source(obj),
             )

49-63: 🛠️ Refactor suggestion

Enhance cancellation detection logic.

The current implementation:

  1. Has unsafe key access
  2. Uses a very specific cancellation notice format
  3. Could miss other cancellation indicators
     def _parse_status(self, obj):
         date = obj["Meeting Date"]
         parsed_date = parse(date, fuzzy=True)
 
-        if (
-            obj["Meeting Location"]
-            == "Council Chambers, Room 300 NOTICE OF CANCELLATION"
-        ):
+        location = obj.get("Meeting Location", "").lower()
+        if "notice of cancellation" in location or "canceled" in location:
             return "cancelled"
         elif parsed_date < datetime.today():
             return "passed"
         else:
             return "tentative"

64-66: ⚠️ Potential issue

Add error handling for missing location.

The method could raise KeyError if the "Meeting Location" key is missing.

     def _parse_location(self, obj):
-        room = obj["Meeting Location"]
+        room = obj.get("Meeting Location", "No room provided")
         return {"address": "801 Plum St. Cincinnati, OH 45202", "name": f"{room}"}

68-97: 🛠️ Refactor suggestion

Refactor link parsing for better maintainability and reliability.

Current implementation has several issues:

  1. Unsafe dictionary access
  2. Inefficient comparison operators
  3. Repetitive code structure
     def _parse_links(self, obj):
         links = []
-
-        if obj.get("Name"):
-            links.append({"title": "meeting page", "href": obj["Name"]["url"]})
-
-        if obj.get("iCalendar"):
-            links.append({"title": "iCalendar", "href": obj["iCalendar"]["url"]})
-
-        if not obj.get("Meeting Details") == "Meeting\u00a0details":
-            links.append(
-                {"title": "Meeting Details", "href": obj["Meeting Details"]["url"]}
-            )
-
-        if not obj.get("Agenda") == "Not\u00a0available":
-            links.append({"title": "Agenda", "href": obj["Agenda"]["url"]})
-
-        if not obj.get("Agenda Packet") == "Not\u00a0available":
-            links.append(
-                {"title": "Agenda Packet", "href": obj["Agenda Packet"]["url"]}
-            )
-
-        if not obj.get("Minutes") == "Not\u00a0available":
-            links.append({"title": "Minutes", "href": obj["Minutes"]["url"]})
-
-        if not obj.get("Video") == "Not\u00a0available":
-            links.append({"title": "Video", "href": obj["Video"]["url"]})
+        documents = {
+            "Name": "meeting page",
+            "iCalendar": "iCalendar",
+            "Meeting Details": "Meeting Details",
+            "Agenda": "Agenda",
+            "Agenda Packet": "Agenda Packet",
+            "Minutes": "Minutes",
+            "Video": "Video"
+        }
+
+        for key, title in documents.items():
+            doc = obj.get(key, {})
+            if isinstance(doc, dict) and doc.get("url"):
+                if key in ["Meeting Details"] and doc.get("label") == "Meeting\u00a0details":
+                    continue
+                if doc.get("label") != "Not\u00a0available":
+                    links.append({"title": title, "href": doc["url"]})
 
         return links

This refactoring:

  1. Uses a dictionary to define document types
  2. Handles missing keys safely
  3. Properly checks for unavailable documents
  4. Reduces code duplication
🧰 Tools
🪛 Ruff (0.8.2)

78-78: Use obj.get("Meeting Details") != "Meeting\xa0details" instead of not obj.get("Meeting Details") == "Meeting\xa0details"

Replace with != operator

(SIM201)


83-83: Use obj.get("Agenda") != "Not\xa0available" instead of not obj.get("Agenda") == "Not\xa0available"

Replace with != operator

(SIM201)


86-86: Use obj.get("Agenda Packet") != "Not\xa0available" instead of not obj.get("Agenda Packet") == "Not\xa0available"

Replace with != operator

(SIM201)


91-91: Use obj.get("Minutes") != "Not\xa0available" instead of not obj.get("Minutes") == "Not\xa0available"

Replace with != operator

(SIM201)


94-94: Use obj.get("Video") != "Not\xa0available" instead of not obj.get("Video") == "Not\xa0available"

Replace with != operator

(SIM201)

🧹 Nitpick comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)

16-23: Enhance docstring with more details.

Add information about:

  • The method's parameters
  • The return value (yielded Meeting objects)
  • The structure of the response object
     def parse_legistar(self, response):
         """
-        Parse upcoming and past meetings from the
-        Cincinnati City Council meetings table.
+        Parse upcoming and past meetings from the Cincinnati City Council meetings table.
+
+        Args:
+            response: JSON response from Legistar API containing meeting data
+
+        Yields:
+            Meeting: Meeting object containing parsed meeting information
 
         Oftentimes, the columns: meeting details, agenda,
         minutes, and video are left blank on the calander
         but when they are, they are in the form of links.
         """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cb02fb and 221c31f.

📒 Files selected for processing (2)
  • city_scrapers/spiders/cinoh_city_council.py (1 hunks)
  • tests/test_cinoh_city_council.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_cinoh_city_council.py
🧰 Additional context used
🪛 Ruff (0.8.2)
city_scrapers/spiders/cinoh_city_council.py

78-78: Use obj.get("Meeting Details") != "Meeting\xa0details" instead of not obj.get("Meeting Details") == "Meeting\xa0details"

Replace with != operator

(SIM201)


83-83: Use obj.get("Agenda") != "Not\xa0available" instead of not obj.get("Agenda") == "Not\xa0available"

Replace with != operator

(SIM201)


86-86: Use obj.get("Agenda Packet") != "Not\xa0available" instead of not obj.get("Agenda Packet") == "Not\xa0available"

Replace with != operator

(SIM201)


91-91: Use obj.get("Minutes") != "Not\xa0available" instead of not obj.get("Minutes") == "Not\xa0available"

Replace with != operator

(SIM201)


94-94: Use obj.get("Video") != "Not\xa0available" instead of not obj.get("Video") == "Not\xa0available"

Replace with != operator

(SIM201)

🔇 Additional comments (1)
city_scrapers/spiders/cinoh_city_council.py (1)

9-13: LGTM! Configuration is accurate.

The timezone "America/New_York" is correct for Cincinnati, and the Legistar calendar URL is properly configured.

Comment on lines +43 to +48
def _parse_classification(self, obj):
if obj["Name"]["label"] == "Cincinnati City Council":
return CITY_COUNCIL
else:
return COMMITTEE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for missing keys in classification.

The method could raise KeyError if the "Name" or "label" keys are missing.

     def _parse_classification(self, obj):
-        if obj["Name"]["label"] == "Cincinnati City Council":
+        if obj.get("Name", {}).get("label") == "Cincinnati City Council":
             return CITY_COUNCIL
         else:
             return COMMITTEE
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _parse_classification(self, obj):
if obj["Name"]["label"] == "Cincinnati City Council":
return CITY_COUNCIL
else:
return COMMITTEE
def _parse_classification(self, obj):
if obj.get("Name", {}).get("label") == "Cincinnati City Council":
return CITY_COUNCIL
else:
return COMMITTEE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants