Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added UK Publisher The Sun #445

Merged
merged 15 commits into from
May 6, 2024
Merged

Added UK Publisher The Sun #445

merged 15 commits into from
May 6, 2024

Conversation

BorisKalika
Copy link
Contributor

No description provided.

@BorisKalika
Copy link
Contributor Author

black, isort, mypy and pytest all passed on my local machine.

Copy link
Collaborator

@addie9800 addie9800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this :) I found one thing that still needs to be added. If you run python -m scripts.generate_parser_test_files -p TheSun -oj after adding the subheadline selector, your testcase will stay the same and just update the extracted values.

class TheSunParser(ParserProxy):
class V1(BaseParser):
_summary_selector = CSSSelector("div[data-gu-name='standfirst'] p")
_paragraph_selector = CSSSelector("div.article__content > p")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some articles that also contain subheadlines, such as this one: https://www.thesun.co.uk/betting/21748039/best-monopoly-live-casinos/. It would be great, if you could also add a subheadline selector

Copy link
Contributor Author

@BorisKalika BorisKalika Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested, I added a subheadline selector and successfully executed python -m scripts.generate_parser_test_files -p TheSun -o

Copy link
Contributor Author

@BorisKalika BorisKalika Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tested black, isort, mypy and pytest. All of them passed on my local machine without any erros :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks a lot :)

Copy link
Contributor Author

@BorisKalika BorisKalika Apr 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I change everything that was requested or did I miss something :)?'

I think I might've misunderstood what the subheadline of this article is. Could you maybe point it out for me please? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I picked the wrong subheadline. I changed the subheadline selector and re-generated test files, executed pytest, black, isort and mypy. Pycharm tells me there are no file changes thus I can't push or commit test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed it and judging by the subheadline selector you chose, I think you got the correct idea. A subheadline in Fundus a line of text separating paragraphs into logical entities. For example in https://www.thesun.co.uk/news/27470413/ukraine-torpedo-submarine-black-sea-battle/ CAN IT BE REAL? would be considered a subheadline. In this case I would suggest something like this as the subheadline selector: div.article__content > h2.wp-block-heading

class TheSunParser(ParserProxy):
class V1(BaseParser):
_summary_selector = CSSSelector("div[data-gu-name='standfirst'] p")
_paragraph_selector = CSSSelector("div.article__content > p")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed it and judging by the subheadline selector you chose, I think you got the correct idea. A subheadline in Fundus a line of text separating paragraphs into logical entities. For example in https://www.thesun.co.uk/news/27470413/ukraine-torpedo-submarine-black-sea-battle/ CAN IT BE REAL? would be considered a subheadline. In this case I would suggest something like this as the subheadline selector: div.article__content > h2.wp-block-heading

class V1(BaseParser):
_summary_selector = CSSSelector("div[data-gu-name='standfirst'] p")
_paragraph_selector = CSSSelector("div.article__content > p")
_sub_headline_selector = CSSSelector("div.toplist_container__jpTyX thesun_container__fty3s > h2")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_sub_headline_selector = CSSSelector("div.toplist_container__jpTyX thesun_container__fty3s > h2")
_sub_headline_selector = CSSSelector("div.article__content > h2.wp-block-heading")

Copy link
Contributor Author

@BorisKalika BorisKalika Apr 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggested headline selector is failing 1 pytest test case :(

________________________________________________________________ TestParser.test_parsing[TheSun] _________________________________________________________________

self = <tests.test_parser.TestParser object at 0x1091b9970>, publisher = <UK.TheSun: 5>

    def test_parsing(self, publisher: PublisherEnum) -> None:
        comparative_data = load_test_case_data(publisher)
        html_mapping = load_html_test_file_mapping(publisher)
    
        for versioned_parser in publisher.parser:
            # validate json
            version_name = versioned_parser.__name__
            assert (
                version_data := comparative_data.get(version_name)
            ), f"Missing test data for parser version '{version_name}'"
    
            for key, value in version_data.items():
                if not value:
                    raise ValueError(
                        f"There is no value set for key '{key}' in the test JSON. "
                        f"Only complete articles should be used as test cases"
                    )
    
            # test coverage
            supported_attrs = set(versioned_parser.attributes().names)
            missing_attrs = attributes_required_to_cover & supported_attrs - set(version_data.keys())
            assert (
                not missing_attrs
            ), f"Test JSON for {version_name} does not cover the following attribute(s): {missing_attrs}"
    
            assert list(version_data.keys()) == sorted(
                attributes_required_to_cover & supported_attrs
            ), f"Test JSON for {version_name} is not in alphabetical order"
    
            assert (html := html_mapping.get(versioned_parser)), f"Missing test HTML for parser version {version_name}"
            # compare data
            extraction = versioned_parser().parse(html.content, "raise")
            for key, value in version_data.items():
>               assert value == extraction[key]
E               assert ArticleBody(s...s older."'))]) == ArticleBody(s...s older."'))])
E                 
E                 Omitting 1 identical items, use -vv to show
E                 Differing attributes:
E                 ['sections']
E                 
E                 Drill down into differing attribute sections:
E                   sections: [ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.', 'Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Inst...
E                 
E                 ...Full output truncated (5 lines hidden), use '-vv' to show

self = <tests.test_parser.TestParser object at 0x10527e5e0>, publisher = <UK.TheSun: 5>

    def test_parsing(self, publisher: PublisherEnum) -> None:
        comparative_data = load_test_case_data(publisher)
        html_mapping = load_html_test_file_mapping(publisher)
    
        for versioned_parser in publisher.parser:
            # validate json
            version_name = versioned_parser.__name__
            assert (
                version_data := comparative_data.get(version_name)
            ), f"Missing test data for parser version '{version_name}'"
    
            for key, value in version_data.items():
                if not value:
                    raise ValueError(
                        f"There is no value set for key '{key}' in the test JSON. "
                        f"Only complete articles should be used as test cases"
                    )
    
            # test coverage
            supported_attrs = set(versioned_parser.attributes().names)
            missing_attrs = attributes_required_to_cover & supported_attrs - set(version_data.keys())
            assert (
                not missing_attrs
            ), f"Test JSON for {version_name} does not cover the following attribute(s): {missing_attrs}"
    
            assert list(version_data.keys()) == sorted(
                attributes_required_to_cover & supported_attrs
            ), f"Test JSON for {version_name} is not in alphabetical order"
    
            assert (html := html_mapping.get(versioned_parser)), f"Missing test HTML for parser version {version_name}"
            # compare data
            extraction = versioned_parser().parse(html.content, "raise")
            for key, value in version_data.items():
>               assert value == extraction[key]
E               assert ArticleBody(summary=(), sections=[ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.', 'Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.', 'The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.', 'Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."'))]) == ArticleBody(summary=(), sections=[ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.')), ArticleSection(headline=('Who is Rebecca Cooke?',), paragraphs=('Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.')), ArticleSection(headline=('How long has she been dating Phil Foden?',), paragraphs=('The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.')), ArticleSection(headline=('How many children do couple have?',), paragraphs=('Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."'))])
E                 
E                 Matching attributes:
E                 ['summary']
E                 Differing attributes:
E                 ['sections']
E                 
E                 Drill down into differing attribute sections:
E                   sections: [ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.', 'Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.', 'The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.', 'Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."'))] != [ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.')), ArticleSection(headline=('Who is Rebecca Cooke?',), paragraphs=('Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.')), ArticleSection(headline=('How long has she been dating Phil Foden?',), paragraphs=('The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.')), ArticleSection(headline=('How many children do couple have?',), paragraphs=('Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."'))]
E                   At index 0 diff: ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.', 'Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.', 'The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.', 'Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."')) != ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.'))
E                   Right contains 3 more items, first extra item: ArticleSection(headline=('Who is Rebecca Cooke?',), paragraphs=('Rebecca Cooke is the long-term partner of Manchester ...has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.'))
E                   Full diff:
E                     [
E                   +  ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.', 'Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.', 'The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.', 'Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."')),
E                   -  ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.')),
E                   -  ArticleSection(headline=('Who is Rebecca Cooke?',), paragraphs=('Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.')),
E                   -  ArticleSection(headline=('How long has she been dating Phil Foden?',), paragraphs=('The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.')),
E                   -  ArticleSection(headline=('How many children do couple have?',), paragraphs=('Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."')),
E                     ]
E                 Full diff:
E                 - ArticleBody(summary=(), sections=[ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.')), ArticleSection(headline=('Who is Rebecca Cooke?',), paragraphs=('Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.')), ArticleSection(headline=('How long has she been dating Phil Foden?',), paragraphs=('The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.')), ArticleSection(headline=('How many children do couple have?',), paragraphs=('Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."'))])
E                 ?                                                                                                                                                                                                                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^             ^^^^^^^^^^^^^^^^^^         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                        ^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 + ArticleBody(summary=(), sections=[ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden.', 'And the couple have announced some great news with the pair expecting a new addition to the family.', 'Rebecca Cooke is the long-term partner of Manchester City midfielder Phil Foden.', 'Rebecca is thought to be 22 years old and the mother of two children with Phil.', 'She tends to keep out the spotlight and has her Instagram account currently set private, though it does seem to suggest that she goes by the nickname Becca.', 'The exact time at which they started dating is unknown, but they have been together since being teenagers.', 'At the age of 18 she became a mother to their son, Ronnie.', 'A fan account of the couple (@beccafodenx) on Instagram shows the two together, along with a closer look at the blonde bombshell.', 'Phil and Rebecca have a son called Ronnie, 4, and a daughter named True, 1.', 'In April 2024, the couple announced they are expecting a third child.', 'Speaking to Manchester City at the time of the birth of his son, Phil said: "I was there for the birth. I walked out of the room, gave it a little tear and then went back in like nothing happened.', '"I’m not one for crying in front of people. I like to be on my own, but I was there in the room, watched it happen and it was a special moment.', '"Your life changes."', 'He continued, speaking of the things he misses Ronnie doing due to football training: "There are things you miss when you’re not there because you’ve got an away game.', '"I was there when he started crawling, but I think I was in London when he started to walk.', '"Now he’s getting about and walking everywhere, so you have to have eyes in the back of your head or he starts running off.', '"It’s unfortunate to miss things like that but it’s a sacrifice that he’ll appreciate when he’s older."'))])
E                 ?                                                                                                                                                                                                                                                               ^^^             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         ^^^^^^^^^^^^^^^^                                                                                                                                                                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^                                                                                                                                                                                   ^^^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^  ^^^^^^

tests/test_parser.py:182: AssertionError
==================================================================== short test summary info =====================================================================
FAILED tests/test_parser.py::TestParser::test_parsing[TheSun] - assert ArticleBody(summary=(), sections=[ArticleSection(headline=(), paragraphs=('REBECCA Cooke is the childhood sweetheart of England footballer Phil Foden....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, but this is the expected behavior since you updated the selectors. If you run pytest now, the extracted content will differ from your test files you generated earlier. If you run python -m scripts.generate_parser_test_files -p TheSun -oj (make sure it's the -oj flag though) and run pytest again, everything should work just fine :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I am sorry for the inconvenience.

I committed and pushed the feature change && newly generated test :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries :)

addie9800
addie9800 previously approved these changes Apr 29, 2024
Copy link
Collaborator

@addie9800 addie9800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@addie9800
Copy link
Collaborator

While re-reviewing, I just noticed two minor details, so I updated them.

addie9800
addie9800 previously approved these changes May 2, 2024
MaxDall added 2 commits May 6, 2024 12:49
# Conflicts:
#	src/fundus/publishers/uk/__init__.py
@MaxDall MaxDall merged commit 43d5b46 into flairNLP:master May 6, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants