-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added UK Publisher The Sun #445
Changes from 1 commit
655c05a
cff75bc
a48c8da
a4de929
f206460
2168d6e
49bce7a
79a60ad
7c70d0f
c057b99
e2287fe
c7794bd
bb3c19e
b4cfb6f
34400fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -16,7 +16,7 @@ class TheSunParser(ParserProxy): | |||||
class V1(BaseParser): | ||||||
_summary_selector = CSSSelector("div[data-gu-name='standfirst'] p") | ||||||
_paragraph_selector = CSSSelector("div.article__content > p") | ||||||
_sub_headline_selector = CSSSelector("div.article__content > h1") | ||||||
_sub_headline_selector = CSSSelector("div.toplist_container__jpTyX thesun_container__fty3s > h2") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yes, but this is the expected behavior since you updated the selectors. If you run pytest now, the extracted content will differ from your test files you generated earlier. If you run There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are right, I am sorry for the inconvenience. I committed and pushed the feature change && newly generated test :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No worries :) |
||||||
|
||||||
@attribute | ||||||
def body(self) -> ArticleBody: | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some articles that also contain subheadlines, such as this one: https://www.thesun.co.uk/betting/21748039/best-monopoly-live-casinos/. It would be great, if you could also add a subheadline selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As requested, I added a subheadline selector and successfully executed
python -m scripts.generate_parser_test_files -p TheSun -o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tested black, isort, mypy and pytest. All of them passed on my local machine without any erros :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks a lot :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I change everything that was requested or did I miss something :)?'
I think I might've misunderstood what the subheadline of this article is. Could you maybe point it out for me please? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I picked the wrong subheadline. I changed the subheadline selector and re-generated test files, executed pytest, black, isort and mypy. Pycharm tells me there are no file changes thus I can't push or commit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reviewed it and judging by the subheadline selector you chose, I think you got the correct idea. A subheadline in Fundus a line of text separating paragraphs into logical entities. For example in https://www.thesun.co.uk/news/27470413/ukraine-torpedo-submarine-black-sea-battle/
CAN IT BE REAL?
would be considered a subheadline. In this case I would suggest something like this as the subheadline selector:div.article__content > h2.wp-block-heading