-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #8625: Split linters to run for a specific file extension #8656
Conversation
Hi, @Hudda. This pull request does not have a "CHANGELOG: ..." label as mentioned in the PR checkbox list. Please add this label. PRs without this label will not be merged. If you are unsure of which label to add, please ask the reviewers for guidance. Thanks! |
Assigning @kevinlee12 for the first-pass review of this pull request. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR! Even though this might alleviate the issue, it's better to use the semaphore method, see https://github.com/oppia/oppia/blob/develop/scripts/run_backend_tests.py#L255 (and generally how the backend tests are structured). Semaphores essentially use an OS level way to doing resource control that is most efficient to the OS. The work here will still be needed since each linting operation needs to be divided into distinct chunks. Also, it's worth keeping in mind that pylint and eslint most likely has some parallelization underneath the hood, so it's best to have those lint by themselves. let me know if you have any questions, I'm happy to scope them out in more detail if you'd like.
Just wanted to say that I agree with following the way we approach parallelization in run_backend_tests.py. One thing you might want to consider is to split the common "task running" logic that's currently in run_backend_tests.py out into a separate module. Then, both run_backend_tests.py and pre_commit_linter.py can use the code in that module to manage tasks, and you won't need to reimplement that logic from scratch in pre_commit_linter.py. |
Hi @Hudda, I'm going to mark this PR as stale because it hasn't had any updates for 7 days. If no further activity occurs within 4 days, it will be automatically closed so that others can take up the issue. |
@kevinlee12 PTAL! Also, pylint is running twice, I was not be able to find a reason for that. Can you help me with it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Hudda, this is moving in the right direction, but it needs more isolation built in. see my comments for details. thanks!
scripts/pre_commit_linter.py
Outdated
def _join_linting_process(linting_processes, result_queues, result_stdouts): | ||
"""Join process spawn off by _lint_all_files and capture the outputs.""" | ||
"""Join process spawn off by lint_all_files and capture the outputs.""" | ||
for process in linting_processes: | ||
process.join() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not being used any more, remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
scripts/semaphore_method.py
Outdated
@@ -0,0 +1,128 @@ | |||
# coding: utf-8 | |||
# | |||
# Copyright 2018 The Oppia Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Year :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
scripts/pre_commit_linter.py
Outdated
@@ -611,6 +626,8 @@ | |||
_STDOUT_LIST = multiprocessing.Manager().list() | |||
_FILES = multiprocessing.Manager().dict() | |||
|
|||
MAX_CONCURRENT_RUNS = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a global number, I'd recommend putting this in the semaphore_method
file. The max is whatever is in the original backend test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
scripts/pre_commit_linter.py
Outdated
all_messages += _join_linting_process( | ||
linting_processes, result_queues, stdout_queues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to kick off all of the linting processes at once, which is what you don't want to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
scripts/pre_commit_linter.py
Outdated
"""This function is used to check if node-eslint dependencies are | ||
installed and pass ESLint binary path and lint all the files(JS, Python, | ||
HTML, CSS) with their respective third party linters. | ||
class LintingTaskSpec(python_utils.OBJECT): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't exactly what we're looking for, since the code here is essentially repackaging the old method. Instead, you want to is to break out each linter (Python, JS, etc) into a separate check like PythonLintChecksManager (but for Pylint). That way, you'll have a list of tasks for each linter operation that you'll pass in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some changes. PTAL! If they are alright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's heading in the right direction, but I'd recommend pulling out the third party linters into it's own Manager class, that way there's even more isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, because these third party libraries have their own ways to lint at their fastest (ie. might parallelize on their own), I'd recommend actually having 2 queues. One queue for the third party linters like Pylint and ESLint where the Semaphore number is 1 and another for the custom linting functions where the semaphore number is the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like, you can probably pull out the linters into separate files, and keep this module for just the logic that is common to all linters. (You might not need a manager class in that case.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinlee12 so you want me to bring _lint_py_files under a Class ThirdPartyPythonLIntChecks and same for all other third party linters?
They should be in separate classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanlip Maybe because this file contains BAD_PATTERNS
and MANDATORY_PATTERNS
definitions which is used by almost all the linters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have the bad patterns and mandatory patterns check in this file, and separate all the other filetype-specific lint checks out. Then, what would happen is: the general lint checks in this file run always, and then the specific ones run for the given input filetype. Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanlip we also have EXCLUDED_PATHS
variable and redirect_stdout
method and many other global variables like FILES
, FILE_CACHE
, etc. which are used by every lint check. If we separate all checks then we have to define these in every file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a good exercise to try and figure out what the dependencies actually are and refactor -- having everything in the same place makes it more likely that you get a "big blob of mud".
Stuff like EXCLUDED_PATHS falls in the "just do this check before passing files to the sub-linters" category. I'm not sure what FILES and FILE_CACHE are used for but it may be a good idea to figure that out and define an interface for what the seam between the main linter and the sub-linters is. Having the sub-linters be in separate files will help with this as it will require you to understand the dependencies properly and find the correct "break" points.
scripts/semaphore_method.py
Outdated
@@ -0,0 +1,128 @@ | |||
# coding: utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to semaphore_utils
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Done. @seanlip ptal |
Thanks, no further comments from my end! Please get reviews from @vojtechjelinek and @nithusha21. |
Thanks @seanlip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the codeowner file (core/tests)
Just one note, some wiki pages would need to be updated with the new script command. |
Thanks, @nithusha21 I will update the wiki once this PR gets merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from code owner's perspective. Thanks!
Please do not merge, I want to take a pass over this (given this is a part of the linter project). |
@bansalnitish please do expedite the review since there is large interest in the community to make new linter checks. thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ppia#8656) * Splited linter by file extension types * Run linter using semaphore method * Used Semaphore method for linting * Fix yarn.lock * Added more isolation * Fix yarn.lock * Added manager for third party lint checks * Fix typo * Splitted the linter into sub linters * Moved under linter directory * Addressed review changes * Changed pre_commit_linter path * Temporarily disabled BlankLineBelowFileOverviewChecker * Fixed lint error * Added test for redirect_stdout * Removed html directive check * Addressed review comments * Used list.extend * Fixed lint errors * Addressed review comments * Increased concurrency limit to 2 * Addressed review comments * Fixed lint errors * Semaphore fixed * Addressed review comments * Addressed review comments. * Addressed review comments * Addressed review comments * Addressed review comments
Just following up.. @Hudda did you update the wiki pages? If so, can you link them here? |
@nithusha21 Here are the links to wiki pages: FAQ, Coding style guide |
Thanks! |
all_file_extensions_type: list(str). The list of all file extensions | ||
to be linted and checked. | ||
""" | ||
all_file_extensions_type = ['js', 'py', 'html', 'css', 'other'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hudda, Why ts
is not added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DubeySandeep we do not have separate linter for js and ts, if both options used together then Js/Ts linter will run twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how the general-purpose check would run for ts files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DubeySandeep Thanks for pointing this out. I will open a PR to fix this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please resolve it asap! (We have an open issue for the same at #8993 and I've assigned you to the same.)
(Note: This is a critical issue, a PR can be merged with fit
and this can cause CI to run only one test.)
'are used together, then the JS/TS linter will be run twice.') | ||
python_utils.PRINT('Exiting...') | ||
sys.exit(1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hudda Do we need to check whether all the passes params in file_extensions_to_lint
are valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DubeySandeep validation is added here. This option in argparse only allows a limited subset of choices.
Explanation
Fixes Splitting lint checks into Python, Js, Ts lint checks. #8625
Split the linter in pre_commit_linter to run to run linter on a specific file extension type
Example:
python -m scripts.pre_commit_linter --only-check-file-extensions py
to run linter for files with '.py' extensions.Checklist
python -m scripts.pre_commit_linter
andpython -m scripts.run_frontend_tests
.