Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #8625: Split linters to run for a specific file extension #8656

Merged
merged 41 commits into from
Mar 23, 2020

Conversation

Hudda
Copy link
Contributor

@Hudda Hudda commented Feb 16, 2020

Explanation

Checklist

  • The PR title starts with "Fix #bugnum: ", followed by a short, clear summary of the changes. (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".)
  • The PR explanation includes the words "Fixes #bugnum: ..." (or "Fixes part of #bugnum" if the PR only partially fixes an issue).
  • The linter/Karma presubmit checks have passed.
    • These should run automatically, but if not, you can manually trigger them locally using python -m scripts.pre_commit_linter and python -m scripts.run_frontend_tests.
  • The PR is made from a branch that's not called "develop".
  • The PR has an appropriate "PR CHANGELOG: ..." label (If you are unsure of which label to add, ask the reviewers for guidance).
  • The PR follows the style guide.
  • The PR addresses the points mentioned in the codeowner checks for the files/folders changed. (See the codeowner's wiki page.)
  • The PR is assigned to an appropriate reviewer.
    • If you're a new contributor, please ask on Gitter for someone to assign a reviewer and don't tick this checkbox.
    • If you're not sure who the appropriate reviewer is, please assign to the issue's "owner" -- see the "talk-to" label on the issue. Do not only request the review but also add the reviewer as an assignee.

@Hudda Hudda requested a review from kevinlee12 as a code owner February 16, 2020 17:21
@oppiabot
Copy link

oppiabot bot commented Feb 16, 2020

Hi, @Hudda. This pull request does not have a "CHANGELOG: ..." label as mentioned in the PR checkbox list. Please add this label. PRs without this label will not be merged. If you are unsure of which label to add, please ask the reviewers for guidance. Thanks!

@oppiabot
Copy link

oppiabot bot commented Feb 16, 2020

Assigning @kevinlee12 for the first-pass review of this pull request. Thanks!

Copy link
Contributor

@kevinlee12 kevinlee12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! Even though this might alleviate the issue, it's better to use the semaphore method, see https://github.com/oppia/oppia/blob/develop/scripts/run_backend_tests.py#L255 (and generally how the backend tests are structured). Semaphores essentially use an OS level way to doing resource control that is most efficient to the OS. The work here will still be needed since each linting operation needs to be divided into distinct chunks. Also, it's worth keeping in mind that pylint and eslint most likely has some parallelization underneath the hood, so it's best to have those lint by themselves. let me know if you have any questions, I'm happy to scope them out in more detail if you'd like.

@kevinlee12 kevinlee12 assigned Hudda and unassigned kevinlee12 and seanlip Feb 17, 2020
@seanlip
Copy link
Member

seanlip commented Feb 17, 2020

Just wanted to say that I agree with following the way we approach parallelization in run_backend_tests.py. One thing you might want to consider is to split the common "task running" logic that's currently in run_backend_tests.py out into a separate module. Then, both run_backend_tests.py and pre_commit_linter.py can use the code in that module to manage tasks, and you won't need to reimplement that logic from scratch in pre_commit_linter.py.

@oppiabot
Copy link

oppiabot bot commented Feb 24, 2020

Hi @Hudda, I'm going to mark this PR as stale because it hasn't had any updates for 7 days. If no further activity occurs within 4 days, it will be automatically closed so that others can take up the issue.
If you are still working on this PR, please make a follow-up commit within 4 days (and submit it for review, if applicable). Please also let us know if you are stuck so we can help you!

@oppiabot oppiabot bot added stale and removed stale labels Feb 24, 2020
@Hudda
Copy link
Contributor Author

Hudda commented Feb 29, 2020

@kevinlee12 PTAL! Also, pylint is running twice, I was not be able to find a reason for that. Can you help me with it?

@Hudda Hudda assigned kevinlee12 and unassigned Hudda Feb 29, 2020
Copy link
Contributor

@kevinlee12 kevinlee12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Hudda, this is moving in the right direction, but it needs more isolation built in. see my comments for details. thanks!

Comment on lines 3274 to 3277
def _join_linting_process(linting_processes, result_queues, result_stdouts):
"""Join process spawn off by _lint_all_files and capture the outputs."""
"""Join process spawn off by lint_all_files and capture the outputs."""
for process in linting_processes:
process.join()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not being used any more, remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,128 @@
# coding: utf-8
#
# Copyright 2018 The Oppia Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Year :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -611,6 +626,8 @@
_STDOUT_LIST = multiprocessing.Manager().list()
_FILES = multiprocessing.Manager().dict()

MAX_CONCURRENT_RUNS = 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a global number, I'd recommend putting this in the semaphore_method file. The max is whatever is in the original backend test file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 845 to 846
all_messages += _join_linting_process(
linting_processes, result_queues, stdout_queues)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to kick off all of the linting processes at once, which is what you don't want to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

"""This function is used to check if node-eslint dependencies are
installed and pass ESLint binary path and lint all the files(JS, Python,
HTML, CSS) with their respective third party linters.
class LintingTaskSpec(python_utils.OBJECT):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't exactly what we're looking for, since the code here is essentially repackaging the old method. Instead, you want to is to break out each linter (Python, JS, etc) into a separate check like PythonLintChecksManager (but for Pylint). That way, you'll have a list of tasks for each linter operation that you'll pass in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some changes. PTAL! If they are alright.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's heading in the right direction, but I'd recommend pulling out the third party linters into it's own Manager class, that way there's even more isolation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, because these third party libraries have their own ways to lint at their fastest (ie. might parallelize on their own), I'd recommend actually having 2 queues. One queue for the third party linters like Pylint and ESLint where the Semaphore number is 1 and another for the custom linting functions where the semaphore number is the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like, you can probably pull out the linters into separate files, and keep this module for just the logic that is common to all linters. (You might not need a manager class in that case.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinlee12 so you want me to bring _lint_py_files under a Class ThirdPartyPythonLIntChecks and same for all other third party linters?

They should be in separate classes.

Copy link
Contributor Author

@Hudda Hudda Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlip Maybe because this file contains BAD_PATTERNS and MANDATORY_PATTERNS definitions which is used by almost all the linters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have the bad patterns and mandatory patterns check in this file, and separate all the other filetype-specific lint checks out. Then, what would happen is: the general lint checks in this file run always, and then the specific ones run for the given input filetype. Would that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlip we also have EXCLUDED_PATHS variable and redirect_stdout method and many other global variables like FILES, FILE_CACHE, etc. which are used by every lint check. If we separate all checks then we have to define these in every file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a good exercise to try and figure out what the dependencies actually are and refactor -- having everything in the same place makes it more likely that you get a "big blob of mud".

Stuff like EXCLUDED_PATHS falls in the "just do this check before passing files to the sub-linters" category. I'm not sure what FILES and FILE_CACHE are used for but it may be a good idea to figure that out and define an interface for what the seam between the main linter and the sub-linters is. Having the sub-linters be in separate files will help with this as it will require you to understand the dependencies properly and find the correct "break" points.

@@ -0,0 +1,128 @@
# coding: utf-8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to semaphore_utils instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@kevinlee12 kevinlee12 assigned Hudda and unassigned kevinlee12 Feb 29, 2020
@Hudda Hudda removed their assignment Mar 1, 2020
@Hudda
Copy link
Contributor Author

Hudda commented Mar 22, 2020

OK -- to be clearer, this is how the whole Returns part should look (here and elsewhere):

tuple(None, ThirdPartyCSSLintChecksManager). A 2-tuple of custom and 
third_party linter objects.

Note: no "Returns", and no need to indent the second line.

Done. @seanlip ptal

@Hudda Hudda assigned seanlip and unassigned Hudda Mar 22, 2020
@seanlip seanlip removed their assignment Mar 22, 2020
@seanlip
Copy link
Member

seanlip commented Mar 22, 2020

Thanks, no further comments from my end!

Please get reviews from @vojtechjelinek and @nithusha21.

@Hudda
Copy link
Contributor Author

Hudda commented Mar 22, 2020

Thanks, no further comments from my end!

Please get reviews from @vojtechjelinek and @nithusha21.

Thanks @seanlip

Copy link
Contributor

@nithusha21 nithusha21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the codeowner file (core/tests)

@nithusha21
Copy link
Contributor

Just one note, some wiki pages would need to be updated with the new script command.

@nithusha21 nithusha21 assigned Hudda and unassigned nithusha21 Mar 22, 2020
@Hudda
Copy link
Contributor Author

Hudda commented Mar 22, 2020

Just one note, some wiki pages would need to be updated with the new script command.

Thanks, @nithusha21 I will update the wiki once this PR gets merged.

@Hudda Hudda removed their assignment Mar 22, 2020
Copy link
Contributor

@vojtechjelinek vojtechjelinek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from code owner's perspective. Thanks!

@bansalnitish
Copy link
Contributor

Please do not merge, I want to take a pass over this (given this is a part of the linter project).

@kevinlee12
Copy link
Contributor

@bansalnitish please do expedite the review since there is large interest in the community to make new linter checks. thanks!

Copy link
Contributor

@bansalnitish bansalnitish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bansalnitish bansalnitish merged commit 8c37b1e into oppia:develop Mar 23, 2020
@Hudda Hudda deleted the split-linter branch March 23, 2020 08:48
FlorinBalint pushed a commit to FlorinBalint/oppia that referenced this pull request Mar 27, 2020
…ppia#8656)

* Splited linter by file extension types

* Run linter using semaphore method

* Used Semaphore method for linting

* Fix yarn.lock

* Added more isolation

* Fix yarn.lock

* Added manager for third party lint checks

* Fix typo

* Splitted the linter into sub linters

* Moved under linter directory

* Addressed review changes

* Changed pre_commit_linter path

* Temporarily disabled BlankLineBelowFileOverviewChecker

* Fixed lint error

* Added test for redirect_stdout

* Removed html directive check

* Addressed review comments

* Used list.extend

* Fixed lint errors

* Addressed review comments

* Increased concurrency limit to 2

* Addressed review comments

* Fixed lint errors

* Semaphore fixed

* Addressed review comments

* Addressed review comments.

* Addressed review comments

* Addressed review comments

* Addressed review comments
@nithusha21
Copy link
Contributor

Just following up.. @Hudda did you update the wiki pages? If so, can you link them here?

@Hudda
Copy link
Contributor Author

Hudda commented Apr 5, 2020

@nithusha21 Here are the links to wiki pages: FAQ, Coding style guide

@nithusha21
Copy link
Contributor

Thanks!

all_file_extensions_type: list(str). The list of all file extensions
to be linted and checked.
"""
all_file_extensions_type = ['js', 'py', 'html', 'css', 'other']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hudda, Why ts is not added here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DubeySandeep we do not have separate linter for js and ts, if both options used together then Js/Ts linter will run twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how the general-purpose check would run for ts files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DubeySandeep Thanks for pointing this out. I will open a PR to fix this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please resolve it asap! (We have an open issue for the same at #8993 and I've assigned you to the same.)

(Note: This is a critical issue, a PR can be merged with fit and this can cause CI to run only one test.)

'are used together, then the JS/TS linter will be run twice.')
python_utils.PRINT('Exiting...')
sys.exit(1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hudda Do we need to check whether all the passes params in file_extensions_to_lint are valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DubeySandeep validation is added here. This option in argparse only allows a limited subset of choices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Splitting lint checks into Python, Js, Ts lint checks.
7 participants