-
Notifications
You must be signed in to change notification settings - Fork 36.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: add new python linter to check file names and permissions #21740
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK: desired properties should where possible be checked automatically.
Linters are to "repo invariants" what constructors are to class invariants :)
Nice to see the linting being done in Python instead of bash: much better in terms of maintainability, portability and readability.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
d0016b1
to
886315f
Compare
f0b0c7e
to
7126ab5
Compare
Can you update the commit message to use the following format: prefix: title
commit body i.e: test: add new python linter to check file names and permissions
explain what is being done in this commit
explain some more etc Also, for this to me merged. Obviously all linters / CI need to be passing. So if changes to other files need to be made, they will have to be done either in preceding commits, or as part of the same change. |
7126ab5
to
52c3fb3
Compare
Updates permissions on files to comply with the new test added in the following commit
Thanks @fanquake - I've updated the commit messages to the standard format
Done - added a new commit before the main commit which updates the permissions on files which are failing the new test. Would appreciate any additional review and feedback :) |
@windsok This PR removes The new file names are confusing to me as they do something different than they hint. Would it make sense to you to rename them from "lint-filenames-permissions.{py|sh}" to "lint-files.{py|sh}" or something better? Edit:
|
failed_tests += check_shebang_file_permissions() | ||
|
||
if failed_tests: | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Given that failed_tests
number is computed, one can probably print the number of failed tests too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, I've updated to keep track of the number of failures, and print a message with the total number of failures, if any :)
'ci/retry/retry' -> None | ||
'contrib/devtools/split-debug.sh.in' -> 'sh.in' | ||
""" | ||
filename_parts = filename.split(os.extsep, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think people mostly do something like this https://source.dot.net/#System.Private.CoreLib/Path.cs,467ebc0e33e0820c to find a file extension. That is to look for the first os.extsep
but not from the beginning of the string (filename) but from the end of the filename.
A counter-example can be 'contrib/devtools/split-debug.my.nice.helper.sh.in'
where you would return my.nice.helper.sh.in
.
Your code may be perfectly fine for what you want to achieve in this PR, yet I would be hesitant to use get_filename_extension
name.
I have no clear suggestion here, letting it up to you to decide. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I originally was using os.path.splitext(self.file_path)[1].strip(".")
which works similar to the linked implementation (only return the final extension, not everything from the first .
)
I had changed this as I needed a way to detect .sh.in
files to exclude them from one of the tests, but you bring up a very good point - with the change I had made it's possible certain files with weird extensions like split-debug.my.nice.helper.sh
would not actually be detected as a sh
file, and would not have certain tests enforced.
Ive updated the code to add a new FileMeta
class, and have changed the logic to use the appropriate version of the extension in each test. Let me know if this makes sense or not :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks fine to me. Thanks for updating it. :)
c1906a8
to
36573c8
Compare
I'm happy to name this test as
I am also unsure on this one, but happy to take advice on which direction it should go |
Thanks @practicalswift :) looking forward to getting more involved. Sorry for the delayed response on the last round of feedback. |
82018d3
to
46b025e
Compare
Replaces the existing tests in the test/lint/lint-filenames.sh and test/lint/lint-shebang.sh linter tests, as well as adding some new and increased testing. Summary of tests: - Checks every file in the repository against an allowed regexp to make sure only lowercase or uppercase alphanumerics (a-zA-Z0-9), underscores (_), hyphens (-), at (@) and dots (.) are used in repository filenames. - Checks only source files (*.cpp, *.h, *.py, *.sh) against a stricter allowed regexp to make sure only lowercase alphanumerics (a-z0-9), underscores (_), hyphens (-) and dots (.) are used in source code filenames. Additionally there is an exception regexp for directories or files which are excepted from matching this regexp (This should replicate the existing test/lint/lint-filenames.sh test) - Checks all files in the repository match an allowed executable or non-executable file permission octal. Additionally checks that for executable files, the file contains a shebang line. - Checks that for executable .py and .sh files, the shebang line used matches an allowable list of shebangs (This should replicate the existing test/lint/lint-shebang.sh test) - Checks every file that contains a shebang line to ensure it has an executable permission Fixes bitcoin#21729
cr re-ACK 46b025e: patch still looks correct |
""" | ||
# We avoid using rstrip() to ensure we catch filenames which accidentally include trailing whitespace | ||
filenames = check_output(CMD_ALL_FILES, shell=True).decode("utf8").split("\n") | ||
filenames = [filename for filename in filenames if filename != ""] # removes the trailing empty list element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The comment does not correspond with the code actually.
One could probably do: filenames = filenames[:-1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I chose to remove the trailing element by checking for an empty string as I was worried about some case in the future where the output of the git ls-files
command changes somehow and the last element is not actually empty. I'm probably overthinking it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a review of this, I'm fairly certain I can just safely rewrite this code as this one liner, which still catches the original edge case of a file with whitespace at the end of the files list.
filenames = check_output(CMD_ALL_FILES, shell=True).decode("utf8").rstrip('\n').split("\n")
Happy to update this if people don't mind re-reviewing. Let me know.
Edit: though that said, apparently it is actually possible for a filename to have a newline in it so I guess that one-liner would not catch this case, though it does seem fairly unlikely to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I chose to remove the trailing element by checking for an empty string as I was worried about some case in the future where the output of the
git ls-files
command changes somehow and the last element is not actually empty. I'm probably overthinking it though.
One would need to ask on a Git mailing list whether the text output of the command is stable or not. But your solution works well, so it's good as it is I think. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a filename had a newline in its name it would also be split by the split("\n")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll think about a way to handle this case. I think currently if this happens the test overall will still fail, as the permissions test would not be able to locate the file and should error out - we can definitely handle this better, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think .rstrip('\n').split("\n")
as you suggested is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #21873
""" | ||
Returns the octal file permission of the file | ||
""" | ||
return int(oct(os.stat(self.file_path).st_mode)[-3:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: I'm not sure whether os.stat
should behave have follow_symlinks equal to True
(default) or False
. Possibly relevant is https://stackoverflow.com/questions/954560/how-does-git-handle-symbolic-links/18791647#18791647.
I don't want to hold up this PR. Leaving this for comment here for the sake of completeness. Maybe it's stupid. IDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting. It seems that there are currently no symbolic links in the repository itself, at least none that I could find with:
ls -lR ~/code/bitcoin | grep ^l
I wonder if that is intentional, or it's just never come up before?
In any case, if there were symbolic links my thinking is that we probably would want to follow them, as it seems that symlinks themselves can't actually have their permissions changed (At least on linux), so either we would want to exclude symlinks entirely from the permissions test, or just do the default here and follow the link to check the linked file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take on this is: I think that adding a symbolic link should produce a warning so that others can notice the fact. The fact that there are no symbolic links at the moment hints that they are probably not necessary.
However, this is really out of scope of this PR, I guess. If somebody finds it important it's always possible to address this in a follow up PR. :)
code review ACK 46b025e if Great contribution! |
Code review ACK 46b025e
Agree. Shell script tends to become virtually unreviewable quickly when the logic gets more complex than simply executing some commands in a batch. |
|
||
set -e | ||
cd "$(dirname $0)/../.." | ||
test/lint/lint-files.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question (not a blocker): why does this need a bash wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the main reason is the the lint-all.sh
script that CI runs is currently setup to automatically run all .sh
tests in the test/lint
directory, but won't automatically run .py
files.
In addition it allows lint tests to opt in or out of locale dependence
If we start to move more towards using python based linters, it probably makes sense to make some updates here, so we can get rid of the requirement for the .sh
wrapper.
@practicalswift may have more context here.
""" | ||
# We avoid using rstrip() to ensure we catch filenames which accidentally include trailing whitespace | ||
filenames = check_output(CMD_ALL_FILES, shell=True).decode("utf8").split("\n") | ||
filenames = [filename for filename in filenames if filename != ""] # removes the trailing empty list element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a filename had a newline in its name it would also be split by the split("\n")
for filename in filenames: | ||
file_meta = FileMeta(filename) | ||
if file_meta.permissions == ALLOWED_PERMISSION_EXECUTABLES: | ||
shebang = open(filename, "rb").readline().rstrip(b"\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shouldn't open
be used in a context manager, so that it is closed again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I will address in a follow-up
shebang = open(filename, "rb").readline().rstrip(b"\n") | ||
|
||
# For any file with executable permissions the first line must contain a shebang | ||
if shebang[:2] != b"#!": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Doesn't matter here, but it could use startswith
for type safety.
d48565d fix permissions on 00_setup_env_native_fuzz_with_msan (glozow) Pull request description: I think there was a silent merge conflict between #21852 and #21740? I have a [linter failure](https://cirrus-ci.com/task/5436849834426368): ``` File "ci/test/00_setup_env_native_fuzz_with_msan.sh" contains a shebang line, but has the file permission 644 instead of the expected executable permission 755. Do "chmod 755 ci/test/00_setup_env_native_fuzz_with_msan.sh" (or remove the shebang line). ERROR: There were 1 failed tests in the lint-files.py lint test. Please resolve the above errors. ^---- failure generated from test/lint/lint-files.sh ``` ACKs for top commit: MarcoFalke: ACK d48565d Tree-SHA512: 445bdd738faf007451f40bbcf360dd1fb4675e17a4c96546e6818c12e33dd336dadd95cf8d4b5f8df1d6ccfbc4bf5496864bb5528e416cea894857b6b732140c
…and per… …missions
…_msan.sh d48565d fix permissions on 00_setup_env_native_fuzz_with_msan (glozow) Pull request description: I think there was a silent merge conflict between bitcoin#21852 and bitcoin#21740? I have a [linter failure](https://cirrus-ci.com/task/5436849834426368): ``` File "ci/test/00_setup_env_native_fuzz_with_msan.sh" contains a shebang line, but has the file permission 644 instead of the expected executable permission 755. Do "chmod 755 ci/test/00_setup_env_native_fuzz_with_msan.sh" (or remove the shebang line). ERROR: There were 1 failed tests in the lint-files.py lint test. Please resolve the above errors. ^---- failure generated from test/lint/lint-files.sh ``` ACKs for top commit: MarcoFalke: ACK d48565d Tree-SHA512: 445bdd738faf007451f40bbcf360dd1fb4675e17a4c96546e6818c12e33dd336dadd95cf8d4b5f8df1d6ccfbc4bf5496864bb5528e416cea894857b6b732140c
…r test 2227fc4 test: minor fixes & improvements for files linter test (windsok) Pull request description: Couple of minor fixes & improvements for files linter test added in bitcoin#21740 - Use a context manager when opening files, so that files are closed are we are done with them - Use the `-z` flag when shelling out to `git ls-files` so that we can catch newlines and other weird control characters in filenames. From the `git ls-files` manpage: ``` -z \0 line termination on output and do not quote filenames. See OUTPUT below for more information. Without the -z option, pathnames with "unusual" characters are quoted as explained for the configuration variable core.quotePath (see git-config(1)). Using -z the filename is output verbatim and the line is terminated by a NUL byte. ``` ACKs for top commit: MarcoFalke: cr ACK 2227fc4 practicalswift: cr ACK 2227fc4: patch looks correct Tree-SHA512: af059a805f4a7614162de85dea856052a45ab531895cb0431087e7fc9e037513fa7501bb5eb2fe43238adf5f09e77712ebdbb15b1486983359ad3661a3da0c60
ALLOWED_PERMISSION_NON_EXECUTABLES = 644 | ||
ALLOWED_PERMISSION_EXECUTABLES = 755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR, when I run the linter I see a flood of errors and have to disable it. It seems inappropriate to check user, group, read, and write permissions when the only permission git tracks is whether the file is marked executable. Other permissions are local to the checkout and don't affect the project or other developers. Errors I see are:
[...]
File "test/lint/lint-whitespace.sh" contains a shebang line, but has the file permission 700 instead of the expected executable permission 755. Do "chmod 755 test/lint/lint-whitespace.sh" (or remove the shebang line).
File "test/util/bitcoin-util-test.py" contains a shebang line, but has the file permission 700 instead of the expected executable permission 755. Do "chmod 755 test/util/bitcoin-util-test.py" (or remove the shebang line).
File "test/util/rpcauth-test.py" contains a shebang line, but has the file permission 700 instead of the expected executable permission 755. Do "chmod 755 test/util/rpcauth-test.py" (or remove the shebang line).
ERROR: There were 2464 failed tests in the lint-files.py lint test. Please resolve the above errors.
^---- failure generated from test/lint/lint-files.sh
Adds a new python linter test which tests for correct filenames and file permissions in the repository.
Replaces the existing tests in the
test/lint/lint-filenames.sh
andtest/lint/lint-shebang.sh
linter tests, as well as adding some new and increased testing. This increased coverage is intended to catch issues such as in #21728 and https://github.com/bitcoin/bitcoin/pull/16807/files#r345547050Summary of tests:
Checks every file in the repository against an allowed regexp to make sure only lowercase or uppercase alphanumerics (a-zA-Z0-9), underscores (_), hyphens (-), at (@) and dots (.) are used in repository filenames.
Checks only source files (*.cpp, *.h, *.py, *.sh) against a stricter allowed regexp to make sure only lowercase alphanumerics (a-z0-9), underscores (_), hyphens (-) and dots (.) are used in source code filenames. Additionally there is an exception regexp for directories or files which are excepted from matching this regexp (This should replicate the existing
test/lint/lint-filenames.sh
test)Checks all files in the repository match an allowed executable or non-executable file permission octal. Additionally checks that for executable files, the file contains a shebang line.
Checks that for executable
.py
and.sh
files, the shebang line used matches an allowable list of shebangs (This should replicate the existingtest/lint/lint-shebang.sh
test)Checks every file that contains a shebang line to ensure it has an executable permission
Additionally updates the permissions on various files to comply with the new tests.
Fixes #21729