Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add new python linter to check file names and permissions #21740

Merged
merged 2 commits into from
May 5, 2021

Conversation

windsok
Copy link
Contributor

@windsok windsok commented Apr 21, 2021

Adds a new python linter test which tests for correct filenames and file permissions in the repository.

Replaces the existing tests in the test/lint/lint-filenames.sh and test/lint/lint-shebang.sh linter tests, as well as adding some new and increased testing. This increased coverage is intended to catch issues such as in #21728 and https://github.com/bitcoin/bitcoin/pull/16807/files#r345547050

Summary of tests:

  • Checks every file in the repository against an allowed regexp to make sure only lowercase or uppercase alphanumerics (a-zA-Z0-9), underscores (_), hyphens (-), at (@) and dots (.) are used in repository filenames.

  • Checks only source files (*.cpp, *.h, *.py, *.sh) against a stricter allowed regexp to make sure only lowercase alphanumerics (a-z0-9), underscores (_), hyphens (-) and dots (.) are used in source code filenames. Additionally there is an exception regexp for directories or files which are excepted from matching this regexp (This should replicate the existing test/lint/lint-filenames.sh test)

  • Checks all files in the repository match an allowed executable or non-executable file permission octal. Additionally checks that for executable files, the file contains a shebang line.

  • Checks that for executable .py and .sh files, the shebang line used matches an allowable list of shebangs (This should replicate the existing test/lint/lint-shebang.sh test)

  • Checks every file that contains a shebang line to ensure it has an executable permission

Additionally updates the permissions on various files to comply with the new tests.

Fixes #21729

Copy link
Contributor

@practicalswift practicalswift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK: desired properties should where possible be checked automatically.

Linters are to "repo invariants" what constructors are to class invariants :)

Nice to see the linting being done in Python instead of bash: much better in terms of maintainability, portability and readability.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 21, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@windsok windsok force-pushed the file-permissions-linter branch 2 times, most recently from f0b0c7e to 7126ab5 Compare April 22, 2021 23:47
@fanquake fanquake changed the title test: add new python linter test for checking filenames and file permissions test: add new python linter to check file names and permissions Apr 23, 2021
@fanquake
Copy link
Member

fanquake commented Apr 23, 2021

Can you update the commit message to use the following format:

prefix: title

commit body

i.e:

test: add new python linter to check file names and permissions

explain what is being done in this commit
explain some more etc

Also, for this to me merged. Obviously all linters / CI need to be passing. So if changes to other files need to be made, they will have to be done either in preceding commits, or as part of the same change.

@windsok windsok force-pushed the file-permissions-linter branch from 7126ab5 to 52c3fb3 Compare April 24, 2021 00:13
Updates permissions on files to comply with the new test added in the following commit
@windsok
Copy link
Contributor Author

windsok commented Apr 24, 2021

Can you update the commit message to use the following format:

Thanks @fanquake - I've updated the commit messages to the standard format

Also, for this to me merged. Obviously all linters / CI need to be passing. So if changes to other files need to be made, they will have to be done either in preceding commits, or as part of the same change.

Done - added a new commit before the main commit which updates the permissions on files which are failing the new test.

Would appreciate any additional review and feedback :)

@windsok windsok marked this pull request as ready for review April 24, 2021 00:19
@practicalswift
Copy link
Contributor

cr ACK 52c3fb3: patch looks correct!

Nice first-time contribution @windsok. I hope to see more contributions from you in the future. Warm welcome as a contributor!

@kiminuo
Copy link
Contributor

kiminuo commented Apr 24, 2021

@windsok This PR removes test/lint/lint-shebang.sh and test/lint/lint-filenames.sh and adds test/lint/lint-filenames-permissions.sh and adds test/lint/lint-filenames-permissions.py.

The new file names are confusing to me as they do something different than they hint. Would it make sense to you to rename them from "lint-filenames-permissions.{py|sh}" to "lint-files.{py|sh}" or something better?

Edit:

  • This PR modifies https://github.com/bitcoin/bitcoin/blob/master/contrib/gitian-descriptors/assign_DISTNAME permissions but it may be actually correct to add .sh extension - I'm not sure. Just mentioning it here so that others double check correctness of the change.

failed_tests += check_shebang_file_permissions()

if failed_tests:
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Given that failed_tests number is computed, one can probably print the number of failed tests too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, I've updated to keep track of the number of failures, and print a message with the total number of failures, if any :)

'ci/retry/retry' -> None
'contrib/devtools/split-debug.sh.in' -> 'sh.in'
"""
filename_parts = filename.split(os.extsep, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people mostly do something like this https://source.dot.net/#System.Private.CoreLib/Path.cs,467ebc0e33e0820c to find a file extension. That is to look for the first os.extsep but not from the beginning of the string (filename) but from the end of the filename.

A counter-example can be 'contrib/devtools/split-debug.my.nice.helper.sh.in' where you would return my.nice.helper.sh.in.

Your code may be perfectly fine for what you want to achieve in this PR, yet I would be hesitant to use get_filename_extension name.

I have no clear suggestion here, letting it up to you to decide. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I originally was using os.path.splitext(self.file_path)[1].strip(".") which works similar to the linked implementation (only return the final extension, not everything from the first .)

I had changed this as I needed a way to detect .sh.in files to exclude them from one of the tests, but you bring up a very good point - with the change I had made it's possible certain files with weird extensions like split-debug.my.nice.helper.sh would not actually be detected as a sh file, and would not have certain tests enforced.

Ive updated the code to add a new FileMeta class, and have changed the logic to use the appropriate version of the extension in each test. Let me know if this makes sense or not :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine to me. Thanks for updating it. :)

@windsok windsok force-pushed the file-permissions-linter branch 2 times, most recently from c1906a8 to 36573c8 Compare April 29, 2021 23:39
@windsok
Copy link
Contributor Author

windsok commented Apr 29, 2021

@windsok This PR removes test/lint/lint-shebang.sh and test/lint/lint-filenames.sh and adds test/lint/lint-filenames-permissions.sh and adds test/lint/lint-filenames-permissions.py.

The new file names are confusing to me as they do something different than they hint. Would it make sense to you to rename them from "lint-filenames-permissions.{py|sh}" to "lint-files.{py|sh}" or something better?

I'm happy to name this test as lint-files.{py|sh} - it does seem clearer and less confusing. Thanks for the suggestion :) I've updated this in the latest version commit.

  • This PR modifies https://github.com/bitcoin/bitcoin/blob/master/contrib/gitian-descriptors/assign_DISTNAME permissions but it may be actually correct to add .sh extension - I'm not sure. Just mentioning it here so that others double check correctness of the change.

I am also unsure on this one, but happy to take advice on which direction it should go

@windsok
Copy link
Contributor Author

windsok commented Apr 29, 2021

Nice first-time contribution @windsok. I hope to see more contributions from you in the future. Warm welcome as a contributor!

Thanks @practicalswift :) looking forward to getting more involved. Sorry for the delayed response on the last round of feedback.

@windsok windsok force-pushed the file-permissions-linter branch 2 times, most recently from 82018d3 to 46b025e Compare April 29, 2021 23:59
Replaces the existing tests in the test/lint/lint-filenames.sh and test/lint/lint-shebang.sh linter tests, as well as adding some new and increased testing.
Summary of tests:
- Checks every file in the repository against an allowed regexp to make sure only lowercase or uppercase alphanumerics (a-zA-Z0-9), underscores (_), hyphens (-), at (@) and dots (.) are used in repository filenames.
- Checks only source files (*.cpp, *.h, *.py, *.sh) against a stricter allowed regexp to make sure only lowercase alphanumerics (a-z0-9), underscores (_), hyphens (-) and dots (.) are used in source code filenames. Additionally there is an exception regexp for directories or files which are excepted from matching this regexp (This should replicate the existing test/lint/lint-filenames.sh test)
- Checks all files in the repository match an allowed executable or non-executable file permission octal. Additionally checks that for executable files, the file contains a shebang line.
- Checks that for executable .py and .sh files, the shebang line used matches an allowable list of shebangs (This should replicate the existing test/lint/lint-shebang.sh test)
- Checks every file that contains a shebang line to ensure it has an executable permission

Fixes bitcoin#21729
@practicalswift
Copy link
Contributor

cr re-ACK 46b025e: patch still looks correct

@bitcoin bitcoin deleted a comment from DrahtBot Apr 30, 2021
"""
# We avoid using rstrip() to ensure we catch filenames which accidentally include trailing whitespace
filenames = check_output(CMD_ALL_FILES, shell=True).decode("utf8").split("\n")
filenames = [filename for filename in filenames if filename != ""] # removes the trailing empty list element
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The comment does not correspond with the code actually.

One could probably do: filenames = filenames[:-1]

Copy link
Contributor Author

@windsok windsok May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I chose to remove the trailing element by checking for an empty string as I was worried about some case in the future where the output of the git ls-files command changes somehow and the last element is not actually empty. I'm probably overthinking it though.

Copy link
Contributor Author

@windsok windsok May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a review of this, I'm fairly certain I can just safely rewrite this code as this one liner, which still catches the original edge case of a file with whitespace at the end of the files list.

filenames = check_output(CMD_ALL_FILES, shell=True).decode("utf8").rstrip('\n').split("\n")

Happy to update this if people don't mind re-reviewing. Let me know.

Edit: though that said, apparently it is actually possible for a filename to have a newline in it so I guess that one-liner would not catch this case, though it does seem fairly unlikely to happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I chose to remove the trailing element by checking for an empty string as I was worried about some case in the future where the output of the git ls-files command changes somehow and the last element is not actually empty. I'm probably overthinking it though.

One would need to ask on a Git mailing list whether the text output of the command is stable or not. But your solution works well, so it's good as it is I think. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a filename had a newline in its name it would also be split by the split("\n")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about a way to handle this case. I think currently if this happens the test overall will still fail, as the permissions test would not be able to locate the file and should error out - we can definitely handle this better, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .rstrip('\n').split("\n") as you suggested is fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #21873

"""
Returns the octal file permission of the file
"""
return int(oct(os.stat(self.file_path).st_mode)[-3:])
Copy link
Contributor

@kiminuo kiminuo May 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I'm not sure whether os.stat should behave have follow_symlinks equal to True (default) or False. Possibly relevant is https://stackoverflow.com/questions/954560/how-does-git-handle-symbolic-links/18791647#18791647.

I don't want to hold up this PR. Leaving this for comment here for the sake of completeness. Maybe it's stupid. IDK.

Copy link
Contributor Author

@windsok windsok May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting. It seems that there are currently no symbolic links in the repository itself, at least none that I could find with:

ls -lR ~/code/bitcoin | grep ^l

I wonder if that is intentional, or it's just never come up before?

In any case, if there were symbolic links my thinking is that we probably would want to follow them, as it seems that symlinks themselves can't actually have their permissions changed (At least on linux), so either we would want to exclude symlinks entirely from the permissions test, or just do the default here and follow the link to check the linked file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take on this is: I think that adding a symbolic link should produce a warning so that others can notice the fact. The fact that there are no symbolic links at the moment hints that they are probably not necessary.

However, this is really out of scope of this PR, I guess. If somebody finds it important it's always possible to address this in a follow up PR. :)

@kiminuo
Copy link
Contributor

kiminuo commented May 1, 2021

code review ACK 46b025e if contrib/gitian-descriptors/assign_DISTNAME permission change is deemed OK.

Great contribution!

@DrahtBot
Copy link
Contributor

DrahtBot commented May 3, 2021

🕵️ @achow101 @sipa have been requested to review this pull request as specified in the REVIEWERS file.

@laanwj
Copy link
Member

laanwj commented May 5, 2021

Code review ACK 46b025e
Also ACK on the permission changes in 46b025e.
Thanks for working on this.

Nice to see the linting being done in Python instead of bash: much better in terms of maintainability, portability and readability.

Agree. Shell script tends to become virtually unreviewable quickly when the logic gets more complex than simply executing some commands in a batch.


set -e
cd "$(dirname $0)/../.."
test/lint/lint-files.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question (not a blocker): why does this need a bash wrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the main reason is the the lint-all.sh script that CI runs is currently setup to automatically run all .sh tests in the test/lint directory, but won't automatically run .py files.

In addition it allows lint tests to opt in or out of locale dependence

If we start to move more towards using python based linters, it probably makes sense to make some updates here, so we can get rid of the requirement for the .sh wrapper.

@practicalswift may have more context here.

@laanwj laanwj merged commit 1b9a523 into bitcoin:master May 5, 2021
"""
# We avoid using rstrip() to ensure we catch filenames which accidentally include trailing whitespace
filenames = check_output(CMD_ALL_FILES, shell=True).decode("utf8").split("\n")
filenames = [filename for filename in filenames if filename != ""] # removes the trailing empty list element
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a filename had a newline in its name it would also be split by the split("\n")

for filename in filenames:
file_meta = FileMeta(filename)
if file_meta.permissions == ALLOWED_PERMISSION_EXECUTABLES:
shebang = open(filename, "rb").readline().rstrip(b"\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shouldn't open be used in a context manager, so that it is closed again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I will address in a follow-up

shebang = open(filename, "rb").readline().rstrip(b"\n")

# For any file with executable permissions the first line must contain a shebang
if shebang[:2] != b"#!":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Doesn't matter here, but it could use startswith for type safety.

maflcko pushed a commit that referenced this pull request May 5, 2021
d48565d fix permissions on 00_setup_env_native_fuzz_with_msan (glozow)

Pull request description:

  I think there was a silent merge conflict between #21852 and #21740?

  I have a [linter failure](https://cirrus-ci.com/task/5436849834426368):

  ```
  File "ci/test/00_setup_env_native_fuzz_with_msan.sh" contains a shebang line, but has the file permission 644 instead of the expected executable permission 755. Do "chmod 755 ci/test/00_setup_env_native_fuzz_with_msan.sh" (or remove the shebang line).
  ERROR: There were 1 failed tests in the lint-files.py lint test. Please resolve the above errors.
  ^---- failure generated from test/lint/lint-files.sh
  ```

ACKs for top commit:
  MarcoFalke:
    ACK d48565d

Tree-SHA512: 445bdd738faf007451f40bbcf360dd1fb4675e17a4c96546e6818c12e33dd336dadd95cf8d4b5f8df1d6ccfbc4bf5496864bb5528e416cea894857b6b732140c
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 5, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 5, 2021
…_msan.sh

d48565d fix permissions on 00_setup_env_native_fuzz_with_msan (glozow)

Pull request description:

  I think there was a silent merge conflict between bitcoin#21852 and bitcoin#21740?

  I have a [linter failure](https://cirrus-ci.com/task/5436849834426368):

  ```
  File "ci/test/00_setup_env_native_fuzz_with_msan.sh" contains a shebang line, but has the file permission 644 instead of the expected executable permission 755. Do "chmod 755 ci/test/00_setup_env_native_fuzz_with_msan.sh" (or remove the shebang line).
  ERROR: There were 1 failed tests in the lint-files.py lint test. Please resolve the above errors.
  ^---- failure generated from test/lint/lint-files.sh
  ```

ACKs for top commit:
  MarcoFalke:
    ACK d48565d

Tree-SHA512: 445bdd738faf007451f40bbcf360dd1fb4675e17a4c96546e6818c12e33dd336dadd95cf8d4b5f8df1d6ccfbc4bf5496864bb5528e416cea894857b6b732140c
@DrahtBot
Copy link
Contributor

DrahtBot commented May 6, 2021

Gitian builds

File commit 3f8f238
(master)
commit 4f1c6e5
(master and this pull)
*-aarch64-linux-gnu-debug.tar.gz 1d782713b323d8b8... 00706d683dc7fdb3...
*-aarch64-linux-gnu.tar.gz d6ea38d31834b245... 7fc438afb1329705...
*-arm-linux-gnueabihf-debug.tar.gz 4dc746c0a4e127f5... 1723f3a260829ade...
*-arm-linux-gnueabihf.tar.gz ee7559f5ce241e5d... 4294b3e4f15990e7...
*-osx-unsigned.dmg eef8d656b3f0229d... 87a5702ab3f06d26...
*-osx64.tar.gz 83f3690cd9006ef3... b3c92c583789b2c7...
*-powerpc64-linux-gnu-debug.tar.gz 4e6bb99605730f87... 4345d0a6eee207f6...
*-powerpc64-linux-gnu.tar.gz f7f89deb2cb0819a... 0b73302264fe0eae...
*-powerpc64le-linux-gnu-debug.tar.gz e1e9e084d4088dec... fa3fbbf1bf64614e...
*-powerpc64le-linux-gnu.tar.gz bc95d5e7ba71cd31... e7633a594fe25426...
*-riscv64-linux-gnu-debug.tar.gz 95f182eb1d00e5f5... 0701b7b84a8683ee...
*-riscv64-linux-gnu.tar.gz 6e1ee8834e0f17ec... 94c9e06c8cbc1956...
*-win64-debug.zip 1a5115bafee424f2... 9363c1aa27d3ce46...
*-win64-setup-unsigned.exe 03d79fc50eceea8f... f20c0017ecd90652...
*-win64.zip 4333c2e76bf73c2c... 1ba0d6d65b89a14a...
*-x86_64-linux-gnu-debug.tar.gz 18ce6c08c5bc691e... 8645294ec08c6a64...
*-x86_64-linux-gnu.tar.gz a58dbdea99493df5... cc88f00d07b4784c...
*.tar.gz 7cdbf224e1151aa1... 4419dfaa17218c19...
bitcoin-core-linux-22-res.yml 97ceb4f415d8c4a2... 85ce1f40436f8cb5...
bitcoin-core-osx-22-res.yml 80d164889106ca0a... 96f161a171eddcab...
bitcoin-core-win-22-res.yml dd694a3ccfd5e934... 12fbc619bafb0987...
linux-build.log c2df17b298bea4de... 521a16a9ead3839a...
osx-build.log d185ccafdeb25ee6... 0d80d2d9f91e6111...
win-build.log 0d0ac751380fb5e4... 6a72628fc4afefaf...
bitcoin-core-linux-22-res.yml.diff dcb55fb490600b5b...
bitcoin-core-osx-22-res.yml.diff e7803ad2ab6f4aa8...
bitcoin-core-win-22-res.yml.diff 04c8390c0da31c97...
linux-build.log.diff 4ed426a1862a5aa1...
osx-build.log.diff 1c2436e98e4c7fbe...
win-build.log.diff f6d48e161363f2cf...

maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request May 7, 2021
…r test

2227fc4 test: minor fixes & improvements for files linter test (windsok)

Pull request description:

  Couple of minor fixes & improvements for files linter test added in bitcoin#21740

  - Use a context manager when opening files, so that files are closed are we are done with them

  - Use the `-z` flag when shelling out to `git ls-files` so that we can catch newlines and other weird control characters in filenames.

  From the `git ls-files` manpage:
  ```
  -z \0 line termination on output and do not quote filenames. See OUTPUT below for more information.

  Without the -z option, pathnames with "unusual" characters are quoted as explained for the configuration variable
  core.quotePath (see git-config(1)). Using -z the filename is output verbatim and the line is terminated by a NUL byte.
  ```

ACKs for top commit:
  MarcoFalke:
    cr ACK 2227fc4
  practicalswift:
    cr ACK 2227fc4: patch looks correct

Tree-SHA512: af059a805f4a7614162de85dea856052a45ab531895cb0431087e7fc9e037513fa7501bb5eb2fe43238adf5f09e77712ebdbb15b1486983359ad3661a3da0c60
Comment on lines +24 to +25
ALLOWED_PERMISSION_NON_EXECUTABLES = 644
ALLOWED_PERMISSION_EXECUTABLES = 755
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR, when I run the linter I see a flood of errors and have to disable it. It seems inappropriate to check user, group, read, and write permissions when the only permission git tracks is whether the file is marked executable. Other permissions are local to the checkout and don't affect the project or other developers. Errors I see are:

[...]
File "test/lint/lint-whitespace.sh" contains a shebang line, but has the file permission 700 instead of the expected executable permission 755. Do "chmod 755 test/lint/lint-whitespace.sh" (or remove the shebang line).
File "test/util/bitcoin-util-test.py" contains a shebang line, but has the file permission 700 instead of the expected executable permission 755. Do "chmod 755 test/util/bitcoin-util-test.py" (or remove the shebang line).
File "test/util/rpcauth-test.py" contains a shebang line, but has the file permission 700 instead of the expected executable permission 755. Do "chmod 755 test/util/rpcauth-test.py" (or remove the shebang line).
ERROR: There were 2464 failed tests in the lint-files.py lint test. Please resolve the above errors.
^---- failure generated from test/lint/lint-files.sh

@laanwj laanwj mentioned this pull request Apr 6, 2022
25 tasks
gwillen pushed a commit to ElementsProject/elements that referenced this pull request Jun 1, 2022
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write linter to check file permissions
8 participants