Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic/LowercasedFilename: sniff doesn't handle non-ANSII characters properly #682

Open
4 tasks done
rodrigoprimo opened this issue Nov 13, 2024 · 1 comment
Open
4 tasks done

Comments

@rodrigoprimo
Copy link
Contributor

rodrigoprimo commented Nov 13, 2024

Describe the bug

While working on improving code coverage for the Generic.Files.LowercasedFilename sniff (#681), I noticed that it fails to properly handle file names that contain uppercase non-ANSII characters as it uses strtolower() to check if the filename is all lowercase. strtolower() ignores non-ANSII characters.

$lowercaseFilename = strtolower($filename);

Code sample

<?php

To reproduce

Steps to reproduce the behavior:

  1. Create a file called tÉst.php with the code sample above.
  2. Run phpcs tÉst.php --standard=Generic --sniffs=Generic.Files.LowercasedFilename
  3. No error message is displayed.

Expected behavior

PHPCS should display the following error message:

----------------------------------------------------------------------------------
FOUND 1 ERROR AFFECTING 1 LINE
----------------------------------------------------------------------------------
 1 | ERROR | Filename "tÉst.php" doesn't match the expected filename "tést.php"
----------------------------------------------------------------------------------

Versions (please complete the following information)

Operating System Ubuntu 24.04
PHP version 8.3
PHP_CodeSniffer version master
Standard Generic
Install type git clone

Please confirm

  • I have searched the issue list and am not opening a duplicate issue.
  • I have read the Contribution Guidelines and this is not a support question.
  • I confirm that this bug is a bug in PHP_CodeSniffer and not in one of the external standards.
  • I have verified the issue still exists in the master branch of PHP_CodeSniffer.
@jrfnl
Copy link
Member

jrfnl commented Nov 24, 2024

@rodrigoprimo Thanks for finding and reporting this issue.

While this is an interesting issue from a technical perspective, I consider this issue a low priority issue unless and until end-users of PHPCS would report they are running into it.

I wonder how common it is to have non-ASCII characters in file names ? I also have a gut-feeling files like that may not always be portable cross-OS, but this would need to be researched and confirmed/debunked first.
If my gut-feeling would turn out to be correct, I can imagine non-ASCII characters in file names might deserve their own sniff (to forbid this).

I also wonder how we could detect this reliably as, while the file contents has an encoding, I don't know how we could figure out the encoding for the file name. I imagine the encoding might be based on the OS ?
File name vs encoding is a curiosity which I've never dug into, so I'd be very interested to hear from someone who has and who can shed more light on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants