Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detectindent: Don't consider subsyntax patterns #1250

Closed

Conversation

Guldoman
Copy link
Member

Fixes #1247.

We aren't checking when a certain pattern should be used anyways (thus using subsyntax patterns outside of the subsyntax code blocks), and recursing over subsyntaxes can cause problems, so we should just ignore them.

@jgmdev
Copy link
Member

jgmdev commented Dec 20, 2022

Tested the change and it breaks proper indent size auto-detection on PHP files, the following example is detected as indent size of 1 instead of 4:

<?php
/**
 * Copyright ###, Blah
 * This file is part of Blah and licensed under the GPL,
 * check the LICENSE.txt file for version and details or visit
 * https://opensource.org/licenses/GPL-3.0.
 *
 * The main execution entry point of Blah.
 */

//Check PHP version complies with the minimum required.
if(substr(PHP_VERSION, 0, 1) < 8)
{
    if(substr(PHP_VERSION, 0, 1) < 7 && substr(PHP_VERSION, 2, 1) < 4)
    {
        exit("Error: please use PHP 7.4.0 or later.");
    }
}

//Time when script started executing useful to measure execution time.
$time_start = microtime(true);

//Register autoloader
require 'src/Autoloader.php';
Org\Autoloader::register();

//Shorthand functions commonly used on legacy templates
require 'src/Aliases.php';

//Include backward compatible functions if include dir exists
if(file_exists("include/forms.php"))
{
    require 'src/DeprecatedFunctions.php';
}

@Guldoman
Copy link
Member Author

PHP might be the only language where this issue can present itself because the "main" part of the syntax is a subsyntax.

The correct way to manage this would be to use (again?) the tokenizer on the first few lines.

@jgmdev
Copy link
Member

jgmdev commented Dec 20, 2022

The correct way to manage this would be to use (again?) the tokenizer on the first few lines.

I would opt to limit the recursion amount as Jan suggested because going back to tokenizer will be slower performance.

@Guldoman
Copy link
Member Author

We can probably just defer setting the indent size without blocking to after the tokenizer completes something like 50 lines.

My issue with the current behavior is that subsyntaxes can interfere with the detection. For example the markdown syntax includes many different subsyntaxes that each define their own comment patterns, and those could create problems (for example Lua defines # as comment).

@jgmdev
Copy link
Member

jgmdev commented Dec 21, 2022

r example the markdown syntax includes many different subsyntaxes that each define their own comment patterns, and those could create problems (for example Lua defines # as comment).

At least in that case we wouldn't need to worry about # getting ignored since those most of the time (if not always) start without spaces, making them useless for indent detection purposes. So far autodetect on its current state has worked almost 100% of times with what I have thrown to it without issues. In the past we used the tokenizer and detection wasnt more reliable, it was actually Iess accurate (maybe the way it was setup). I would focus on fixing the infinite recursion issue which is the real problem for now without breaking current behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recursive subsyntax causes detectindent to hang
2 participants