-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite PHP lexer to support use statements, function declarations and type declarations #1489
Conversation
@julp I'm sorry to ask but could I ask you to have a look at this? You mentioned in 826b516 that the commit fixed #1353 and #1361 but I don't think that's the case. To better tokenise the identifiers in the |
@HypeMC Sorry it's taken so long to get to this. Here's what it looks like now: Let me know if you have any thoughts. |
This is the question I was about to ask: for PHP, should we highlight names based on conventions?
(with a potential namespace in front of all of these - Also note that I think we shouldn't bother to handle use /*Foo, */ Bar; <?php use Foo ?> use /*not*/some\name\{
function some_fn,
#const Foo\BAR,
SomeClass,
}; Also, what do you think about this approach? |
@julp I got really excited by your idea and used it to further simplify the lexer. I made a couple of changes to how your code worked to make the tokens more specific (especially regarding functions) but it's largely the same concept. I did have to fiddle with one of the tests. We previously had it so that the What do you think now? |
@julp I've probably become a bit carried away. I ended up replacing Eager to hear your thoughts. |
Awesome, good job, it's way better! Could we not lookup for a builtin if we are lexing one of those statements? See the log method ( I just saw that I've forgotten to add Some more points I've missed:
|
@julp I kind of feel like using a stack is the wrong way to go about it but I couldn't think of a better way so I brought the concept into this PR. Rather than refer to it as a 'stack', I called it a 'magic token' and changed the API a little bit. I also kept the name-guessing heuristic (and kept the broader identifier pattern that I had introduced last time to make that possible). What do you think? One difference is that your code has this rule about closing parentheses. I didn't see the point of it so have left it out but let me know if there's something that I'm missing. |
A state machine would be a better fit (independant from lexer states to handle only relevant tokens). The more you want to handle, the more complex it becomes. At start, it seemed easy but you add one case then another then... (it's PHP, not Ruby or Erlang/Elixir sadly) Based only on conventions, the result is not so bad, it seems pretty accurate to me and probably is the best compromise accuracy/simplicity.
For something like For To avoid a conflict on typehinting with nullable types ( class Foo extends AbstractFoo implements Bar {
private ?PDO $dbh = NULL;
}
$a = fn (int $a, int $b): int => FOO + $a <=> $b;
echo $a(1, 2);
echo self::class;
echo FOO;
echo $condition?FOO:BAR;
echo $condition??DEFAULT; Someone else tried the code and/or have any thoughts? |
@julp Thanks again for all the help so far. I'm still having a think about the best way to do things. I agree that a state machine is largely inevitable but some of my hesitancy comes from the fact that Rouge already provides a state machine with the states in the lexer. Having an additional state machine feels duplicative and inconsistent with the way the other lexers are implemented. |
Sure, I understand. What ever we could do, we will never get something 100% accurate. I think we have to live with that and keep things as easy and simple as possible in order to facilitate maintenance and future evolvements inherent to the language. Thanks for your time and support as well. |
d0b0b5f
to
d33ef44
Compare
@julp I know it's been a few months since the last update but I hadn't completely forgotten about this PR (although you could have been forgiven for thinking otherwise!). I've made some rather large changes to the structure but I think the result is easier to understand and easier to extend. The main idea was to move from using a I also revised the PHP visual sample; ripping out the copypasta and trying to group examples together so that they are more logically arranged. II don't know if you've got time at the moment but if you have, could you have a look and let me know what you think? |
Hi, you did a great job however PHP is not a mature language with some kind of final/established syntax like many others. PHP 8.0.0 is already in alpha and some major changes are coming: union-types, attributes (a replacement for annotations as comments), pattern matching (an alternative to switch/case - seems not yet present in alpha releases), a mixed type, ... (sorry, I feel your pain) I'm a little worried, with such a technique, on how difficult it might become to maintain the lexer (it begins to do what a parser should in my opinion). For now, you don't have so many intermediate states so it seems manageable but with PHP, expect regular changes (= always more cases to handle). |
@julp Well, we also have a JavaScript lexer so this isn't a completely unknown problem :P I did see some of these proposals and structured this lexer partly with extensibility in mind. The reason that I mixed in the If the lexer looks like it's highlighting PHP correctly (for now), I think we can merge it in and then handle changes to the syntax in due course. |
Hi, I haven't tested it on invalid scripts but the result seems really good. I found one issue: the leading
On the above example DatePeriod is also higlighted as a function (instead of class) but I think this is due to this leading \ issue because with this error token we consequently fallback in recover/reset mode (the I also noted that on the Anyone else gave it a shot? |
@julp At this point, I think it's easiest to generate feedback by putting it out into the world and seeing the response. I'm including it in the next version of Rouge I'm prepping right now. Thanks for all your help! I couldn't have done what I did without your advice and support :) |
…d type declarations (rouge-ruby#1489) This commit involves a large-scale rewrite of the PHP lexer with the aim of improving the level of semantic detail. Partly this is achieved by the use of various heuristics that, while not always correct, do hold for the majority of cases. In particular, the lexer has improved support for `use` statements, function declarations and type declarations. Co-authored-by: julp <julp@users.noreply.github.com>
PHP's
use
keyword can be used to (1) import namespaces into the current scope and (2) include traits in a class. While the PHP lexer currently supportsuse
in some contexts, it doesn't allow for the importing of multiple namespaces at once, nor does it tokenise a given input correctly whenuse
is used inside a class.This PR improves the support for
use
. It uses a simple conditional to determine whether identifiers that are captured by the rules in the:uselist
state are keywords, namespaces or function names.It fixes #1353 and it fixes #1361.