Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial rewrite of PHP lexer #368

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

julp
Copy link
Contributor

@julp julp commented Dec 29, 2015

  • translation of PHP's lexer into rouge
  • add lexer option short_open_tag to permit highlighting as if
    short_open_tag were off when disabled (enabled by default)
  • add some missing features:
    • binary numbers (0b...) (introduced by PHP 5.4.0)
    • string interpolation in backquoted strings (`mysqldump -u $user -p$pwd $db > $file`)
    • Unicode codepoints escape syntax (\u{...}) (introduced by PHP 7.0.0)
  • stricter syntax:
    • <?php is case insensitive
    • a "blank" have to follow <?php (but not <?= nor <?)
    • for doc comments, /** is followed by at least one whitespace
    • keywords are case insensitive
    • function/method names are also case insensitive
  • add some missing keywords:
    • type declarations (PHP 7, including void and nullable types from PHP 7.1)
    • 7.0.0: class (anonymous classes), yield from
    • 5.5.0: finally
    • 5.4.0: callable, insteadof, trait, __TRAIT__
    • 5.3.0: goto, __NAMESPACE__, __DIR__
    • function, const added to keywords (can't manage them with a rule anymore, it
      conflicts with use - eg: use constant \A\B\C; as introduced by PHP 5.6.0)
    • others: casts, instanceof, __CLASS__, __FUNCTION__, __METHOD__, __halt_compiler
    • self even if it's not really a reserved word
  • cleanup in keywords, removal of:
    • predefined constants (eg: E_ERROR or PHP_OS) - which, besides, are case sensitive
    • predefined classes (as stdClass)

Also fixes: #338 and #348

To give it a try, (temporarily) replace gem 'rouge' in your Gemfile by:

gem 'rouge', :github => 'julp/rouge', :branch => 'php_lexer_rewritten'

@julp julp force-pushed the php_lexer_rewritten branch from a1b21d5 to 8ac7fc7 Compare December 29, 2015 20:32
@julp julp force-pushed the php_lexer_rewritten branch from 8ac7fc7 to 1ac94d2 Compare January 10, 2016 16:24

DEFAULTS = Hash.new(Error).tap do |h|
# :in_scripting => Error,
# :var_offset => Error,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete commented code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, could you explain to me what the DEFAULTS hash is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some states are "inherited" in few others, so to set the current token's type, I use this hash to map it to the correct type based on the current (top level) state's name. eg to highlight a , in "...,..." as Str::Double but in an expression like "$a[,]" as an error.

Do you have a better approach in mind?

@jneen
Copy link
Member

jneen commented Jun 2, 2016

Hi @julp! Thanks for your work. I'm going to leave this for later, because it looks like this needs a good deal of review, which I'll get to when things have settled down a bit more.

@jneen jneen added lexer-request A request for a lexer to be developed. discussion-open needs-review The PR needs to be reviewed labels Jun 2, 2016
@julp julp force-pushed the php_lexer_rewritten branch from 1ac94d2 to 373ca5a Compare June 3, 2016 14:16
julp added 5 commits December 21, 2016 17:13
* translation of PHP's lexer into rouge
* add lexer option short_open_tag to permit highlighting as if
  short_open_tag were off when disabled (enabled by default)
* add some missing features:
  + binary numbers (introduced by PHP 5.4.0)
  + string interpolation in backquoted strings
  + Unicode codepoints escape syntax (\u{...}) (introduced by PHP 7.0.0)
* stricter syntax:
  + <?php is case insensitive
  + a "blank" have to follow <?php (but not <?= nor <?)
  + for doc comments, /** is followed by at least one whitespace
  + keywords are case insensitive
  + function/method names are also case insensitive
* add some missing keywords:
  + type declarations (PHP 7, including void planned for 7.1.0 - master)
  + 7.0.0: class (anonymous classes), yield from
  + 5.5.0: finally
  + 5.4.0: callable, insteadof, trait, __TRAIT__
  + 5.3.0: goto, __NAMESPACE__, __DIR__
  + function, const added to keywords (can't manage them with a rule anymore, it
    conflicts with use - eg: use constant \A\B\C; as introduced by PHP 5.6.0)
  + others: casts, instanceof, __CLASS__, __FUNCTION__, __METHOD__, __halt_compiler
  + self even if it's not really a reserved word
* cleanup in keywords, removal of:
  + predefined constants (eg: E_ERROR or PHP_OS) - which, besides, are case sensitive
  + predefined classes (as stdclass)
@julp julp force-pushed the php_lexer_rewritten branch from f6747e0 to b70a812 Compare December 21, 2016 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-open lexer-request A request for a lexer to be developed. needs-review The PR needs to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

php hilight, strings with objects in
2 participants