Boost in tokenizer performance with 3 liner #896

jgmdev · 2022-03-24T05:32:37Z

While working on improving responsiveness of lite-xl while highlithing on #885 I noticed that the slow downs in tokenization process where occurring on lines with long amounts of consecutive spaces. It seems the tokenizer was trying to apply each of its rules to each of the spaces found which slowed things a lot.

Adding a rule of %s+ that matches to "normal" in the beginning of every syntax table, will hugely improve the performance of the tokenizer basically for free! I haven't measured yet the gains but it is noticeable how tokenization is much more faster now.

adamharrison · 2022-03-24T15:41:56Z

Tested this; seems legit.

Of course, this means we can't consider whitespace to be part of token starts. I don't think any language currently does this, so this is probably fine. And given that you're right, this would absolutely improve performance by a significant degree, I think this is worth it.

Any objections? Otherwise, let's merge. We can always take it out later if there is some language that this conflicts with (but I can't think of any off hand).

jgmdev · 2022-03-24T16:43:45Z

Without this change opening the plugins readme as shown on #885 was like this:

highlight-no-lazy-mode.mp4

and with this change it is now like this:

with-match-spaces-rule.mp4

adamharrison · 2022-03-24T16:49:42Z

Oh wow, that is quite noticeable! We'll merge tomorrow if no one has any further comments.

* mainly the language_md got affected which has some exotic rules * some other languages are also using spaces at start of pattern and even if not affected this change tackles that

syntax: fix conflicts introduced with #896

* mainly the language_md got affected which has some exotic rules * some other languages are also using spaces at start of pattern and even if not affected this change tackles that

syntax: add pattern to boost tokenizer performance

6488c85

github-actions bot added the Category: Lua Core label Mar 24, 2022

adamharrison merged commit 951f091 into lite-xl:master Mar 25, 2022

jgmdev mentioned this pull request Mar 25, 2022

highlighter: added lazy mode for low end machines. #885

Open

jgmdev added a commit that referenced this pull request Mar 29, 2022

Merge pull request #904 from jgmdev/PR/fix-syntax-optimization

fac54d2

syntax: fix conflicts introduced with #896

adamharrison pushed a commit to adamharrison/lite-xl that referenced this pull request Apr 1, 2022

syntax: add pattern to boost tokenizer performance (lite-xl#896)

c8cbe69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boost in tokenizer performance with 3 liner #896

Boost in tokenizer performance with 3 liner #896

jgmdev commented Mar 24, 2022

adamharrison commented Mar 24, 2022

jgmdev commented Mar 24, 2022

adamharrison commented Mar 24, 2022

Boost in tokenizer performance with 3 liner #896

Boost in tokenizer performance with 3 liner #896

Conversation

jgmdev commented Mar 24, 2022

adamharrison commented Mar 24, 2022

jgmdev commented Mar 24, 2022

adamharrison commented Mar 24, 2022