Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create julia lexer and tags parser #2584

Merged
merged 2 commits into from
Jul 20, 2021
Merged

Create julia lexer and tags parser #2584

merged 2 commits into from
Jul 20, 2021

Conversation

getzze
Copy link
Contributor

@getzze getzze commented Sep 11, 2020

resolves #434

[Edit2] Lexer

  • using splatting ... and % make the end of the line being recognized as comment
  • parse begin and end as numbers inside indexing brackets, so code folding works.
  • complex strings with interpolation are not correctly parsed.
  • macros are not colored
  • parse unicode operators and variables -> partially done
  • symbols are not recognized as special type (like strings).
  • recognize string literals (r"text") - the list of keywords that work as raw string literals is modifiable in filetypes.julia using rawliterals in the keywords section.
  • type annotations with ::, <: are not emphasized
  • correct code folding with list comprehension
  • do not highlight interpolation in raw and r (regex) string literals

[Edit] Link to official Julia parser: https://github.com/JuliaLang/julia/blob/master/src/julia-parser.scm
and the C library used in Julia for unicode parsing: https://github.com/JuliaStrings/utf8proc

@getzze getzze changed the title Create julia lexer, no ctags Create julia lexer and tags parser Sep 12, 2020
@eht16
Copy link
Member

eht16 commented Sep 12, 2020

I didn't review the code, just an early note:
the Scintilla lexer should be first submitted to and accepted by the Scintilla project (https://www.scintilla.org/).
Also the Ctags parser should be first submitted to and accepted by the Universal Ctags project (https://github.com/universal-ctags/ctags).
See also universal-ctags/ctags#1566 (comment) where someone also started a parser.

We want to stay inline with the upstream projects and so the code should be accepted there first.

@getzze
Copy link
Contributor Author

getzze commented Sep 14, 2020

Ok, thanks.
They will need to be tested by julia+geany users first anyway.

@elextr
Copy link
Member

elextr commented Sep 15, 2020

They will need to be tested by julia+geany users first anyway.

And a reminder to those who test it to comment, that lets @getzze know its working (or what needs fixing :) and it lets Geany project devs know there is a demand.

@getzze
Copy link
Contributor Author

getzze commented Oct 3, 2020

@AndiMD
Copy link

AndiMD commented Dec 24, 2020

Thanks for your efforts so far @getzze. I'd be happy to see this merged.
I wanted to test, but have trouble building the branch julia_lexxer in a standard Kubuntu 20.04 environment (gcc 9.3.0):

  CXX      lexers/LexMake.lo
lexers/LexJulia.cxx: In static member function ‘static Scintilla::ILexer* LexerJulia::LexerFactoryJulia()’:
lexers/LexJulia.cxx:123:25: error: invalid new-expression of abstract class type ‘LexerJulia’
  123 |   return new LexerJulia();
      |                         ^
lexers/LexJulia.cxx:90:7: note:   because the following virtual functions are pure within ‘LexerJulia’:
   90 | class LexerJulia : public DefaultLexer {
      |       ^~~~~~~~~~
In file included from lexers/LexJulia.cxx:25:
./include/ILexer.h:92:34: note:         ‘virtual const char* Scintilla::ILexerWithIdentity::PropertyGet(const char*)’
   92 |  virtual const char * SCI_METHOD PropertyGet(const char *key) = 0;
      |                                  ^~~~~~~~~~~
  CXX      lexers/LexMarkdown.lo

@getzze
Copy link
Contributor Author

getzze commented Dec 25, 2020

Thanks for testing it, I commited a correction, you should be able to compile it now. Happy to hear your feedback.

@AndiMD
Copy link

AndiMD commented Dec 27, 2020

Thanks, that fixed it. Looking pretty good already. I will continue to test this in the coming days and report back.

Two points I noticed instantly:

  1. Highlighting interpolated strings seems inconsistent: Depending on using $(var) vs $var and single quote vs triple quote strings, I see different results. Example:
    "($scanPosX,$scanPosY): (k±Δk)=$(kLaserApprox) $(begin x+5 end)"
    """($scanPosX,$scanPosY): (k±Δk)=$(kLaserApprox) $(begin x+5 end)"""

  2. Macros confuse the lexer (possibly we have to live with that, since macros can have arbitrary syntax)
    Example:

Base.@kwdef struct A
    a::Float64 = 1E-19 
    b::Float64 = 1E-22 
end

shows that the struct contains Symbols E

@AndiMD
Copy link

AndiMD commented Dec 30, 2020

@elextr @getzze I have not discovered any further issues. If more testers or feedback on the level of demand is required, I suggest announcing this on https://discourse.julialang.org/.

@elextr
Copy link
Member

elextr commented Dec 30, 2020

@AndiMD thats entirely up to @getzze, remember no Geany devs are Juliaists, so they have no knowledge or need to support this.

Also the failing CI needs fixing.

@getzze
Copy link
Contributor Author

getzze commented Dec 31, 2020

@AndiMD Thanks for reporting these problems! I think it would be great to announce it on Julia discourse.

1. Highlighting interpolated strings seems inconsistent: Depending on using `$(var)` vs `$var` and single quote vs triple quote strings, I see different results. Example:
   `"($scanPosX,$scanPosY): (k±Δk)=$(kLaserApprox) $(begin x+5 end)"`
   `"""($scanPosX,$scanPosY): (k±Δk)=$(kLaserApprox)  $(begin x+5 end)"""`

Interpolation highlighting is only done for single-quote strings, but I am planning to allow it in triple-quote and commands also.
For $(var) vs $var, I didn't find a way to highlight $( in the former so I ended up highlighting only what is inside the parenthesis. Because in the latter case there are no parenthesis, nothing is highlighted. I can try to find a way to highlight both, including the $.

2. Macros confuse the lexer (possibly we have to live with that, since macros can have arbitrary syntax)
   Example:

Base.@kwdef struct A
a::Float64 = 1E-19
b::Float64 = 1E-22
end


Thanks for reporting, that are actually two bugs:

  • defined values for the attributes are not parsed at all
  • Base.@kwdef is not recognized as a macro (@kwdef is well recognized):
@eval fun1(x) = $pi        # recognizes `fun1` short function definition
Base.@eval fun1(x) = $pi   # does not recognize `fun1` short function definition

@logankilpatrick
Copy link

Love this initiative! Thank you all.

@AndiMD
Copy link

AndiMD commented Mar 18, 2021

Found another edge case: (2E9) / (2e9)/ (2f9): the right bracket is highlighted as part of the number.

@getzze
Copy link
Contributor Author

getzze commented Mar 26, 2021

Thanks for the feedback.
I pushed a patch for it, along with some other improvements:

  • merge ctags from upstream
  • do not highlight interpolation for raw strings
  • correct folding
  • add REPL autocompletion using snippets (needs to change wordchars)

@oscardssmith
Copy link

What's the status on this? I love both Julia and Geany and this looks like a really good improvement.

@elextr
Copy link
Member

elextr commented May 19, 2021

@oscardssmith three problems:

  1. Geany regular contributors are very quiet ATM for various reasons, but then none of them are Juliaists anyway
  2. As noted in the first comment, the lexer and parser need to be upstreamed, even when 1. improves Geany doesn't have the effort to maintain its own versions of those
  3. Scintilla is going through a major evolution, and is still classed as unstable by its maintainer, so any lexer submitted there will be a while before its added to Geany (after Geany is updated to the eventually stable new Scintilla)
  4. it has merge conflicts which prevent CI building to see if its ok

oops thats 4, anyway at least you could build your own test if the merge was fixed.

@getzze
Copy link
Contributor Author

getzze commented May 20, 2021

I just pushed new commits to solve conflicts.
As @elextr said, julia ctag parser has been merged upstream but from Scintilla it will take much longer (they only reviewed aesthetic aspects of the code so far).
But you can build from source applying this PR.

@getzze
Copy link
Contributor Author

getzze commented Jun 2, 2021

LexJulia has been merged upstream by Scintilla (ScintillaOrg/lexilla@6ee6b08 and later commits for minor corrections).
I will push a clean rebase of this PR so it can be merged to geany.

@getzze
Copy link
Contributor Author

getzze commented Jun 2, 2021

The PR has been rebased.
The ctags parser file is identical to upstream and the scintilla lexer has minimal change (3 lines) because upstream is using lexilla v5 and geany is still at v4.
Geany-specific files need to be reviewed.

Copy link
Member

@elextr elextr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First glance at it only a few comments, also for the Julia lexer could you add a diff that applies the three lines (I think you said) to backport it from Lexilla, probably best in the same place as scintilla_changes.patch and add a prompt to update_scintilla.sh to use it so the changes are known and can be re-applied if another backport is needed.

[Edit: to be clear something to update_scintilla.sh like echo "Don't forget to manually copy the julia lexer from Lexilla and apply 'julia_lexer.patch' to it no point in trying to automate getting it from Lexilla, leave that for Scintilla 5 update]

data/filedefs/filetypes.common Outdated Show resolved Hide resolved
data/snippets.conf Outdated Show resolved Hide resolved
data/filedefs/filetypes.common Outdated Show resolved Hide resolved
data/filedefs/filetypes.julia Outdated Show resolved Hide resolved
@getzze
Copy link
Contributor Author

getzze commented Jun 21, 2021

There are still some open issues with Lexilla: ScintillaOrg/lexilla#13

@elextr
Copy link
Member

elextr commented Jun 22, 2021

I didn't look at the lexer code, but most lexers load the "keyword" lists from Geany. Not sure how you can approach the two word keywords (curse Jeff and his femtolisp parser :-) where each word is a valid variable in other contexts. Unless you simply build it into the lexer, have it know and find the two word keywords first and do not test any of the keyword list keywords in that case.

A similar (unsolved) issue happens with C++ where names like override which are keywords in one context and variable or type names in other contexts are currently keyword highlighted in all contexts since the lexer doesn't have sufficient capability to distinguish the context, it also happens in other IDEs where highlighting is separate from parsing (eg Eclipse).

So its not unprecedented if some edge cases are wrongly highlighted, just raise an issue after this is committed.

@elextr
Copy link
Member

elextr commented Jun 23, 2021

@getzze can you push some inconsequential change (or a big change if you have one :) to see if CI has been fixed?

@elextr
Copy link
Member

elextr commented Jun 24, 2021

Yay CI works again!!

One thing you might like to consider (can be later) is a test for the ctags parser, see tests/ctags so CI would notice if a ctags infrastructure update like #2830 broke your parser.

@elextr
Copy link
Member

elextr commented Jun 24, 2021

Have approved the changes (pending build-menu :) and LGTM, but havn't tested it yet, might have time in a few days then should be mergable.

@getzze
Copy link
Contributor Author

getzze commented Jun 24, 2021

Yay CI works again!!

One thing you might like to consider (can be later) is a test for the ctags parser, see tests/ctags so CI would notice if a ctags infrastructure update like #2830 broke your parser.

That's a good idea. What is the format of this file? There are unit test for julia in the ctag repository, these files can be used maybe.

@getzze getzze closed this Jun 24, 2021
@getzze getzze reopened this Jun 24, 2021
@elextr
Copy link
Member

elextr commented Jun 24, 2021

IIUC most of the tests are simply from ctags anyway, many of their names relate to ctags bug reports

IIUC the tests are just one (or more) source file(s) mentioned in Makefile.am and tags file(s) with the same name but extension .tags (generated by Geany, they are tagmanager format, which is binary, so not manually generated). I think mentioning the source files in Makefile.am sources list makes the existing test harness feed the source to Geany and generate and compare to the tags files.

@getzze
Copy link
Contributor Author

getzze commented Jun 24, 2021

I pushed some changed from Lexilla (like using a new keyword list for coloring builtin functions).

I added a patch file in the scintilla folder, so the lexer works with lexilla v4.
Also added a ctags test file (I didn't manage to generate the binary file).

@getzze
Copy link
Contributor Author

getzze commented Jun 28, 2021

The issue with Lexilla is solved. It's ready to merge.
If needed, I can rebase to one commit.

@getzze
Copy link
Contributor Author

getzze commented Jul 6, 2021

I just rebased to two commits. I left the ctags test file in a separate commit because I am not sure it is needed, julia parser is well tested in ctags upstream.

@elextr elextr merged commit 90c6096 into geany:master Jul 20, 2021
@elextr
Copy link
Member

elextr commented Jul 20, 2021

WFM, although testing limited, thanks @getzze

@oscardssmith
Copy link

Thanks so much for getting this pushed through!

@VarLad
Copy link

VarLad commented Aug 4, 2021

@elextr Any idea when's the next release?

@eht16
Copy link
Member

eht16 commented Aug 7, 2021

Hopefully it will happen in September, if not at least this year :).

@czylabsonasa
Copy link

czylabsonasa commented Aug 1, 2022

Hello,
in the following snippet,
the 4th line after the # char constant parsed as comment:

ok=true
for r=2:R+1,c=2:C+1
  if table[r,c]=='#'
    all(table[r+d[1],c+d[2]]!='#' for d=[(1,0),(-1,0),(0,1),(0,-1)])&&(ok=false;break)
  end
end

it can be resolved by putting an extra space after the operator.
geany version is 1.39.

image

@oscardssmith
Copy link

That's not a correct fix. The correct fix would be to check whether you're inside or outside quotation marks.

@elextr
Copy link
Member

elextr commented Aug 1, 2022

Its actually lexing the first ' as an operator, possibly a postfix transpose, then the # is indeed a comment.

Lexers come from the Lexilla project. Since Geany folks don't know Julia, please post the issue there so the Julia lexer maintainer can talk to you directly.

Post a link to the issue here.

@czylabsonasa
Copy link

czylabsonasa commented Aug 1, 2022

@oscardssmith: i did not mean it as a fix, just as a how to get rid of it. :-)
@elextr: thx, i'll file an issue.

@getzze
Copy link
Contributor Author

getzze commented Aug 3, 2022

Thanks for catching this bug!
I submitted a fix to lexilla: ScintillaOrg/lexilla#98

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Julia support
8 participants