-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Perl 6 Pod lexer #978
base: master
Are you sure you want to change the base?
Add Perl 6 Pod lexer #978
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your submission. I'm trying to help triage lexer submissions to see if we can work down the pull request backlog... Please take a look at the comments.
lib/rouge/lexers/perl6pod.rb
Outdated
state :item do | ||
rule(/\n/, Text::Whitespace, :pop!) | ||
|
||
rule(/B\</, Punctuation::Indicator, :formatting_b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The :formatting_... part seems excessive. You can do this with far fewer rules. E.g, something like:
formatting_tokens = { "B" => Generic::String, ... }
rule(/([BCEIKLN...]<)([^>]*)>/) do m
t = formatting_tokens[m[0][0]]
groups Punctuation::Indicator, t , Punctuation::Indicator
end
.. or similar ought to work. (Untested; let me know if you run into problems)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I use this, I lose the <
and >
in the output.
I can look at this for a bit. Do you need any help putting these changes together @Tyil? I'm also wondering if there's a way to call into this parser from a theoretical Perl 6 parser I might write. Or would it make more sense to just merge it in at a later date? POD6 files are valid Perl 6, right? |
On Sat, 06 Jul 2019 01:53:52 -0700 Brent Laabs ***@***.***> wrote:
I can look at this for a bit. Do you need any help putting these
changes together @Tyil?
I probably will, Ruby is not my first language after all. However, this
weekend I have plans already, but I should have some spare time during
the week and the following weekend to look at this again.
I'm also wondering if there's a way to call into this parser from a
theoretical Perl 6 parser I might write. Or would it make more sense
to just merge it in at a later date? POD6 files are valid Perl 6,
right?
POD6 is valid Perl 6, yes. If you were to write a Perl 6 parser, I
would be interested in looking at it, as the language is huge and I
would presume it to be very hard to parse in different languages. The
people in #perl6 on Freenode can certainly help you out if needed in
this aspect, though.
…--
With kind regards,
Patrick Spek
www: https://www.tyil.nl/
mail: p.spek@tyil.nl
pgp: 1660 F6A2 DFA7 5347 322A 4DC0 7A6A C285 E2D9 8827
|
@labster I've added some changes that hopefully make the PR on itself better, however, |
@Tyil I suspect the problem is that |
The white space is easily solved, and should be OK now. I'll take a look at the other points you've made. |
OK, I updated the |
@jneen Is creating tokens per character (as happens in a number of places here) worse from a performance perspective than matching as much of the text as possible? Is that an anti-pattern we should discourage? |
Pinging @jneen regarding the question in the previous comment. |
Yes - try to avoid matching character-at-a-time. |
If you're waiting to encounter a specific set of characters, use a negative character class, like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some comments :) Oh, and you're missing a spec file that will run tests. You can see some examples in spec/lexers/
:)
lib/rouge/lexers/perl6pod.rb
Outdated
rule(/\=output/, Keyword, :output) | ||
rule(/\=defn/, Keyword) | ||
|
||
rule(/(BCEIKLNTUZ)<([^>]*)>/) do |m| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is (BCEIKLNTUZ)
meant to be a character class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was, so I tried to fix that. However, this particular rule
seemed to drop the <
and >
characters, which are important to keep. If I can get pointers on how to do something like this that keeps the <
and >
characters, I can put a working variant back into the PR.
I was wondering if there's a way to forcefully insert a character into the resulting document, so I can simply push the <
and >
back in. If there is such a method, I could rewrite some of the =begin ... =end
logic to be a single block as well, I think. This would reduce the overall size of the codebase and duplicated efforts for all the special blocks that I have right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to read through Ruby's Regexp
documentation. It sounds like that might be helpful in consolidating a few things.
To answer your specific question, I think what you want to use are capture groups. This allows you to 'capture' the value matching the capture group's pattern. Using this, you could rewrite this:
rule(/\=begin code/, Keyword, :block_code)
rule(/\=begin input/, Keyword, :block_input)
rule(/\=begin output/, Keyword, :block_output)
as this:
rule(/(\=begin)( )(code|input|output)/i) do |m|
s = case |m[3].downcase| # use the result of the third capture group
when "code" then :block_code
when "input" then :block_input
when "output" then :block_output
end
groups Keyword, Text, Keyword # use the groups method to assign tokens to capture groups in order
push s # push the new state onto the stack
end
The rule
method can take a block (as in the example above) that accepts a single variable (the convention in Rouge is to call the variable m
for 'match'). Within the block you can access the capture groups with the []
operator. m[0]
is the complete pattern, m[1]
is the first capture group, m[2]
is the second capture group and so on.
The markup stuff has been removed, as it currently doesn't work (the < and > characters were being dropped, which is not desired). Additionally, applying some requests eventually broke the block attributes, so a new solution to get these highlighted correctly needs to be found. If I can find a solution to the former, I can most likely also resolve the latter, while reducing the codebase.
I will look into adding a spec once I get the rendering to look the way I intended. |
I've been working to add Perl 6 Pod as a Lexer, to improve the current support for Perl 6 Pod on GitLab.
I've tested my lexer visually, and I am reasonably content with the results thus far. However, before this can be merged, there's still some issues to iron out.
rake
fails, reporting a lot of<Token Error>
symbols.TODO
that should be resolved. It won't pose problems most of the times, but I'd rather not have it posing problems ever.If possible, I'd like to have some feedback to grasp the issue that makes
rake
fail right now, so that I can fix this. I'd also like to have some pointers on fixing theTODO
.