Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lexer development documentation #1111

Merged
merged 4 commits into from
May 31, 2019

Conversation

pyrmont
Copy link
Contributor

@pyrmont pyrmont commented May 16, 2019

Rouge does not have a single document explaining how a lexer should be developed. The README contains a simplified example and there are existing lexers that can be looked at for reference but these are inconsistent and do not clearly explain some of the more difficult elements (eg. disambiguating conflicting filename globs).

This commit adds a docs directory with a basic guide. The guide is written in Markdown and so is viewable directly on GitHub but it is intended for use with YARD and uses YARD tags for cross referencing. For this reason, the commit also updates the .yardopts file. The README is also updated to refer to this guide.

pyrmont added 2 commits May 16, 2019 14:14
Rouge does not have a single document explaining how a lexer should
be developed. The README contains a simplified example and there are
existing lexers that can be looked at for reference but these are
inconsistent and do not clearly explain some of the more difficult
elements (eg. disambiguating conflicting filename globs).

This commit adds a docs directory with a basic guide. The guide is
written in Markdown and so is viewable directly on GitHub but it is
intended for use with YARD and uses YARD tags for cross referencing. For
this reason, the commit also updates the `.yardopts` file. The README is
also updated to refer to this guide.
@pyrmont pyrmont changed the title Add Lexer Development Documentation Add lexer development documentation May 19, 2019
@pyrmont
Copy link
Contributor Author

pyrmont commented May 28, 2019

This needs to be updated with further information. I think it would also be helpful to submit a separate PR changing the YARD settings.

@pyrmont pyrmont added the author-action The PR has been reviewed but action by the author is needed label May 28, 2019
This adds further detail and adjusts the structure slightly.
@miparnisari
Copy link
Contributor

I haven't finished reading this but I promise to do that during the weekend and add feedback :) This is very useful stuff

@pyrmont pyrmont merged commit df23679 into rouge-ruby:master May 31, 2019
@pyrmont
Copy link
Contributor Author

pyrmont commented May 31, 2019

Thanks @miparnisari :) I hope it's useful. I've merged this into master but it's intended to be a draft so please feel free to contribute feedback (either via GitHub or as a PR).

docs/LexerDevelopment.md Outdated Show resolved Hide resolved
docs/LexerDevelopment.md Outdated Show resolved Hide resolved
lib/rouge/lexer.rb Show resolved Hide resolved
docs/LexerDevelopment.md Outdated Show resolved Hide resolved
docs/LexerDevelopment.md Show resolved Hide resolved
docs/LexerDevelopment.md Outdated Show resolved Hide resolved
@jneen
Copy link
Member

jneen commented May 31, 2019

I don't know how github works anymore so I forgot to hit the "Submit Review" button ><.

@pyrmont
Copy link
Contributor Author

pyrmont commented Jun 1, 2019

@jneen That's cool! Thanks for the feedback. I'll get it updated as soon as I can (probably later tonight) :)

@miparnisari
Copy link
Contributor

miparnisari commented Jun 1, 2019

Well I read this, very very well written and I actually learned a few things.

  • Can we add information on how to debug while developing? What I do is run rackup >> debug.txt and then load localhost:9292/json?debug which prints something like
lexer: json
stack: [:root]
stream: "\"firstName\": \"John\","
  trying #<Rule /\s+/m>
  trying #<Rule /"/>
    got "\""
    yielding Literal.String.Double, "\""
    pushing :string
lexer: json
stack: [:root, :string]
stream: "firstName\": \"John\",\n"
  trying #<Rule /[^\\"]+/>
    got "firstName"
    yielding Literal.String.Double, "firstName"

@pyrmont
Copy link
Contributor Author

pyrmont commented Jun 1, 2019

I'll reply to the other points a bit later tonight but regarding the way the source is consumed, @miparnisari wrote:

A question which I don't have an answer for but I sure wondered while developing. How many characters are evaluated at any given time? It seems like the number is 24 characters but I can't find this constant in the code.

Rouge uses StringScanner from the Ruby standard library. The actual method called to advance through the string is #skip:

if (size = stream.skip(rule.re))

I'm not sure how #scan #skip is implemented but I'd assume it's doing one character (or possibly one Unicode code point) at a time.

@ashmaroli
Copy link
Contributor

ashmaroli commented Jun 1, 2019

I'm not sure how #scan is implemented

From the docs for StringScanner#scan, it looks like the pointer advances till the end of match. For example:

s = StringScanner.new('test string')
#  test string
# ^
s.scan(/\w+/)   # => test
#   string
# ^
s.scan(/\w+/)   # nil
#   string
# ^
s.scan(/\s+/)   # => " "
#  string
# ^
s.scan(/\w+/)   # => string
# EOS
#    ^

@pyrmont
Copy link
Contributor Author

pyrmont commented Jun 1, 2019

Gah, I said #scan but meant #skip. Sorry for the confusion!

@ashmaroli
Copy link
Contributor

but meant #skip

😃 From the same docs :

#skip is similar to #scan, but without returning the matched string.

@pyrmont
Copy link
Contributor Author

pyrmont commented Jun 1, 2019

@miparnisari The other answers to your questions:

  1. (Debug Documentation) I've created Add debug documentation #1144 regarding debug documentation.

  2. (Magic Constants) I've now looked at (some) of the C source for StringScanner and I don't see any reason you should be seeing this behaviour. It looks to me like it should be evaluating a character at a time.

  3. (Docker Documentation) I'd like to include documentation about setting up a Ruby environment for development. We have some information on the wiki but I'd like to move that to docs/ so that, like the lexer development guide, it will show up on RubyDoc.info.

  4. (Option Documentation) Speaking of RubyDoc.info, code comments that are in the appropriate form for YARD, automatically generate documentation on RubyDoc and should provide the documentation requested in Document options #76. However, as we haven't released an update to the gem yet, the current docs don't pick up the changes that were made recently to include documentation attached to protected methods. You can, however, see that in the docs generated from master. To the extent method parameters, options, etc aren't documented, we really should be updating the comments in the source code and having it generate documentation on RubyDoc.

  5. (DSL Documentation) My understanding is that YARD can also be taught how to read DSLs and generate documentation from that. That might not produce the list that document lexer tags, aliases, and file associations #50 is about but it might be 80% of what's required. I'm kind of loathe to have too much documentation that has to be updated manually. Things like the lexer development guide are one thing but having a master list of all the settings for each lexer is prone to fall out of sync.

pyrmont added a commit to pyrmont/rouge that referenced this pull request Jun 1, 2019
As per the suggestions from @jneen in rouge-ruby#1111, this updates the lexer
development guide.
@pyrmont pyrmont removed the author-action The PR has been reviewed but action by the author is needed label Jun 1, 2019
@pyrmont pyrmont deleted the fix.lexer-development-docs branch January 8, 2020 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants