Skip to content

Allow comments in grammars #37

Open
@shnewto

Description

We need a way to allow incorporating comments into a grammar. It seems somewhat standard to
use the ; to indicate the start of a comment and a \n to close it. It'd be a good add I think to make that work.

A couple initial questions:

  • Do we look for the ; all along the way?
  • Do we make a "first pass" that strips all comments before parsing?

Activity

Carlyle-Foster

Carlyle-Foster commented on Jan 1, 2025

@Carlyle-Foster
Contributor

i don't think we'd have to look for ; all the time, unless there's some de facto standard for for closing comments within a line comments can never appear as interjections so we'd only have to check here <a> ::= "b" here <c> here | <d> and here

Carlyle-Foster

Carlyle-Foster commented on Jan 1, 2025

@Carlyle-Foster
Contributor

now that i think of it, we should be able to confine comment parsing entirely within whitespace parsing, can you think of any exceptions?

Carlyle-Foster

Carlyle-Foster commented on Jan 2, 2025

@Carlyle-Foster
Contributor

i don't think we'd have to look for ; all the time, unless there's some de facto standard for for closing comments within a line comments can never appear as interjections so we'd only have to check here <a> ::= "b" here <c> here | <d> and here

wait a second that's wrong, it's actually just here <a> ::= "b" <c> | <d> and here that have to be checked

Carlyle-Foster

Carlyle-Foster commented on Jan 5, 2025

@Carlyle-Foster
Contributor

i'm not sure if this should be closed yet, i found this out in the wild, it's VERY old but it's usage of comments really make sense, if newlines are allowable WS then any whitespace could have comments inside

Carlyle-Foster

Carlyle-Foster commented on Jan 5, 2025

@Carlyle-Foster
Contributor

i might just have to bite the bullet on this one and replace all the simple WS parsing with a custom function that parses WS while transparently skipping comments

shnewto

shnewto commented on Jan 5, 2025

@shnewto
OwnerAuthor

one note here is that I don't think comments are actually allowed anywhere except at the end of a rule so rather than
here <a> ::= "b" <c> | <d> and here

it's just
<a> ::= "b" <c> | <d> here

From section 2.8 of the spec linked above:

A semi-colon, set off some distance to the right of rule text, starts a comment that continues to the end of line. This is a simple way of including useful notes in parallel with the specifications.

shnewto

shnewto commented on Jan 5, 2025

@shnewto
OwnerAuthor

which effectively means that anything that follows a ; until a newline can be treated as a comment and we can expect that there can be lines that are only whitespace + comments

shnewto

shnewto commented on Jan 5, 2025

@shnewto
OwnerAuthor

also 🤔 I think we already handle ; to terminate lines so it might just be that we want to eat anything that follows a ; until the newline rather than allowing ; as a delimiter which I think it's behaving as currently.

Carlyle-Foster

Carlyle-Foster commented on Jan 5, 2025

@Carlyle-Foster
Contributor

; isn't the main delimiter, we do normally just consume until we hit a newline, ; is only a delimiter because for some reason i though having two comments on the same line should be rejected, i think because i interpreted that as trying to close the comment b4 the newline
in retrospect, that doesn't make much sense

shnewto

shnewto commented on Jan 5, 2025

@shnewto
OwnerAuthor

I think we're talking about the same thing but just in case, here's the current behavior for bnf grammars

<dna> ::= <base> | <base> <dna> 
<base> ::= 'A' | 'C' | 'G' | 'T'"

is equivalent to

<dna> ::= <base> | <base> <dna> ; <base> ::= 'A' | 'C' | 'G' | 'T'

i.e. ; is (incorrectly) just acting as an alternative to \n as a delimiter and they both result in the same object after parsing.

but if we want to handle comments correctly, I believe we want

<dna> ::= <base> | <base> <dna> ; <base> ::= 'A' | 'C' | 'G' | 'T'

to be equivalent to

<dna> ::= <base> | <base> <dna>
Carlyle-Foster

Carlyle-Foster commented on Jan 5, 2025

@Carlyle-Foster
Contributor

one note here is that I don't think comments are actually allowed anywhere except at the end of a rule so rather than here <a> ::= "b" <c> | <d> and here

it's just <a> ::= "b" <c> | <d> here

From section 2.8 of the spec linked above:

A semi-colon, set off some distance to the right of rule text, starts a comment that continues to the end of line. This is a simple way of including useful notes in parallel with the specifications.

the problem is currently all WS could include newlines so all WS has to handle comments, take this example from the RFC u quoted foreinstance, it idiomatically puts a NL in the whitespace before a /

destination =  "To"          ":" 1#address  ; Primary
                 /  "Resent-To"   ":" 1#address
                 /  "cc"          ":" 1#address  ; Secondary
                 /  "Resent-cc"   ":" 1#address
                 /  "bcc"         ":"  #address  ; Blind carbon
                 /  "Resent-bcc"  ":"  #address
Carlyle-Foster

Carlyle-Foster commented on Jan 5, 2025

@Carlyle-Foster
Contributor

I think we're talking about the same thing but just in case, here's the current behavior for bnf grammars

<dna> ::= <base> | <base> <dna> 
<base> ::= 'A' | 'C' | 'G' | 'T'"

is equivalent to

<dna> ::= <base> | <base> <dna> ; <base> ::= 'A' | 'C' | 'G' | 'T'

i.e. ; is (incorrectly) just acting as an alternative to \n as a delimiter and they both result in the same object after parsing.

but if we want to handle comments correctly, I believe we want

<dna> ::= <base> | <base> <dna> ; <base> ::= 'A' | 'C' | 'G' | 'T'

to be equivalent to

<dna> ::= <base> | <base> <dna>

oh, that's what ur talking about, yeah i thought that was a little weird, i've never seen that used out in the wild for that matter, not sure why anyone would want 2 production rules on 1 line

2 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Allow comments in grammars · Issue #37 · shnewto/bnf