Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing validation fails with bareword containing # in block #10327

Closed
sholderbach opened this issue Sep 11, 2023 · 6 comments
Closed

Parsing validation fails with bareword containing # in block #10327

sholderbach opened this issue Sep 11, 2023 · 6 comments
Labels
🐛 bug Something isn't working parser Issues related to parsing
Milestone

Comments

@sholderbach
Copy link
Member

Describe the bug

On the REPL insert, and try to submit via enter.

1..10 | each {echo nixpkgs#unzip }

This will continue to insert a new line as if the block was not closed.

The basic behavior that a string containing a # is not interpreted as a comment (commonly found in the nix context) still works outside the block:

> echo nixpkgs#unzip
nixpkgs#unzip

The original problem was described in: (#6957, #6335, #8137)
The parsing/lexing change to make it possible came with #8151

How to reproduce

  1. In repl
  2. Insert
1..10 | each {echo nixpkgs#unzip }
  1. cursor at the end
  2. Press enter

Expected behavior

Should be a valid expression to submit

Screenshots

No response

Configuration

key value
version 0.84.0
branch
commit_hash
build_os linux-x86_64
build_target x86_64-unknown-linux-gnu
rust_version rustc 1.72.0 (5680fa18f 2023-08-23)
rust_channel stable-x86_64-unknown-linux-gnu
cargo_version cargo 1.72.0 (103a7ff2e 2023-08-15)
build_time 2023-09-11 22:00:27 +02:00
build_rust_channel release
allocator standard
features default, sqlite, trash, which, zip
installed_plugins

Additional context

No response

@sholderbach sholderbach added parser Issues related to parsing needs-triage An issue that hasn't had any proper look 🐛 bug Something isn't working and removed needs-triage An issue that hasn't had any proper look labels Sep 11, 2023
@Eisfunke
Copy link

Can confirm. Also, still doesn't work if I try running via nu -c (so that the REPL not allowing me to submit the command doesn't play into it):

$> nu -c "nix run nixpkgs#hello"
Hello, world!

$> nu -c "echo (nix run nixpkgs#hello)"
Error: nu::parser::unexpected_eof

  × Unexpected end of code.
   ╭─[source:1:1]
 1 │ echo (nix run nixpkgs#hello)
   ╰────

(same with your example each {echo nixpkgs#unzip })

So it seems this is not just a problem with the REPL not allowing you to submit the command, the # just seems to still be parsed incorrectly if inside a block.

@amtoine
Copy link
Member

amtoine commented Dec 14, 2023

ooh i stumbled upon the same exact one when writing this, where i tried nix run nixpkgs#opam inside a do block

@RobbingDaHood
Copy link
Contributor

RobbingDaHood commented Dec 3, 2024

Could I be assigned to this issue?

I also have a suggested solution, please review that.

Patch of suggested solution
Index: crates/nu-parser/src/lex.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/crates/nu-parser/src/lex.rs b/crates/nu-parser/src/lex.rs
--- a/crates/nu-parser/src/lex.rs	(revision Staged)
+++ b/crates/nu-parser/src/lex.rs	(date 1733255567176)
@@ -51,7 +51,7 @@
 }
 
 // A baseline token is terminated if it's not nested inside of a paired
-// delimiter and the next character is one of: `|`, `;`, `#` or any
+// delimiter and the next character is one of: `|`, `;` or any
 // whitespace.
 fn is_item_terminator(
     block_level: &[BlockKind],
@@ -115,6 +115,7 @@
     //   character (whitespace, `|`, `;` or `#`) is encountered, the baseline
     //   token is done.
     // - Otherwise, accumulate the character into the current baseline token.
+    let mut previous_char = None;
     while let Some(c) = input.get(*curr_offset) {
         let c = *c;
 
@@ -150,8 +151,11 @@
         } else if c == b'#' {
             if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
                 break;
-            }
-            in_comment = true;
+            } else if previous_char.map(|c: char| c.is_alphabetic()).unwrap_or(false) {
+                in_comment = false;
+            } else {
+                in_comment = true;
+            }
         } else if c == b'\n' || c == b'\r' {
             in_comment = false;
             if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
@@ -254,6 +258,7 @@
         }
 
         *curr_offset += 1;
+        previous_char = Some(c as char);
     }
 
     let span = Span::new(span_offset + token_start, span_offset + *curr_offset);

I did test it manually: All current tests passes but I have not run any other checks.

I also need to test if the condition also catches all letters or just the ASCII ones.

Slightly better version

Patch of suggested solution
Index: crates/nu-parser/src/lex.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/crates/nu-parser/src/lex.rs b/crates/nu-parser/src/lex.rs
--- a/crates/nu-parser/src/lex.rs	(revision Staged)
+++ b/crates/nu-parser/src/lex.rs	(date 1733256555600)
@@ -51,7 +51,7 @@
 }
 
 // A baseline token is terminated if it's not nested inside of a paired
-// delimiter and the next character is one of: `|`, `;`, `#` or any
+// delimiter and the next character is one of: `|`, `;` or any
 // whitespace.
 fn is_item_terminator(
     block_level: &[BlockKind],
@@ -115,6 +115,7 @@
     //   character (whitespace, `|`, `;` or `#`) is encountered, the baseline
     //   token is done.
     // - Otherwise, accumulate the character into the current baseline token.
+    let mut previous_char = 0;
     while let Some(c) = input.get(*curr_offset) {
         let c = *c;
 
@@ -151,7 +152,7 @@
             if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
                 break;
             }
-            in_comment = true;
+            in_comment = previous_char == b' ';
         } else if c == b'\n' || c == b'\r' {
             in_comment = false;
             if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
@@ -254,6 +255,7 @@
         }
 
         *curr_offset += 1;
+        previous_char = c;
     }
 
     let span = Span::new(span_offset + token_start, span_offset + *curr_offset);

@sholderbach
Copy link
Member Author

In the case if you have some working idea please open a draft PR directly, so we can discuss the necessary new tests or get folks to review the idea. Patch comments on GitHub are harder to work with than a quick gh pr checkout <pr number> or expanding the full diff view of a PR for context.

RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 5, 2024
@RobbingDaHood
Copy link
Contributor

In the case if you have some working idea please open a draft PR directly, so we can discuss the necessary new tests or get folks to review the idea. Patch comments on GitHub are harder to work with than a quick gh pr checkout <pr number> or expanding the full diff view of a PR for context.

Here is a draft of the suggestion: #14523

RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 7, 2024
…e the first character of the token or prefixed with ` ` (space). (nushell#10327)
RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 9, 2024
…cter of the token or prefixed with ` ` (space). (nushell#10327)

Draft solution for better comment handling. nushell#10327
WindSoilder added a commit that referenced this issue Dec 11, 2024
#14548)

This PR should close
1. #10327 
1. #13667 
1. #13810 
1. #14129 

# Description
For `#` to start a comment, then it either need to be the first
character of the token or prefixed with ` ` (space).

So now you can do this:
``` 
~/Projects/nushell> 1..10 | each {echo test#testing }                                                                                                                     12/05/2024 05:37:19 PM
╭───┬──────────────╮
│ 0 │ test#testing │
│ 1 │ test#testing │
│ 2 │ test#testing │
│ 3 │ test#testing │
│ 4 │ test#testing │
│ 5 │ test#testing │
│ 6 │ test#testing │
│ 7 │ test#testing │
│ 8 │ test#testing │
│ 9 │ test#testing │
╰───┴──────────────╯
```  

# User-Facing Changes
It is a breaking change if anyone expected comments to start in the
middle of a string without any prefixing ` ` (space).

# Tests + Formatting
Did all: 
- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

# After Submitting
I cant see that I need to update anything in [the
documentation](https://github.com/nushell/nushell.github.io) but please
point me in the direction if there is anything.

---------

Co-authored-by: Wind <WindSoilder@outlook.com>
RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 11, 2024
RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 11, 2024
fdncred pushed a commit that referenced this issue Dec 13, 2024
#14562)

This PR should close
1. #10327 
1. #13667 
1. #13810 
1. #14129 

# Description
For `#` to start a comment, then it either need to be the first
character of the token or prefixed with ` ` (space).

So now you can do this:
``` 
~/Projects/nushell> 1..10 | each {echo test#testing }                                                                                                                     12/05/2024 05:37:19 PM
╭───┬──────────────╮
│ 0 │ test#testing │
│ 1 │ test#testing │
│ 2 │ test#testing │
│ 3 │ test#testing │
│ 4 │ test#testing │
│ 5 │ test#testing │
│ 6 │ test#testing │
│ 7 │ test#testing │
│ 8 │ test#testing │
│ 9 │ test#testing │
╰───┴──────────────╯
```  

# User-Facing Changes
It is a breaking change if anyone expected comments to start in the
middle of a string without any prefixing ` ` (space).

# Tests + Formatting
Did all: 
- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

# After Submitting
I cant see that I need to update anything in [the
documentation](https://github.com/nushell/nushell.github.io) but please
point me in the direction if there is anything.
RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 18, 2024
WindSoilder pushed a commit that referenced this issue Dec 25, 2024
…of token (#14616)

This PR should close
1. #10327 
1. #13667 
1. #13810 
1. #14129 

# Description
This got reverted #14606 because
the previous changes only considered space a whitespace and forgot about
tabs. I now added a check for any whitespace, even if it is only those
two that would be relevant.

The added test failed before the changes. 

For `#` to start a comment, then it either need to be the first
character of the token or prefixed with ` ` (space).

So now you can do this:
``` 
~/Projects/nushell> 1..10 | each {echo test#testing }                                                                                                                     12/05/2024 05:37:19 PM
╭───┬──────────────╮
│ 0 │ test#testing │
│ 1 │ test#testing │
│ 2 │ test#testing │
│ 3 │ test#testing │
│ 4 │ test#testing │
│ 5 │ test#testing │
│ 6 │ test#testing │
│ 7 │ test#testing │
│ 8 │ test#testing │
│ 9 │ test#testing │
╰───┴──────────────╯
```  

# User-Facing Changes
It is a breaking change if anyone expected comments to start in the
middle of a string without any prefixing ` ` (space).

# Tests + Formatting
Did all: 
- `cargo fmt --all -- --check` to check standard code formatting (`cargo
fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to
check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass (on Windows make
sure to [enable developer
mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging))
- `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the
tests for the standard library

# After Submitting
I cant see that I need to update anything in [the
documentation](https://github.com/nushell/nushell.github.io) but please
point me in the direction if there is anything.
@hustcer
Copy link
Contributor

hustcer commented Dec 27, 2024

Closed by #14616

@hustcer hustcer closed this as completed Dec 27, 2024
@hustcer hustcer added this to the v0.102.0 milestone Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working parser Issues related to parsing
Projects
None yet
Development

No branches or pull requests

5 participants