Parsing validation fails with bareword containing `#` in block #10327

sholderbach · 2023-09-11T20:24:48Z

Describe the bug

On the REPL insert, and try to submit via enter.

1..10 | each {echo nixpkgs#unzip }

This will continue to insert a new line as if the block was not closed.

The basic behavior that a string containing a # is not interpreted as a comment (commonly found in the nix context) still works outside the block:

> echo nixpkgs#unzip
nixpkgs#unzip

The original problem was described in: (#6957, #6335, #8137)
The parsing/lexing change to make it possible came with #8151

How to reproduce

In repl
Insert

1..10 | each {echo nixpkgs#unzip }

cursor at the end
Press enter

Expected behavior

Should be a valid expression to submit

Screenshots

No response

Configuration

key	value
version	0.84.0
branch
commit_hash
build_os	linux-x86_64
build_target	x86_64-unknown-linux-gnu
rust_version	rustc 1.72.0 (5680fa18f 2023-08-23)
rust_channel	stable-x86_64-unknown-linux-gnu
cargo_version	cargo 1.72.0 (103a7ff2e 2023-08-15)
build_time	2023-09-11 22:00:27 +02:00
build_rust_channel	release
allocator	standard
features	default, sqlite, trash, which, zip
installed_plugins

Additional context

No response

The text was updated successfully, but these errors were encountered:

Eisfunke · 2023-12-14T09:38:04Z

Can confirm. Also, still doesn't work if I try running via nu -c (so that the REPL not allowing me to submit the command doesn't play into it):

$> nu -c "nix run nixpkgs#hello"
Hello, world!

$> nu -c "echo (nix run nixpkgs#hello)"
Error: nu::parser::unexpected_eof

  × Unexpected end of code.
   ╭─[source:1:1]
 1 │ echo (nix run nixpkgs#hello)
   ╰────

(same with your example each {echo nixpkgs#unzip })

So it seems this is not just a problem with the REPL not allowing you to submit the command, the # just seems to still be parsed incorrectly if inside a block.

amtoine · 2023-12-14T17:22:51Z

ooh i stumbled upon the same exact one when writing this, where i tried nix run nixpkgs#opam inside a do block

RobbingDaHood · 2024-12-03T19:56:35Z

Could I be assigned to this issue?

I also have a suggested solution, please review that.

Patch of suggested solution

Index: crates/nu-parser/src/lex.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/crates/nu-parser/src/lex.rs b/crates/nu-parser/src/lex.rs
--- a/crates/nu-parser/src/lex.rs	(revision Staged)
+++ b/crates/nu-parser/src/lex.rs	(date 1733255567176)
@@ -51,7 +51,7 @@
 }
 
 // A baseline token is terminated if it's not nested inside of a paired
-// delimiter and the next character is one of: `|`, `;`, `#` or any
+// delimiter and the next character is one of: `|`, `;` or any
 // whitespace.
 fn is_item_terminator(
     block_level: &[BlockKind],
@@ -115,6 +115,7 @@
     //   character (whitespace, `|`, `;` or `#`) is encountered, the baseline
     //   token is done.
     // - Otherwise, accumulate the character into the current baseline token.
+    let mut previous_char = None;
     while let Some(c) = input.get(*curr_offset) {
         let c = *c;
 
@@ -150,8 +151,11 @@
         } else if c == b'#' {
             if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
                 break;
-            }
-            in_comment = true;
+            } else if previous_char.map(|c: char| c.is_alphabetic()).unwrap_or(false) {
+                in_comment = false;
+            } else {
+                in_comment = true;
+            }
         } else if c == b'\n' || c == b'\r' {
             in_comment = false;
             if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
@@ -254,6 +258,7 @@
         }
 
         *curr_offset += 1;
+        previous_char = Some(c as char);
     }
 
     let span = Span::new(span_offset + token_start, span_offset + *curr_offset);

I did test it manually: All current tests passes but I have not run any other checks.

I also need to test if the condition also catches all letters or just the ASCII ones.

Slightly better version

Patch of suggested solution

Index: crates/nu-parser/src/lex.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/crates/nu-parser/src/lex.rs b/crates/nu-parser/src/lex.rs
--- a/crates/nu-parser/src/lex.rs	(revision Staged)
+++ b/crates/nu-parser/src/lex.rs	(date 1733256555600)
@@ -51,7 +51,7 @@
 }
 
 // A baseline token is terminated if it's not nested inside of a paired
-// delimiter and the next character is one of: `|`, `;`, `#` or any
+// delimiter and the next character is one of: `|`, `;` or any
 // whitespace.
 fn is_item_terminator(
     block_level: &[BlockKind],
@@ -115,6 +115,7 @@
     //   character (whitespace, `|`, `;` or `#`) is encountered, the baseline
     //   token is done.
     // - Otherwise, accumulate the character into the current baseline token.
+    let mut previous_char = 0;
     while let Some(c) = input.get(*curr_offset) {
         let c = *c;
 
@@ -151,7 +152,7 @@
             if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
                 break;
             }
-            in_comment = true;
+            in_comment = previous_char == b' ';
         } else if c == b'\n' || c == b'\r' {
             in_comment = false;
             if is_item_terminator(&block_level, c, additional_whitespace, special_tokens) {
@@ -254,6 +255,7 @@
         }
 
         *curr_offset += 1;
+        previous_char = c;
     }
 
     let span = Span::new(span_offset + token_start, span_offset + *curr_offset);

sholderbach · 2024-12-04T20:51:57Z

In the case if you have some working idea please open a draft PR directly, so we can discuss the necessary new tests or get folks to review the idea. Patch comments on GitHub are harder to work with than a quick gh pr checkout <pr number> or expanding the full diff view of a PR for context.

RobbingDaHood · 2024-12-05T16:41:05Z

In the case if you have some working idea please open a draft PR directly, so we can discuss the necessary new tests or get folks to review the idea. Patch comments on GitHub are harder to work with than a quick gh pr checkout <pr number> or expanding the full diff view of a PR for context.

Here is a draft of the suggestion: #14523

…e the first character of the token or prefixed with ` ` (space). (nushell#10327)

…cter of the token or prefixed with ` ` (space). (nushell#10327) Draft solution for better comment handling. nushell#10327

#14548) This PR should close 1. #10327 1. #13667 1. #13810 1. #14129 # Description For `#` to start a comment, then it either need to be the first character of the token or prefixed with ` ` (space). So now you can do this: ``` ~/Projects/nushell> 1..10 | each {echo test#testing } 12/05/2024 05:37:19 PM ╭───┬──────────────╮ │ 0 │ test#testing │ │ 1 │ test#testing │ │ 2 │ test#testing │ │ 3 │ test#testing │ │ 4 │ test#testing │ │ 5 │ test#testing │ │ 6 │ test#testing │ │ 7 │ test#testing │ │ 8 │ test#testing │ │ 9 │ test#testing │ ╰───┴──────────────╯ ``` # User-Facing Changes It is a breaking change if anyone expected comments to start in the middle of a string without any prefixing ` ` (space). # Tests + Formatting Did all: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library # After Submitting I cant see that I need to update anything in [the documentation](https://github.com/nushell/nushell.github.io) but please point me in the direction if there is anything. --------- Co-authored-by: Wind <WindSoilder@outlook.com>

…cter of the token or prefixed with ` ` (space). (nushell#10327)

#14562) This PR should close 1. #10327 1. #13667 1. #13810 1. #14129 # Description For `#` to start a comment, then it either need to be the first character of the token or prefixed with ` ` (space). So now you can do this: ``` ~/Projects/nushell> 1..10 | each {echo test#testing } 12/05/2024 05:37:19 PM ╭───┬──────────────╮ │ 0 │ test#testing │ │ 1 │ test#testing │ │ 2 │ test#testing │ │ 3 │ test#testing │ │ 4 │ test#testing │ │ 5 │ test#testing │ │ 6 │ test#testing │ │ 7 │ test#testing │ │ 8 │ test#testing │ │ 9 │ test#testing │ ╰───┴──────────────╯ ``` # User-Facing Changes It is a breaking change if anyone expected comments to start in the middle of a string without any prefixing ` ` (space). # Tests + Formatting Did all: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library # After Submitting I cant see that I need to update anything in [the documentation](https://github.com/nushell/nushell.github.io) but please point me in the direction if there is anything.

… prefixed with any whitespace.

…of token (#14616) This PR should close 1. #10327 1. #13667 1. #13810 1. #14129 # Description This got reverted #14606 because the previous changes only considered space a whitespace and forgot about tabs. I now added a check for any whitespace, even if it is only those two that would be relevant. The added test failed before the changes. For `#` to start a comment, then it either need to be the first character of the token or prefixed with ` ` (space). So now you can do this: ``` ~/Projects/nushell> 1..10 | each {echo test#testing } 12/05/2024 05:37:19 PM ╭───┬──────────────╮ │ 0 │ test#testing │ │ 1 │ test#testing │ │ 2 │ test#testing │ │ 3 │ test#testing │ │ 4 │ test#testing │ │ 5 │ test#testing │ │ 6 │ test#testing │ │ 7 │ test#testing │ │ 8 │ test#testing │ │ 9 │ test#testing │ ╰───┴──────────────╯ ``` # User-Facing Changes It is a breaking change if anyone expected comments to start in the middle of a string without any prefixing ` ` (space). # Tests + Formatting Did all: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass (on Windows make sure to [enable developer mode](https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging)) - `cargo run -- -c "use toolkit.nu; toolkit test stdlib"` to run the tests for the standard library # After Submitting I cant see that I need to update anything in [the documentation](https://github.com/nushell/nushell.github.io) but please point me in the direction if there is anything.

hustcer · 2024-12-27T01:43:37Z

Closed by #14616

sholderbach added parser Issues related to parsing needs-triage An issue that hasn't had any proper look 🐛 bug Something isn't working and removed needs-triage An issue that hasn't had any proper look labels Sep 11, 2023

sholderbach mentioned this issue Aug 22, 2024

Unbalanced delimiters on parens when stripping " from output of 2nd/last $"...(... | str trim -c '"')" before string ends #13667

Closed

sholderbach mentioned this issue Sep 8, 2024

Cannot use command output as argument when the command includes # #13810

Closed

sholderbach mentioned this issue Oct 19, 2024

potential parser bug when a bare string contains the "#" character #14129

Closed

RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 5, 2024

Draft solution for better comment handling. nushell#10327

b757646

RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 7, 2024

Added tests for: For # to start a comment, then it either need to b…

bde96f3

…e the first character of the token or prefixed with ` ` (space). (nushell#10327)

RobbingDaHood mentioned this issue Dec 7, 2024

Draft solution for better comment handling. #10327 #14523

Closed

RobbingDaHood mentioned this issue Dec 9, 2024

For # to start a comment, then it either need to be the first chara… #14548

Merged

RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 11, 2024

For # to start a comment, then it either need to be the first chara…

dcf1d24

…cter of the token or prefixed with ` ` (space). (nushell#10327)

RobbingDaHood mentioned this issue Dec 11, 2024

For # to start a comment, then it either need to be the first chara… #14562

Merged

RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 11, 2024

nushell#10327 Removed some debug code that I accidentally pushed.

6d51ec9

RobbingDaHood added a commit to RobbingDaHood/nushell that referenced this issue Dec 18, 2024

nushell#10327 Now hashtags get parsed as comment starters if they are…

0dae22d

… prefixed with any whitespace.

RobbingDaHood mentioned this issue Dec 18, 2024

14523 all comments should be prefixed with space tab or be beginning of token #14616

Merged

hustcer closed this as completed Dec 27, 2024

hustcer added this to the v0.102.0 milestone Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing validation fails with bareword containing `#` in block #10327

Parsing validation fails with bareword containing `#` in block #10327

sholderbach commented Sep 11, 2023

Eisfunke commented Dec 14, 2023

amtoine commented Dec 14, 2023

RobbingDaHood commented Dec 3, 2024 •

edited

Loading

sholderbach commented Dec 4, 2024

RobbingDaHood commented Dec 5, 2024

hustcer commented Dec 27, 2024

Parsing validation fails with bareword containing # in block #10327

Parsing validation fails with bareword containing # in block #10327

Comments

sholderbach commented Sep 11, 2023

Describe the bug

How to reproduce

Expected behavior

Screenshots

Configuration

Additional context

Eisfunke commented Dec 14, 2023

amtoine commented Dec 14, 2023

RobbingDaHood commented Dec 3, 2024 • edited Loading

Slightly better version

sholderbach commented Dec 4, 2024

RobbingDaHood commented Dec 5, 2024

hustcer commented Dec 27, 2024

Parsing validation fails with bareword containing `#` in block #10327

Parsing validation fails with bareword containing `#` in block #10327

RobbingDaHood commented Dec 3, 2024 •

edited

Loading