Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change how and and or operations are compiled to IR to support custom values #14653

Merged

Conversation

devyn
Copy link
Contributor

@devyn devyn commented Dec 21, 2024

Description

Because and and or are short-circuiting operations in Nushell, they must be compiled to a sequence that avoids evaluating the RHS if the LHS is already sufficient to determine the output - i.e., false for and and true for or. I initially implemented this with branch-if instructions, simply returning the RHS if it needed to be evaluated, and returning the short-circuited boolean value if it did not.

Example for $a and $b:

   0: load-variable          %0, var 999 "$a"
   1: branch-if              %0, 3
   2: jump                   5
   3: load-variable          %0, var 1000 "$b" # label(0), from(1:)
   4: jump                   6
   5: load-literal           %0, bool(false) # label(1), from(2:)
   6: span                   %0          # label(2), from(4:)
   7: return                 %0

Unfortunately, this broke polars, because using and/or on custom values is perfectly valid and they're allowed to define that behavior differently, and the polars plugin uses this for boolean masks. But without using the binary-op instruction, that custom behavior is never invoked. Additionally, branch-if requires a boolean, and custom values are not booleans. This changes the IR to the following, using the match instruction to check for the specific short-circuit value instead, and still invoking binary-op otherwise:

   0: load-variable          %0, var 125 "$a"
   1: match                  (false), %0, 4
   2: load-variable          %1, var 124 "$b"
   3: binary-op              %0, Boolean(And), %1
   4: span                   %0          # label(0), from(1:)
   5: return                 %0

I've also renamed Pattern::Value to Pattern::Expression and added a proper Pattern::Value variant that actually contains a Value instead. I'm still hoping to remove Pattern::Expression eventually, because it's kind of a hack - we don't actually evaluate the expression, we just match it against a few cases specifically for pattern matching, and it's one of the cases where AST leaks into IR and I want to remove all of those cases, because AST should not leak into IR.

Fixes #14518

User-Facing Changes

  • and and or now support custom values again.
  • the IR is actually a little bit cleaner, though it may be a bit slower; match is more complex.

Tests + Formatting

The existing tests pass, but I didn't add anything new. Unfortunately I don't think there's anything built-in to trigger this, but maybe some testcases could be added to polars to test it.

…stom values

Because `and` and `or` are short-circuiting operations in Nushell, they must be compiled to a sequence that avoids evaluating the RHS if the LHS is already sufficient to determine the output - i.e., `false` for `and` and `true` for `or`. I initially implemented this with `branch-if` instructions, simply returning the RHS if it needed to be evaluated, and returning the short-circuited boolean value if it did not.

Example for `$a and $b`:

```
   0: load-variable          %0, var 999 "$a"
   1: branch-if              %0, 3
   2: jump                   5
   3: load-variable          %0, var 1000 "$b" # label(0), from(1:)
   4: jump                   6
   5: load-literal           %0, bool(false) # label(1), from(2:)
   6: span                   %0          # label(2), from(4:)
   7: return                 %0
```

Unfortunately, this broke polars, because using `and`/`or` on custom values is perfectly valid and they're allowed to define that behavior differently, and the polars plugin uses this for boolean masks. But without using the `binary-op` instruction, that custom behavior is never invoked. Additionally, `branch-if` requires a boolean, and custom values are not booleans. This changes the IR to the following, using the `match` instruction to check for the specific short-circuit value instead, and still invoking `binary-op` otherwise:

```
   0: load-variable          %0, var 125 "$a"
   1: match                  (false), %0, 4
   2: load-variable          %1, var 124 "$b"
   3: binary-op              %0, Boolean(And), %1
   4: span                   %0          # label(0), from(1:)
   5: return                 %0
```

I've also renamed `Pattern::Value` to `Pattern::Expression` and added a proper `Pattern::Value` variant that actually contains a `Value` instead. I'm still hoping to remove `Pattern::Expression` eventually, because it's kind of a hack - we don't actually evaluate the expression, we just match it against a few cases specifically for pattern matching, and it's one of the cases where AST leaks into IR and I want to remove all of those cases, because AST should not leak into IR.

Fixes nushell#14518
@devyn
Copy link
Contributor Author

devyn commented Dec 21, 2024

Working with polars:

> ([true, false] | polars into-df) and ([true, true] | polars into-df)
╭───┬─────────╮
 # │ and_0_0 │
├───┼─────────┤
 0  true    
 1  false   
╰───┴─────────╯
> view ir { ([true, false] | polars into-df) and ([true, true] | polars into-df) }
# 3 registers, 18 instructions, 0 bytes of data
   0: load-literal           %0, list(capacity = 2)
   1: load-literal           %1, bool(true)
   2: list-push              %0, %1
   3: load-literal           %1, bool(false)
   4: list-push              %0, %1
   5: redirect-out           value
   6: call                   decl 491 "polars into-df", %0
   7: match                  (false), %0, 16
   8: load-literal           %1, list(capacity = 2)
   9: load-literal           %2, bool(true)
  10: list-push              %1, %2
  11: load-literal           %2, bool(true)
  12: list-push              %1, %2
  13: redirect-out           value
  14: call                   decl 491 "polars into-df", %1
  15: binary-op              %0, Boolean(And), %1
  16: span                   %0          # label(0), from(7:)
  17: return                 %0

@fdncred fdncred merged commit 35d2750 into nushell:main Dec 25, 2024
14 checks passed
@fdncred
Copy link
Collaborator

fdncred commented Dec 25, 2024

Thanks

@github-actions github-actions bot added this to the v0.102.0 milestone Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dataframe combining logical masks
2 participants