Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: give more helpful error when trying to get a column that is missing from only some rows #14811

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

goth-turtle
Copy link

Description

LIke the above issue describes, trying to get a column that some rows lack values for gives an unhelpful cannot find column error. This PR introduces a new RowLacksValue error, that gives a clearer error:

: [{a: 1 b: 2} {a: 3, b: 5} {a: 3}] | get b
Error: nu::shell::row_lacks_value

  × Not all rows contain a value for 'b'
   ╭─[entry #2:1:41]
 1  [{a: 1 b: 2} {a: 3, b: 5} {a: 3}] | get b
   ·                                         
   ·                                         ╰── try using 'b?'
   ╰────

User-Facing Changes

  • New error reported when a column does exist, but not every row provides a value for it.

Tests + Formatting

new tests:

  • commands::get::errors_column_missing_in_some_rows

changed tests:

  • commands::select::select_failed1

After Submitting

@goth-turtle goth-turtle marked this pull request as draft January 12, 2025 18:22
@fdncred
Copy link
Collaborator

fdncred commented Jan 12, 2025

Nice, that looks like a good add. Thanks!

@goth-turtle
Copy link
Author

So far, this only works for get, Im still having a look into how to implement the error in select. It seems it acts on individual records, so Value::follow_cell_path can't return this new error like it does with get

@goth-turtle goth-turtle force-pushed the new_error_row_lacks_value branch from c9b190e to 4afd9ae Compare January 12, 2025 20:46
@goth-turtle
Copy link
Author

goth-turtle commented Jan 12, 2025

select and get both work now, though a bunch of other tests are still broken - Ill get around to that in the next few days if I can find time for it

Copy link
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this change makes the error more precise there are new performance costs to follow_cell_path introduced both in the theoretical best case and additionally through how it is currently implemented in this PR.

The four different cases to worry about.

  1. All rows contain the cell path. All n need to be visited (For Value::follow_cell_path we sadly have to allocate ).
  2. A missing value appears after a stretch of valid values -> We can already abort here and emit the new RowLacksValue error. Should not change the run time over the existing implementation using Result as FromIterator
  3. At the beginning of the List the CellPath is not found but then the value is found. Compared to the previous Version we now have to iterate until this hit instead of failing early to report RowLacksColumn.
  4. The column doesn't exist at all. Now instead of failing early we have to walk the whole List to then report CantFindColumn

A good implementation should make 2.) cheap but to have a nicer message for 3.) we will always pay for 4.) That is a trade-off we should keep in mind, don't have a strong opinion yet on what should be the default here.

For the implementation we should at least make 2. and 3. reasonable

@@ -1252,9 +1252,21 @@ impl Value {
}),
}
})
.collect::<Result<_, _>>()?;
.collect::<Vec<Result<_, _>>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't collect all the ShellErrors as only one is ever reported and the Vec would grow signifiicantly in size:

  • 120 bytes per Unit of ShellError
  • 48 bytes previously for each Value

Instead of a raw non-short-circuiting collect, we should replace the collect with a custom collection mechanism that manages the Result::Err cases depending on the situation.

  • have we only observed the Err case so far
    • Some(Err) proceed consuming (we don't need to keep allocations around anymore)
    • Some(Ok(Value)) return RowLacksColumn
    • None exhausted return CantFindColumn
  • We only saw a Ok(Value)
    • Some(Err) return RowLacksColumn
    • Some(Ok(Value)), push Value into the size-hinted Vec allocation.
    • None exhausted return the Vec/Value::List

///
/// ## Resolution
///
/// Use the Optional Operator, --ignore-errors with get or select, or fill in the missing values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment only appears in the Rust crate documentation. If you want to show it you can use the miette #[help] attribute.

As follow_cell_path and friends are also used in a bunch other commands , you should check that the error message like that makes universal sense, including its span annotations.

Comment on lines +607 to +608
#[label = "try using '{col_name}?'"]
span: Option<Span>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points to the cell path component (if that is set correctly in all circumstances)

Question if we should have another label pointing to the source Value guaranteed to have a span (even if sometimes a bit weirdly located)

@goth-turtle
Copy link
Author

Would it make sense to instead just tweak the column_not_found error message a bit? Something like Column not found, or not all values contain a value for it. Try using 'b?' perhaps?

That way we avoid the performance issue, but the user still gets a hint for resolving the situation. wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

getting a column that is missing a row gives unhelpful error
3 participants