-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aggr-] allow ranking rows by key column #2417
base: develop
Are you sure you want to change the base?
Conversation
The standard way is to convert the column into a known type, and then anything that can't be converted (errors and nulls) become |
Yes, that seems like it should work. Should the rank aggregator pick the known type, and if so, which one? Or is it the user who should convert the column? |
Since it's not obvious which type to pick, the user can convert the column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love what this is adding, and I think with a few tweaks it would be even more powerful!
There are two kinds of ranking operations people may want.
What is a good name for these two aggregators? |
8078fb6
to
c6c608e
Compare
Okay, I implemented a command that adds a column and applies an aggregator to rows after grouping them by key columns. It's To get this to work with list aggregators, I made a new class I also tried making a I'm a bit unsure about the new behavior of the
|
There is one detail about the grouping in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move most of this into features/addcol_sheetrank.py
? It's a fair amount of code and I'd like to make it self-contained as much as possible.
It is needed now that addcol-aggregate can apply stdev to groups, which may include lists of size 1.
Okay, I moved the I had to move the One notable consequence of this move is that the new visidata/visidata/features/rank.py Line 41 in ec28446
|
for aggr in aggrs: | ||
rows = aggregate_groups(sheet, col, sheet.rows, aggr) | ||
if isinstance(aggr, ListAggregator): | ||
t = aggr.listtype or col.type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a separate listtype
? Seems like we could just use the same aggr.type
in both cases, and remove this isinstance
(which is usually a code smell for me).
Okay, this one is quite old and quite big at this point! I gave it another pass through, and my remaining questions have to do with |
@midichef I think your PR successfully meets your goals. This is such thoughtfully considered work. Once you have your fnal changes in, I am going to add an update to the guide in this PR, with documentation. I have one question for my own clarity: what is rank's intended behaviour when there are multiple key columns? Say keycol keycol_2 Item |
This PR adds a
rank
aggregator that returns a list, and a commandaddcol-rank
, which adds a new column with the rank of each row. Ranks are calculated by comparing key columns.It also fixes a bug in
memo-aggregate
where long output takes an extremely long time to show up in the statusbar.For example:
seq 1222333 |vd -
, thenz+
list
. After the list is calculated, visidata will get stuck for many seconds showingprocessing…
, because it's very slow to runformat()
on a long sequence.I think it's worth having an aggregator for rank, and the need for a simpler solution than the current method has come up before. On the other hand, I know part of Visidata philosophy is that it's not a spreadsheet. How do people feel about having a rank aggregator?
Also, in its current form, the rank aggregator will give errors when comparing key columns with different types across 2 rows:
What's the standard way to handle sorting mixed types for Visidata?