-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transpose(dt) allows to return list without promoting elements to maxtype #5805
Conversation
Hi Ben thanks for sharing, looks interesting. I have never used transpose, so I was wondering what is the typical use case? with lists? The man page says
I have used tstrsplit, so I guess I understand the use case with character, but I wonder if you could add an illustrative example, which would show how transpose is different / more flexible / useful with lists ? |
The main use cases I have in my mind is when you receive a rowwise format or want to dt = data.table(name=c("Anna", "Bob"), age=c(30,20))
fun = function(x) sprintf("Hi %d year old %s", x[[2]], x[[1]])
transpose(dt, list.cols = TRUE)[, lapply(.SD, fun)]
# V1 V2
# <char> <char>
# 1: Hi 30 year old Anna Hi 20 year old Bob
transpose(dt, list.cols = FALSE)[, lapply(.SD, fun)]
# Error in sprintf("Hi %d year old %s", x[[2]], x[[1]]) :
# invalid format '%d'; use format %s for character objects |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5805 +/- ##
=======================================
Coverage 97.50% 97.50%
=======================================
Files 80 80
Lines 14876 14884 +8
=======================================
+ Hits 14505 14513 +8
Misses 371 371 ☔ View full report in Codecov by Sentry. |
transpose() is the workhorse behind tstrsplit, tstrsplit is "just" strsplit()|>transpose(). but there are many other times we wind up with a list() of rows that we want as columns instead. base R code looks like transpose() is not only much cleaner but much faster and handles type conversion and fills ragged data |
Can an example of this usage (and new usage) be please added to transpose.Rd in this PR? |
FWIW here are benchmarks version vs base solution l = list(sample(1e5), sample(letters, 1e5, TRUE))
dt = bench::mark(
base = lapply(seq(length(l[[1]])), function(x) lapply(l, `[[`, x)),
dt = transpose(l, list.cols=TRUE),
iterations = 100
)
setDT(dt)
dt[, .(expression, min, median, "max" = lapply(time, max), `itr/sec`, n_itr, n_gc), .I]
# I expression min median max itr/sec n_itr n_gc
# <int> <bench_expr> <bench_time> <bench_time> <list> <num> <int> <num>
# 1: 1 base 270.3ms 333.9ms 769ms 2.773482 100 241
# 2: 2 dt 15.5ms 23.3ms 315ms 29.969829 100 29 |
@ben-schwen base was run only two times while PR was run 16 times. Therefore I would look for a "max" statistic (which is missing) rather than min or median (presented in your post). You may find this post relevant: jangorecki/rollbench#1 (comment) |
if you want to see how time/memory vary with N, please try using atime tdhock/atime#15 above shows that both base R and dt are linear time and memory, different by constant factors. above shows the data size N which both can handle, given 1 second time limit or 1000kb memory limit. (dt is faster by constant factors but uses a constant factor more memory) |
As I mentioned in the updated NEWS, transposing a table to get it in row-major form is indeed quite common & useful. This is especially useful for I/O with non-column-major data structures (as is common for streaming applications which receive one row at a time). As noted in the linked issue, the workarounds were quite tedious before, after this PR, we can quite easily just In short, this feature is quite great and something I "didn't know I needed" all along! |
would be great to add vignette documentation for this use case "As noted in the linked issue, the workarounds were quite tedious before, after this PR, we can quite easily just transpose() |> lapply(unlist) |> setDT() to get a nice data.table out of something we receive, and transpose(, list.cols=TRUE) to convert an R table into something to pass along" |
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
Do you have a specific vignette in mind? I don't see any mention of |
Closes #5639
Transpose now also supports transposing list columns.