Description
I ran into this when converting a nested tibble
(i.e. with a list column of tibble
s) to a data.table
. Normally, list
columns print with something like <tibble[3x1]>
in data.table
; however, these columns printed as a string displaying the entirety of the nested tibble
s contents (in my case, ~20k rows of data per tibble
). This appears to be due to a check for the existence of a format()
method for the column type in data.table:::format_col()
. In this case, the list columns were vctrs_list_of
class, which implements its own format.vctrs_list_of()
method. See below reprex for an example.
Fixing this is easy enough (see below reprex for proposed solution), but it would change the default printing behavior of list-cols. Any list subclass would then have to implement a format_list_item()
method to get special treatment. On the other hand, any list subclass could then implement that method and get special treatment.
To me, defaulting to the standard list print behavior is an improvement, given that format()
methods for list-like classes are generally not going to product output suitable for a column in a data.table
(which is why format_list_item()
is needed in the first place).
Reprex is below, along with proposed solution. I'm happy to file a PR but wanted to make sure the change in defaults is acceptable in principle first. Thanks!
# Setup -----------------------------------------------------------------------
# Use development version of {data.table}
data.table::update_dev_pkg()
#> R data.table package is up-to-date at 8f8ef9343dcabefa9e4cb0af4251cbb74dae9f55 (1.15.99)
# Attach internals
ns <- asNamespace("data.table")
name <- format(ns)
attach(ns, name = name, warn.conflicts = FALSE)
# Create example data
dt <- data.table(
list_col = list(data.table(a = 1:3), data.table(a = 4:6)),
list_of_col = vctrs::list_of(data.table(a = 1:3), data.table(a = 4:6))
)
# Problem ---------------------------------------------------------------------
# `format_col()` does not format `list_of` columns from {vctrs} as lists
# Print `data.table`
print(dt)
#> list_col list_of_col
#> <list> <vctrs_list_of>
#> 1: <data.table[3x1]> 1, 2, 3
#> 2: <data.table[3x1]> 4, 5, 6
# Format individually
format_col(dt$list_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
format_col(dt$list_of_col)
#> [1] "1, 2, 3" "4, 5, 6"
# Solution --------------------------------------------------------------------
# Current function definition:
# format_col.default = function(x, ...) {
# if (!is.null(dim(x)))
# "<multi-column>"
# else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
# formatted
# else if (is.list(x))
# vapply_1c(x, format_list_item, ...)
# else
# format(char.trunc(x), ...)
# }
# Swapping the order of the `else if` statements in `format_col()` will fix the issue
# `format_col_updated()` formats `list_of` columns from {vctrs} as lists
format_col_updated = function(x, ...) {
if (!is.null(dim(x)))
"<multi-column>"
else if (is.list(x)) # Now comes before `has_format_method()` check
vapply_1c(x, format_list_item, ...)
else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
formatted
else
format(char.trunc(x), ...)
}
# Replace as default method
registerS3method("format_col", "default", format_col_updated)
# Print `data.table`
print(dt)
#> list_col list_of_col
#> <list> <vctrs_list_of>
#> 1: <data.table[3x1]> <data.table[3x1]>
#> 2: <data.table[3x1]> <data.table[3x1]>
# Print individually
# Still formatted as list
format_col_updated(dt$list_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
# Now also formatted as list
format_col_updated(dt$list_of_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
# Clean up --------------------------------------------------------------------
detach(name, character.only = TRUE)
rm(list = ls())
Created on 2024-02-21 with reprex v2.0.2
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.2 (2023-10-31 ucrt)
#> os Windows 11 x64 (build 22631)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz America/Chicago
#> date 2024-02-21
#> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2)
#> data.table 1.15.99 2024-02-21 [1] local
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.2)
#> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
#> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
#> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.2)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.1)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.1)
#> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.2)
#> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2)
#> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
#> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.2)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2)
#> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.2)
#>
#> [1] D:/ProgramFiles/R/R-4.3.2/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Activity
tdhock commentedon Feb 21, 2024
Hi Jesse, thanks for writing. Is there a way to fix your issue in a way that maintains backward compatibility with the existing print output? One of the guiding principles of data.table is stability/back-compatibility https://github.com/Rdatatable/data.table/blob/master/GOVERNANCE.md#the-r-package so it would be much easier to accept a code contribution that does not change what is printed.
jesse-smith commentedon Feb 22, 2024
Hey Toby, thanks for getting back. The current behavior is actually not backwards compatible - list sub-classes print identically to bare lists through v1.14.10, and it's only in v1.15.0 and later that this "print everything" behavior occurs. The proposal is essentially to revert to the previous behavior, because there are cases where the current behavior is pathological, and I'm not sure the current behavior was actually intended (I can't find anything in NEWS that mentions it).
r2evans commentedon Oct 28, 2024
This formats differently because
getS3method("format", class="vctrs_list_of")
is defined whereasgetS3method("format", class="list")
is not. My #6593 does not resolve this.This may be a frustrating issue. I like the notion that
format.data.table
honorsformat.<col-class>
if found, but the fact thatvctrs::format.vctrs_list_of
is simply a wrapper aroundformat.default
seems less useful (to me).If you go down the road of over-riding for this class, then pandora's box of "override this class and that class and that other class" may not be desired, plus what happens if/when
vctrs
changes itsformat
S3 methods.While frustrating, I wonder if the better path to resolve this aesthetic request is to request that
vctrs::format.vctrs_list_of
does something different, in which case it will self-resolve within thedata.table
ecosystem.MichaelChirico commentedon Oct 28, 2024
cc @DavisVaughan
DavisVaughan commentedon Oct 28, 2024
I don't think so, no. At the end of the day, we are simply following
format.list()
, and it seem reasonable thatformat(<list>)
andformat(<list-of>)
do exactly the same thing.On the tibble side, both
pillar:::pillar_shaft.list()
andpillar:::pillar_shaft.vctrs_list_of()
exist for controlling how they are printed.I do not think it is out of the question for you to provide
format_col.vctrs_list_of()
that gets conditionally registered using an.onLoad
hook for the vctrs package. Kind of like this, where in vctrs we specify that "when dplyr is loaded, go ahead and register avec_restore()
method forgrouped_df
"https://github.com/r-lib/vctrs/blob/78d9f2b0b24131b5ce2230eb3c2c9f93620b10d9/R/zzz.R#L33-L36
r2evans commentedon Oct 28, 2024
I'm good with including that in my current PR (if that doesn't get too complicated). If yes, I'm assuming
dt
"should" look like this:I added the header class, should it be
vctrs_list_of
,vctrs_vctr
, or justlist
?DavisVaughan commentedon Oct 28, 2024
For data.table users I imagine it might make the most sense if you have the full class, since this probably isn't a super common class to see in the data.table world, and in that case the full explicit name is easier to understand than hiding some class name details.
We have some tooling for showing
![Screenshot 2024-10-28 at 3 57 34 PM](https://private-user-images.githubusercontent.com/19150088/380873564-c4839b1e-416e-405d-a480-e108dfd834e5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NjI5OTYsIm5iZiI6MTczOTQ2MjY5NiwicGF0aCI6Ii8xOTE1MDA4OC8zODA4NzM1NjQtYzQ4MzliMWUtNDE2ZS00MDVkLWE0ODAtZTEwOGRmZDgzNGU1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDE2MDQ1NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ5ZTc3YTI4ZGFhY2I0ZWRjZWVjMzRmMDgzMTQ4YTE1NzBlNmViZThkMTdlYmZlYTAwZGZiZTU1YjVlMTIyNjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ZP1bgEdD-RRFJpjePxAQv5Y6tESqPSnAqUJo-kMDFuc)
list<type>
wheretype
is an abbreviated name of the class the container is forr2evans commentedon Oct 28, 2024
Okay. This is starting to become a little more than "add some rider code to a PR", so let me know (@MichaelChirico ) if this is out of scope to be appended (to an already-approved PR).
I think we can do this:
and
Which would result in
r2evans commentedon Nov 9, 2024
@MichaelChirico thoughts on that?
MichaelChirico commentedon Dec 6, 2024
That will induce a
Suggests
dependency on {vctrs} so it'll be a "no".Turns out there is a trivial fix for the bug at hand: #6637.