Skip to content

list sub-class with format() method prints full contents #5948

Open
@jesse-smith

Description

I ran into this when converting a nested tibble (i.e. with a list column of tibbles) to a data.table. Normally, list columns print with something like <tibble[3x1]> in data.table; however, these columns printed as a string displaying the entirety of the nested tibbles contents (in my case, ~20k rows of data per tibble). This appears to be due to a check for the existence of a format() method for the column type in data.table:::format_col(). In this case, the list columns were vctrs_list_of class, which implements its own format.vctrs_list_of() method. See below reprex for an example.

Fixing this is easy enough (see below reprex for proposed solution), but it would change the default printing behavior of list-cols. Any list subclass would then have to implement a format_list_item() method to get special treatment. On the other hand, any list subclass could then implement that method and get special treatment.

To me, defaulting to the standard list print behavior is an improvement, given that format() methods for list-like classes are generally not going to product output suitable for a column in a data.table (which is why format_list_item() is needed in the first place).

Reprex is below, along with proposed solution. I'm happy to file a PR but wanted to make sure the change in defaults is acceptable in principle first. Thanks!

# Setup -----------------------------------------------------------------------

# Use development version of {data.table}
data.table::update_dev_pkg()
#> R data.table package is up-to-date at 8f8ef9343dcabefa9e4cb0af4251cbb74dae9f55 (1.15.99)
# Attach internals
ns <- asNamespace("data.table")
name <- format(ns)
attach(ns, name = name, warn.conflicts = FALSE)

# Create example data
dt <- data.table(
  list_col = list(data.table(a = 1:3), data.table(a = 4:6)),
  list_of_col = vctrs::list_of(data.table(a = 1:3), data.table(a = 4:6))
)


# Problem ---------------------------------------------------------------------
# `format_col()` does not format `list_of` columns from {vctrs} as lists

# Print `data.table`
print(dt)
#>             list_col     list_of_col
#>               <list> <vctrs_list_of>
#> 1: <data.table[3x1]>         1, 2, 3
#> 2: <data.table[3x1]>         4, 5, 6

# Format individually
format_col(dt$list_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
format_col(dt$list_of_col)
#> [1] "1, 2, 3" "4, 5, 6"


# Solution --------------------------------------------------------------------

# Current function definition:
# format_col.default = function(x, ...) {
#   if (!is.null(dim(x)))
#     "<multi-column>"
#   else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
#     formatted
#   else if (is.list(x))
#     vapply_1c(x, format_list_item, ...)
#   else
#     format(char.trunc(x), ...)
# }

# Swapping the order of the `else if` statements in `format_col()` will fix the issue
# `format_col_updated()` formats `list_of` columns from {vctrs} as lists
format_col_updated = function(x, ...) {
  if (!is.null(dim(x)))
    "<multi-column>"
  else if (is.list(x)) # Now comes before `has_format_method()` check
    vapply_1c(x, format_list_item, ...)
  else if (has_format_method(x) && length(formatted<-format(x, ...))==length(x))
    formatted
  else
    format(char.trunc(x), ...)
}

# Replace as default method
registerS3method("format_col", "default", format_col_updated)

# Print `data.table`
print(dt)
#>             list_col       list_of_col
#>               <list>   <vctrs_list_of>
#> 1: <data.table[3x1]> <data.table[3x1]>
#> 2: <data.table[3x1]> <data.table[3x1]>

# Print individually
# Still formatted as list
format_col_updated(dt$list_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"
# Now also formatted as list
format_col_updated(dt$list_of_col)
#> [1] "<data.table[3x1]>" "<data.table[3x1]>"

# Clean up --------------------------------------------------------------------
detach(name, character.only = TRUE)
rm(list = ls())

Created on 2024-02-21 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31 ucrt)
#>  os       Windows 11 x64 (build 22631)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       America/Chicago
#>  date     2024-02-21
#>  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.2)
#>  data.table    1.15.99 2024-02-21 [1] local
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.2)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.2)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.2)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.2)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.2)
#>  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.2)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.2)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.1)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.1)
#>  R.utils       2.12.3  2023-11-18 [1] CRAN (R 4.3.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.2)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.2)
#>  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.2)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
#>  styler        1.10.2  2023-08-29 [1] CRAN (R 4.3.2)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.2)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.2)
#>  xfun          0.41    2023-11-01 [1] CRAN (R 4.3.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.2)
#> 
#>  [1] D:/ProgramFiles/R/R-4.3.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Activity

tdhock

tdhock commented on Feb 21, 2024

@tdhock
Member

Hi Jesse, thanks for writing. Is there a way to fix your issue in a way that maintains backward compatibility with the existing print output? One of the guiding principles of data.table is stability/back-compatibility https://github.com/Rdatatable/data.table/blob/master/GOVERNANCE.md#the-r-package so it would be much easier to accept a code contribution that does not change what is printed.

jesse-smith

jesse-smith commented on Feb 22, 2024

@jesse-smith
Author

Hey Toby, thanks for getting back. The current behavior is actually not backwards compatible - list sub-classes print identically to bare lists through v1.14.10, and it's only in v1.15.0 and later that this "print everything" behavior occurs. The proposal is essentially to revert to the previous behavior, because there are cases where the current behavior is pathological, and I'm not sure the current behavior was actually intended (I can't find anything in NEWS that mentions it).

r2evans

r2evans commented on Oct 28, 2024

@r2evans
Contributor

This formats differently because getS3method("format", class="vctrs_list_of") is defined whereas getS3method("format", class="list") is not. My #6593 does not resolve this.

This may be a frustrating issue. I like the notion that format.data.table honors format.<col-class> if found, but the fact that vctrs::format.vctrs_list_of is simply a wrapper around format.default seems less useful (to me).

If you go down the road of over-riding for this class, then pandora's box of "override this class and that class and that other class" may not be desired, plus what happens if/when vctrs changes its format S3 methods.

While frustrating, I wonder if the better path to resolve this aesthetic request is to request that vctrs::format.vctrs_list_of does something different, in which case it will self-resolve within the data.table ecosystem.

MichaelChirico

MichaelChirico commented on Oct 28, 2024

@MichaelChirico
Member
DavisVaughan

DavisVaughan commented on Oct 28, 2024

@DavisVaughan
Contributor

I wonder if the better path to resolve this aesthetic request is to request that vctrs::format.vctrs_list_of does something different

I don't think so, no. At the end of the day, we are simply following format.list(), and it seem reasonable that format(<list>) and format(<list-of>) do exactly the same thing.

On the tibble side, both pillar:::pillar_shaft.list() and pillar:::pillar_shaft.vctrs_list_of() exist for controlling how they are printed.

I do not think it is out of the question for you to provide format_col.vctrs_list_of() that gets conditionally registered using an .onLoad hook for the vctrs package. Kind of like this, where in vctrs we specify that "when dplyr is loaded, go ahead and register a vec_restore() method for grouped_df"
https://github.com/r-lib/vctrs/blob/78d9f2b0b24131b5ce2230eb3c2c9f93620b10d9/R/zzz.R#L33-L36

r2evans

r2evans commented on Oct 28, 2024

@r2evans
Contributor

I'm good with including that in my current PR (if that doesn't get too complicated). If yes, I'm assuming dt "should" look like this:

dt
#             list_col       list_of_col
#               <list>      <vctrs_list>
# 1: <data.table[3x1]> <data.table[3x1]>
# 2: <data.table[3x1]> <data.table[3x1]>

I added the header class, should it be vctrs_list_of, vctrs_vctr, or just list?

DavisVaughan

DavisVaughan commented on Oct 28, 2024

@DavisVaughan
Contributor

For data.table users I imagine it might make the most sense if you have the full class, since this probably isn't a super common class to see in the data.table world, and in that case the full explicit name is easier to understand than hiding some class name details.

We have some tooling for showing list<type> where type is an abbreviated name of the class the container is for
Screenshot 2024-10-28 at 3 57 34 PM

r2evans

r2evans commented on Oct 28, 2024

@r2evans
Contributor

Okay. This is starting to become a little more than "add some rider code to a PR", so let me know (@MichaelChirico ) if this is out of scope to be appended (to an already-approved PR).

I think we can do this:

@@ -105,6 +105,14 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
       expression = "<expr>", ordered = "<ord>")
     classes = classes1(x)
     abbs = unname(class_abb[classes])
+    abb_na = is.na(abbs)
+    if (any(abb_na) && any(classes[abb_na] %in% c("vctrs_list_of", "vctrs_vctr"))) {
+      if (!requireNamespace("vctrs", quietly=TRUE)) {
+        warningf("Some columns are class 'vctrs_list_of' but package vctrs is not available. Those columns will print as regular objects. Simplify install.packages('vctrs') to obtain the vctrs_list_of format method and print the data again.")
+      } else {
+        abbs[abb_na] <- paste0("<", vapply_1c(x, vctrs::vec_ptype_abbr)[abb_na], ">")
+      }
+    }
     if ( length(idx <- which(is.na(abbs))) ) abbs[idx] = paste0("<", classes[idx], ">")
     toprint = rbind(abbs, toprint)
     rownames(toprint)[1L] = ""
@@ -207,6 +215,10 @@ format_col.default = function(x, ...) {
     format(char.trunc(x), ...) # relevant to #37
 }
 
+format_col.vctrs_list_of = function(x, ...) {
+  vapply_1c(x, format_list_item, ...)
+}
+
 # #2842 -- different columns can have different tzone, so force usage in output
 format_col.POSIXct = function(x, ..., timezone=FALSE) {
   if (timezone) {

and

@@ -199,6 +199,7 @@ export(format_col)
 S3method(format_col, default)
 S3method(format_col, POSIXct)
 S3method(format_col, expression)
+S3method(format_col, vctrs_list_of)
 export(format_list_item)
 S3method(format_list_item, default)
 S3method(format_list_item, data.frame)

Which would result in

dt
#             list_col       list_of_col
#               <list>    <list<dt[,1]>>
# 1: <data.table[3x1]> <data.table[3x1]>
# 2: <data.table[3x1]> <data.table[3x1]>
r2evans

r2evans commented on Nov 9, 2024

@r2evans
Contributor

@MichaelChirico thoughts on that?

MichaelChirico

MichaelChirico commented on Dec 6, 2024

@MichaelChirico
Member

That will induce a Suggests dependency on {vctrs} so it'll be a "no".

Turns out there is a trivial fix for the bug at hand: #6637.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      `list` sub-class with `format()` method prints full contents · Issue #5948 · Rdatatable/data.table