Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printing the first and last n observations for xts and/or zoo? #321

Closed
markushhh opened this issue Jan 11, 2020 · 31 comments
Closed

Printing the first and last n observations for xts and/or zoo? #321

markushhh opened this issue Jan 11, 2020 · 31 comments
Labels
feature request New features
Milestone

Comments

@markushhh
Copy link

markushhh commented Jan 11, 2020

No description provided.

@joshuaulrich joshuaulrich added the feature request New features label Aug 2, 2020
@Eluvias
Copy link

Eluvias commented Aug 3, 2020

If it helps, here is one approach. Of course needs testing, but it works for me so far.

library(xts)

xts_print <- function(x, n = 5) {

    if (is.null(colnames(x))) {
      nm <- paste0("X.", 1:ncol(x))
    } else {
      nm <- colnames(x)
    }

    df <- format(fortify.zoo(x), justify = "right")
    colnames(df) <- c("Index", nm)
    row.names(df) <- paste(format(rownames(df), justify = "right"),
                           ":", sep = "")

    nr <- nrow(df)

    if (nr <= n && nr <= 5) {

      print(df)

    } else {

      if (nr < n * 2) {
        n <- floor(nr / 2)
      }

      cat("\n")
      print(utils::head(df, n))

      ndigits <- nchar(nrow(df))

      if (ndigits >= 3) {
        cat(rep(" ", ndigits - 3), "---")
      } else {
        cat("---")
      }

      nm2 <- vector(mode = "numeric", ncol(x))
      for (i in 1:ncol(x)) {
        nm2[i] <- formatC(" ", width = nchar(nm[i]))
      }

      attr(df, "names") <- c("", nm2)
      print(utils::tail(df, n), right = TRUE, justify = "right")
    }
  }


data(sample_matrix)

samplexts <- as.xts(sample_matrix)


xts_print(samplexts)
#> 
#>           Index     Open     High      Low    Close
#>   1: 2007-01-02 50.03978 50.11778 49.95041 50.11778
#>   2: 2007-01-03 50.23050 50.42188 50.23050 50.39767
#>   3: 2007-01-04 50.42096 50.42096 50.26414 50.33236
#>   4: 2007-01-05 50.37347 50.37347 50.22103 50.33459
#>   5: 2007-01-06 50.24433 50.24433 50.11121 50.18112
#>  ---                                                   
#> 176: 2007-06-26 47.44300 47.61611 47.44300 47.61611
#> 177: 2007-06-27 47.62323 47.71673 47.60015 47.62769
#> 178: 2007-06-28 47.67604 47.70460 47.57241 47.60716
#> 179: 2007-06-29 47.63629 47.77563 47.61733 47.66471
#> 180: 2007-06-30 47.67468 47.94127 47.67468 47.76719

xts_print(samplexts, n = 1)
#> 
#>           Index     Open     High      Low    Close
#>   1: 2007-01-02 50.03978 50.11778 49.95041 50.11778
#>  ---                                                   
#> 180: 2007-06-30 47.67468 47.94127 47.67468 47.76719

xts_print(head(samplexts,10), n = 8)
#> 
#>          Index     Open     High      Low    Close
#>  1: 2007-01-02 50.03978 50.11778 49.95041 50.11778
#>  2: 2007-01-03 50.23050 50.42188 50.23050 50.39767
#>  3: 2007-01-04 50.42096 50.42096 50.26414 50.33236
#>  4: 2007-01-05 50.37347 50.37347 50.22103 50.33459
#>  5: 2007-01-06 50.24433 50.24433 50.11121 50.18112
#> ---                                                  
#>  6: 2007-01-07 50.13211 50.21561 49.99185 49.99185
#>  7: 2007-01-08 50.03555 50.10363 49.96971 49.98806
#>  8: 2007-01-09 49.99489 49.99489 49.80454 49.91333
#>  9: 2007-01-10 49.91228 50.13053 49.91228 49.97246
#> 10: 2007-01-11 49.88529 50.23910 49.88529 50.23910

# 2nd sample data
xm <- xts(cumsum(rnorm(100, 0, 0.2)), Sys.time() - 100:1)

xts_print(xm)
#> 
#>                    Index         X.1
#>   1: 2020-08-03 09:28:00  0.14533549
#>   2: 2020-08-03 09:28:01  0.26327216
#>   3: 2020-08-03 09:28:02  0.21394361
#>   4: 2020-08-03 09:28:03  0.20015489
#>   5: 2020-08-03 09:28:04  0.18350584
#>  ---                                    
#>  96: 2020-08-03 09:29:35 -1.74172313
#>  97: 2020-08-03 09:29:36 -1.66798390
#>  98: 2020-08-03 09:29:37 -1.47796503
#>  99: 2020-08-03 09:29:38 -1.16800551
#> 100: 2020-08-03 09:29:39 -1.18936443

@markushhh
Copy link
Author

markushhh commented Aug 27, 2020

I really liked your approach. Just now, I was improving your solution for the third time, and IMO the best solution is following:

library("xts")
library("data.table")

data(sample_matrix)
samplexts <- xts::as.xts(sample_matrix)

print.xts <- function(x, ...) {
    print(data.table::as.data.table(x))
}

print(samplexts)

I couldn't write better code than the authors of data.table and data.table's printing function is incredibly fast and reliable. Hence, depending on data.table is "the best" one can do. It seems kind of really unfortunate for your and my time being wasted like this... but I appreciate your work @Eluvias !
It doesn't really work with tibbles, since the index gets dropped and tsibbles have not (yet) implemented a converter method from xts but that's another story...

@joshuaulrich
Copy link
Owner

joshuaulrich commented Aug 28, 2020

The main issue I see with both of these solutions is that they make it appear like xts objects have an 'index' column, which is not true. That's likely to cause a lot of confusion.

This would also make xts inconsistent with zoo, and consistency with zoo is an objective because xts extends zoo. We need to consider differences in xts compared to zoo. I could discuss with the zoo team about adding a xts.max.print option that we could allow to be set to a one or two element vector. The two element version would allow you to specify how may head/tail observations to print. And it would allow users to set options(xts.max.print = getOption("max.print") to restore the prior behavior.

Also, with no disrespect to the data.table team, I'm not going to add a dependency on another package for a print method.

@jangorecki
Copy link

jangorecki commented Sep 16, 2020

print(data.table::as.data.table(x))

wouldn't make much sense because it has to copy whole object during conversion of xts (matrix) to data.table. Much easier just
simple concatenate print output of head and tail of xts.

@ghost
Copy link

ghost commented Sep 16, 2020 via email

@markushhh
Copy link
Author

markushhh commented Oct 22, 2020

The following code provides a solution for xts (print.xts) and zoo (print.zoo) objects. The methods do not change the general behaviour of the existing print methods. They just trim the output. The methods add the argument max with getOption("xts.max.print") and getOption("zoo.max.print"). What's your opinion on it?

library("xts")

check.TZ <- xts:::check.TZ
tformat <- xts:::tformat
coredata <- zoo::coredata


print.xts <- function(x,
                      fmt,
                      max = getOption("xts.max.print"),
                      ...) {
  check.TZ(x)
  if (missing(fmt)) {
    fmt <- tformat(x)
  }
  if (is.null(fmt)) {
    fmt <- TRUE
  }
  
  if (NROW(x) > max*2+1) {
    index <- as.character(index(x))
    index <- c(index[c(1:max)], "...", index[(NROW(x)-max+1):NROW(x)])
    y <- rbind(
      format(as.matrix(x[1:max, ])),
      format(matrix(rep("", NCOL(x)), nrow = 1)),
      format(as.matrix(x[(NROW(x)-max+1):NROW(x), ]))
    )
    rownames(y) <- format(index, justify = "right")
    colnames(y) <- colnames(x)
  } else {
    y <- coredata(x, fmt)
  }

  if (length(y) == 0) {
    if (!is.null(dim(x))) {
      p <- structure(vector(storage.mode(y)), dim = dim(x),
                     dimnames = list(format(index(x)), colnames(x)))
      print(p)
    } else {
      cat('Data:\n')
      print(vector(storage.mode(y)))
      cat('\n')
      cat('Index:\n')
      index <- index(x)
      if (length(index) == 0) {
        print(index)
      } else {
        print(str(index(x)))
      }
    }
  } else {
    print(y, quote = FALSE, right = TRUE, ...)
  }
}

print.zoo <- function (x,
                       style = ifelse(length(dim(x)) == 0, "horizontal", "vertical"), 
                       quote = FALSE,
                       max = getOption("zoo.max.print"),
                       ...) {
  
  style <- match.arg(style, c("horizontal", "vertical", "plain"))
  if (is.null(dim(x)) && length(x) == 0) {
    style <- "plain"
  }
  if (length(dim(x)) > 0 && style == "horizontal") {
    style <- "plain"
  }
  if (style == "vertical") {
    if (NROW(x) > max*2+1) {
      index <- index2char(index(x), frequency = attr(x, "frequency"))
      index <- c(index[c(1:max)], "...", index[(NROW(x)-max+1):NROW(x)])
      y <- rbind(
        format(as.matrix(x[1:max, ])),
        format(matrix(rep("", NCOL(x)), nrow = 1)),
        format(as.matrix(x[(NROW(x)-max+1):NROW(x), ]))
      )
      rownames(y) <- format(index, justify = "right")
      colnames(y) <- colnames(x)
    } else {
      y <- as.matrix(coredata(x))
      if (length(colnames(y)) < 1) {
        colnames(y) <- rep("", NCOL(y))
      }
      if (NROW(y) > 0) {
        rownames(y) <- index2char(index(x), frequency = attr(x, "frequency"))
      }
    }
    print(y, quote = quote, ...)
  } else if (style == "horizontal") {
    y <- as.vector(x)
    names(y) <- index2char(index(x), frequency = attr(x, "frequency"))
    print(y, quote = quote, ...)
  } else {
    cat("Data:\n")
    print(coredata(x), ...)
    cat("\nIndex:\n")
    print(index(x), ...)
  }
  invisible(x)
}

data("sample_matrix", package = "xts")
samplexts <- xts::as.xts(sample_matrix)
samplezoo <- zoo::as.zoo(sample_matrix)

options("xts.max.print" = 5)
options("zoo.max.print" = 5)

print.xts(samplexts)

#>                Open     High      Low    Close
#> 2007-01-02 50.03978 50.11778 49.95041 50.11778
#> 2007-01-03 50.23050 50.42188 50.23050 50.39767
#> 2007-01-04 50.42096 50.42096 50.26414 50.33236
#> 2007-01-05 50.37347 50.37347 50.22103 50.33459
#> 2007-01-06 50.24433 50.24433 50.11121 50.18112
#>        ...                                    
#> 2007-06-26 47.44300 47.61611 47.44300 47.61611
#> 2007-06-27 47.62323 47.71673 47.60015 47.62769
#> 2007-06-28 47.67604 47.70460 47.57241 47.60716
#> 2007-06-29 47.63629 47.77563 47.61733 47.66471
#> 2007-06-30 47.67468 47.94127 47.67468 47.76719

print.zoo(samplexts)

#>            Open     High     Low      Close   
#> 2007-01-02 50.03978 50.11778 49.95041 50.11778
#> 2007-01-03 50.23050 50.42188 50.23050 50.39767
#> 2007-01-04 50.42096 50.42096 50.26414 50.33236
#> 2007-01-05 50.37347 50.37347 50.22103 50.33459
#> 2007-01-06 50.24433 50.24433 50.11121 50.18112
#> ...                                    
#> 2007-06-26 47.44300 47.61611 47.44300 47.61611
#> 2007-06-27 47.62323 47.71673 47.60015 47.62769
#> 2007-06-28 47.67604 47.70460 47.57241 47.60716
#> 2007-06-29 47.63629 47.77563 47.61733 47.66471
#> 2007-06-30 47.67468 47.94127 47.67468 47.76719

print.zoo(samplezoo)

#>     Open     High     Low      Close   
#>   1 50.03978 50.11778 49.95041 50.11778
#>   2 50.23050 50.42188 50.23050 50.39767
#>   3 50.42096 50.42096 50.26414 50.33236
#>   4 50.37347 50.37347 50.22103 50.33459
#>   5 50.24433 50.24433 50.11121 50.18112
#> ...                                    
#> 176 47.44300 47.61611 47.44300 47.61611
#> 177 47.62323 47.71673 47.60015 47.62769
#> 178 47.67604 47.70460 47.57241 47.60716
#> 179 47.63629 47.77563 47.61733 47.66471
#> 180 47.67468 47.94127 47.67468 47.76719

library("microbenchmark")

x <- microbenchmark(
  zoo_old = invisible(capture.output(zoo:::print.zoo(samplexts))),
  xts_old = invisible(capture.output(xts:::print.xts(samplexts))),
  zoo_new = invisible(capture.output(print.zoo(samplexts))),
  xts_new = invisible(capture.output(print.xts(samplexts))),
  times = 1000
)
summary(x)

#>      expr    min      lq     mean  median      uq     max neval
#> 1 zoo_old 2.3590 2.46380 2.921920 2.59965 2.89375 12.7040  1000
#> 2 xts_old 2.3931 2.50755 2.972585 2.62770 2.92450  8.7730  1000
#> 3 zoo_new 1.7792 1.84510 2.236352 1.92520 2.16320  9.9530  1000
#> 4 xts_new 1.8103 1.88250 2.300003 1.96860 2.23665  9.1413  1000

@jangorecki
Copy link

jangorecki commented Oct 23, 2020

Looks neat

do not break the existing code

You mean you run checks of reverse dependencies (ideally including Suggested revdeps). As this is what CRAN will expect from maintainers of zoo and xts. If it does break any package then probably better to have this as an opt-in feature for at least one release before changing that to default.

@markushhh
Copy link
Author

that was misleading. I did not.

Setting options("xts.max.print" = Inf) for a transition should be enough.

@joshuaulrich
Copy link
Owner

@markushhh, this looks really good! Thanks for all the effort you put into it!

I've been talking with the zoo team about the potential for making this change in xts, and maybe in zoo too. No one is outright opposed, but we want to carefully consider the change. Here are a few things that came up:

  1. The intent behind zoo is to be compatible with ts objects. And xts has the same aim for zoo objects.
  2. What do we do for 1-dimensional zoo objects (i.e. vectors)?
  3. What is the threshold for when the truncation kicks in? I wouldn't want a 15-row object truncated when printing.
  4. There's a potential that this change could break tests that depend on the full output being printed. Reverse dependency checks would find these though, and we could send the authors a patch.
  5. We would need an option to disable the truncation. This would also help people migrate, and we could advise people to set the option to disable the truncation now, before the change is exposed a few releases from now.

@zeileis
Copy link
Collaborator

zeileis commented Oct 24, 2020

Thanks for the proposed code @markushhh. Thanks for the summary @joshuaulrich.

To expand on 2: I think it would be useful to avoid long printed chunks in the 1-d case as well. However, it is not clear to me what is a good general layout for this. A simple idea would be to print the head, a separate line with the ..., and then the tail:

z <- zoo(sin(1:100), as.Date("2000-01-01") + 0:99)
print1d <- function(x, ...) {
  x <- structure(as.vector(x), .Names = index2char(index(x), frequency = attr(x, "frequency")))
  print(head(x, 5))
  cat("...\n")
  print(tail(x, 5))
}
print1d(z)
## 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 
##  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243 
## ...
## 2000-04-05 2000-04-06 2000-04-07 2000-04-08 2000-04-09 
##  0.9835877  0.3796077 -0.5733819 -0.9992068 -0.5063656 

My feeling is, though, that this does not necessarily convey one vector of things and might be confused with the matrix layout.

Another idea would be to print it as one vector of c(head, empty, tail) where the empy element would have a ... index:

print1d <- function(x, ...) {
  x <- structure(format(as.vector(x)), .Names = index2char(index(x), frequency = attr(x, "frequency")))
  print(c(head(x, 5), structure("", .Names = "..."), tail(x, 5)), quote = FALSE)
}
print1d(z)
##   2000-01-01   2000-01-02   2000-01-03   2000-01-04   2000-01-05          ... 
##  0.841470985  0.909297427  0.141120008 -0.756802495 -0.958924275              
##   2000-04-05   2000-04-06   2000-04-07   2000-04-08   2000-04-09 
##  0.983587745  0.379607739 -0.573381872 -0.999206834 -0.506365641 

There it's really easy to miss the ... It's a bit better if it's not the end of the line but I'm also not thrilled about it.

options(digits = 4)
print1d(z)
## 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05        ... 2000-04-05 
##   0.841471   0.909297   0.141120  -0.756802  -0.958924              0.983588 
## 2000-04-06 2000-04-07 2000-04-08 2000-04-09 
##   0.379608  -0.573382  -0.999207  -0.506366 

Better ideas?

@ggrothendieck
Copy link

print.zoo has a style= argument. This could be an additional style.

> args(zoo:::print.zoo)
function (x, style = ifelse(length(dim(x)) == 0, "horizontal", 
    "vertical"), quote = FALSE, ...) 

@markushhh markushhh changed the title what about printing only the first and last 5 observations like a data.table? Printing the first and last n observations for xts and/or zoo? Oct 25, 2020
@markushhh
Copy link
Author

markushhh commented Oct 26, 2020

@joshuaulrich Thanks for talking to them!

@zeileis Thanks for joining in!

  1. Is there any existing code that tests for the compatibility between the classes?

  2. truncation of vectors is a very good question.

    • Another possibility would be to print ... at the beginning of the last line to prevent overseeing. But this might introduce asymmetry between the head and tail.
 1970-01-02    1970-01-03    1970-01-04    1970-01-05    1970-01-06 
 0.0137348254  0.8844110406 -1.5889070092 -1.3828891715  1.2165048537 
 1970-01-07    1970-01-08    1970-01-09    1970-01-10    1970-01-11 
-1.6170753365  0.4848673419 -0.1725599031  0.3682548469  0.3236398913 
 1970-01-12    1970-01-13    1970-01-14    1970-01-15    1970-01-16 
-0.9045243951 -1.2520928653 -0.0966016999  0.2222901724 -0.5781466642 
 ...           1970-01-28    1970-01-29    1970-01-30    1970-01-31 
 ...           0.9102255425  2.3607751726  1.0997868566  0.8708621780
  • I think printing ... at the beginning and at the end is too much.
1970-01-02    1970-01-03    1970-01-04    1970-01-05    1970-01-06 
 0.0137348254  0.8844110406 -1.5889070092 -1.3828891715  1.2165048537 
 1970-01-07    1970-01-08    1970-01-09    1970-01-10    ... 
-1.6170753365  0.4848673419 -0.1725599031  0.3682548469  ...
 ...           1970-01-28    1970-01-29    1970-01-30    1970-01-31 
 ...           0.9102255425  2.3607751726  1.0997868566  0.8708621780
  • I'd probably go for @zeileis 's first case where ... between the head and tail. The danger of confusion with matrices only occurs if you don't respect the index.
  1. Threshold

Truncation in other Languages and classes:

Language Class Truncation After n-th Row
R matrix 1000, getOption("max.print")
R data.frame 1000, getOption("max.print")
R vector 1000, getOption("max.print")
R data.table > 100, getOption("datatable.print.nrows");
prints the column names below the columns if 20 < nrow < 101
R tibble / tsibble > 20 getOption("tibble.print_max")
Julia DataFrame > 24
Julia Array n x 1 Array : > 26
1 x n Array: > 20
Python pandas.DataFrame no truncation?
  • base R truncates the output of vectors based on the number of observations. When max.print is reached it truncates the output and displays additional information
 [ reached getOption("max.print") -- omitted 99000 entries ]
  • Maybe the settings are arbitrary or it's preference. I don't really mind when it kicks in (as long as it's reasonably long, i.e. <= 100). To be consistent with base R, a vector should be printed horizontally, even though it's column major. Limiting the output to one or two lines is not useful nor appropriate for vectors. IMO default behavior for matrices could be at 50 (somewhat arbitrary!) and for vectors, it depends on the final decision how they are truncated, but in the end, it must be dynamically because the width is not static and depends on the user. Keyword: getOption("width").
  1. I'm going to run reverse dependency checks tonight with the package revdepcheck for xts (fewer dependencies than zoo) with the new printing method to get an overview of how many package tests depend on the output (and how). Is it enough to check for "Depends" and "Includes" or should I check for "Suggests" and "LinkingTo" as well? Bioconductor?

  2. Truncation can be disabled by setting options("zoo.max.print" = Inf) or options("xts.max.print" = Inf), which should the default for (at least) the initial release. I added an argument topn (inspired by data.table) for "head" and "tail".

  3. What about limiting the columns as well? The output for e.g. 10000 columns seems to be completely useless (IMO), in the old and new truncated behavior.

    6.1. There's a bug in the code which reduces topn if max.print get's too large, but I'll have a look at that.

    I'm currently testing some possible behaviors, e.g.

                 [,1]       [,2]       [,3]           [,6]       [,7]       [,8]
1970-01-02  1.9587855  0.4649187 -1.5189918 ...  0.5964707 -0.8898568 -0.9436546
1970-01-03  0.6700347  1.2181748  1.4143326 ... -0.8143729  0.3040398  0.4106147
1970-01-04  1.9587855  0.4649187 -1.5189918 ...  0.5964707 -0.8898568 -0.9436546
1970-01-05  0.6700347  1.2181748  1.4143326 ... -0.8143729  0.3040398  0.4106147
1970-01-06  1.9587855  0.4649187 -1.5189918 ...  0.5964707 -0.8898568 -0.9436546
       ...        ...        ...        ... ...        ...        ...        ...
1970-02-16 -0.8143729  0.3040398  0.4106147 ... -0.8143729  0.3040398  0.4106147
1970-02-17  0.5964707 -0.8898568 -0.9436546 ...  0.5964707 -0.8898568 -0.9436546
1970-02-18 -0.8143729  0.3040398  0.4106147 ... -0.8143729  0.3040398  0.4106147
1970-02-19  0.5964707 -0.8898568 -0.9436546 ...  0.5964707 -0.8898568 -0.9436546
1970-02-20 -0.8143729  0.3040398  0.4106147 ... -0.8143729  0.3040398  0.4106147

or

                 [,1]       [,2]       [,3]           [,6]       [,7]       [,8]
1970-01-02  1.9587855  0.4649187 -1.5189918 ...  0.5964707 -0.8898568 -0.9436546
1970-01-03  0.6700347  1.2181748  1.4143326     -0.8143729  0.3040398  0.4106147
1970-01-04  1.9587855  0.4649187 -1.5189918      0.5964707 -0.8898568 -0.9436546
1970-01-05  0.6700347  1.2181748  1.4143326     -0.8143729  0.3040398  0.4106147
1970-01-06  1.9587855  0.4649187 -1.5189918      0.5964707 -0.8898568 -0.9436546
...                                                                          ...
1970-02-16 -0.8143729  0.3040398  0.4106147     -0.8143729  0.3040398  0.4106147
1970-02-17  0.5964707 -0.8898568 -0.9436546      0.5964707 -0.8898568 -0.9436546
1970-02-18 -0.8143729  0.3040398  0.4106147     -0.8143729  0.3040398  0.4106147
1970-02-19  0.5964707 -0.8898568 -0.9436546      0.5964707 -0.8898568 -0.9436546
1970-02-20 -0.8143729  0.3040398  0.4106147 ... -0.8143729  0.3040398  0.4106147

any idea/advice?

@markushhh
Copy link
Author

@ggrothendieck for xts a vector display is useless since there are no vectors in xts. Plain display would be possible though, I don't need it. If it's desired I can implement it. What is the use case of plain? In case the index or coredata is malformed?

@markushhh
Copy link
Author

markushhh commented Oct 26, 2020

I think following style is a good example where vectors could be mixed up with matrices

## 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 
##  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243 
##     ...        ...        ...        ...        ...        
## 2000-04-05 2000-04-06 2000-04-07 2000-04-08 2000-04-09 
##  0.9835877  0.3796077 -0.5733819 -0.9992068 -0.5063656 

@ggrothendieck
Copy link

print.zoo is pretty short so if you need clarification see its source. https://github.com/rforge/zoo/blob/master/pkg/zoo/R/zoo.R

@markushhh
Copy link
Author

@ggrothendieck Thanks. When do you need the plain style?

@markushhh
Copy link
Author

In Julia they don't care about ... being in the middle.

julia> [collect(1000000:10000000)]
1-element Array{Array{Int64,1},1}:
 [1000000, 1000001, 1000002, 1000003, 1000004, 1000005, 1000006, 1000007, 1000008, 1000009  …  9999991, 9999992, 9999993, 9999994, 9999995, 9999996, 9999997, 9999998, 9999999, 10000000]

@zeileis
Copy link
Collaborator

zeileis commented Oct 26, 2020

Thanks @markushhh for collecting all this information, very useful! Just a couple of comments:

  • The plain style is mostly used for zero-length series:
    zoo()
    ## Data:
    ## numeric(0)
    ## 
    ## Index:
    ## integer(0)
    
  • What is across the different systems the general preference regarding showing head and tail vs. head only? Both base R and tibble show only the head (albeit the head is allowed to be rather long in base R).
  • Showing only the head would also facilitate the issue of where to print the ... for 1-d series.
  • What about adding the information how many elements are omitted and/or how many elements there are overall. Base R only shows the former, tibble shows both.
  • Limiting the columns as well is a good idea. I like the display with fewer ... better.

@braverock
Copy link
Contributor

  • What is across the different systems the general preference regarding showing head and tail vs. head only? Both base R and tibble show only the head (albeit the head is allowed to be rather long in base R).
  • Showing only the head would also facilitate the issue of where to print the ... for 1-d series.

Many time series are "ragged", and several columns will start with NA's. So head and tail has the advantage of showing the most recent data where one will often have a more complete sample.

  • What about adding the information how many elements are omitted and/or how many elements there are overall. Base R only shows the former, tibble shows both.

I agree this is a good idea for a more informative print method.

  • Limiting the columns as well is a good idea. I like the display with fewer ... better.

Agreed.

@markushhh
Copy link
Author

@zeileis for zero-length series, plain style is in xts already implemented. No need for the extra argument. It's open to discuess whether there's a need for it in zoo. I guess that depends on zoo's dependencies, right?

What about adding the information how many elements are omitted and/or how many elements there are overall. Base R only shows the former, tibble shows both.

I'm down! (printing both)

@zeileis
Copy link
Collaborator

zeileis commented Oct 26, 2020

Printing dimension: I agree. I also like printing both the overall dimension and the number of elements omitted.

Plain style: zoo always had this argument, not sure who actually uses it (not me). It could be debated whether we should have introduced it or not. But given we have I think we ought to stick to it.

Head only vs. head and tail: Convincing argument by Brian that in time series the tail is typically the most recent information and should be included.

@joshuaulrich joshuaulrich added this to the 0.12.3 milestone Oct 12, 2022
@joshuaulrich
Copy link
Owner

I've started working on this because I want it. :) I started with @markushhh's implementation (thanks again!). Here's what we still need:

  1. Truncate the number of columns if the result would be > than getOption("width"), and add an argument and option to set it.
  2. Determine how many rows to print before we truncate. I prefer 50 because that works for my screen. But I wouldn't be opposed to 100, like data.table. I think we should use the max argument for this.
  3. Handle the zoo 1-d case.
  4. I'd also like to add a blank line between rows when columns would wrap (when columns > screen width). data.table uses trunc.cols (TRUE/FALSE) for this. I'd like to also support the number of columns too.
  5. Printing dimensions. Not sure how I feel about this. That's something the str() function does.

Did I miss anything? Any other thoughts?

@joshuaulrich
Copy link
Owner

I also started working on something similar for str.xts(): #378

I'd appreciate everyone thoughts on that too!

@ethanbsmith
Copy link
Contributor

+1 for leaving index and dim output in str()

joshuaulrich added a commit that referenced this issue Oct 24, 2022
Refactor print.xts() to only show the first and last 'max' rows if the
number of rows is > 'trunc.rows'. Also truncate the number of columns
if they would wrap to a new line.

See #321.
@joshuaulrich
Copy link
Owner

I'm starting to come around to the idea of including them in the print() output too. Still on the fence though... but I just had an idea about how to include them: it could go with the ellipses in the middle. For example:

# zoo 1-d vector

## 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 
##  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243 
## ... (zoo vector with `n` elements omitted)
## 2000-04-05 2000-04-06 2000-04-07 2000-04-08 2000-04-09 
##  0.9835877  0.3796077 -0.5733819 -0.9992068 -0.5063656

# zoo matrix

##            Open     High     Low      Close   
## 2007-01-02 50.03978 50.11778 49.95041 50.11778
## 2007-01-03 50.23050 50.42188 50.23050 50.39767
## 2007-01-04 50.42096 50.42096 50.26414 50.33236
## 2007-01-05 50.37347 50.37347 50.22103 50.33459
## 2007-01-06 50.24433 50.24433 50.11121 50.18112
## ... (zoo matrix with `n` rows omitted)
## 2007-06-26 47.44300 47.61611 47.44300 47.61611
## 2007-06-27 47.62323 47.71673 47.60015 47.62769
## 2007-06-28 47.67604 47.70460 47.57241 47.60716
## 2007-06-29 47.63629 47.77563 47.61733 47.66471
## 2007-06-30 47.67468 47.94127 47.67468 47.76719

@joshuaulrich
Copy link
Owner

Here's a first draft of printing zoo vectors.

diff --git a/pkg/zoo/R/zoo.R b/pkg/zoo/R/zoo.R
index 39c554b..2ae8224 100644
--- a/pkg/zoo/R/zoo.R
+++ b/pkg/zoo/R/zoo.R
@@ -71,7 +71,39 @@ print.zoo <- function (x, style = ifelse(length(dim(x)) == 0,
     else if (style == "horizontal") {
         y <- as.vector(x)
         names(y) <- index2char(index(x), frequency = attr(x, "frequency"))
-        print(y, quote = quote, ...)
+
+        beg <- NULL
+        end <- NULL
+        n_beg <- 1
+        n_end <- 1
+        while (length(beg) < 3 || length(end) < 3) {
+          if (length(beg) < 3) {
+            beg <- utils::capture.output(print.default(head(y, n_beg)))
+            n_beg <- n_beg + 1
+          }
+          if (length(end) < 3) {
+            end <- utils::capture.output(print.default(tail(y, n_end)))
+            n_end <- n_end + 1
+          }
+        }
+        beg <- utils::capture.output(print.default(head(y, n_beg-2)))
+        end <- utils::capture.output(print.default(tail(y, n_end-2)))
+
+        n_obs <- 1
+        for (i in seq_along(y)) {
+          o <- utils::capture.output(print.default(y[seq_len(i)]))
+          if (length(o) > 2) {
+            # output has wrapped to a new line
+            n_obs <- i - 1
+            break
+          }
+        }
+        o <- utils::capture.output(print.default(head(y, n_obs), quote = quote, ...))
+        p <- utils::capture.output(print.default(tail(y, n_obs), quote = quote, ...))
+        more_rows <- paste0("... zoo vector with ", length(y) - 2*n_obs,
+                            " more observations")
+        z <- matrix(c(o, more_rows, p), ncol = 1)
+        writeLines(z)
     }
     else {
         cat("Data:\n")

And the output is:

R$ z <- zoo(1:100, .Date(1:100))
R$ print(z)
1970-01-02 1970-01-03 1970-01-04 1970-01-05 1970-01-06 1970-01-07 1970-01-08 1970-01-09 1970-01-10 1970-01-11 
         1          2          3          4          5          6          7          8          9         10 
... zoo vector with 80 more observations
1970-04-02 1970-04-03 1970-04-04 1970-04-05 1970-04-06 1970-04-07 1970-04-08 1970-04-09 1970-04-10 1970-04-11 
        91         92         93         94         95         96         97         98         99        100 
 

@zeileis
Copy link
Collaborator

zeileis commented Nov 22, 2022

Thanks for having a go at this Josh @joshuaulrich ! Comments:

  • index2char() names:
    Unfortunately, for this application, index2char() internally relies on as.character() rather than format(). My guess is that I didn't know better at the time of writing. But possibly it was also a design decision because index2char() is not only used for printing but also in merge(). In any case, we cannot rely on the names of y having the same number of characters. For Date this is the case, presumably also POSIXt, but not plain numeric. Consider printing: zoo(rep_len(0:9, 1000), 1:1000). The head() uses just 2 lines but the tail 4. I see 3 ways to go: (a) Determine n_obs based on the head rater than the tail. (b) Determine the lengths of head and tail separately. (c) Assure that the names(y) all have the same number of characters, e.g., via

    names(y) <- format(index2char(index(x), frequency = attr(x, "frequency")), justify = "right")
    
  • Difference between n_beg/n_end and n_obs:
    If option (c) is used above, then it is probably enough to use only n_obs and omit the code determining separate n_beg and n_end. In any case, only one of the two approaches seems to be necessary. Question: Is there a particular reason why you use head() and tail() in most places but [seq_len(...)] when determining n_obs?

  • Inserted line for more rows:
    My personal impression would be that "with ... observations omitted" would be clearer than "with ... more observations". In the latter case I found myself wondering whether the "more observations" include those shown at the end, because I was reading top-down. I would also add "..." at the end of the line as well.

  • Condition for omitting observations:
    There should probably be a check whether we need to omit any observations at all. This should be consistent with the matrix printing, e.g., allowing up to a certain number of lines of output. You mention above that you would be ok with up to 50 or even 100 lines. Personally, I would probably prefer less, maybe 20 or 30. But I'm open for discussion here.

In addition with a few further tweaks (naming objects, breaking from the loop, always using quote = quote, ..., etc.), my implementation would be:

        y <- as.vector(x)
        names(y) <- format(index2char(index(x), frequency = attr(x, "frequency")), justify = "right")
        n_tot <- length(y)
        n_obs <- 1L
        if(n_tot > 10L) { ## only consider omitting observations if n_tot > 10 (see below)
          y_head <- utils::capture.output(print.default(y[1L], quote = quote, ...))
          for (i in 2L:n_tot) {
            y_next <- utils::capture.output(print.default(y[1L:i], quote = quote, ...))
            if (length(y_next) > 2L) { ## output has wrapped to a new line
              break
            } else {
              y_head <- y_next
              n_obs <- n_obs + 1L
            }
          }
        }
        if(n_tot > 10L * n_obs) { ## more than 20 lines when fully printed
          y_tail <- utils::capture.output(print.default(y[n_tot - n_obs:1L + 1L], quote = quote, ...))
          y_more <- sprintf("... zoo vector with %s observations omitted ...", n_tot - 2L * n_obs)
          writeLines(c(y_head, y_more, y_tail))
        } else {
          print(y, quote = quote, ...)
        }

@joshuaulrich
Copy link
Owner

Thanks for having a go at this Josh!

Happy to! I thought it was most efficient to use my knowledge of doing this with print.xts() to give you something to tweak using your knowledge of what zoo needed to do.

If option (c) is used above, then it is probably enough to use only n_obs and omit the code determining separate n_beg and n_end ... Question: Is there a particular reason why you use head() and tail() in most places but [seq_len(...)] when determining n_obs?

Agree about only using n_obs. y[seq_len(i)] most likely came from my copy/paste of the print.xts() code. I doubt there's a good reason to use it other than head/tail. I use head/tail elsewhere because I prefer tail() to y[n:length(y)].

  • Inserted line for more rows:

Agree with all your comments here.

  • Condition for omitting observations:

Agreed with allowing a number of observations before truncating. I like 50 lines because that's roughly what fits vertically on my laptop screen. That would be 25 1-d zoo vector observations because there are 2 lines/observation.

I don't have strong feelings about this because changing it later shouldn't be an issue, especially if we provide a global option for users to set their personal preference.

@joshuaulrich
Copy link
Owner

This is going into the 0.13.0 xts release.

@ethanbsmith
Copy link
Contributor

overall i like this feature and think its a good idea. just one thing i have found a bit frustrating is that head() and tail() no longer work as they used to. i sometimes want to look at a specific set of data, eg: tail(x, 45). however, if the n is less than print's default, the output still gets compressed. there is probably a way to work around this, but im not sure this change in behavior in this scenario is desirable.

@joshuaulrich
Copy link
Owner

I encountered this too and it needs to be fixed before release. Can you create another issue with a reproducible example for this bug?

joshuaulrich added a commit that referenced this issue Mar 17, 2023
The top/bottom rows could have a different number of decimal places
and there are often multiple variying spaces between columns. For
example:

                          close      volume          ma         bsi
2022-01-03 09:31:00     476.470  803961.000          NA   54191.000
2022-01-03 09:32:00     476.700  179476.000          NA   53444.791
2022-01-03 09:33:00     476.540  197919.000          NA  -16334.994
                ...
2023-03-16 14:52:00    394.6000  46728.0000    392.8636  28319.4691
2023-03-16 14:53:00    394.6500  64648.0000    392.8755  15137.6857
2023-03-16 14:54:00    394.6500  69900.0000    392.8873  -1167.9368

There are 4 spaces between the index and the 'close' column, 2 between
'close' and 'volume', 4 between 'volume' and 'ma', and 2 between 'ma'
and 'bsi'. There should be a consistent number of spaces between the
columns. Most other classes of objects print with 1 space between the
columns.

The top rows have 3 decimals and the bottom rows have 4. These should
also be the same.

See #321.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New features
Projects
None yet
Development

No branches or pull requests

8 participants