-
Notifications
You must be signed in to change notification settings - Fork 57
/
Copy pathexpand.dtplyr_step.Rd
84 lines (73 loc) · 3.1 KB
/
expand.dtplyr_step.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/step-subset-expand.R
\name{expand.dtplyr_step}
\alias{expand.dtplyr_step}
\title{Expand data frame to include all possible combinations of values.}
\usage{
\method{expand}{dtplyr_step}(data, ..., .name_repair = "check_unique")
}
\arguments{
\item{data}{A \code{\link[=lazy_dt]{lazy_dt()}}.}
\item{...}{Specification of columns to expand. Columns can be atomic vectors
or lists.
\itemize{
\item To find all unique combinations of \code{x}, \code{y} and \code{z}, including those not
present in the data, supply each variable as a separate argument:
\code{expand(df, x, y, z)}.
\item To find only the combinations that occur in the
data, use \code{nesting}: \code{expand(df, nesting(x, y, z))}.
\item You can combine the two forms. For example,
\code{expand(df, nesting(school_id, student_id), date)} would produce
a row for each present school-student combination for all possible
dates.
}
Unlike the data.frame method, this method does not use the full set of
levels, just those that appear in the data.
When used with continuous variables, you may need to fill in values
that do not appear in the data: to do so use expressions like
\code{year = 2010:2020} or \code{year = full_seq(year,1)}.}
\item{.name_repair}{Treatment of problematic column names:
\itemize{
\item \code{"minimal"}: No name repair or checks, beyond basic existence,
\item \code{"unique"}: Make sure names are unique and not empty,
\item \code{"check_unique"}: (default value), no name repair, but check they are
\code{unique},
\item \code{"universal"}: Make the names \code{unique} and syntactic
\item a function: apply custom name repair (e.g., \code{.name_repair = make.names}
for names in the style of base R).
\item A purrr-style anonymous function, see \code{\link[rlang:as_function]{rlang::as_function()}}
}
This argument is passed on as \code{repair} to \code{\link[vctrs:vec_as_names]{vctrs::vec_as_names()}}.
See there for more details on these terms and the strategies used
to enforce them.}
}
\description{
This is a method for the tidyr \code{expand()} generic. It is translated to
\code{\link[data.table:J]{data.table::CJ()}}.
}
\examples{
library(tidyr)
fruits <- lazy_dt(tibble(
type = c("apple", "orange", "apple", "orange", "orange", "orange"),
year = c(2010, 2010, 2012, 2010, 2010, 2012),
size = factor(
c("XS", "S", "M", "S", "S", "M"),
levels = c("XS", "S", "M", "L")
),
weights = rnorm(6, as.numeric(size) + 2)
))
# All possible combinations ---------------------------------------
# Note that only present levels of the factor variable `size` are retained.
fruits \%>\% expand(type)
fruits \%>\% expand(type, size)
# This is different from the data frame behaviour:
fruits \%>\% dplyr::collect() \%>\% expand(type, size)
# Other uses -------------------------------------------------------
fruits \%>\% expand(type, size, 2010:2012)
# Use `anti_join()` to determine which observations are missing
all <- fruits \%>\% expand(type, size, year)
all
all \%>\% dplyr::anti_join(fruits)
# Use with `right_join()` to fill in missing rows
fruits \%>\% dplyr::right_join(all)
}