Skip to content

Commit

Permalink
finished first draft of scoped dplyr slides
Browse files Browse the repository at this point in the history
  • Loading branch information
bradleyboehmke committed Dec 20, 2018
1 parent 560ef91 commit 57647e6
Show file tree
Hide file tree
Showing 48 changed files with 435 additions and 31 deletions.
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ following is an outline of the material covered in this training:
| :----------------------------------------------------------------------------- | :-----------: |
| Breakfast / Social time | 8:00 - 9:00 |
| [Introductions](https://uc-r.github.io/Intermediate-R/day-1a-intro.html) | 9:00 - 9:30 |
| Scoped variable transformations | 9:30 - 10:45 |
| [Scoped variable transformations](https://uc-r.github.io/Intermediate-R/day-1b-scoped-dplyr.html) | 9:30 - 10:45 |
| Break | 10:45 - 11:00 |
| Control flow | 11:00 - 12:00 |
| Lunch | 12:00 - 1:00 |
Expand Down
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,18 @@ training:
**Day
1**

| Topic | Time |
| :----------------------------------------------------------------------- | :-----------: |
| Breakfast / Social time | 8:00 - 9:00 |
| [Introductions](https://uc-r.github.io/Intermediate-R/day-1a-intro.html) | 9:00 - 9:30 |
| Scoped variable transformations | 9:30 - 10:45 |
| Break | 10:45 - 11:00 |
| Control flow | 11:00 - 12:00 |
| Lunch | 12:00 - 1:00 |
| Work flow | 1:00 - 2:30 |
| Break | 2:30 - 2:45 |
| Case study | 2:45 - 4:00 |
| Q\&A | 4:00 - 4:30 |
| Topic | Time |
| :------------------------------------------------------------------------------------------------ | :-----------: |
| Breakfast / Social time | 8:00 - 9:00 |
| [Introductions](https://uc-r.github.io/Intermediate-R/day-1a-intro.html) | 9:00 - 9:30 |
| [Scoped variable transformations](https://uc-r.github.io/Intermediate-R/day-1b-scoped-dplyr.html) | 9:30 - 10:45 |
| Break | 10:45 - 11:00 |
| Control flow | 11:00 - 12:00 |
| Lunch | 12:00 - 1:00 |
| Work flow | 1:00 - 2:30 |
| Break | 2:30 - 2:45 |
| Case study | 2:45 - 4:00 |
| Q\&A | 4:00 - 4:30 |

**Day 2**

Expand Down
170 changes: 161 additions & 9 deletions docs/day-1b-scoped-dplyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -600,7 +600,46 @@ flights %>%
class: yourturn
# Your Turn!

.pull-left[

### Challenge

Using the `flights` data:

1. convert month and day variables to type character
2. group by month and day
3. select all variables containing "time" and "delay"
4. compute the mean of all "time" and "delay" variables

```r
# hint
flights %>%
mutate(____) %>%
group_by(____) %>%
select(____) %>%
summarize_all(____)
```

]

--

.pull-right[

### Solution

```{r}
flights %>%
mutate(
month = as.character(month),
day = as.character(day)
) %>%
group_by(month, day) %>%
select(contains("time"), contains("delay")) %>%
summarize_all(mean, na.rm = TRUE)
```

]

---

Expand Down Expand Up @@ -661,7 +700,7 @@ Perform some operation on all variables that meet a condition

# Transform .red[some] variables with .red[`*_if()`]

Back to our problem of _wanting to standardize only .red[only numeric variables]_:
Back to our problem of _wanting to standardize .red[only numeric variables]_:

.font150.center[`df %>% mutate_if(.predicate, .funs, ...)`]

Expand Down Expand Up @@ -892,12 +931,6 @@ flights %>%

]

---
class: yourturn

# Your Turn!


---

# `filter_*()` .red[and its helpers]
Expand All @@ -918,24 +951,143 @@ class: yourturn

---

# Filter .red[all] rows
# Filtering rows that meet certain conditions

.pull-left[

* The .red[`all_vars()`] function can be used to filter rows where .red[all variables meet the same logical condition]. <br><br>

```{r filter-at-with-all-vars}
# This will return rows where all variables containing "delay" are NA
flights %>%
filter_at(vars(contains("delay")), all_vars(is.na(.)))
```

]

--

.pull-right[

* The .red[`any_vars()`] function can be used to filter rows where .red[at least one variable meets the logical condition].

```{r filter-at-with-any-vars}
# This will return rows where any variable containing "delay" is NA
flights %>%
filter_at(vars(contains("delay")), any_vars(is.na(.)))
```

]

---

# `group_by_*()` .red[and its helpers]

Say we wanted to compute the median delay values for carriers by month and in doing so, we wanted to treat both `carrier` and `month` as factors.

.pull-left[

.center.font120.bold[Option A]

```{r}
flights %>%
mutate(
carrier = as.factor(carrier),
month = as.factor(month)
) %>%
group_by(carrier, month) %>%
summarize_at(vars(contains("delay")), median, na.rm = TRUE)
```

]

--

.pull-right[

.center.font120.bold[Option B]

```{r}
flights %>%
group_by_at(vars(carrier, month), as.factor) %>%
summarize_at(vars(contains("delay")), median, na.rm = TRUE)
```

]

---
class: yourturn

# `group_by_*()` .red[and its helpers]
# Last Challenge!

### Challenge

Fill in the blanks and select the right .red[`filter_*()`] to filter for those flights where .red[either] departure .red[delay] (`dep_delay`) .red[or] arrival .red[delay] (`arr_delay`) exceeded the 99th percentile (hint: `quantile(x, .99)` provides the 99th percentile for variable `x`).

```{r, eval=FALSE}
flights %>%
filter_xxx(vars(contains("_____")), any_vars(___ > quantile(___, .99, na.rm = TRUE)))
```

---
class: yourturn

# Last Challenge!

### Challenge

Fill in the blanks and select the right .red[`filter_*()`] to filter for those flights where .red[either] departure .red[delay] (`dep_delay`) .red[or] arrival .red[delay] (`arr_delay`) exceeded the 99th percentile (hint: `quantile(x, .99)` provides the 99th percentile for variable `x`).

```{r}
flights %>%
filter_at(vars(contains("delay")), any_vars(. > quantile(., .99, na.rm = TRUE)))
```

---

# Key things to remember

.pull-left-60[

* dplyr scoped variants:
- .bold[`*_all()`]: execute function(s) on all variables or...
- .bold[`*_if()`]: on variables that meet a certain condition or...
- .bold[`*_at()`]: for pre-specified variables

* argument functions within scoped variants:
- .bold[`vars()`]: specify the variables to be executed on
- .bold[`funs()`]: specify the functions to be executed

* helper functions for `filter_*()`
- .bold[`all_vars()`]: filter for rows where all variables meet the specified condition
- .bold[`any_vars()`]: filter for rows where at least one variable meets the specified condition

]

.pull-right-40[

<br><br>
```{r, echo=FALSE}
knitr::include_graphics("images/information-overload.jpg")
```

]

---

# Key things to remember

```{r, echo=FALSE, out.width="55%"}
knitr::include_graphics("images/cheatsheet-dplyr.png")
```

.center[.content-box-gray[.bold[`Help >> Cheatsheets >> Data Transformation with dplyr`]]]

---

# Questions?

<br>

```{r questions-dplyr, echo=FALSE, out.height="450", out.width="450"}
knitr::include_graphics("images/questions.png")
Loading

0 comments on commit 57647e6

Please sign in to comment.