testerror.Rmd

# Testing and Error Handling {#testerror}

```{r setup, include=FALSE}
source("etc/common.R")
```

Novices write code and pray that it works.
Experienced programmers know that prayer alone is not enough,
and take steps to protect what little sanity they have left.
This chapter looks at the tools R gives us for doing this.

## Learning Objectives

- Name and describe the three levels of error handling in R.
- Handle an otherwise-fatal error in a function call in R.
- Create unit tests in R.
- Create unit tests for an R package.

## How does R handle errors?

Python programs handle errors
by [raising](glossary.html#raise-exception) and [catching](glossary.html#catch-exception) [exceptions](glossary.html#exception):

```{python py-exception}
values = [-1, 0, 1]
for i in range(4):
    try:
        reciprocal = 1/values[i]
        print("index {} value {} reciprocal {}".format(i, values[i], reciprocal))
    except ZeroDivisionError:
        print("index {} value {} ZeroDivisionError".format(i, values[i]))
    except Exception as e:
        print("index{} some other Exception: {}".format(i, e))
```

R draws on a different tradition.
We say that the operation [signals](glossary.html#signal-condition) a [condition](glossary.html#condition)
that some other piece of code then [handles](glossary.html#handle-condition).
These things are all simpler to do using the rlang library,
so we begin by loading that:

```{r load-rlang, include=FALSE}
library(rlang)
```

In order of increasing severity,
the three built-in kinds of conditions are [messages](glossary.html#message),
[warnings](glossary.html#warning),
and [errors](glossary.html#error).
(There are also interrupts,
which are generated by the user pressing Ctrl-C to stop an operation,
but we will ignore those for the sake of brevity.)
We can signal conditions of these kinds using the functions `message`, `warning`, and `stop`,
each of which takes an error message as a parameter:

```{r message-warning-error, error=TRUE}
message("This is a message.")
warning("This is a warning.\n")
stop("This is an error.")
```

Note that we have to supply our own line ending for warnings
but not for the other two cases.
Note also that there are very few situations in which a warning is appropriate:
if something has truly gone wrong then we should stop,
but otherwise we should not distract users from more pressing concerns.

The bluntest of instruments for handling errors is to ignore them.
If a statement is wrapped in the function `try`
then errors that occur in it are still reported,
but execution continues.
Compare this:

```{r attempt-without-try, error=TRUE}
attemptWithoutTry <- function(left, right){
  temp <- left + right
  "result" # returned
}
result <- attemptWithoutTry(1, "two")
cat("result is", result)
```

with this:

```{r attempt-using-try}
attemptUsingTry <- function(left, right){
  temp <- try(left + right)
  "value returned" # returned
}
result <- attemptUsingTry(1, "two")
cat("result is", result)
```

We can suppress error messages from `try` by setting `silent` to `TRUE`:

```{r attempt-quietly}
attemptUsingTryQuietly <- function(left, right){
  temp <- try(left + right, silent = TRUE)
  "result" # returned
}
result <- attemptUsingTryQuietly(1, "two")
cat("result is", result)
```

Do not do this,
lest you one day find yourself lost in a silent hellscape.

Should you more sensibly wish to handle conditions rather than ignore them,
you may invoke `tryCatch`.
We begin by raising an error explicitly:

```{r r-try-catch}
tryCatch(
  stop("our message"),
  error = function(cnd) print(glue("error object is {cnd}"))
)
```
We can now run a function that would otherwise blow up:

```{r r-try-catch-triggered}
tryCatch(
  attemptWithoutTry(1, "two"),
  error = function(cnd) print(glue("error object is {cnd}"))
)
```

We can also handle non-fatal errors using `withCallingHandlers`,
and define new types of conditions,
but this is done less often in day-to-day R code than in Python:
see *[Advanced R][advanced-r]* or [this tutorial][said-handling-r-errors] for details.

## What should I know about testing in general?

In keeping with common programming practice,
we have left testing until the last possible moment.
The standard testing library for R is [testthat][testthat],
which shares many features with Python's [unittest][unittest]
and other [unit testing](glossary.html#unit-test) libraries:

1.  Each test consists of a single function that tests a single property or behavior of the system.
2.  Tests are collected into files with prescribed names that can be found by a [test runner](glossary.html#test-runner).
3.  Shared [setup](glossary.html#testing-setup) and [teardown](glossary.html#testing-teardown) steps are put in functions of their own.

Let's load it and write our first test:

```{r introduce-testthat}
library(testthat)

test_that("Zero equals itself", {expect_equal(0, 0)})
```

As is conventional with unit testing libraries,
no news is good news:
if a test passes,
it doesn't produce output because it doesn't need our attention.
Let's try something that ought to fail:

```{r force-error, error=TRUE}
test_that("Zero equals one", {expect_equal(0, 1)})
```

Good:
we can draw some comfort from the fact that Those Beyond have not yet changed the fundamental rules of arithmetic.
But what are the curly braces around `expect_equal` for?
The answer is that they create a [code block](glossary.html#code-block) for `test_that` to run.
We can run `expect_equal` on its own:

```{r expect-equal-alone, error=TRUE}
expect_equal(0, 1)
```

but that doesn't produce a summary of how many tests passed or failed.
Passing a block of code to `test_that` also allows us to check several things in one test:

```{r pass-code-block, error=TRUE}
test_that("Testing two things", {
  expect_equal(0, 0)
  expect_equal(0, 1)
})
```

A block of code is *not* the same thing as an [anonymous function](glossary.html#anonymous-function),
which is why running this block of code does nothing—the "test" defines a function
but doesn't actually call it:

```{r anonymous-function}
test_that("Using an anonymous function", function() {
  print("In our anonymous function")
  expect_equal(0, 1)
})
```

## How should I organize my tests?

Running blocks of tests by hand is a bad practice.
Instead,
we should put related tests in files
and then put those files in a directory called `tests/testthat`.
We can then run some or all of those tests with a single command.

To start,
let's create `tests/testthat/test_example.R`:

```{r test-example, code=readLines("tests/testthat/test_example.R"), eval=FALSE}
```

The first line loads the testthat package,
which gives us our tools.
The call to `context` on the second line gives this set of tests a name for reporting purposes.
After that,
we add as many calls to `test_that` as we want,
each with a name and a block of code.
We can now run this file from within RStudio:

```{r run-test-dir}
test_dir("tests/testthat")
```

Care is needed when interpreting these results.
There are four `test_that` calls,
but eight actual checks,
and the number of successes and failures is counted by recording the latter,
not the former.

What then is the purpose of `test_that`?
Why not just use `expect_equal` and its kin,
such as `expect_true`, `expect_false`, `expect_length`, and so on?
The answer is that it allows us to do one operation and then check several things afterward.
Let's create another file called `tests/testthat/test_tibble.R`:

```{r test-tibble, code=readLines("tests/testthat/test_tibble.R")}
```

(We don't actually have to call our test files `test_something.R`,
but `test_dir` and the rest of R's testing infrastructure expect us to.
Similarly,
we don't have to put them in a `tests` directory,
but gibbering incoherence will ensue if we do not.)
Now let's run all of our tests:

```{r run-more-tests}
test_dir("tests/testthat")
```

That's rather a lot of output.
Happily,
we can provide a `filter` argument to `test_dir`:

```{r test-with-filter-wrong, error=TRUE}
test_dir("tests/testthat", filter = "test_tibble.R")
```

Ah.
It turns out that `filter` is applied to filenames *after* the leading `test_` and the trailing `.R` have been removed.
Let's try again:

```{r test-with-filter}
test_dir("tests/testthat", filter = "tibble")
```

That's better,
and it illustrates our earlier point about the importance of following conventions.

## How can I write a few simple tests?

To give ourselves something to test,
let's create a file called `scripts/find_empty_01.R`
containing a single function `find_empty_rows` to identy all the empty rows in a CSV file.
Our first implementation is:

```{r find-empty-01, code=readLines("scripts/find_empty_01.R")}
```

This is complex enough to merit line-by-line exegesis:

1.  Define the function with one argument `source`, whence we shall read.
2.  Read tabular data from that source and assign the resulting tibble to `data`.
3.  Begin a pipeline that will assign something to the variable `empty`.
    1.  Use `pmap` to map a function across each row of the tibble.
        Since we don't know how many columns are in each row,
        we use `...` to take any number of arguments.
    2.  Convert the variable number of arguments to a list.
    3.  Check to see if all of those arguments are either `NA` or the empty string.
    4.  Close the mapped function's definition.
4.  Start another pipeline.
    Its result isn't assigned to a variable,
    so whatever it produces will be the value returned by `find_empty_rows`.
    1.  Construct a tibble that contains only the row numbers of the original table in a column called `id`.
    2.  Filter those row numbers to keep only those corresponding to rows that were entirely empty.
        The `as.logical` call inside `filter` is needed because the value returned by `pmap`
        (which we stored in `empty`)
        is a list, not a logical vector.
    3.  Use `pull` to get the one column we want from the filtered tibble as a vector.

There is a lot going on here,
particularly if you are new to R (as I am at the time of writing)
and needed help to figure out that `pmap` is the function this problem wants.
But now that we have it,
we can do this:

```{r show-how-source-works, eval=FALSE}
source("scripts/find_empty_01.R")
find_empty_rows("a,b\n1,2\n,\n5,6")
```

The `source` function reads R code from the given source.
Using this inside an R Markdown file is usually a bad idea,
since the generated HTML or PDF won't show readers what code we loaded and ran.
On the other hand,
if we are creating command-line tools for use on clusters or in other batch processing modes,
and are careful to display the code in a nearby block,
the stain on our soul is excusable.

The more interesting part of this example is the call to `find_empty_rows`.
Instead of giving it the name of a file,
we have given it the text of the CSV we want parsed.
This string is passed to `read_csv`,
which (according to documentation that only took us 15 minutes to realize we had already seen)
interprets its first argument as a filename *or*
as the actual text to be parsed if it contains a newline character.
This allows us to write put the [test fixture](glossary.html#test-fixture)
right there in the code as a literal string,
which experience shows is to understand and maintain
than having test data in separate files.

Our function seems to work,
but we can make it more pipelinesque:

```{r find-empty-02, code=readLines("scripts/find_empty_02.R")}
```

Going line by line once again:

1.  Define a function with one argument called `source`, from which we shall once again read.
2.  Read from that source to fill the pipeline.
3.  Map our test for emptiness across each row, returning a logical vector as a result.
    (`pmap_lgl` is a derivative of `pmap` that always casts its result to logical.
    Similar functions like `pmap_dbl` return vectors of other types;
    and many other tidyverse functions also have strongly-typed variants.)
4.  Turn that logical vector into a single-column tibble,
    giving that column the name "empty".
    We explain the use of `.` below.
5.  Add a second column with row numbers.
6.  Discard rows that aren't empty.
7.  Return a vector of the remaining row IDs.

> **Wat?**
>
> Buried in the middle of the pipe shown above is the expression:
>
> `tibble(empty = .)`
>
> Quoting from *[Advanced R][advanced-r]*,
> "The function arguments look a little quirky
> but allow you to refer to `.` for one argument functions,
> `.x` and `.y.` for two argument functions,
> and `..1`, `..2`, `..3`, etc, for functions with an arbitrary number of arguments."
> In other words, `.` in tidyverse functions usually means "whatever is on the left side of the `%>%` operator",
> i.e., whatever would normally be passed as the function's first argument.
> Without this,
> we have no easy way to give the sole column of our newly-constructed tibble a name.

Here's our first batch of tests:

```{r show-test-find-empty-a, code=readLines("tests/testthat/test_find_empty_a.R"), eval=FALSE}
```

And here's what happens when we run this file with `test_dir`:

```{r test-find-empty-a}
test_dir("tests/testthat", "find_empty_a")
```

This is perplexing:
we expected that if there were no empty rows,
our function would return `NULL`.
Let's look more closely:

```{r load-find-empty-rows, echo=FALSE}
source("scripts/find_empty_02.R")
```

```{r call-find-empty-rows-broken}
find_empty_rows("a\n1")
```

Ah:
our function is returning an integer vector of zero length rather than `NULL`.
Let's have a closer look at the properties of this strange beast:

```{r properties-of-empty-vector}
print(glue("integer(0) equal to NULL? {is.null(integer(0))}"))
print(glue("any(logical(0))? {any(logical(0))}"))
print(glue("all(logical(0))? {all(logical(0))}"))
```

All right.
If we compare `c(1L, 2L)` to `NULL`, we expect `c(FALSE, FALSE)`,
so it's reasonable to get a zero-length logical vector as a result
when we compare `NULL` to an integer vector with no elements.
The fact that `any` of an empty logical vector is `FALSE` isn't really surprising either—none of the elements are `TRUE`,
so it would be hard to say that any of them are.
`all` of an empty vector being `TRUE` is unexpected, though.
The reasoning is apparently that none of the (nonexistent) elements are `FALSE`,
but honestly,
at this point we are veering dangerously close to [JavaScript Logic][javascript-wat],
so we will accept this result for what it is and move on.

So what *should* our function return when there aren't any empty rows: `NULL` or `integer(0)`?
After a bit of thought,
we decide on the latter,
which means it's the tests that we need to rewrite,
not the code:

```{r show-test-find-empty-b, code=readLines("tests/testthat/test_find_empty_b.R"), eval=FALSE}
```

And here's what happens when we run this file with `test_dir`:

```{r run-modified-tests}
test_dir("tests/testthat", "find_empty_b")
```

## How can I check data transformation?

People normally write unit tests for the code in packages,
not to check the steps taken to clean up particular datasets,
but the latter are just as useful as the former.
To illustrate,
we have been given several more CSV files to clean up.
The first,
`at_health_facilities.csv`,
shows the percentage of births at health facilities by country, year, and mother's age.
It comes from the same UNICEF website as our previous data,
but has a different set of problems.
Here are its first few lines:

```
,,GLOBAL DATABASES,,,,,,,,,,,,,
,,[data.unicef.org],,,,,,,,,,,,,
,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
Indicator:,Delivered in health facilities,,,,,,,,,,,,,,
Unit:,Percentage,,,,,,,,,,,,,,
,,,,Mother's age,,,,,,,,,,,
iso3,Country/areas,year,Total ,age 15-17,age 18-19,age less than 20,age more than 20,age 20-34,age 35-49,Source,Source year,,,,
AFG,Afghanistan,2010, 	33 , 	25 , 	29 , 	28 , 	31 , 	31 , 	31 ,MICS,2010,,,,
ALB,Albania,2005, 	98 , 	100 , 	96 , 	97 , 	98 , 	99 , 	92 ,MICS,2005,,,,
ALB,Albania,2008, 	98 , 	94 , 	98 , 	97 , 	98 , 	98 , 	99 ,DHS,2008,,,,
...
```

and its last:

```
ZWE,Zimbabwe,2005, 	66 , 	64 , 	64 , 	64 , 	67 , 	69 , 	53 ,DHS,2005,,,,
ZWE,Zimbabwe,2009, 	58 , 	49 , 	59 , 	55 , 	59 , 	60 , 	52 ,MICS,2009,,,,
ZWE,Zimbabwe,2010, 	64 , 	56 , 	66 , 	62 , 	64 , 	65 , 	60 ,DHS,2010,,,,
ZWE,Zimbabwe,2014, 	80 , 	82 , 	82 , 	82 , 	79 , 	80 , 	77 ,MICS,2014,,,,
,,,,,,,,,,,,,,,
Definition:,Percentage of births delivered in a health facility.,,,,,,,,,,,,,,
,"The indicator refers to women who had a live birth in a recent time period, generally two years for MICS and five years for DHS.",,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
Note:,"Database include reanalyzed data from DHS and MICS, using a reference period of two years before the survey.",,,,,,,,,,,,,,
,Includes surveys which microdata were available as of April 2016. ,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
Source:,"UNICEF global databases 2016 based on DHS, MICS .",,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
Contact us:,data@unicef.org,,,,,,,,,,,,,,
```

There are two other files in this collection called `c_sections.csv` and `skilled_attendant_at_birth.csv`,
which are the number of Caesarean sections
and the number of births where a midwife or other trained practitioner was present.
All three datasets have been exported from the same Excel spreadsheet;
rather than writing a separate script for each,
we should create a tool that will handle them all.

At first glance,
the problems we need to solve to do this are:

1.  Each file may have a different number of header rows
    (by inspection, two of the files have 7 and one has 8),
    so we should infer this number from the file.
2.  Each file may contain a different number of records,
    so our tool should select rows by content rather than by absolute row number.
3.  The files appear to have the same column names
    (for which we give thanks),
    but we should check this in case someone tries to use our function
    with a dataset that doesn't.

These three requirements will make our program significantly more complicated,
so we should tackle each with its own testable function.

### How can I reorganize code to make it more testable?

The data we care about comes after the row with `iso3`, `Country/areas`, and other column headers,
so the simplest way to figure out how many rows to skip is to read the data,
look for this row,
and discard everything above it.
The simplest way to do *that* is to read the file once to find the number of header rows,
then read it again,
discarding that number of rows.
It's inefficient,
but for a dataset this size,
simplicity beats performance.

Here's our first try:

```{r reading-health-data}
read_csv("data/at_health_facilities.csv") %>%
  select(check = 1) %>%
  mutate(id = row_number()) %>%
  filter(check == "iso3") %>%
  select(id) %>%
  first()
```

Ignoring the messages about missing column names,
this tells us that `iso3` appears in row 7 of our data,
which is *almost* true:
it's actually in row 8,
because `read_csv` has interpreted the first row of the raw CSV data as a header.
On the bright side,
that means we can immediately use this value as the `skip` parameter to the next `read_csv` call.

How do we test this code?
Easy:
we turn it into a function,
tell that function to stop if it can't find `iso3` in the data,
and write some unit tests.
The function is:

```{r determine-skip-rows, code=readLines("scripts/determine_skip_rows_a.R")}
```

We can then call `usethis::use_testthat()` to set up some testing infrastructure,
including the directory `tests/testthat`
and a script called `tests/testthat.R`
that will run all our tests when we want to check the integrity of our project.
Once we have done that
we can put these five tests in `tests/testthat/test_determine_skip_rows.R`:

```{r show-test-determine-skip-rows-a, code=readLines("tests/testthat/test_determine_skip_rows_a.R"), eval=FALSE}
```

and run it:

```{r run-skip-row-tests-a, error=TRUE}
test_dir("tests/testthat", "determine_skip_rows_a")
```

That's right: all five fail.
The first problem is that we have written `is03` (with a digit `0` instead of a letter `o`) in the first two tests.
If we fix that and re-run the tests, they pass;
what about the other three?

1.  When there are no rows to skip, our function is returning `integer(0)` instead of 0
    because the row with `iso3` is being used as headers.
2.  When `iso3` isn't found at all, the function is returning `integer(0)` rather than stopping.

Here is a more robust version of the function:

```{r show-determine-skip-rows-b, code=readLines("scripts/determine_skip_rows_b.R"), eval=FALSE}
```

And here are the results:

```{r run-skip-row-tests-b}
test_dir("tests/testthat", "determine_skip_rows_b")
```

Our tests still aren't checking anything statistical,
but without trustworthy data,
our statistics will be meaningless.
Tests like these allow our future selves to focus on making new mistakes instead of repeating old ones.

## Key Points
```{r keypoints, child="keypoints/testerror.md"}
```

```{r links, child="etc/links.md"}
```