This repository has been archived by the owner on Aug 17, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtesterror.Rmd
583 lines (473 loc) · 20.6 KB
/
testerror.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
# Testing and Error Handling {#testerror}
```{r setup, include=FALSE}
source("etc/common.R")
```
Novices write code and pray that it works.
Experienced programmers know that prayer alone is not enough,
and take steps to protect what little sanity they have left.
This chapter looks at the tools R gives us for doing this.
## Learning Objectives
- Name and describe the three levels of error handling in R.
- Handle an otherwise-fatal error in a function call in R.
- Create unit tests in R.
- Create unit tests for an R package.
## How does R handle errors?
Python programs handle errors
by [raising](glossary.html#raise-exception) and [catching](glossary.html#catch-exception) [exceptions](glossary.html#exception):
```{python py-exception}
values = [-1, 0, 1]
for i in range(4):
try:
reciprocal = 1/values[i]
print("index {} value {} reciprocal {}".format(i, values[i], reciprocal))
except ZeroDivisionError:
print("index {} value {} ZeroDivisionError".format(i, values[i]))
except Exception as e:
print("index{} some other Exception: {}".format(i, e))
```
R draws on a different tradition.
We say that the operation [signals](glossary.html#signal-condition) a [condition](glossary.html#condition)
that some other piece of code then [handles](glossary.html#handle-condition).
These things are all simpler to do using the rlang library,
so we begin by loading that:
```{r load-rlang, include=FALSE}
library(rlang)
```
In order of increasing severity,
the three built-in kinds of conditions are [messages](glossary.html#message),
[warnings](glossary.html#warning),
and [errors](glossary.html#error).
(There are also interrupts,
which are generated by the user pressing Ctrl-C to stop an operation,
but we will ignore those for the sake of brevity.)
We can signal conditions of these kinds using the functions `message`, `warning`, and `stop`,
each of which takes an error message as a parameter:
```{r message-warning-error, error=TRUE}
message("This is a message.")
warning("This is a warning.\n")
stop("This is an error.")
```
Note that we have to supply our own line ending for warnings
but not for the other two cases.
Note also that there are very few situations in which a warning is appropriate:
if something has truly gone wrong then we should stop,
but otherwise we should not distract users from more pressing concerns.
The bluntest of instruments for handling errors is to ignore them.
If a statement is wrapped in the function `try`
then errors that occur in it are still reported,
but execution continues.
Compare this:
```{r attempt-without-try, error=TRUE}
attemptWithoutTry <- function(left, right){
temp <- left + right
"result" # returned
}
result <- attemptWithoutTry(1, "two")
cat("result is", result)
```
with this:
```{r attempt-using-try}
attemptUsingTry <- function(left, right){
temp <- try(left + right)
"value returned" # returned
}
result <- attemptUsingTry(1, "two")
cat("result is", result)
```
We can suppress error messages from `try` by setting `silent` to `TRUE`:
```{r attempt-quietly}
attemptUsingTryQuietly <- function(left, right){
temp <- try(left + right, silent = TRUE)
"result" # returned
}
result <- attemptUsingTryQuietly(1, "two")
cat("result is", result)
```
Do not do this,
lest you one day find yourself lost in a silent hellscape.
Should you more sensibly wish to handle conditions rather than ignore them,
you may invoke `tryCatch`.
We begin by raising an error explicitly:
```{r r-try-catch}
tryCatch(
stop("our message"),
error = function(cnd) print(glue("error object is {cnd}"))
)
```
We can now run a function that would otherwise blow up:
```{r r-try-catch-triggered}
tryCatch(
attemptWithoutTry(1, "two"),
error = function(cnd) print(glue("error object is {cnd}"))
)
```
We can also handle non-fatal errors using `withCallingHandlers`,
and define new types of conditions,
but this is done less often in day-to-day R code than in Python:
see *[Advanced R][advanced-r]* or [this tutorial][said-handling-r-errors] for details.
## What should I know about testing in general?
In keeping with common programming practice,
we have left testing until the last possible moment.
The standard testing library for R is [testthat][testthat],
which shares many features with Python's [unittest][unittest]
and other [unit testing](glossary.html#unit-test) libraries:
1. Each test consists of a single function that tests a single property or behavior of the system.
2. Tests are collected into files with prescribed names that can be found by a [test runner](glossary.html#test-runner).
3. Shared [setup](glossary.html#testing-setup) and [teardown](glossary.html#testing-teardown) steps are put in functions of their own.
Let's load it and write our first test:
```{r introduce-testthat}
library(testthat)
test_that("Zero equals itself", {expect_equal(0, 0)})
```
As is conventional with unit testing libraries,
no news is good news:
if a test passes,
it doesn't produce output because it doesn't need our attention.
Let's try something that ought to fail:
```{r force-error, error=TRUE}
test_that("Zero equals one", {expect_equal(0, 1)})
```
Good:
we can draw some comfort from the fact that Those Beyond have not yet changed the fundamental rules of arithmetic.
But what are the curly braces around `expect_equal` for?
The answer is that they create a [code block](glossary.html#code-block) for `test_that` to run.
We can run `expect_equal` on its own:
```{r expect-equal-alone, error=TRUE}
expect_equal(0, 1)
```
but that doesn't produce a summary of how many tests passed or failed.
Passing a block of code to `test_that` also allows us to check several things in one test:
```{r pass-code-block, error=TRUE}
test_that("Testing two things", {
expect_equal(0, 0)
expect_equal(0, 1)
})
```
A block of code is *not* the same thing as an [anonymous function](glossary.html#anonymous-function),
which is why running this block of code does nothing—the "test" defines a function
but doesn't actually call it:
```{r anonymous-function}
test_that("Using an anonymous function", function() {
print("In our anonymous function")
expect_equal(0, 1)
})
```
## How should I organize my tests?
Running blocks of tests by hand is a bad practice.
Instead,
we should put related tests in files
and then put those files in a directory called `tests/testthat`.
We can then run some or all of those tests with a single command.
To start,
let's create `tests/testthat/test_example.R`:
```{r test-example, code=readLines("tests/testthat/test_example.R"), eval=FALSE}
```
The first line loads the testthat package,
which gives us our tools.
The call to `context` on the second line gives this set of tests a name for reporting purposes.
After that,
we add as many calls to `test_that` as we want,
each with a name and a block of code.
We can now run this file from within RStudio:
```{r run-test-dir}
test_dir("tests/testthat")
```
Care is needed when interpreting these results.
There are four `test_that` calls,
but eight actual checks,
and the number of successes and failures is counted by recording the latter,
not the former.
What then is the purpose of `test_that`?
Why not just use `expect_equal` and its kin,
such as `expect_true`, `expect_false`, `expect_length`, and so on?
The answer is that it allows us to do one operation and then check several things afterward.
Let's create another file called `tests/testthat/test_tibble.R`:
```{r test-tibble, code=readLines("tests/testthat/test_tibble.R")}
```
(We don't actually have to call our test files `test_something.R`,
but `test_dir` and the rest of R's testing infrastructure expect us to.
Similarly,
we don't have to put them in a `tests` directory,
but gibbering incoherence will ensue if we do not.)
Now let's run all of our tests:
```{r run-more-tests}
test_dir("tests/testthat")
```
That's rather a lot of output.
Happily,
we can provide a `filter` argument to `test_dir`:
```{r test-with-filter-wrong, error=TRUE}
test_dir("tests/testthat", filter = "test_tibble.R")
```
Ah.
It turns out that `filter` is applied to filenames *after* the leading `test_` and the trailing `.R` have been removed.
Let's try again:
```{r test-with-filter}
test_dir("tests/testthat", filter = "tibble")
```
That's better,
and it illustrates our earlier point about the importance of following conventions.
## How can I write a few simple tests?
To give ourselves something to test,
let's create a file called `scripts/find_empty_01.R`
containing a single function `find_empty_rows` to identy all the empty rows in a CSV file.
Our first implementation is:
```{r find-empty-01, code=readLines("scripts/find_empty_01.R")}
```
This is complex enough to merit line-by-line exegesis:
1. Define the function with one argument `source`, whence we shall read.
2. Read tabular data from that source and assign the resulting tibble to `data`.
3. Begin a pipeline that will assign something to the variable `empty`.
1. Use `pmap` to map a function across each row of the tibble.
Since we don't know how many columns are in each row,
we use `...` to take any number of arguments.
2. Convert the variable number of arguments to a list.
3. Check to see if all of those arguments are either `NA` or the empty string.
4. Close the mapped function's definition.
4. Start another pipeline.
Its result isn't assigned to a variable,
so whatever it produces will be the value returned by `find_empty_rows`.
1. Construct a tibble that contains only the row numbers of the original table in a column called `id`.
2. Filter those row numbers to keep only those corresponding to rows that were entirely empty.
The `as.logical` call inside `filter` is needed because the value returned by `pmap`
(which we stored in `empty`)
is a list, not a logical vector.
3. Use `pull` to get the one column we want from the filtered tibble as a vector.
There is a lot going on here,
particularly if you are new to R (as I am at the time of writing)
and needed help to figure out that `pmap` is the function this problem wants.
But now that we have it,
we can do this:
```{r show-how-source-works, eval=FALSE}
source("scripts/find_empty_01.R")
find_empty_rows("a,b\n1,2\n,\n5,6")
```
The `source` function reads R code from the given source.
Using this inside an R Markdown file is usually a bad idea,
since the generated HTML or PDF won't show readers what code we loaded and ran.
On the other hand,
if we are creating command-line tools for use on clusters or in other batch processing modes,
and are careful to display the code in a nearby block,
the stain on our soul is excusable.
The more interesting part of this example is the call to `find_empty_rows`.
Instead of giving it the name of a file,
we have given it the text of the CSV we want parsed.
This string is passed to `read_csv`,
which (according to documentation that only took us 15 minutes to realize we had already seen)
interprets its first argument as a filename *or*
as the actual text to be parsed if it contains a newline character.
This allows us to write put the [test fixture](glossary.html#test-fixture)
right there in the code as a literal string,
which experience shows is to understand and maintain
than having test data in separate files.
Our function seems to work,
but we can make it more pipelinesque:
```{r find-empty-02, code=readLines("scripts/find_empty_02.R")}
```
Going line by line once again:
1. Define a function with one argument called `source`, from which we shall once again read.
2. Read from that source to fill the pipeline.
3. Map our test for emptiness across each row, returning a logical vector as a result.
(`pmap_lgl` is a derivative of `pmap` that always casts its result to logical.
Similar functions like `pmap_dbl` return vectors of other types;
and many other tidyverse functions also have strongly-typed variants.)
4. Turn that logical vector into a single-column tibble,
giving that column the name "empty".
We explain the use of `.` below.
5. Add a second column with row numbers.
6. Discard rows that aren't empty.
7. Return a vector of the remaining row IDs.
> **Wat?**
>
> Buried in the middle of the pipe shown above is the expression:
>
> `tibble(empty = .)`
>
> Quoting from *[Advanced R][advanced-r]*,
> "The function arguments look a little quirky
> but allow you to refer to `.` for one argument functions,
> `.x` and `.y.` for two argument functions,
> and `..1`, `..2`, `..3`, etc, for functions with an arbitrary number of arguments."
> In other words, `.` in tidyverse functions usually means "whatever is on the left side of the `%>%` operator",
> i.e., whatever would normally be passed as the function's first argument.
> Without this,
> we have no easy way to give the sole column of our newly-constructed tibble a name.
Here's our first batch of tests:
```{r show-test-find-empty-a, code=readLines("tests/testthat/test_find_empty_a.R"), eval=FALSE}
```
And here's what happens when we run this file with `test_dir`:
```{r test-find-empty-a}
test_dir("tests/testthat", "find_empty_a")
```
This is perplexing:
we expected that if there were no empty rows,
our function would return `NULL`.
Let's look more closely:
```{r load-find-empty-rows, echo=FALSE}
source("scripts/find_empty_02.R")
```
```{r call-find-empty-rows-broken}
find_empty_rows("a\n1")
```
Ah:
our function is returning an integer vector of zero length rather than `NULL`.
Let's have a closer look at the properties of this strange beast:
```{r properties-of-empty-vector}
print(glue("integer(0) equal to NULL? {is.null(integer(0))}"))
print(glue("any(logical(0))? {any(logical(0))}"))
print(glue("all(logical(0))? {all(logical(0))}"))
```
All right.
If we compare `c(1L, 2L)` to `NULL`, we expect `c(FALSE, FALSE)`,
so it's reasonable to get a zero-length logical vector as a result
when we compare `NULL` to an integer vector with no elements.
The fact that `any` of an empty logical vector is `FALSE` isn't really surprising either—none of the elements are `TRUE`,
so it would be hard to say that any of them are.
`all` of an empty vector being `TRUE` is unexpected, though.
The reasoning is apparently that none of the (nonexistent) elements are `FALSE`,
but honestly,
at this point we are veering dangerously close to [JavaScript Logic][javascript-wat],
so we will accept this result for what it is and move on.
So what *should* our function return when there aren't any empty rows: `NULL` or `integer(0)`?
After a bit of thought,
we decide on the latter,
which means it's the tests that we need to rewrite,
not the code:
```{r show-test-find-empty-b, code=readLines("tests/testthat/test_find_empty_b.R"), eval=FALSE}
```
And here's what happens when we run this file with `test_dir`:
```{r run-modified-tests}
test_dir("tests/testthat", "find_empty_b")
```
## How can I check data transformation?
People normally write unit tests for the code in packages,
not to check the steps taken to clean up particular datasets,
but the latter are just as useful as the former.
To illustrate,
we have been given several more CSV files to clean up.
The first,
`at_health_facilities.csv`,
shows the percentage of births at health facilities by country, year, and mother's age.
It comes from the same UNICEF website as our previous data,
but has a different set of problems.
Here are its first few lines:
```
,,GLOBAL DATABASES,,,,,,,,,,,,,
,,[data.unicef.org],,,,,,,,,,,,,
,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
Indicator:,Delivered in health facilities,,,,,,,,,,,,,,
Unit:,Percentage,,,,,,,,,,,,,,
,,,,Mother's age,,,,,,,,,,,
iso3,Country/areas,year,Total ,age 15-17,age 18-19,age less than 20,age more than 20,age 20-34,age 35-49,Source,Source year,,,,
AFG,Afghanistan,2010, 33 , 25 , 29 , 28 , 31 , 31 , 31 ,MICS,2010,,,,
ALB,Albania,2005, 98 , 100 , 96 , 97 , 98 , 99 , 92 ,MICS,2005,,,,
ALB,Albania,2008, 98 , 94 , 98 , 97 , 98 , 98 , 99 ,DHS,2008,,,,
...
```
and its last:
```
ZWE,Zimbabwe,2005, 66 , 64 , 64 , 64 , 67 , 69 , 53 ,DHS,2005,,,,
ZWE,Zimbabwe,2009, 58 , 49 , 59 , 55 , 59 , 60 , 52 ,MICS,2009,,,,
ZWE,Zimbabwe,2010, 64 , 56 , 66 , 62 , 64 , 65 , 60 ,DHS,2010,,,,
ZWE,Zimbabwe,2014, 80 , 82 , 82 , 82 , 79 , 80 , 77 ,MICS,2014,,,,
,,,,,,,,,,,,,,,
Definition:,Percentage of births delivered in a health facility.,,,,,,,,,,,,,,
,"The indicator refers to women who had a live birth in a recent time period, generally two years for MICS and five years for DHS.",,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
Note:,"Database include reanalyzed data from DHS and MICS, using a reference period of two years before the survey.",,,,,,,,,,,,,,
,Includes surveys which microdata were available as of April 2016. ,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
Source:,"UNICEF global databases 2016 based on DHS, MICS .",,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
Contact us:,data@unicef.org,,,,,,,,,,,,,,
```
There are two other files in this collection called `c_sections.csv` and `skilled_attendant_at_birth.csv`,
which are the number of Caesarean sections
and the number of births where a midwife or other trained practitioner was present.
All three datasets have been exported from the same Excel spreadsheet;
rather than writing a separate script for each,
we should create a tool that will handle them all.
At first glance,
the problems we need to solve to do this are:
1. Each file may have a different number of header rows
(by inspection, two of the files have 7 and one has 8),
so we should infer this number from the file.
2. Each file may contain a different number of records,
so our tool should select rows by content rather than by absolute row number.
3. The files appear to have the same column names
(for which we give thanks),
but we should check this in case someone tries to use our function
with a dataset that doesn't.
These three requirements will make our program significantly more complicated,
so we should tackle each with its own testable function.
### How can I reorganize code to make it more testable?
The data we care about comes after the row with `iso3`, `Country/areas`, and other column headers,
so the simplest way to figure out how many rows to skip is to read the data,
look for this row,
and discard everything above it.
The simplest way to do *that* is to read the file once to find the number of header rows,
then read it again,
discarding that number of rows.
It's inefficient,
but for a dataset this size,
simplicity beats performance.
Here's our first try:
```{r reading-health-data}
read_csv("data/at_health_facilities.csv") %>%
select(check = 1) %>%
mutate(id = row_number()) %>%
filter(check == "iso3") %>%
select(id) %>%
first()
```
Ignoring the messages about missing column names,
this tells us that `iso3` appears in row 7 of our data,
which is *almost* true:
it's actually in row 8,
because `read_csv` has interpreted the first row of the raw CSV data as a header.
On the bright side,
that means we can immediately use this value as the `skip` parameter to the next `read_csv` call.
How do we test this code?
Easy:
we turn it into a function,
tell that function to stop if it can't find `iso3` in the data,
and write some unit tests.
The function is:
```{r determine-skip-rows, code=readLines("scripts/determine_skip_rows_a.R")}
```
We can then call `usethis::use_testthat()` to set up some testing infrastructure,
including the directory `tests/testthat`
and a script called `tests/testthat.R`
that will run all our tests when we want to check the integrity of our project.
Once we have done that
we can put these five tests in `tests/testthat/test_determine_skip_rows.R`:
```{r show-test-determine-skip-rows-a, code=readLines("tests/testthat/test_determine_skip_rows_a.R"), eval=FALSE}
```
and run it:
```{r run-skip-row-tests-a, error=TRUE}
test_dir("tests/testthat", "determine_skip_rows_a")
```
That's right: all five fail.
The first problem is that we have written `is03` (with a digit `0` instead of a letter `o`) in the first two tests.
If we fix that and re-run the tests, they pass;
what about the other three?
1. When there are no rows to skip, our function is returning `integer(0)` instead of 0
because the row with `iso3` is being used as headers.
2. When `iso3` isn't found at all, the function is returning `integer(0)` rather than stopping.
Here is a more robust version of the function:
```{r show-determine-skip-rows-b, code=readLines("scripts/determine_skip_rows_b.R"), eval=FALSE}
```
And here are the results:
```{r run-skip-row-tests-b}
test_dir("tests/testthat", "determine_skip_rows_b")
```
Our tests still aren't checking anything statistical,
but without trustworthy data,
our statistics will be meaningless.
Tests like these allow our future selves to focus on making new mistakes instead of repeating old ones.
## Key Points
```{r keypoints, child="keypoints/testerror.md"}
```
```{r links, child="etc/links.md"}
```