-
Notifications
You must be signed in to change notification settings - Fork 139
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Technique example: data loader, R to JSON (#1417)
* loader example R to JSON * update r-to-json readme * format Plot code * Apply suggestions from code review Co-authored-by: Mike Bostock <mbostock@gmail.com> * updates package install line, and adds to examples README --------- Co-authored-by: Allison Horst <allison@Allisons-MacBook-Air.local> Co-authored-by: Mike Bostock <mbostock@gmail.com>
- Loading branch information
1 parent
1a1d550
commit ee78892
Showing
8 changed files
with
193 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
.DS_Store | ||
/dist/ | ||
node_modules/ | ||
yarn-error.log | ||
.RData | ||
.Rhistory |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
[Framework examples →](../) | ||
|
||
# R data loader to generate JSON | ||
|
||
View live: <https://observablehq.observablehq.cloud/framework-example-loader-r-to-json/> | ||
|
||
This Observable Framework example demonstrates how to write a data loader in R that accesses text from Tolstoy’s _War and Peace_ from Project Gutenberg, does some basic text mining, then generates JSON with top word counts by book and chapter. | ||
|
||
The data loader lives in [`src/data/tolstoy.json.R`](./src/data/tolstoy.json.R). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
export default { | ||
root: "src" | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"type": "module", | ||
"private": true, | ||
"scripts": { | ||
"clean": "rimraf src/.observablehq/cache", | ||
"build": "rimraf dist && observable build", | ||
"dev": "observable preview", | ||
"deploy": "observable deploy", | ||
"observable": "observable" | ||
}, | ||
"dependencies": { | ||
"@observablehq/framework": "^1.7.0" | ||
}, | ||
"devDependencies": { | ||
"rimraf": "^5.0.5" | ||
}, | ||
"engines": { | ||
"node": ">=18" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
/.observablehq/cache/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Attach libraries (must be installed) | ||
library(tidytext) | ||
library(readr) | ||
library(dplyr) | ||
library(stringr) | ||
library(jsonlite) | ||
|
||
# Access and wrangle data | ||
tolstoy <- read_csv("https://www.gutenberg.org/cache/epub/2600/pg2600.txt") |> | ||
rename(text = 1) | ||
booktext <- tolstoy[-(1:400), ] | ||
booktext <- booktext[-(51477:51770), ] | ||
|
||
tidy_tolstoy <- booktext |> | ||
mutate(book = cumsum(str_detect(text, "BOOK | EPILOGUE"))) |> | ||
mutate(book = case_when( | ||
book < 16 ~ paste("Book", book), | ||
book == 16 ~ "Epilogue 1", | ||
book == 17 ~ "Epilogue 2" | ||
)) |> | ||
group_by(book) |> | ||
mutate(chapter = cumsum(str_detect(text, regex("CHAPTER", ignore_case = FALSE)))) |> | ||
ungroup() |> | ||
filter(!str_detect(text, regex("BOOK", ignore_case = FALSE))) |> | ||
filter(!str_detect(text, regex("CHAPTER", ignore_case = FALSE))) |> | ||
unnest_tokens(word, text) |> | ||
anti_join(stop_words) | ||
|
||
# Find top 10 words (by count) for each chapter | ||
tolstoy_word_counts <- tidy_tolstoy |> | ||
group_by(book, chapter) |> | ||
count(word) |> | ||
top_n(10, n) |> | ||
arrange(desc(n), .by_group = TRUE) | ||
|
||
# Create JSON and write to standard output | ||
cat(toJSON(tolstoy_word_counts, pretty = TRUE)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# R data loader to generate JSON | ||
|
||
Here’s an R data loader that accesses Tolstoy’s _War and Peace_ from the [Gutenberg Project](https://www.gutenberg.org/ebooks/2600), finds the most common words by book and chapter, then outputs JSON. | ||
|
||
```r | ||
# Attach libraries (must be installed) | ||
library(tidytext) | ||
library(readr) | ||
library(dplyr) | ||
library(stringr) | ||
library(jsonlite) | ||
|
||
# Access and wrangle data | ||
tolstoy <- read_csv("https://www.gutenberg.org/cache/epub/2600/pg2600.txt") |> | ||
rename(text = 1) | ||
booktext <- tolstoy[-(1:400), ] | ||
booktext <- booktext[-(51477:51770), ] | ||
|
||
tidy_tolstoy <- booktext |> | ||
mutate(book = cumsum(str_detect(text, "BOOK | EPILOGUE"))) |> | ||
mutate(book = case_when( | ||
book < 16 ~ paste("Book", book), | ||
book == 16 ~ "Epilogue 1", | ||
book == 17 ~ "Epilogue 2" | ||
)) |> | ||
group_by(book) |> | ||
mutate(chapter = cumsum(str_detect(text, regex("CHAPTER", ignore_case = FALSE)))) |> | ||
ungroup() |> | ||
filter(!str_detect(text, regex("BOOK", ignore_case = FALSE))) |> | ||
filter(!str_detect(text, regex("CHAPTER", ignore_case = FALSE))) |> | ||
unnest_tokens(word, text) |> | ||
anti_join(stop_words) | ||
|
||
# Find top 10 words (by count) for each chapter | ||
tolstoy_word_counts <- tidy_tolstoy |> | ||
group_by(book, chapter) |> | ||
count(word) |> | ||
top_n(10, n) |> | ||
arrange(desc(n), .by_group = TRUE) | ||
|
||
# Create JSON and write to standard output | ||
cat(toJSON(tolstoy_word_counts, pretty = TRUE)) | ||
``` | ||
|
||
<div class="note"> | ||
|
||
To run this data loader, you’ll need R installed, along with all required packages, _e.g._ by running `install.packages(c("tidytext", "readr", "dplyr", "stringr", "jsonlite"))`. | ||
|
||
</div> | ||
|
||
The above data loader lives in `data/tolstoy.json.R`, so we can load the data as `data/tolstoy.json` using `FileAttachment`. | ||
|
||
```js echo | ||
const text = FileAttachment("data/tolstoy.json").json(); | ||
``` | ||
|
||
We can display this dataset with Inputs.table: | ||
|
||
```js echo | ||
Inputs.table(text) | ||
``` | ||
|
||
We can make a quick chart of top words in Book 1, with color mapped to book chapter, using [Observable Plot](https://observablehq.com/plot/): | ||
|
||
```js echo | ||
Plot.plot({ | ||
marks: [ | ||
Plot.barY(text, { | ||
filter: (d) => d.book === "Book 1", | ||
x: "word", | ||
y: "n", | ||
fill: "chapter", | ||
tip: true, | ||
sort: {x: "y", limit: 5, reverse: true} | ||
}) | ||
] | ||
}) | ||
``` |