Technique example: data loader, R to JSON (#1417)

* loader example R to JSON * update r-to-json readme * format Plot code * Apply suggestions from code review Co-authored-by: Mike Bostock <mbostock@gmail.com> * updates package install line, and adds to examples README --------- Co-authored-by: Allison Horst <allison@Allisons-MacBook-Air.local> Co-authored-by: Mike Bostock <mbostock@gmail.com>
observablehq · Jun 3, 2024 · ee78892 · ee78892
1 parent 1a1d550
commit ee78892
Show file tree

Hide file tree

Showing 8 changed files with 193 additions and 38 deletions.
diff --git a/examples/README.md b/examples/README.md
@@ -33,44 +33,45 @@
 
 ### Charts
 
-* [`geotiff`](https://observablehq.observablehq.cloud/framework-example-geotiff/) - Parsing GeoTIFF with geotiff.js, then visualizing with Observable Plot
-* [`netcdf`](https://observablehq.observablehq.cloud/framework-example-netcdf/) - Parsing NetCDF with `netcdfjs`, then visualizing with Observable Plot
-* [`vega-dark`](https://observablehq.observablehq.cloud/framework-example-vega-dark/) - Responsive dark mode in Vega-Lite
-* [`vega-responsive`](https://observablehq.observablehq.cloud/framework-example-vega-responsive/) - Responsive width in Vega-Lite using ResizeObserver
+- [`geotiff`](https://observablehq.observablehq.cloud/framework-example-geotiff/) - Parsing GeoTIFF with geotiff.js, then visualizing with Observable Plot
+- [`netcdf`](https://observablehq.observablehq.cloud/framework-example-netcdf/) - Parsing NetCDF with `netcdfjs`, then visualizing with Observable Plot
+- [`vega-dark`](https://observablehq.observablehq.cloud/framework-example-vega-dark/) - Responsive dark mode in Vega-Lite
+- [`vega-responsive`](https://observablehq.observablehq.cloud/framework-example-vega-responsive/) - Responsive width in Vega-Lite using ResizeObserver
 
 ### Data loaders
 
-* [`loader-arrow`](https://observablehq.observablehq.cloud/framework-example-loader-arrow/) - Generating Apache Arrow IPC files
-* [`loader-databricks`](https://observablehq.observablehq.cloud/framework-example-loader-databricks/) - Loading data from Databricks
-* [`loader-duckdb`](https://observablehq.observablehq.cloud/framework-example-loader-duckdb/) - Processing data with DuckDB
-* [`loader-github`](https://observablehq.observablehq.cloud/framework-example-loader-github/) - Loading data from GitHub
-* [`loader-google-analytics`](https://observablehq.observablehq.cloud/framework-example-loader-google-analytics/) - Loading data from Google Analytics
-* [`loader-parquet`](https://observablehq.observablehq.cloud/framework-example-loader-parquet/) - Generating Apache Parquet files
-* [`loader-postgres`](https://observablehq.observablehq.cloud/framework-example-loader-postgres/) - Loading data from PostgreSQL
-* [`loader-snowflake`](https://observablehq.observablehq.cloud/framework-example-loader-snowflake/) - Loading data from Snowflake
-* [`netcdf-contours`](https://observablehq.observablehq.cloud/framework-example-netcdf-contours/) - Converting NetCDF to GeoJSON with `netcdfjs` and `d3-geo-voronoi`
+- [`loader-arrow`](https://observablehq.observablehq.cloud/framework-example-loader-arrow/) - Generating Apache Arrow IPC files
+- [`loader-databricks`](https://observablehq.observablehq.cloud/framework-example-loader-databricks/) - Loading data from Databricks
+- [`loader-duckdb`](https://observablehq.observablehq.cloud/framework-example-loader-duckdb/) - Processing data with DuckDB
+- [`loader-github`](https://observablehq.observablehq.cloud/framework-example-loader-github/) - Loading data from GitHub
+- [`loader-google-analytics`](https://observablehq.observablehq.cloud/framework-example-loader-google-analytics/) - Loading data from Google Analytics
+- [`loader-parquet`](https://observablehq.observablehq.cloud/framework-example-loader-parquet/) - Generating Apache Parquet files
+- [`loader-postgres`](https://observablehq.observablehq.cloud/framework-example-loader-postgres/) - Loading data from PostgreSQL
+- [`loader-r-to-json`](https://observablehq.observablehq.cloud/framework-example-loader-r-to-json/) - Loading data from PostgreSQL
+- [`loader-snowflake`](https://observablehq.observablehq.cloud/framework-example-loader-snowflake/) - Loading data from Snowflake
+- [`netcdf-contours`](https://observablehq.observablehq.cloud/framework-example-netcdf-contours/) - Converting NetCDF to GeoJSON with `netcdfjs` and `d3-geo-voronoi`
 
 ### Inputs
 
-* [`codemirror`](https://observablehq.observablehq.cloud/framework-example-codemirror/) - A text input powered by CodeMirror
-* [`custom-input-2d`](https://observablehq.observablehq.cloud/framework-example-custom-input-2d/) - A custom 2D input with bidirectional binding
-* [`input-select-file`](https://observablehq.observablehq.cloud/framework-example-input-select-file/) - Selecting a file from a drop-down menu
+- [`codemirror`](https://observablehq.observablehq.cloud/framework-example-codemirror/) - A text input powered by CodeMirror
+- [`custom-input-2d`](https://observablehq.observablehq.cloud/framework-example-custom-input-2d/) - A custom 2D input with bidirectional binding
+- [`input-select-file`](https://observablehq.observablehq.cloud/framework-example-input-select-file/) - Selecting a file from a drop-down menu
 
 ### Markdown
 
-* [`markdown-it-container`](https://observablehq.observablehq.cloud/framework-example-markdown-it-container/) - The markdown-it-container plugin
-* [`markdown-it-footnote`](https://observablehq.observablehq.cloud/framework-example-markdown-it-footnote/) - The markdown-it-footnote plugin
-* [`markdown-it-wikilinks`](https://observablehq.observablehq.cloud/framework-example-markdown-it-wikilinks/) - The markdown-it-wikilinks plugin
+- [`markdown-it-container`](https://observablehq.observablehq.cloud/framework-example-markdown-it-container/) - The markdown-it-container plugin
+- [`markdown-it-footnote`](https://observablehq.observablehq.cloud/framework-example-markdown-it-footnote/) - The markdown-it-footnote plugin
+- [`markdown-it-wikilinks`](https://observablehq.observablehq.cloud/framework-example-markdown-it-wikilinks/) - The markdown-it-wikilinks plugin
 
 ### Other
 
-* [`chess`](https://observablehq.observablehq.cloud/framework-example-chess/) - Loading Zip data from FIDE; creating a bump chart with Observable Plot
-* [`custom-stylesheet`](https://observablehq.observablehq.cloud/framework-example-custom-stylesheet/) - Defining a custom stylesheet (custom theme)
-* [`google-analytics`](https://observablehq.observablehq.cloud/framework-example-google-analytics/) - A Google Analytics dashboard with numbers and charts
-* [`hello-world`](https://observablehq.observablehq.cloud/framework-example-hello-world/) - A minimal Framework project
-* [`intersection-observer`](https://observablehq.observablehq.cloud/framework-example-intersection-observer/) - Scrollytelling with IntersectionObserver
-* [`penguin-classification`](https://observablehq.observablehq.cloud/framework-example-penguin-classification/) - Logistic regression in Python; validating models with Observable Plot
-* [`responsive-iframe`](https://observablehq.observablehq.cloud/framework-example-responsive-iframe/) - Adjust the height of an embedded iframe to fit its content
+- [`chess`](https://observablehq.observablehq.cloud/framework-example-chess/) - Loading Zip data from FIDE; creating a bump chart with Observable Plot
+- [`custom-stylesheet`](https://observablehq.observablehq.cloud/framework-example-custom-stylesheet/) - Defining a custom stylesheet (custom theme)
+- [`google-analytics`](https://observablehq.observablehq.cloud/framework-example-google-analytics/) - A Google Analytics dashboard with numbers and charts
+- [`hello-world`](https://observablehq.observablehq.cloud/framework-example-hello-world/) - A minimal Framework project
+- [`intersection-observer`](https://observablehq.observablehq.cloud/framework-example-intersection-observer/) - Scrollytelling with IntersectionObserver
+- [`penguin-classification`](https://observablehq.observablehq.cloud/framework-example-penguin-classification/) - Logistic regression in Python; validating models with Observable Plot
+- [`responsive-iframe`](https://observablehq.observablehq.cloud/framework-example-responsive-iframe/) - Adjust the height of an embedded iframe to fit its content
 
 ## About these examples
 
@@ -104,15 +105,15 @@ If you have an example that you’d like to share with the community, please [op
 
 Here are some technique examples we’d like to see:
 
-* Visualization
-  * Big number with area chart
-  * Daily metric chart with moving average
-  * Punchcard chart (activity by day of week and hour of day)
-  * Bump chart/rank chart
-  * Brushing
-  * Zooming
-* Data loaders
-  * JSZip data loader
-  * npm data loader
-* Markdown
-  * Inline TeX `$…$`
+- Visualization
+  - Big number with area chart
+  - Daily metric chart with moving average
+  - Punchcard chart (activity by day of week and hour of day)
+  - Bump chart/rank chart
+  - Brushing
+  - Zooming
+- Data loaders
+  - JSZip data loader
+  - npm data loader
+- Markdown
+  - Inline TeX `$…$`
diff --git a/examples/loader-r-to-json/.gitignore b/examples/loader-r-to-json/.gitignore
@@ -0,0 +1,6 @@
+.DS_Store
+/dist/
+node_modules/
+yarn-error.log
+.RData
+.Rhistory
diff --git a/examples/loader-r-to-json/README.md b/examples/loader-r-to-json/README.md
@@ -0,0 +1,9 @@
+[Framework examples →](../)
+
+# R data loader to generate JSON
+
+View live: <https://observablehq.observablehq.cloud/framework-example-loader-r-to-json/>
+
+This Observable Framework example demonstrates how to write a data loader in R that accesses text from Tolstoy’s _War and Peace_ from Project Gutenberg, does some basic text mining, then generates JSON with top word counts by book and chapter.
+
+The data loader lives in [`src/data/tolstoy.json.R`](./src/data/tolstoy.json.R).
diff --git a/examples/loader-r-to-json/observablehq.config.js b/examples/loader-r-to-json/observablehq.config.js
@@ -0,0 +1,3 @@
+export default {
+  root: "src"
+};
diff --git a/examples/loader-r-to-json/package.json b/examples/loader-r-to-json/package.json
@@ -0,0 +1,20 @@
+{
+  "type": "module",
+  "private": true,
+  "scripts": {
+    "clean": "rimraf src/.observablehq/cache",
+    "build": "rimraf dist && observable build",
+    "dev": "observable preview",
+    "deploy": "observable deploy",
+    "observable": "observable"
+  },
+  "dependencies": {
+    "@observablehq/framework": "^1.7.0"
+  },
+  "devDependencies": {
+    "rimraf": "^5.0.5"
+  },
+  "engines": {
+    "node": ">=18"
+  }
+}
diff --git a/examples/loader-r-to-json/src/.gitignore b/examples/loader-r-to-json/src/.gitignore
@@ -0,0 +1 @@
+/.observablehq/cache/
diff --git a/examples/loader-r-to-json/src/data/tolstoy.json.R b/examples/loader-r-to-json/src/data/tolstoy.json.R
@@ -0,0 +1,37 @@
+# Attach libraries (must be installed)
+library(tidytext)
+library(readr)
+library(dplyr)
+library(stringr)
+library(jsonlite)
+
+# Access and wrangle data
+tolstoy <- read_csv("https://www.gutenberg.org/cache/epub/2600/pg2600.txt") |>
+  rename(text = 1)
+booktext <- tolstoy[-(1:400), ]
+booktext <- booktext[-(51477:51770), ]
+
+tidy_tolstoy <- booktext |>
+  mutate(book = cumsum(str_detect(text, "BOOK | EPILOGUE"))) |>
+  mutate(book = case_when(
+    book < 16 ~ paste("Book", book),
+    book == 16 ~ "Epilogue 1",
+    book == 17 ~ "Epilogue 2"
+  )) |>
+  group_by(book) |>
+  mutate(chapter = cumsum(str_detect(text, regex("CHAPTER", ignore_case = FALSE)))) |>
+  ungroup() |>
+  filter(!str_detect(text, regex("BOOK", ignore_case = FALSE))) |>
+  filter(!str_detect(text, regex("CHAPTER", ignore_case = FALSE))) |>
+  unnest_tokens(word, text) |>
+  anti_join(stop_words)
+
+# Find top 10 words (by count) for each chapter
+tolstoy_word_counts <- tidy_tolstoy |>
+  group_by(book, chapter) |>
+  count(word) |>
+  top_n(10, n) |>
+  arrange(desc(n), .by_group = TRUE)
+
+# Create JSON and write to standard output
+cat(toJSON(tolstoy_word_counts, pretty = TRUE))
diff --git a/examples/loader-r-to-json/src/index.md b/examples/loader-r-to-json/src/index.md
@@ -0,0 +1,78 @@
+# R data loader to generate JSON
+
+Here’s an R data loader that accesses Tolstoy’s _War and Peace_ from the [Gutenberg Project](https://www.gutenberg.org/ebooks/2600), finds the most common words by book and chapter, then outputs JSON.
+
+```r
+# Attach libraries (must be installed)
+library(tidytext)
+library(readr)
+library(dplyr)
+library(stringr)
+library(jsonlite)
+
+# Access and wrangle data
+tolstoy <- read_csv("https://www.gutenberg.org/cache/epub/2600/pg2600.txt") |>
+  rename(text = 1)
+booktext <- tolstoy[-(1:400), ]
+booktext <- booktext[-(51477:51770), ]
+
+tidy_tolstoy <- booktext |>
+  mutate(book = cumsum(str_detect(text, "BOOK | EPILOGUE"))) |>
+  mutate(book = case_when(
+    book < 16 ~ paste("Book", book),
+    book == 16 ~ "Epilogue 1",
+    book == 17 ~ "Epilogue 2"
+  )) |>
+  group_by(book) |>
+  mutate(chapter = cumsum(str_detect(text, regex("CHAPTER", ignore_case = FALSE)))) |>
+  ungroup() |>
+  filter(!str_detect(text, regex("BOOK", ignore_case = FALSE))) |>
+  filter(!str_detect(text, regex("CHAPTER", ignore_case = FALSE))) |>
+  unnest_tokens(word, text) |>
+  anti_join(stop_words)
+
+# Find top 10 words (by count) for each chapter
+tolstoy_word_counts <- tidy_tolstoy |>
+  group_by(book, chapter) |>
+  count(word) |>
+  top_n(10, n) |>
+  arrange(desc(n), .by_group = TRUE)
+
+# Create JSON and write to standard output
+cat(toJSON(tolstoy_word_counts, pretty = TRUE))
+```
+
+<div class="note">
+
+To run this data loader, you’ll need R installed, along with all required packages, _e.g._ by running `install.packages(c("tidytext", "readr", "dplyr", "stringr", "jsonlite"))`.
+
+</div>
+
+The above data loader lives in `data/tolstoy.json.R`, so we can load the data as `data/tolstoy.json` using `FileAttachment`.
+
+```js echo
+const text = FileAttachment("data/tolstoy.json").json();
+```
+
+We can display this dataset with Inputs.table:
+
+```js echo
+Inputs.table(text)
+```
+
+We can make a quick chart of top words in Book 1, with color mapped to book chapter, using [Observable Plot](https://observablehq.com/plot/):
+
+```js echo
+Plot.plot({
+  marks: [
+    Plot.barY(text, {
+      filter: (d) => d.book === "Book 1",
+      x: "word",
+      y: "n",
+      fill: "chapter",
+      tip: true,
+      sort: {x: "y", limit: 5, reverse: true}
+    })
+  ]
+})
+```