Skip to content

Commit

Permalink
Technique example: data loader, R to JSON (#1417)
Browse files Browse the repository at this point in the history
* loader example R to JSON

* update r-to-json readme

* format Plot code

* Apply suggestions from code review

Co-authored-by: Mike Bostock <mbostock@gmail.com>

* updates package install line, and adds to examples README

---------

Co-authored-by: Allison Horst <allison@Allisons-MacBook-Air.local>
Co-authored-by: Mike Bostock <mbostock@gmail.com>
  • Loading branch information
3 people authored Jun 3, 2024
1 parent 1a1d550 commit ee78892
Show file tree
Hide file tree
Showing 8 changed files with 193 additions and 38 deletions.
77 changes: 39 additions & 38 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,44 +33,45 @@

### Charts

* [`geotiff`](https://observablehq.observablehq.cloud/framework-example-geotiff/) - Parsing GeoTIFF with geotiff.js, then visualizing with Observable Plot
* [`netcdf`](https://observablehq.observablehq.cloud/framework-example-netcdf/) - Parsing NetCDF with `netcdfjs`, then visualizing with Observable Plot
* [`vega-dark`](https://observablehq.observablehq.cloud/framework-example-vega-dark/) - Responsive dark mode in Vega-Lite
* [`vega-responsive`](https://observablehq.observablehq.cloud/framework-example-vega-responsive/) - Responsive width in Vega-Lite using ResizeObserver
- [`geotiff`](https://observablehq.observablehq.cloud/framework-example-geotiff/) - Parsing GeoTIFF with geotiff.js, then visualizing with Observable Plot
- [`netcdf`](https://observablehq.observablehq.cloud/framework-example-netcdf/) - Parsing NetCDF with `netcdfjs`, then visualizing with Observable Plot
- [`vega-dark`](https://observablehq.observablehq.cloud/framework-example-vega-dark/) - Responsive dark mode in Vega-Lite
- [`vega-responsive`](https://observablehq.observablehq.cloud/framework-example-vega-responsive/) - Responsive width in Vega-Lite using ResizeObserver

### Data loaders

* [`loader-arrow`](https://observablehq.observablehq.cloud/framework-example-loader-arrow/) - Generating Apache Arrow IPC files
* [`loader-databricks`](https://observablehq.observablehq.cloud/framework-example-loader-databricks/) - Loading data from Databricks
* [`loader-duckdb`](https://observablehq.observablehq.cloud/framework-example-loader-duckdb/) - Processing data with DuckDB
* [`loader-github`](https://observablehq.observablehq.cloud/framework-example-loader-github/) - Loading data from GitHub
* [`loader-google-analytics`](https://observablehq.observablehq.cloud/framework-example-loader-google-analytics/) - Loading data from Google Analytics
* [`loader-parquet`](https://observablehq.observablehq.cloud/framework-example-loader-parquet/) - Generating Apache Parquet files
* [`loader-postgres`](https://observablehq.observablehq.cloud/framework-example-loader-postgres/) - Loading data from PostgreSQL
* [`loader-snowflake`](https://observablehq.observablehq.cloud/framework-example-loader-snowflake/) - Loading data from Snowflake
* [`netcdf-contours`](https://observablehq.observablehq.cloud/framework-example-netcdf-contours/) - Converting NetCDF to GeoJSON with `netcdfjs` and `d3-geo-voronoi`
- [`loader-arrow`](https://observablehq.observablehq.cloud/framework-example-loader-arrow/) - Generating Apache Arrow IPC files
- [`loader-databricks`](https://observablehq.observablehq.cloud/framework-example-loader-databricks/) - Loading data from Databricks
- [`loader-duckdb`](https://observablehq.observablehq.cloud/framework-example-loader-duckdb/) - Processing data with DuckDB
- [`loader-github`](https://observablehq.observablehq.cloud/framework-example-loader-github/) - Loading data from GitHub
- [`loader-google-analytics`](https://observablehq.observablehq.cloud/framework-example-loader-google-analytics/) - Loading data from Google Analytics
- [`loader-parquet`](https://observablehq.observablehq.cloud/framework-example-loader-parquet/) - Generating Apache Parquet files
- [`loader-postgres`](https://observablehq.observablehq.cloud/framework-example-loader-postgres/) - Loading data from PostgreSQL
- [`loader-r-to-json`](https://observablehq.observablehq.cloud/framework-example-loader-r-to-json/) - Loading data from PostgreSQL
- [`loader-snowflake`](https://observablehq.observablehq.cloud/framework-example-loader-snowflake/) - Loading data from Snowflake
- [`netcdf-contours`](https://observablehq.observablehq.cloud/framework-example-netcdf-contours/) - Converting NetCDF to GeoJSON with `netcdfjs` and `d3-geo-voronoi`

### Inputs

* [`codemirror`](https://observablehq.observablehq.cloud/framework-example-codemirror/) - A text input powered by CodeMirror
* [`custom-input-2d`](https://observablehq.observablehq.cloud/framework-example-custom-input-2d/) - A custom 2D input with bidirectional binding
* [`input-select-file`](https://observablehq.observablehq.cloud/framework-example-input-select-file/) - Selecting a file from a drop-down menu
- [`codemirror`](https://observablehq.observablehq.cloud/framework-example-codemirror/) - A text input powered by CodeMirror
- [`custom-input-2d`](https://observablehq.observablehq.cloud/framework-example-custom-input-2d/) - A custom 2D input with bidirectional binding
- [`input-select-file`](https://observablehq.observablehq.cloud/framework-example-input-select-file/) - Selecting a file from a drop-down menu

### Markdown

* [`markdown-it-container`](https://observablehq.observablehq.cloud/framework-example-markdown-it-container/) - The markdown-it-container plugin
* [`markdown-it-footnote`](https://observablehq.observablehq.cloud/framework-example-markdown-it-footnote/) - The markdown-it-footnote plugin
* [`markdown-it-wikilinks`](https://observablehq.observablehq.cloud/framework-example-markdown-it-wikilinks/) - The markdown-it-wikilinks plugin
- [`markdown-it-container`](https://observablehq.observablehq.cloud/framework-example-markdown-it-container/) - The markdown-it-container plugin
- [`markdown-it-footnote`](https://observablehq.observablehq.cloud/framework-example-markdown-it-footnote/) - The markdown-it-footnote plugin
- [`markdown-it-wikilinks`](https://observablehq.observablehq.cloud/framework-example-markdown-it-wikilinks/) - The markdown-it-wikilinks plugin

### Other

* [`chess`](https://observablehq.observablehq.cloud/framework-example-chess/) - Loading Zip data from FIDE; creating a bump chart with Observable Plot
* [`custom-stylesheet`](https://observablehq.observablehq.cloud/framework-example-custom-stylesheet/) - Defining a custom stylesheet (custom theme)
* [`google-analytics`](https://observablehq.observablehq.cloud/framework-example-google-analytics/) - A Google Analytics dashboard with numbers and charts
* [`hello-world`](https://observablehq.observablehq.cloud/framework-example-hello-world/) - A minimal Framework project
* [`intersection-observer`](https://observablehq.observablehq.cloud/framework-example-intersection-observer/) - Scrollytelling with IntersectionObserver
* [`penguin-classification`](https://observablehq.observablehq.cloud/framework-example-penguin-classification/) - Logistic regression in Python; validating models with Observable Plot
* [`responsive-iframe`](https://observablehq.observablehq.cloud/framework-example-responsive-iframe/) - Adjust the height of an embedded iframe to fit its content
- [`chess`](https://observablehq.observablehq.cloud/framework-example-chess/) - Loading Zip data from FIDE; creating a bump chart with Observable Plot
- [`custom-stylesheet`](https://observablehq.observablehq.cloud/framework-example-custom-stylesheet/) - Defining a custom stylesheet (custom theme)
- [`google-analytics`](https://observablehq.observablehq.cloud/framework-example-google-analytics/) - A Google Analytics dashboard with numbers and charts
- [`hello-world`](https://observablehq.observablehq.cloud/framework-example-hello-world/) - A minimal Framework project
- [`intersection-observer`](https://observablehq.observablehq.cloud/framework-example-intersection-observer/) - Scrollytelling with IntersectionObserver
- [`penguin-classification`](https://observablehq.observablehq.cloud/framework-example-penguin-classification/) - Logistic regression in Python; validating models with Observable Plot
- [`responsive-iframe`](https://observablehq.observablehq.cloud/framework-example-responsive-iframe/) - Adjust the height of an embedded iframe to fit its content

## About these examples

Expand Down Expand Up @@ -104,15 +105,15 @@ If you have an example that you’d like to share with the community, please [op

Here are some technique examples we’d like to see:

* Visualization
* Big number with area chart
* Daily metric chart with moving average
* Punchcard chart (activity by day of week and hour of day)
* Bump chart/rank chart
* Brushing
* Zooming
* Data loaders
* JSZip data loader
* npm data loader
* Markdown
* Inline TeX `$…$`
- Visualization
- Big number with area chart
- Daily metric chart with moving average
- Punchcard chart (activity by day of week and hour of day)
- Bump chart/rank chart
- Brushing
- Zooming
- Data loaders
- JSZip data loader
- npm data loader
- Markdown
- Inline TeX `$…$`
6 changes: 6 additions & 0 deletions examples/loader-r-to-json/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.DS_Store
/dist/
node_modules/
yarn-error.log
.RData
.Rhistory
9 changes: 9 additions & 0 deletions examples/loader-r-to-json/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[Framework examples →](../)

# R data loader to generate JSON

View live: <https://observablehq.observablehq.cloud/framework-example-loader-r-to-json/>

This Observable Framework example demonstrates how to write a data loader in R that accesses text from Tolstoy’s _War and Peace_ from Project Gutenberg, does some basic text mining, then generates JSON with top word counts by book and chapter.

The data loader lives in [`src/data/tolstoy.json.R`](./src/data/tolstoy.json.R).
3 changes: 3 additions & 0 deletions examples/loader-r-to-json/observablehq.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export default {
root: "src"
};
20 changes: 20 additions & 0 deletions examples/loader-r-to-json/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"type": "module",
"private": true,
"scripts": {
"clean": "rimraf src/.observablehq/cache",
"build": "rimraf dist && observable build",
"dev": "observable preview",
"deploy": "observable deploy",
"observable": "observable"
},
"dependencies": {
"@observablehq/framework": "^1.7.0"
},
"devDependencies": {
"rimraf": "^5.0.5"
},
"engines": {
"node": ">=18"
}
}
1 change: 1 addition & 0 deletions examples/loader-r-to-json/src/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/.observablehq/cache/
37 changes: 37 additions & 0 deletions examples/loader-r-to-json/src/data/tolstoy.json.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Attach libraries (must be installed)
library(tidytext)
library(readr)
library(dplyr)
library(stringr)
library(jsonlite)

# Access and wrangle data
tolstoy <- read_csv("https://www.gutenberg.org/cache/epub/2600/pg2600.txt") |>
rename(text = 1)
booktext <- tolstoy[-(1:400), ]
booktext <- booktext[-(51477:51770), ]

tidy_tolstoy <- booktext |>
mutate(book = cumsum(str_detect(text, "BOOK | EPILOGUE"))) |>
mutate(book = case_when(
book < 16 ~ paste("Book", book),
book == 16 ~ "Epilogue 1",
book == 17 ~ "Epilogue 2"
)) |>
group_by(book) |>
mutate(chapter = cumsum(str_detect(text, regex("CHAPTER", ignore_case = FALSE)))) |>
ungroup() |>
filter(!str_detect(text, regex("BOOK", ignore_case = FALSE))) |>
filter(!str_detect(text, regex("CHAPTER", ignore_case = FALSE))) |>
unnest_tokens(word, text) |>
anti_join(stop_words)

# Find top 10 words (by count) for each chapter
tolstoy_word_counts <- tidy_tolstoy |>
group_by(book, chapter) |>
count(word) |>
top_n(10, n) |>
arrange(desc(n), .by_group = TRUE)

# Create JSON and write to standard output
cat(toJSON(tolstoy_word_counts, pretty = TRUE))
78 changes: 78 additions & 0 deletions examples/loader-r-to-json/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# R data loader to generate JSON

Here’s an R data loader that accesses Tolstoy’s _War and Peace_ from the [Gutenberg Project](https://www.gutenberg.org/ebooks/2600), finds the most common words by book and chapter, then outputs JSON.

```r
# Attach libraries (must be installed)
library(tidytext)
library(readr)
library(dplyr)
library(stringr)
library(jsonlite)

# Access and wrangle data
tolstoy <- read_csv("https://www.gutenberg.org/cache/epub/2600/pg2600.txt") |>
rename(text = 1)
booktext <- tolstoy[-(1:400), ]
booktext <- booktext[-(51477:51770), ]

tidy_tolstoy <- booktext |>
mutate(book = cumsum(str_detect(text, "BOOK | EPILOGUE"))) |>
mutate(book = case_when(
book < 16 ~ paste("Book", book),
book == 16 ~ "Epilogue 1",
book == 17 ~ "Epilogue 2"
)) |>
group_by(book) |>
mutate(chapter = cumsum(str_detect(text, regex("CHAPTER", ignore_case = FALSE)))) |>
ungroup() |>
filter(!str_detect(text, regex("BOOK", ignore_case = FALSE))) |>
filter(!str_detect(text, regex("CHAPTER", ignore_case = FALSE))) |>
unnest_tokens(word, text) |>
anti_join(stop_words)

# Find top 10 words (by count) for each chapter
tolstoy_word_counts <- tidy_tolstoy |>
group_by(book, chapter) |>
count(word) |>
top_n(10, n) |>
arrange(desc(n), .by_group = TRUE)

# Create JSON and write to standard output
cat(toJSON(tolstoy_word_counts, pretty = TRUE))
```

<div class="note">

To run this data loader, you’ll need R installed, along with all required packages, _e.g._ by running `install.packages(c("tidytext", "readr", "dplyr", "stringr", "jsonlite"))`.

</div>

The above data loader lives in `data/tolstoy.json.R`, so we can load the data as `data/tolstoy.json` using `FileAttachment`.

```js echo
const text = FileAttachment("data/tolstoy.json").json();
```

We can display this dataset with Inputs.table:

```js echo
Inputs.table(text)
```

We can make a quick chart of top words in Book 1, with color mapped to book chapter, using [Observable Plot](https://observablehq.com/plot/):

```js echo
Plot.plot({
marks: [
Plot.barY(text, {
filter: (d) => d.book === "Book 1",
x: "word",
y: "n",
fill: "chapter",
tip: true,
sort: {x: "y", limit: 5, reverse: true}
})
]
})
```

0 comments on commit ee78892

Please sign in to comment.