duckplyr_demo

Companion repository for the duckplyr R package.

Getting Started

First setup and install the libraries and unzip the parquet data

## Install package and dependencies
# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))
pak::pak(c("duckdblabs/duckplyr", "curl", "zip", "tidyverse"))

## Download and unzip data (1.7 GB)
curl::curl_download("http://duckplyr-demo-taxi-data.s3-website-eu-west-1.amazonaws.com/taxi-data-2019-partitioned.zip", "taxi-data-2019-partitioned.zip", quiet = FALSE)
zip::unzip("taxi-data-2019-partitioned.zip")

Running the queries/scripts

To run all duckplyr queries at once run

Rscript duckplyr/run_all_queries.R

To run all dplyr queries at once run

Rscript dplyr/run_all_queries.R

To run just one duckplyr query run

Rscript duckplyr/q0*_**.R

To run just one dplyr query run

Rscript dplyr/q0*_**.R

What do the queries show/highlight?

Highlights duckplyr handling of many small groups
- Get median tips by day & hour.
- 168 small groups.
- Utilizes Perfect hash groups
Highlights duckplyr projection pushdown
- Gets median tip by the number of passengers
- explain output shows only total_amount, passenger_count, tip_amount, and month are read from the parquet file.
Highlights duckplyr filter pushdown.
- Gets popular (pickup, drop-off) combinations in Manhattan.
- DuckDB can push the filter (Borough = “Manhattan”) all the way into the parquet scan of the dimension table
Highlights duckplyr lazy evaluation.
- Gets percentage of trips that report no tip. Grouped by (pickup borough, drop-off borogh), ranked by number of trips.
- Need to join 2 intermediate results,
- duckplyr lazily evaluates.
Highlights that duckplyr can read hive partitioned data over the network easy. (dplyr cannot do this)
- Hive partition filters
- Month filter not in explain output (yet)

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
dplyr		dplyr
duckplyr		duckplyr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
duckplyr_demo.Rproj		duckplyr_demo.Rproj
run_all_queries_and_graph.R		run_all_queries_and_graph.R
taxi-data-2019.zip		taxi-data-2019.zip
taxi_zone_map_manhattan.jpg		taxi_zone_map_manhattan.jpg
timings.csv		timings.csv
video.R		video.R
zone_lookups.csv		zone_lookups.csv
zone_lookups.parquet		zone_lookups.parquet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

duckplyr_demo

Getting Started

Running the queries/scripts

What do the queries show/highlight?

About

Releases

Packages

Contributors 3

Languages

License

Tmonster/duckplyr_demo

Folders and files

Latest commit

History

Repository files navigation

duckplyr_demo

Getting Started

Running the queries/scripts

What do the queries show/highlight?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages