RETL
is an R package that provides tools for writing ETL jobs in R. It
stands on R’s wide range of APIs to various types of data sources.
It is intended to be used together with the
Rflow and
RETLflow packages as universal API
to data stored in databases, files, excel sheets. RETL relies heavily on
the data.table
package for fast data transofrmations.
RETL can be installed from GitHub by running:
devtools::install_github("vh-d/RETL")
library(RETL)
library(magrittr)
# establish connections
my_db <- DBI::dbConnect(RSQLite::SQLite(), "path/to/my.db")
your_csv <- "path/to/your.csv"
your_db <- dbConnect(RMariaDB::MariaDB(), group = "your-db")
# simple extract and load
etl_read(from = your_csv) %>% etl_write(to = my_db, name = "customers")
# extract -> transform -> load
etl_read(from = my_db, name = "orders") %>% # db query: EXTRACT from a database
dtq(, order_year := year(order_date)) %>% # data.table query: TRANSFORM (adding a new column)
etl_write(to = your_db, name = "customers") # LOAD to a db
set_index(table = "customers", c("id", "order_year"), your_db)