Skip to content

Latest commit

 

History

History
 
 

30-sparklyr-rmarkdown

sparklyr webinar

The sparklyr webinar describes how to use R and Apache Spark with the new sparklyr package from RStudio. You can access the presentation materials [here](sparklyr webinar 2016.pdf). There is also a nice set of examples on the main site.

R markdown notebooks

Three scripts are referenced in the webinar. If you install the sparklyr package the first two can be reproduced in local mode. The third script will only run on a Spark cluster with preprocessed data loaded into Hive, therefore it is here for instructional purposes only.

  1. Initialize a spark connection and load data into it
  2. Run sparklyr using dplyr in local mode
  3. Analyze 1 billion records in a Spark cluster the NYC taxi data

For a complete set of scripts see the sparkdemos github repository.

Questions and answers

We had a lot of great questions during the video and we were not able to answer all of them. I have gone through and tried to answer each question below. If you have more questions, please submit questions to this google group. If you have issues with the software, please submit them to the github repos.