30-sparklyr-rmarkdown

sparklyr webinar

The sparklyr webinar describes how to use R and Apache Spark with the new sparklyr package from RStudio. You can access the presentation materials [here](sparklyr webinar 2016.pdf). There is also a nice set of examples on the main site.

R markdown notebooks

Three scripts are referenced in the webinar. If you install the sparklyr package the first two can be reproduced in local mode. The third script will only run on a Spark cluster with preprocessed data loaded into Hive, therefore it is here for instructional purposes only.

Initialize a spark connection and load data into it
Run sparklyr using dplyr in local mode
Analyze 1 billion records in a Spark cluster the NYC taxi data

For a complete set of scripts see the sparkdemos github repository.

Questions and answers

We had a lot of great questions during the video and we were not able to answer all of them. I have gone through and tried to answer each question below. If you have more questions, please submit questions to this google group. If you have issues with the software, please submit them to the github repos.

Questions and answers from the webinar

Name		Name	Last commit message	Last commit date
parent directory ..
titanic		titanic
01_initialize.Rmd		01_initialize.Rmd
02_dplyr.Rmd		02_dplyr.Rmd
03_taxiDemo.Rmd		03_taxiDemo.Rmd
QA.Rmd		QA.Rmd
README.md		README.md
sparklyr webinar 2016.pdf		sparklyr webinar 2016.pdf
sparklyr.png		sparklyr.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

30-sparklyr-rmarkdown

30-sparklyr-rmarkdown

README.md

sparklyr webinar

R markdown notebooks

Questions and answers

Files

30-sparklyr-rmarkdown

Directory actions

More options

Directory actions

More options

Latest commit

History

30-sparklyr-rmarkdown

Folders and files

parent directory

README.md

sparklyr webinar

R markdown notebooks

Questions and answers