From the course: Apache Spark Essential Training: Big Data Engineering
Unlock the full course today
Join today to access over 23,400 courses taught by industry experts.
Data engineering with Spark - Apache Spark Tutorial
From the course: Apache Spark Essential Training: Big Data Engineering
Data engineering with Spark
- [Instructor] Apache Spark is arguably the best processing technology available for data engineering today. It has been constantly evolving over the last few years, adding new capabilities and improving in reliability. Spark can be used to implement both batch and real time use cases. It has support for a number of capabilities to help in this regard. It has native support for several popular data sources like RDBMS, Kafka and HDFS. It also has advanced parallel-processing capabilities to process large quantities of data in real time. Capabilities like MapReduce, windowing, state management, and joins, enable powerful use cases. Finally, it also has support for graph processing and machine learning, so these use cases can also be integrated into these big data pipelines. We will explore these capabilities in detail in the course. Let's dive right into understanding what these capabilities are and use a couple of…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.