From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 23,400 courses taught by industry experts.

Data engineering with Spark

Data engineering with Spark

- [Instructor] Apache Spark is arguably the best processing technology available for data engineering today. It has been constantly evolving over the last few years, adding new capabilities and improving in reliability. Spark can be used to implement both batch and real time use cases. It has support for a number of capabilities to help in this regard. It has native support for several popular data sources like RDBMS, Kafka and HDFS. It also has advanced parallel-processing capabilities to process large quantities of data in real time. Capabilities like MapReduce, windowing, state management, and joins, enable powerful use cases. Finally, it also has support for graph processing and machine learning, so these use cases can also be integrated into these big data pipelines. We will explore these capabilities in detail in the course. Let's dive right into understanding what these capabilities are and use a couple of…

Contents