LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 23,400 courses taught by industry experts.

Data engineering with Spark

Data engineering with Spark - Apache Spark Tutorial

From the course: Apache Spark Essential Training: Big Data Engineering

Start my 1-month free trial Buy for my team

Data engineering with Spark

“

- [Instructor] Apache Spark is arguably the best processing technology available for data engineering today. It has been constantly evolving over the last few years, adding new capabilities and improving in reliability. Spark can be used to implement both batch and real time use cases. It has support for a number of capabilities to help in this regard. It has native support for several popular data sources like RDBMS, Kafka and HDFS. It also has advanced parallel-processing capabilities to process large quantities of data in real time. Capabilities like MapReduce, windowing, state management, and joins, enable powerful use cases. Finally, it also has support for graph processing and machine learning, so these use cases can also be integrated into these big data pipelines. We will explore these capabilities in detail in the course. Let's dive right into understanding what these capabilities are and use a couple of…

Contents

- (Locked)
  
  More about Apache Spark
  
  43s