Regardless of your job title, it is likely that the amount of data with which you are working is growing quickly. Your original solutions may need to be scaled, and your old techniques for solving new problems may need to be updated.
We hope this course will help you leverage Apache Spark to tackle new problems easily and old problems efficiently.
In this course, we will learn about Big Data, its applications and its challenges, and how Spark helps us in dealing with Big Data. We will be covering the architecture of Spark, its internal working using RDDs and optimization techniques. We will learn how to use the different Spark APIs like Spark SQL and Spark ML using Python.
Market for Big Data Analytics is growing tremendously across the world and such a strong growth pattern followed by market demand is a great opportunity for all IT Professionals. Here are a few Professional IT groups, who are continuously enjoying the benefits and perks of moving into the Big Data domain.
Developers and Architects
BI /ETL/DW Professionals
Senior IT Professionals
Big Data Enthusiasts
Software Architects, Engineers, and Developers
Data Scientists and Analytics Professionals
- Good to have knowledge of any SQL platform like MySQL, PostgreSQL, Oracle etc.
- Good to have knowledge of any programming language like Python, Java, Scala etc.
- You should be familiar with Object Oriented Programming concepts like Classes, Objects, Inheritance, etc.
- You should be familiar with concepts of lambda functions and higher order functions.
- Good to have knowledge of machine learning concepts.
- Good to have knowledge of any cloud technology like AWS, Azure etc.
Understanding the Big data challenges and applications
Understanding the architecture of Apache Spark
Familiarity with Spark’s basic abstractions like RDDs and DataFrames
Familiarity with Spark APIs like Spark SQL, Spark ML
Exploratory Data Analysis of any data set using PySpark
Building Machine Learning pipelines in PySpark
- Course Overview
- Instructor Introduction
- Course Handouts
- What is Big Data?
- Challenges with Big Data
- Applications of Big Data
- Quiz: Big Data
- Distributed Systems
- Quiz: Distributed Systems
- Introduction to Apache Hadoop
- Components of Apache Hadoop
- Hadoop Ecosystem
- Quiz: Introduction to Hadoop
- What is Spark?
- Spark Ecosystem
- Quiz: Introduction to Apache Spark
- Spark Architecture
- Quiz: Spark Architechture
- Spark Cluster Managers
- Running Spark Applications on YARN
- Spark Context and Spark Sesssion
- Quiz: Spark Cluster Managers
- Itversity Credentials
- Introduction to Itversity
- Uploading data to Itversity
- HDFS common commands
- What Are RDDs?
- How to create RDDs?
- Implementation: How to create RDDs?
- RDD Operations
- Implementation: RDD Operations(Part 1)
- Implementation: RDD Operations(Part 2)
- Quiz: RDD
- Pair RDDs
- Pair RDD Operations
- Implementation: Pair RDD Operations
- Implementation: GroupByKey Vs ReduceByKey
- Quiz: Pair RDD
- Caching and Persistence in Spark
- Implementation: Persistence
- Storage Levels in Spark
- Implementation: Storage Levels
- Quiz: Caching & Persistence
- Assignment: RDD Operations
- What are Spark DataFrames?
- Implementation: Creating Spark DataFrames
- Implementation: Basic Operations on DataFrames
- Implementation: Creating Columns in DataFrames
- Implementation: Manipulating Records in DataFrames
- RDDs Vs DataFrames - When to use?
- Quiz: DataFrames in Spark
- Assignment: Spark DataFrames
- Jobs, Stages and Tasks
- Implementation: Jobs, Stages and Tasks
- Implementation: Lineage
- Implementation: DAG
- Quiz: Spark Execution
- Shared Variables
- Implementation: Shared Variables
- Coalesce vs Repartition
- Implementation: Coalesce vs Repartition
- Quiz: Advance Programming in Spark
- What is Spark SQL?
- Catalyst Optimizer
- Spark SQL Queries
- Implementation: Spark SQL Queries
- Why do we need Spark SQL?
- Quiz: Spark SQL
- Assignment: Spark SQL
- Scope of ML in this Course
- Introduction to Machine Learning
- Types of Machine Learning Problems
- Machine Learning in Spark
- Life Cycle of a ML Project
- Quiz: Machine Learning in Spark
- Understanding the Problem Statement
- Implementation: Introduction to the Data
- Implementation: Univariate Analysis
- Implementation: Bivariate Analysis
- Quiz: Analysis using Spark
- Encoding Categorical Variables
- Implementation: Preprocessing Data
- Quiz: Preprocessing data using spark ML
- Vector Assembler
- Implementation: Model Building
- Quiz: Model Building
- Model Improvement
- Implementation: Fine Tuning ML Models
- Quiz: Fine Tune ML Models
- Understanding ML Pipelines in Spark
- Implementation: Sample Pipelines in Spark
- Implementation: ML Pipeline for Click Prediction
- Quiz: ML Pipelines
- Assignment: Spark ML