About the course

Where do I begin? Data Engineering is such a huge field - where do you even start learning about Data Engineering?

These are career-defining questions often asked by data engineering aspirants. There are a million resources out there to refer but the learning journey can be quite exhausting if you don’t know where to start.

Don’t worry, we are here to help you take your first steps into the world of data engineering! Here’s the learning path for people who want to become a data engineer in 2022. We have arranged and compiled all the best resources in a structured manner so that you have a unified resource to become a successful data engineer.

Moreover, we have added the most in-demand skills for the year 2022 for data engineers including storytelling, model deployment, and much more along with exercises and assignments. 

Key takeaways of this course

The course is ideal for beginners in the field of Data Engineering. Several features which make it exciting are:

  • Beginner friendly course: The course assumes no prerequisites and is meant for beginners

  • Curated list of resources to follow: All the necessary topics are covered in the course, in an orderly manner with links to relevant resources.

  • Updated skillset for 2022: The knowledge of Data Engineering is important but that won’t set you apart. We have included some of the top unique skills you’ll require to become a data engineer in 2022.

Pre-requisites

This is a beginner-friendly course and has no prerequisites.

Course curriculum

  • 1
    Overview of Learning Path 2022
    • Overview of Learning Path
    • Month-on-Month Plan
    • AI&ML Blackbelt Plus Program (Sponsored)
  • 2
    January 2022: Learn Programming
    • Overview of the Course
    • A brief introduction to Python
    • Introduction to Python Test
    • Installing Python
    • Become a BlackBelt in Data Science
    • Theory of Operators
    • Exercise
    • Understanding Operators in Python
    • Operators Test
    • Understanding variables and data types
    • Variable Test
    • Variables and Data Types in Python
    • Exercise
    • Understanding Conditional Statements
    • Exercise
    • Implementing Conditional Statements in Python
    • Conditional Statements test
    • Understanding Looping Constructs
    • Exercise
    • Implementing Looping Constructs in Python
    • Looping Constructs test
    • Understanding Functions
    • Implementing Functions in Python
    • Functions test
    • A brief introduction to data structure
    • Data Structure test
    • Understanding the concept of Lists
    • Lists test
    • Implementing Lists in Python
    • Exercise
    • Understanding the concept of Dictionaries
    • Exercise
    • Implementing Dictionaries in Python
    • Dictionaries test
    • Understanding the concept of Standard Libraries
    • Libraries test
    • Reading a CSV File in Python - Introduction to Pandas
    • Reading a CSV file in Python: Implementation
    • Reading a csv file in Python test
    • Understanding dataframes and basic operations
    • DataFrames and basic operations test
    • Reading dataframes and conduct basic operations in Python
    • Reading dataframes and conduct basic operations in Python Test
    • Indexing a Dataframe
    • Indexing DataFrames test
    • Exercise
    • Sorting Dataframes
    • Merging Dataframes
    • Quiz: Sorting and Merging dataframes
    • Apply function
    • Aggregating data
    • Quiz: Apply function and Aggregating data
    • Basics of Matplotlib
    • Data Visualization using Matplotlib
    • Quiz: Matplotlib
    • Basics of Seaborn
    • Data Visualization using Seaborn
    • Quiz: Seaborn
    • Regular Expressions
    • Understanding Regular Expressions
    • Quiz: Regular Expressions
    • Regular Expressions in Python
    • Quiz: Regular Expressions in Python
    • Cheatsheet for Python
    • Instructions
    • Quiz
    • Python Coding Challenge
    • Test your Skills: Python
    • Poll
    • Where to go from here?
  • 3
    February 2022: Learn Relational Databases
    • Plan for February 2022
    • 1.1 Introduction FREE PREVIEW
    • 1.2 Why do we need databases? FREE PREVIEW
    • 1.3 What is a database? FREE PREVIEW
    • 1.4 Some properties of a Good Database FREE PREVIEW
    • 1.5 Types of Databases
    • 1.6 How data is Stored in Relational Databases
    • 1.7 How data is stored in NoSQL databases
    • 1.8 Companies using MySQL FREE PREVIEW
    • Exercise 1
    • Course Handouts
    • 2.1 Introduction
    • 2.2 Architecture: Client and Server
    • 2.3 MySQL Distributions
    • 2.4 Local Installation on Mac
    • 2.5 Local Installation on Linux
    • 2.6 Local Installation on Windows
    • 2.7 Licensing
    • 2.8 Accessing a remote MySQL server
    • 2.9 Graphical user interfaces
    • Exercise 2
    • SQL - Installation Guide
    • 3.1 Introduction
    • 3.2 What exactly is SQL?
    • 3.3 History of SQL
    • 3.4 Connecting to MySQL
    • 3.5 Types of Commands - DDL (Creation/ Deletion/ Updating of Schema
    • 3.6 Types of Commands - DML (Manipulating data in tables)
    • 3.7 Types of Commands - DCL (Managing Access control)
    • 3.8 Exploring databases
    • 3.9 Creating tables
    • 3.10 Inserting data in tables
    • 3.11 SELECT Statement - Introduction
    • 3.12 Datatypes in MySQL
    • 3.13 NULL vs NOT NULL
    • Exercise 3
    • 4.1 Introduction
    • 4.2 Update command – Concept
    • 4.3 Update command – Example
    • 4.4 Delete command – Concept
    • 4.5 Delete command – Example
    • 4.6 Describe command – Concept
    • 4.7 Describe command – Example
    • 4.8 Alter command – Concept and Example
    • Copy of Exercise 4
    • 5.1 Introduction
    • 5.2 Importing data from CSV to MySQL
    • 5.3 Exporting data from MySQL to CSV
    • 5.4 Backing up databases
    • 5.5 Restoring databases
    • Exercise 5
    • Importing and Exporting Datasets - Troubleshooting Guide
    • 6.1 Introduction
    • 6.2 Counting Rows and Items
    • 6.3 Aggregation Functions – SUM, AVG, STDDEV
    • 6.4 Extreme Values Identification – MIN, MAX
    • 6.5 Slicing data
    • 6.6 Limiting data
    • 6.7 Sorting data
    • 6.8 Filtering Patterns
    • 6.9 Groupings, Rolling up data and Filtering in Groups
    • Exercise 6
    • 7.1 Introduction
    • 7.2 Data Eyeballing
    • 7.3 Data Dictionary
    • 7.4 Questions we need answers of
    • 7.5 Analyzing data and creating table structure
    • 7.6 Loading data to our MySQL table
    • 7.7 Data Analysis – Simple Queries
    • 7.8 Data Analysis – Advanced Queries
    • FIFA19 Players dataset (cleaned) for this Project
    • 8.1 Introduction
    • 8.2. The need for joins
    • 8.3. Different type of joins
    • 8.4. The Left Join - Concept
    • 8.5. The Left Join – Practical Example
    • 8.6. The Inner Join
    • 8.7. The Cross Join
    • 8.8. The Right Join
    • 8.9. The Self Join
    • Assignment: Share your learning and build your profile
    • Exercise
    • 9.1. Introduction
    • 9.2. Introduction to Indexing
    • 9.3. How indexing works (basics)
    • 9.4. Relationships
    • 9.5. Types of Relationships
    • 9.6. Table Constraints – PRIMARY KEY, FOREIGN KEY, UNIQUENESS and AUTO INCREMENT
    • Exercise
    • 10.1 String functions - CONCAT
    • 10.2 String functions – Case Conversion
    • 10.3 String functions – Trimming Strings
    • 10.4 String functions – Extracting Substrings
    • 10.5 Date/ Time functions – Current date and time
    • 10.6 Date/ Time functions – Extracting date and time from field
    • 10.7 Date/ Time functions – Formatting date and time as Strings
    • 10.8 Numeric functions
    • SQL CheatSheet
    • Exercise
    • 11.1 Introduction
    • 11.2 Setting up a virtual environment
    • 11.3 Installing the required packages
    • 11.4 Connecting to MySQL
    • 11.5 Connecting to database table and pulling data
    • 11.6 Querying the database- INSERT
    • 11.7 Querying the database- DELETE
    • 11.8 Querying the database- SEARCH
    • 11.9 Querying the database- INDEXING
    • 11.10 Notes and Resources
    • Exercise
  • 4
    March 2022: Fundamentals of Linux and Cloud Computing
    • Basic Linux Commands
    • Introduction to Cloud Computing
    • Cloud Deployment Models
    • Service Models
    • Resources: Learn about AWS
  • 5
    April 2022 : NoSQL Databases
    • Creating Databases and Collections
    • Inserting Documents
    • Reading Documents
    • The _id Field
    • Importing and Exporting Data
    • Backup and Restore MongoDB Databases
    • Updating Documents
    • Deleting Documents, Collections and Databases
    • CRUD Operations in MongoDB Atlas
    • Importing, Exporting and Working with MongoDB Atlas
  • 6
    May 2022: Hadoop Ecosystem
    • What is Big Data?
    • Challenges with Big Data
    • Applications of Big Data
    • Distributed Systems
  • 7
    June 2022: Data Warehousing
    • What is Hive
    • Features of Hive
    • Working of Hive

Instructor

  • Analytics Vidhya

    Analytics Vidhya

    Analytics Vidhya provides a community based knowledge portal for Analytics and Data Science professionals. The aim of the platform is to become a complete portal serving all knowledge and career needs of Data Science Professionals.