About Introduction to Data Science Course
Getting Started with Data Science
What is Data Science? Why has it become so popular recently? What are some of the popular data science applications? And more importantly, how can you get started with learning data science from scratch?
Are you looking for the answer to these questions? Frustrated by the lack of structured data science learning? You’ve come to the right place!
Data science is ubiquitous right now. Organizations are splurging to integrate data science solutions in their daily processes. It’s a great time to learn data science and get ready for your first industry role!
This course, curated by experienced data science instructors and experts at Analytics Vidhya, will cover the core concepts you need to know to crack data science interviews and become a data scientist!
Why pursue Data Science:
 Data Science is ubiquitous! It is the hottest field in the industry right now
 Data Scientists are one of the most demanded professionals
 There are so many data science algorithms to build predictive models, such as linear regression, logistic regression, decision trees and random forests. Keep learning, keep growing!
 The potential of data science is limitless  spanning across industries, roles and functions.
What will you learn in the ‘Introduction to Data Science’ course?
 387 Lesssons
 4 Real Life Projects from Data Science Industry
 60+ Hours of Comprehensive content
 Live Q & A Session
What will you learn in the ‘Introduction to Data Science’ course?

Understand What Data Science is

Applications of Data Science

Data Science Terminologies

Python for Data Science

Core Statistics for Data Science

Probability Concepts

Introduction to Machine Learning Algorithms

Handson examples and multiple realworld industryrelevant data science projects
Tools covered in Introduction to Data Science
Projects for Introduction to Data Science
Course curriculum

1
Introduction to Data Science
 Getting Started
 Knowing Each Other
 Data Science Overview FREE PREVIEW
 Exercise1
 Terminologies in Data Science FREE PREVIEW
 Exercise2
 Applications of Data Science
 Exercise3
 Instructor's Introduction
 Overview of the course

2
Setting Up the Systems
 Installation steps for Windows
 Installation steps for Linux
 Installation steps for Mac

3
Introduction to Python
 Introduction to Python
 Introduction to Jupyter Notebook

4
Variables and Data Types
 Introduction to Variables
 Implementing Variables in Python
 Quiz: Variables and Data Types

5
Operators
 Introduction to Operators
 Implementing Operators in Python
 Quiz: Operators

6
Conditional Statements
 Introduction to Conditional Statements
 Implementing Conditional Statements in Python
 Quiz: Conditional Statements

7
Looping Constructs
 Introduction to Looping Constructs
 Implementing Loops in Python
 Quiz: Loops in Python
 Break, Continue and Pass Statements
 Quiz: Break, Continue and Pass Statement

8
Data Structures
 Introduction to Data Structures
 List and Tuple
 Implementing List in Pyhton
 Quiz: Lists
 List  Project in Python
 Implementing Tuple in Python
 Quiz: Tuple
 Introduction to Sets
 Implementing Sets in Python
 Quiz: Sets
 Introduction to Dictionary
 Implementing Dictionary in Python
 Quiz: Dictionary
 Assignment: Data Structures

9
String Manipulation
 Introduction to String Manipulation
 Quiz: String Manipulation

10
Functions
 Introduction to Functions
 Implementing Functions in Python
 Quiz: Functions
 Lambda Expression
 Quiz: Lambda Expressions
 Recursion
 Implementing Recursion in Python
 Quiz: Recursion

11
Modules, Packages and Standard Libraries
 Introduction to Modules
 Modules: Intuition
 Introduction to Packages
 Standard Libraries in Python
 User Defined Libraries in Python
 Quiz: Modules, Packages and Standard Libraries

12
Handling Text Files in Python
 Handling Text Files in Python
 Quiz: Handling Text Files
 Assignment

13
Introduction to Python Libraries for Data Science
 Important Libraries for Data Science
 Quiz: Important Libraries for Data Science

14
Python Libraries for Data Science
 Basics of Numpy in Python
 Basics of Scipy in Python
 Basics of Pandas in Python
 Basics of Matplotlib in Python
 Basics of ScikitLearn in Python
 Basics of Statsmodels in Python

15
Reading Data Files in Python
 Reading Data in Python
 Reading CSV files in Python
 Reading Big CSV Files in Python
 Quiz: Reading CSV files in Python
 Reading Excel & Spreadsheet files in Python
 Quiz: Reading Excel & Spreadsheet files in Python
 Reading JSON files in Python
 Quiz: Reading JSON files in Python
 Assignment: Reading Data Files in Python

16
Preprocessing, Subsetting and Modifying Pandas DataFrames
 Subsetting and Modifying Data in Python
 Overview of Subsetting in Pandas I
 Overview of Subsetting in Pandas II
 Subsetting based on Position
 Subsetting based on Label
 Subsetting based on Value
 Quiz: Subsetting DataFrames
 Modifying data in Pandas
 Quiz: Modifying DataFrames
 Assignment: Subsetting and Modifying Pandas Dataframes

17
Sorting and Aggregating Data in Pandas
 Preprocessing, Sorting and Aggregating Data
 Sorting the DataFrame
 Quiz: Sorting DataFrame
 Concatenating DataFrames in Pandas
 Concept of SQLLike Joins in Pandas
 Implementing SQLLike Joins in Pandas
 Quiz: Joins in Pandas
 Aggregating and Summarizing DataFrames
 Preprocessing TimeSeries Data
 Quiz: Preprocessing TimeSeries Data
 Assignment: Sorting and Aggregating Data in Pandas

18
Visualizing Patterns and Trends in Data
 Visualizing Trends & Pattern in Data
 Basics of Matplotlib
 Data Visualization with Matplotlib
 Quiz: Matplotlib
 Basics of Seaborn
 Data Visualization with Seaborn
 Quiz: Seaborn
 Assignment: Visualizing Patterns and Trends in Data

19
Machine Learning Lifecycle
 7 Steps of Machine Learning Lifecycle
 Introduction to Predictive Modeling

20
Problem statement and Hypothesis Generation
 Defining the Problem statement
 Introduction to Hypothesis Generation
 Performing Hypothesis generation
 Quiz  Performing Hypothesis generation
 List of hypothesis
 Data Collection/Extraction
 Quiz  Data Collection/Extraction

21
Importance of Stats and EDA
 Introduction to Exploratory Data Analysis & Data Insights
 Quiz  Introduction to Exploratory Data Analysis & Data Insights
 Role of Statistics in EDA
 Descriptive Statistics
 Inferential Statistics
 Quiz  Descriptive and Inferential Statistics

22
Understanding Data
 Introduction to dataset
 Quiz  Introduction to dataset
 Reading data files into python
 Quiz  Reading data files into python
 Different Variable Datatypes
 Variable Identification
 Quiz  Variable Identification

23
Basics of Probability
 Probability for Data Science
 Quiz  Probability for Data Science
 Basic Concepts of Probability
 Quiz  Basic Concepts of Probability
 Axioms of Probability
 Quiz  Axioms of Probability
 Conditional Probability
 Quiz  Conditional Probability

24
Exploring Continuous Variable
 Data range for continuous variables
 Central Tendencies for continuous variables
 Spread of the data
 Central Tendencies and Spread of the data: Implementation
 Quiz: Central Tendencies and Spread of data
 KDE plots for continuous variable
 KDE plots : Implementation
 Overview of Distributions for Continuous Variables
 Normal Distribution
 Normality Check
 Skewed Distribution
 Skewness and Kurtosis
 Distributions for continuous variable
 Quiz: Distribution of Continuous variables
 Approaching Univariate Analysis
 Approaching Univariate Analysis: Numerical Variables
 Quiz: Univariate analysis for Continuous variables

25
Exploring Categorical Variables
 Central Tendencies for categorical variables
 Understanding Discrete Distributions
 Discrete Distributions Demonstration
 Performing EDA on Catagorical Variables
 Quiz: Univariate Analysis for Categorical Variables

26
Missing Values and Outliers
 Dealing with Missing values
 Understanding Outliers
 Identifying Outliers in data
 Identifying Outliers in data: Implementation
 Quiz: Identifying Outliers in datasets
 Quiz: Outlier treatment

27
Central Limit Theorem
 Important Terminologies
 Central Limit Theorem
 CLT: Implementation
 Quiz: Central Limit Theorem
 Confidence Interval and Margin of error

28
Exploring Continuous  Continuous Variables
 Introduction to Bivariate Analysis
 Covariance
 Pearson Correlation
 Spearman's Correlation & Kendall's Tau
 Correlation versus Causation
 Tabular and Graphical Methods
 Performing Bivariate Analysis on Continuous  Continuous variables
 Quiz: ContinuousContinuous Variables

29
Continuos Categorical
 Tabular and Graphical Methods
 Introduction to hypothesis Testing
 PValue
 One Sample ztest
 Two Sampled ztest
 Quiz: Hypothesis Testing and Z scores
 TTest
 TTest vs ZTest
 Quiz: T tests
 Performing Bivariate Analysis on Catagorical  Continuous variables

30
Categorical Categorical Variables
 Tabular and Graphical Methods
 ChiSquared Test
 Quiz: Chi squared tests
 Bivariate Analysis for Categorical Categorical Variables

31
Multivariate Analysis
 Multivariate Analysis
 Multivariate Analysis Implementation

32
Assignments
 Understanding the NYC Taxi Trip Duration Problem
 Assignment: EDA

33
Build your first Predictive Model
 Introduction and Overview FREE PREVIEW
 Quiz: Introduction and Overview FREE PREVIEW
 Creating the Dataset FREE PREVIEW
 Quiz: Creating the dataset FREE PREVIEW
 Problem Statement: Regression FREE PREVIEW
 Quiz: Problem Statement  Regression
 Benchmark Model: Regression Implementation
 Quiz: Benchmark Model  Regression Implementation
 Problem Statement: Classification
 Quiz: Problem Statement  Classification
 Benchmark Model: Classification Implementation
 Quiz: Benchmark  Classification Implementation

34
Evaluation Metrics
 Introduction to Evaluation Metrics
 Quiz: Introduction to Evaluation Metrics
 Confusion Matrix
 Quiz: Confusion Matrix
 Accuracy
 Quiz: Accuracy
 Alternatives of Accuracy
 Quiz: Alternatives of Accuracy
 Precision and Recall
 Quiz: Precision and Recall
 Thresholding
 Quiz: Thresholding
 AUCROC
 Quiz: AUCROC
 Log loss
 Quiz: Log loss
 Evaluation Metrics for Regression
 Quiz: Evaluation Metrics for Regression
 R2 and Adjusted R2
 Quiz: R2 and Adjusted R2

35
Data Preprocesssing
 Dealing with Missing Values in the Data
 Replacing Missing Values
 Imputing Missing Values in data
 Working with Categorical Variables
 Working with Outliers
 Preprocessing Data for Model Building

36
Build your First ML model
 Introduction to kNearest Neighbours FREE PREVIEW
 Quiz: Introduction to kNearest Neighbours FREE PREVIEW
 Building a kNN model
 Quiz: Building a kNN model
 Selecting right value of k
 Quiz: Selecting right value of k
 How to calculate the distance?
 Quiz: How to calculate the distance
 Issue with distance based algorithms
 Quiz: Issue with distance based algorithms
 Introduction to sklearn
 Implementing kNearest Neighbours algorithm
 Quiz: Implementing kNearest Neighbours algorithm

37
Selecting Right model (Overfit/Underfit, validation, biasvariance)
 Introduction to Overfitting and Underfitting Models
 Quiz: Introduction to Overfitting and Underfitting Models
 Visualizing overfitting and underfitting using knn
 Quiz: Visualizing overfitting and underfitting using knn
 Selecting the Right Model
 What is Validation?
 Quiz: What is Validation?
 Understanding HoldOut Validation
 Quiz: Understanding HoldOut Validation
 Implementing HoldOut Validation
 Quiz: Implementing HoldOut Validation
 Understanding kfold Cross Validation
 Quiz: Understanding kfold Cross Validation
 Implementing kfold Cross Validation
 Quiz: Implementing kfold Cross Validation
 Bias Variance Tradeoff
 Quiz: Bias Variance Tradeoff

38
Linear Models
 Introduction to Linear Models
 Understanding Cost function
 Quiz: Understanding Cost function
 Understanding Gradient descent (Intuition)
 Maths behind gradient descent
 Convexity of cost function
 Quiz: Gradient Descent
 Assumptions of Linear Regression
 Preparing Data for Model Building
 Implementing Linear Regression
 Generalized Linear Models
 Quiz: Generalized Linear Models
 Introduction to Logistic Regression
 Odds Ratio
 Implementing Logistic Regression
 Quiz: Logistic Regression
 Multiclass using Logistic Regression
 Quiz: MultiClass Logistic Regression
 Challenges with Linear Regression
 Introduction to Regularisation
 Quiz: Introduction to Regularization
 Implementing Regularisation
 Coefficient estimate for ridge and lasso (Optional)

39
Project
 Problem Statement  Customer Churn Prediction
 Predicting whether a customer will churn or not

40
Assignment
 Assignment: NYC taxi trip duration prediction

41
Introduction to Dimensionality Reduction
 Introduction to Dimensionality Reduction
 Quiz: Introduction to Dimensionality Reduction
 Common Dimensionality Reduction Techniques
 Quiz: Common Dimensionality Reduction Techniques
 Missing Value Ratio
 Missing Value Ratio Implementation
 Quiz: Missing Value Ratio
 Low Variance Filter
 Low Variance Filter Implementation
 Quiz: Low Variance Filter
 High Correlation Filter
 High Correlation Filter Implementation
 Quiz: High Correlation Filter
 Backward Feature Elimination
 Backward Feature Elimination Implementation
 Quiz: Backward Feature Elimination
 Forward Feature Selection
 Forward Feature Selection Implementation
 Quiz: Forward Feature Selection

42
Decision Tree
 Introduction to Decision Trees
 Quiz: Introduction to Decision Trees
 Purity in Decision Trees
 Quiz: Purity in Decision Trees
 Terminologies Related to Decision Trees
 Quiz: Terminologies Related to Decision Trees
 How to Select the Best Split Point in Decision Trees
 Quiz: How to Select the Best Split Point in Decision Trees
 ChiSquare
 Quiz: ChiSquare
 Information Gain
 Quiz: Information Gain
 Reduction in Variance
 Quiz: Reduction in Variance
 Optimizing Performance of Decision Trees
 Quiz: Optimizing Performance of Decision Trees
 Decision Tree Implementation

43
Basics of Feature Engineering
 Introduction to Feature Engineering
 Exercise on Feature Engineering
 Overview of the module
 Feature Transformation
 Quiz: Feature Transformation
 Feature Scaling
 Quiz: Feature Scaling
 Feature Encoding
 Quiz: Feature Encoding
 Combining Sparse classes
 Quiz: Combining Sparse classes
 Feature Generation: Binning
 Feature Interaction
 Quiz: Feature Interaction
 Generating Features: Missing Values
 Frequency Encoding
 Quiz: Frequency Encoding
 Feature Engineering: Date Time Features
 Implementing DateTime Features
 Quiz: Implementing DateTime Features
 Introduction to Text Feature Engineering
 Quiz: Introduction to Text Feature Engineering
 Create Basic Text Features
 Quiz: Create Basic Text Features
 Automated Feature Engineering : Feature Tools
 Implementing Feature tools

44
Project: NYC Taxi Trip Prediction
 Exploring the NYC dataset
 Predicting the NYC taxi trip duration (Decision tree)
 Downloads Notebook and DataSets

45
Basic Ensemble Models
 Introduction to Ensemble
 Quiz: Introduction to Ensemble
 Basic Ensemble Techniques
 Quiz: Basic Ensemble Techniques
 Implementing Basic Ensemble Techniques
 Why Ensemble Models Work Well?

46
Bagging (Random Forest)
 Bootstrap Sampling
 Quiz: Bootstrap Sampling
 Introduction to Random Forest
 Quiz: Introduction to Random Forest
 Hyperparameters of Random Forest
 Quiz: Hyperparameters of Random Forest
 Implementing Random Forest

47
Project  Ensemble Model on NYC
 Predicting the NYC Taxi Trip Duration

48
Unsupervised Machine Learning
 Introduction to Clustering
 Quiz: Introduction to Clustering
 Applications of Clustering
 Evaluation Metrics for Clustering
 Quiz: Evaluation Metrics for Clustering
 Understanding KMeans
 KMeans from Scratch Implementation
 Quiz: Understanding KMeans
 Challenges with KMeans
 How to Choose Right kValue
 KMeans Implementation
 Quiz: KMeans Implementation
 Hierarchical Clustering
 Implementation Hierarchical Clustering
 Quiz: Hierarchical Clustering
 How to Define Similarity between Clusters
Certificate of Completion
Common Questions Beginners in Data Science ask

I have no programming experience. Would I need to learn Python to learn data science?
Programming is an essential aspect of being a data scientist or a data science professional. And Python is the market leader in this space. Organizations globally are adopting Python as their goto language, including big tech firms like Spotify, Netflix, Facebook, among others.
Python consistently ranks top in global data science surveys and its widespread popularity will only keep on increasing in the coming years.
Over the years, with strong data science community support, this language has obtained a dedicated library for data analysis and predictive modelling.
And don’t worry! Python is a very easy language to learn and we cover it from scratch in the course. So you don’t need to have any prior programming knowledge to master Python! 
Do I need to know statistics before taking this course?
No! Statistics is the backbone of data science and we understand that. We have designed an entire comprehensive module on statistics which we cover in the course.
We will cover both descriptive statistics and inferential statistics in detail, along with how to implement each concept in Python. And once you’ve learned and practiced statistics concepts, we will then jump to data science modelling. 
What kind of projects can I take up after this course?
You can take up a variety of data science projects! Since this covers both regression and classification algorithms, like linear regression, logistic regression and decision trees, you’ll be well equipped to apply your data science and Python skills on real world projects.
We recommend you pick up the projects we’ve curated on the DataHack platform. These projects will hone your data science skills and enhance what you have learned in the Introduction to Data Science course. 
Can I add the projects covered in this course in my resume?
Of course! Projects are among the first things a hiring manager or recruiter looks for in a data science resume. The more projects you add, the stronger your chance of landing your dream role.
As mentioned above, you can head to the DataHack platform and pick up projects from there. Practice is key in data science! 
Will this course help me clear data science interviews?
This course will help you build a solid base for data science. You will learn a new programming language (Python), the backbone of data science (statistics), and core predictive modeling techniques.
As a next step, you should go through our course  Ace Data Science Interviews.
Instructor(s)

Founder & CEO
Kunal Jain
Kunal is the Founder of Analytics Vidhya. Analytics Vidhya is one of largest Data Science community across the globe. Kunal is a data science evangelist and has a passion for teaching practical machine learning and data science. Before starting Analytics Vidhya, Kunal had worked in Analytics and Data Science for more than 12 years across various geographies and companies like Capital One and Aviva Life Insurance. He has worked with several clients and helped them build their data science capabilities from scratch. 
Neeraj Singh Sarwan
Neeraj is working at Fractal Analytics. Prior to that Neeraj was a data scientist with Analytics Vidhya. He has extensive experience in converting business problems to data problems. He has previously conducted several corporate trainings and is also an avid blogger. He's a graduate of IITBHU and will be your instructor for the Python and Modeling modules. 
Here's what our students have to say about our Introduction to Data Science course

I would definitely recommend this!
Naren Bakshi
The course covers all the 3 aspects of Data science, i.e Programming, Statistics, and the ML part. It also has 2 final projects to let you practice the newly...
Read MoreThe course covers all the 3 aspects of Data science, i.e Programming, Statistics, and the ML part. It also has 2 final projects to let you practice the newly learned skills. It's a 10/10 from me 👍
Read Less 
Just the right course for beginners like me
Umang Verma
I had been trying to get into data science on my own for some time, but this course provided a very good structure and the hands on experience needed to star...
Read MoreI had been trying to get into data science on my own for some time, but this course provided a very good structure and the hands on experience needed to start the journey in a simple manner. The lectures are easy to understand and the course covers basics of Python, Statistics and Predictive Modeling.
Read Less 
Great course for who is just getting start with python an...
Leonardo Silva
Easy going course with handson exercises.
Easy going course with handson exercises.
Read Less 
Good instructor we have got
akhil darisi
The way he exlpaining the course so good..
The way he exlpaining the course so good..
Read Less 
Very well organized
Abhilash G
very organized easy to follow course
very organized easy to follow course
Read Less 
An Excellent Course
Anshuman Yadav
I really loved this course, everything is explained in detail by all the instructors. Multiple lessons in a module, in between MCQs, regular assignments, eac...
Read MoreI really loved this course, everything is explained in detail by all the instructors. Multiple lessons in a module, in between MCQs, regular assignments, each implementation lesson is provided with the materials used in the video, jupyter files, and the datasets. Before this course, I didn't know a single thing about Data Science but now I have ample knowledge about how to work on reallife problems in Data Science. Thanks to every instructor. Looking forward to enroll in more courses by Analytics Vidhya.
Read Less 
very nice course
Amritesh Singh
Very nice course
Very nice course
Read Less
FAQ

Who should take this course?
This course is designed for people looking to learn data science. We will start by understanding the basic concepts from scratch, and then go on to solve case studies using data science concepts.

When will the classes be held in this course?
This is a self paced course, which you can take any time at your convenience over the 6 months after your purchase.

How many hours per week should I dedicate to complete the course?
If you can put between 6 to 8 hours a week, you should be able to finish the course in 4 to 6 weeks.

Do I need to install any software before starting the course ?
You will get information about all installations as part of the course.

What is the refund policy?
The fee for this course is nonrefundable.

Do I need to take the modules in a specific order?
We would highly recommend taking the course in the order in which it has been designed to gain the maximum knowledge from it.

Do I get a certificate upon completion of the course?
Yes, you will be given a certificate upon satisfactory completion of the course.

What is the fee for this course?
Fee for this course is INR 7,999

How long I can access the course?
You will be able to access the course material for six months since the start of the course.
Customer Support for our Courses & Programs
We are there for your support when you need!

Phone  10 AM  6 PM (IST) on Weekdays (Mon  Fri) on +918368808185

Email training_support@analyticsvidhya.com (revert in 1 working day)

Live interactive chat sessions on Monday to Friday between 7 PM to 8 PM IST.

Discussion Forum  answer in 1 working day