About Introduction to Data Science Course

Getting Started with Data Science

 What is Data Science? Why has it become so popular recently? What are some of the popular data science applications? And more importantly, how can you get started with learning data science from scratch?

 Are you looking for the answer to these questions? Frustrated by the lack of structured data science learning? You’ve come to the right place!

 Data science is ubiquitous right now. Organizations are splurging to integrate data science solutions in their daily processes. It’s a great time to learn data science and get ready for your first industry role!

 This course, curated by experienced data science instructors and experts at Analytics Vidhya, will cover the core concepts you need to know to crack data science interviews and become a data scientist!

Why pursue Data Science:

  • Data Science is ubiquitous! It is the hottest field in the industry right now
  • Data Scientists are one of the most demanded professionals
  • There are so many data science algorithms to build predictive models, such as linear regression, logistic regression, decision trees and random forests. Keep learning, keep growing!
  • The potential of data science is limitless - spanning across industries, roles and functions.

What will you learn in the ‘Introduction to Data Science’ course?

  • 387 Lesssons
  • 4 Real Life Projects from Data Science Industry
  • 60+ Hours of Comprehensive content
  • Live Q & A Session
Download detailed Curriculum
What will you learn in the ‘Introduction to Data Science’ course?

What will you learn in the ‘Introduction to Data Science’ course?

  • Understand What Data Science is

  • Applications of Data Science

  • Data Science Terminologies

  • Python for Data Science

  • Core Statistics for Data Science

  • Probability Concepts

  • Introduction to Machine Learning Algorithms

  • Hands-on examples and multiple real-world industry-relevant data science projects

Tools covered in Introduction to Data Science

Projects for Introduction to Data Science

Customer Churn Prediction
Customer Churn Prediction
Sales Prediction for large Super Markets
Sales Prediction for large Super Markets
Predict survivors from Titanic (In-Class)
Predict survivors from Titanic (In-Class)
NYC Taxi Trip Duration Prediction
NYC Taxi Trip Duration Prediction


Download Project Details

Course curriculum

  • 1
    Introduction to Data Science
  • 2
    Setting Up the Systems
    • Installation steps for Windows
    • Installation steps for Linux
    • Installation steps for Mac
  • 3
    Introduction to Python
    • Introduction to Python
    • Introduction to Jupyter Notebook
  • 4
    Variables and Data Types
    • Introduction to Variables
    • Implementing Variables in Python
    • Quiz: Variables and Data Types
  • 5
    • Introduction to Operators
    • Implementing Operators in Python
    • Quiz: Operators
  • 6
    Conditional Statements
    • Introduction to Conditional Statements
    • Implementing Conditional Statements in Python
    • Quiz: Conditional Statements
  • 7
    Looping Constructs
    • Introduction to Looping Constructs
    • Implementing Loops in Python
    • Quiz: Loops in Python
    • Break, Continue and Pass Statements
    • Quiz: Break, Continue and Pass Statement
  • 8
    Data Structures
    • Introduction to Data Structures
    • List and Tuple
    • Implementing List in Pyhton
    • Quiz: Lists
    • List - Project in Python
    • Implementing Tuple in Python
    • Quiz: Tuple
    • Introduction to Sets
    • Implementing Sets in Python
    • Quiz: Sets
    • Introduction to Dictionary
    • Implementing Dictionary in Python
    • Quiz: Dictionary
    • Assignment: Data Structures
  • 9
    String Manipulation
    • Introduction to String Manipulation
    • Quiz: String Manipulation
  • 10
    • Introduction to Functions
    • Implementing Functions in Python
    • Quiz: Functions
    • Lambda Expression
    • Quiz: Lambda Expressions
    • Recursion
    • Implementing Recursion in Python
    • Quiz: Recursion
  • 11
    Modules, Packages and Standard Libraries
    • Introduction to Modules
    • Modules: Intuition
    • Introduction to Packages
    • Standard Libraries in Python
    • User Defined Libraries in Python
    • Quiz: Modules, Packages and Standard Libraries
  • 12
    Handling Text Files in Python
    • Handling Text Files in Python
    • Quiz: Handling Text Files
    • Assignment
  • 13
    Introduction to Python Libraries for Data Science
    • Important Libraries for Data Science
    • Quiz: Important Libraries for Data Science
  • 14
    Python Libraries for Data Science
    • Basics of Numpy in Python
    • Basics of Scipy in Python
    • Basics of Pandas in Python
    • Basics of Matplotlib in Python
    • Basics of Scikit-Learn in Python
    • Basics of Statsmodels in Python
  • 15
    Reading Data Files in Python
    • Reading Data in Python
    • Reading CSV files in Python
    • Reading Big CSV Files in Python
    • Quiz: Reading CSV files in Python
    • Reading Excel & Spreadsheet files in Python
    • Quiz: Reading Excel & Spreadsheet files in Python
    • Reading JSON files in Python
    • Quiz: Reading JSON files in Python
    • Assignment: Reading Data Files in Python
  • 16
    Preprocessing, Subsetting and Modifying Pandas DataFrames
    • Subsetting and Modifying Data in Python
    • Overview of Subsetting in Pandas I
    • Overview of Subsetting in Pandas II
    • Subsetting based on Position
    • Subsetting based on Label
    • Subsetting based on Value
    • Quiz: Subsetting DataFrames
    • Modifying data in Pandas
    • Quiz: Modifying DataFrames
    • Assignment: Subsetting and Modifying Pandas Dataframes
  • 17
    Sorting and Aggregating Data in Pandas
    • Preprocessing, Sorting and Aggregating Data
    • Sorting the DataFrame
    • Quiz: Sorting DataFrame
    • Concatenating DataFrames in Pandas
    • Concept of SQL-Like Joins in Pandas
    • Implementing SQL-Like Joins in Pandas
    • Quiz: Joins in Pandas
    • Aggregating and Summarizing DataFrames
    • Preprocessing TimeSeries Data
    • Quiz: Preprocessing TimeSeries Data
    • Assignment: Sorting and Aggregating Data in Pandas
  • 18
    Visualizing Patterns and Trends in Data
    • Visualizing Trends & Pattern in Data
    • Basics of Matplotlib
    • Data Visualization with Matplotlib
    • Quiz: Matplotlib
    • Basics of Seaborn
    • Data Visualization with Seaborn
    • Quiz: Seaborn
    • Assignment: Visualizing Patterns and Trends in Data
  • 19
    Machine Learning Lifecycle
    • 7 Steps of Machine Learning Lifecycle
    • Introduction to Predictive Modeling
  • 20
    Problem statement and Hypothesis Generation
    • Defining the Problem statement
    • Introduction to Hypothesis Generation
    • Performing Hypothesis generation
    • Quiz - Performing Hypothesis generation
    • List of hypothesis
    • Data Collection/Extraction
    • Quiz - Data Collection/Extraction
  • 21
    Importance of Stats and EDA
    • Introduction to Exploratory Data Analysis & Data Insights
    • Quiz - Introduction to Exploratory Data Analysis & Data Insights
    • Role of Statistics in EDA
    • Descriptive Statistics
    • Inferential Statistics
    • Quiz - Descriptive and Inferential Statistics
  • 22
    Understanding Data
    • Introduction to dataset
    • Quiz - Introduction to dataset
    • Reading data files into python
    • Quiz - Reading data files into python
    • Different Variable Datatypes
    • Variable Identification
    • Quiz - Variable Identification
  • 23
    Basics of Probability
    • Probability for Data Science
    • Quiz - Probability for Data Science
    • Basic Concepts of Probability
    • Quiz - Basic Concepts of Probability
    • Axioms of Probability
    • Quiz - Axioms of Probability
    • Conditional Probability
    • Quiz - Conditional Probability
  • 24
    Exploring Continuous Variable
    • Data range for continuous variables
    • Central Tendencies for continuous variables
    • Spread of the data
    • Central Tendencies and Spread of the data: Implementation
    • Quiz: Central Tendencies and Spread of data
    • KDE plots for continuous variable
    • KDE plots : Implementation
    • Overview of Distributions for Continuous Variables
    • Normal Distribution
    • Normality Check
    • Skewed Distribution
    • Skewness and Kurtosis
    • Distributions for continuous variable
    • Quiz: Distribution of Continuous variables
    • Approaching Univariate Analysis
    • Approaching Univariate Analysis: Numerical Variables
    • Quiz: Univariate analysis for Continuous variables
  • 25
    Exploring Categorical Variables
    • Central Tendencies for categorical variables
    • Understanding Discrete Distributions
    • Discrete Distributions Demonstration
    • Performing EDA on Catagorical Variables
    • Quiz: Univariate Analysis for Categorical Variables
  • 26
    Missing Values and Outliers
    • Dealing with Missing values
    • Understanding Outliers
    • Identifying Outliers in data
    • Identifying Outliers in data: Implementation
    • Quiz: Identifying Outliers in datasets
    • Quiz: Outlier treatment
  • 27
    Central Limit Theorem
    • Important Terminologies
    • Central Limit Theorem
    • CLT: Implementation
    • Quiz: Central Limit Theorem
    • Confidence Interval and Margin of error
  • 28
    Exploring Continuous - Continuous Variables
    • Introduction to Bivariate Analysis
    • Covariance
    • Pearson Correlation
    • Spearman's Correlation & Kendall's Tau
    • Correlation versus Causation
    • Tabular and Graphical Methods
    • Performing Bivariate Analysis on Continuous - Continuous variables
    • Quiz: Continuous-Continuous Variables
  • 29
    Continuos Categorical
    • Tabular and Graphical Methods
    • Introduction to hypothesis Testing
    • P-Value
    • One Sample z-test
    • Two Sampled z-test
    • Quiz: Hypothesis Testing and Z scores
    • T-Test
    • T-Test vs Z-Test
    • Quiz: T tests
    • Performing Bivariate Analysis on Catagorical - Continuous variables
  • 30
    Categorical Categorical Variables
    • Tabular and Graphical Methods
    • Chi-Squared Test
    • Quiz: Chi squared tests
    • Bivariate Analysis for Categorical Categorical Variables
  • 31
    Multivariate Analysis
    • Multivariate Analysis
    • Multivariate Analysis Implementation
  • 32
    • Understanding the NYC Taxi Trip Duration Problem
    • Assignment: EDA
  • 33
    Build your first Predictive Model
  • 34
    Evaluation Metrics
    • Introduction to Evaluation Metrics
    • Quiz: Introduction to Evaluation Metrics
    • Confusion Matrix
    • Quiz: Confusion Matrix
    • Accuracy
    • Quiz: Accuracy
    • Alternatives of Accuracy
    • Quiz: Alternatives of Accuracy
    • Precision and Recall
    • Quiz: Precision and Recall
    • Thresholding
    • Quiz: Thresholding
    • AUC-ROC
    • Quiz: AUC-ROC
    • Log loss
    • Quiz: Log loss
    • Evaluation Metrics for Regression
    • Quiz: Evaluation Metrics for Regression
    • R2 and Adjusted R2
    • Quiz: R2 and Adjusted R2
  • 35
    Data Preprocesssing
    • Dealing with Missing Values in the Data
    • Replacing Missing Values
    • Imputing Missing Values in data
    • Working with Categorical Variables
    • Working with Outliers
    • Preprocessing Data for Model Building
  • 36
    Build your First ML model
  • 37
    Selecting Right model (Overfit/Underfit, validation, bias-variance)
    • Introduction to Overfitting and Underfitting Models
    • Quiz: Introduction to Overfitting and Underfitting Models
    • Visualizing overfitting and underfitting using knn
    • Quiz: Visualizing overfitting and underfitting using knn
    • Selecting the Right Model
    • What is Validation?
    • Quiz: What is Validation?
    • Understanding Hold-Out Validation
    • Quiz: Understanding Hold-Out Validation
    • Implementing Hold-Out Validation
    • Quiz: Implementing Hold-Out Validation
    • Understanding k-fold Cross Validation
    • Quiz: Understanding k-fold Cross Validation
    • Implementing k-fold Cross Validation
    • Quiz: Implementing k-fold Cross Validation
    • Bias Variance Tradeoff
    • Quiz: Bias Variance Tradeoff
  • 38
    Linear Models
    • Introduction to Linear Models
    • Understanding Cost function
    • Quiz: Understanding Cost function
    • Understanding Gradient descent (Intuition)
    • Maths behind gradient descent
    • Convexity of cost function
    • Quiz: Gradient Descent
    • Assumptions of Linear Regression
    • Preparing Data for Model Building
    • Implementing Linear Regression
    • Generalized Linear Models
    • Quiz: Generalized Linear Models
    • Introduction to Logistic Regression
    • Odds Ratio
    • Implementing Logistic Regression
    • Quiz: Logistic Regression
    • Multiclass using Logistic Regression
    • Quiz: Multi-Class Logistic Regression
    • Challenges with Linear Regression
    • Introduction to Regularisation
    • Quiz: Introduction to Regularization
    • Implementing Regularisation
    • Coefficient estimate for ridge and lasso (Optional)
  • 39
    • Problem Statement - Customer Churn Prediction
    • Predicting whether a customer will churn or not
  • 40
    • Assignment: NYC taxi trip duration prediction
  • 41
    Introduction to Dimensionality Reduction
    • Introduction to Dimensionality Reduction
    • Quiz: Introduction to Dimensionality Reduction
    • Common Dimensionality Reduction Techniques
    • Quiz: Common Dimensionality Reduction Techniques
    • Missing Value Ratio
    • Missing Value Ratio Implementation
    • Quiz: Missing Value Ratio
    • Low Variance Filter
    • Low Variance Filter Implementation
    • Quiz: Low Variance Filter
    • High Correlation Filter
    • High Correlation Filter Implementation
    • Quiz: High Correlation Filter
    • Backward Feature Elimination
    • Backward Feature Elimination Implementation
    • Quiz: Backward Feature Elimination
    • Forward Feature Selection
    • Forward Feature Selection Implementation
    • Quiz: Forward Feature Selection
  • 42
    Decision Tree
    • Introduction to Decision Trees
    • Quiz: Introduction to Decision Trees
    • Purity in Decision Trees
    • Quiz: Purity in Decision Trees
    • Terminologies Related to Decision Trees
    • Quiz: Terminologies Related to Decision Trees
    • How to Select the Best Split Point in Decision Trees
    • Quiz: How to Select the Best Split Point in Decision Trees
    • Chi-Square
    • Quiz: Chi-Square
    • Information Gain
    • Quiz: Information Gain
    • Reduction in Variance
    • Quiz: Reduction in Variance
    • Optimizing Performance of Decision Trees
    • Quiz: Optimizing Performance of Decision Trees
    • Decision Tree Implementation
  • 43
    Basics of Feature Engineering
    • Introduction to Feature Engineering
    • Exercise on Feature Engineering
    • Overview of the module
    • Feature Transformation
    • Quiz: Feature Transformation
    • Feature Scaling
    • Quiz: Feature Scaling
    • Feature Encoding
    • Quiz: Feature Encoding
    • Combining Sparse classes
    • Quiz: Combining Sparse classes
    • Feature Generation: Binning
    • Feature Interaction
    • Quiz: Feature Interaction
    • Generating Features: Missing Values
    • Frequency Encoding
    • Quiz: Frequency Encoding
    • Feature Engineering: Date Time Features
    • Implementing DateTime Features
    • Quiz: Implementing DateTime Features
    • Introduction to Text Feature Engineering
    • Quiz: Introduction to Text Feature Engineering
    • Create Basic Text Features
    • Quiz: Create Basic Text Features
    • Automated Feature Engineering : Feature Tools
    • Implementing Feature tools
  • 44
    Project: NYC Taxi Trip Prediction
    • Exploring the NYC dataset
    • Predicting the NYC taxi trip duration (Decision tree)
    • Downloads Notebook and DataSets
  • 45
    Basic Ensemble Models
    • Introduction to Ensemble
    • Quiz: Introduction to Ensemble
    • Basic Ensemble Techniques
    • Quiz: Basic Ensemble Techniques
    • Implementing Basic Ensemble Techniques
    • Why Ensemble Models Work Well?
  • 46
    Bagging (Random Forest)
    • Bootstrap Sampling
    • Quiz: Bootstrap Sampling
    • Introduction to Random Forest
    • Quiz: Introduction to Random Forest
    • Hyper-parameters of Random Forest
    • Quiz: Hyper-parameters of Random Forest
    • Implementing Random Forest
  • 47
    Project - Ensemble Model on NYC
    • Predicting the NYC Taxi Trip Duration
  • 48
    Unsupervised Machine Learning
    • Introduction to Clustering
    • Quiz: Introduction to Clustering
    • Applications of Clustering
    • Evaluation Metrics for Clustering
    • Quiz: Evaluation Metrics for Clustering
    • Understanding K-Means
    • K-Means from Scratch Implementation
    • Quiz: Understanding K-Means
    • Challenges with K-Means
    • How to Choose Right k-Value
    • K-Means Implementation
    • Quiz: K-Means Implementation
    • Hierarchical Clustering
    • Implementation Hierarchical Clustering
    • Quiz: Hierarchical Clustering
    • How to Define Similarity between Clusters

Certificate of Completion

Upon successful completion of the course, you will be provided a block chain enabled certificate by Analytics Vidhya with lifetime validity.
Certificate of Completion

Common Questions Beginners in Data Science ask

  • I have no programming experience. Would I need to learn Python to learn data science?

    Programming is an essential aspect of being a data scientist or a data science professional. And Python is the market leader in this space. Organizations globally are adopting Python as their go-to language, including big tech firms like Spotify, Netflix, Facebook, among others.

    Python consistently ranks top in global data science surveys and its widespread popularity will only keep on increasing in the coming years.

    Over the years, with strong data science community support, this language has obtained a dedicated library for data analysis and predictive modelling.

    And don’t worry! Python is a very easy language to learn and we cover it from scratch in the course. So you don’t need to have any prior programming knowledge to master Python!

  • Do I need to know statistics before taking this course?

    No! Statistics is the backbone of data science and we understand that. We have designed an entire comprehensive module on statistics which we cover in the course.

    We will cover both descriptive statistics and inferential statistics in detail, along with how to implement each concept in Python. And once you’ve learned and practiced statistics concepts, we will then jump to data science modelling.

  • What kind of projects can I take up after this course?

    You can take up a variety of data science projects! Since this covers both regression and classification algorithms, like linear regression, logistic regression and decision trees, you’ll be well equipped to apply your data science and Python skills on real world projects.

    We recommend you pick up the projects we’ve curated on the DataHack platform. These projects will hone your data science skills and enhance what you have learned in the Introduction to Data Science course.

  • Can I add the projects covered in this course in my resume?

    Of course! Projects are among the first things a hiring manager or recruiter looks for in a data science resume. The more projects you add, the stronger your chance of landing your dream role.

    As mentioned above, you can head to the DataHack platform and pick up projects from there. Practice is key in data science!

  • Will this course help me clear data science interviews?

    This course will help you build a solid base for data science. You will learn a new programming language (Python), the backbone of data science (statistics), and core predictive modeling techniques.

    As a next step, you should go through our course - Ace Data Science Interviews.


  • Kunal Jain

    Founder & CEO

    Kunal Jain

    Kunal is the Founder of Analytics Vidhya. Analytics Vidhya is one of largest Data Science community across the globe. Kunal is a data science evangelist and has a passion for teaching practical machine learning and data science. Before starting Analytics Vidhya, Kunal had worked in Analytics and Data Science for more than 12 years across various geographies and companies like Capital One and Aviva Life Insurance. He has worked with several clients and helped them build their data science capabilities from scratch.
  • Neeraj Singh Sarwan

    Neeraj Singh Sarwan

    Neeraj is working at Fractal Analytics. Prior to that Neeraj was a data scientist with Analytics Vidhya. He has extensive experience in converting business problems to data problems. He has previously conducted several corporate trainings and is also an avid blogger. He's a graduate of IIT-BHU and will be your instructor for the Python and Modeling modules.

Here's what our students have to say about our Introduction to Data Science course

  • I would definitely recommend this!

    Naren Bakshi

    The course covers all the 3 aspects of Data science, i.e Programming, Statistics, and the ML part. It also has 2 final projects to let you practice the newly...

    Read More

    The course covers all the 3 aspects of Data science, i.e Programming, Statistics, and the ML part. It also has 2 final projects to let you practice the newly learned skills. It's a 10/10 from me 👍

    Read Less
  • Just the right course for beginners like me

    Umang Verma

    I had been trying to get into data science on my own for some time, but this course provided a very good structure and the hands on experience needed to star...

    Read More

    I had been trying to get into data science on my own for some time, but this course provided a very good structure and the hands on experience needed to start the journey in a simple manner. The lectures are easy to understand and the course covers basics of Python, Statistics and Predictive Modeling.

    Read Less
  • Great course for who is just getting start with python an...

    Leonardo Silva

    Easy going course with hands-on exercises.

    Easy going course with hands-on exercises.

    Read Less
  • Good instructor we have got

    akhil darisi

    The way he exlpaining the course so good..

    The way he exlpaining the course so good..

    Read Less
  • Very well organized

    Abhilash G

    very organized easy to follow course

    very organized easy to follow course

    Read Less
  • An Excellent Course

    Anshuman Yadav

    I really loved this course, everything is explained in detail by all the instructors. Multiple lessons in a module, in between MCQs, regular assignments, eac...

    Read More

    I really loved this course, everything is explained in detail by all the instructors. Multiple lessons in a module, in between MCQs, regular assignments, each implementation lesson is provided with the materials used in the video, jupyter files, and the datasets. Before this course, I didn't know a single thing about Data Science but now I have ample knowledge about how to work on real-life problems in Data Science. Thanks to every instructor. Looking forward to enroll in more courses by Analytics Vidhya.

    Read Less
  • very nice course

    Amritesh Singh

    Very nice course

    Very nice course

    Read Less


  • Who should take this course?

    This course is designed for people looking to learn data science. We will start by understanding the basic concepts from scratch, and then go on to solve case studies using data science concepts.

  • When will the classes be held in this course?

    This is a self paced course, which you can take any time at your convenience over the 6 months after your purchase.

  • How many hours per week should I dedicate to complete the course?

    If you can put between 6 to 8 hours a week, you should be able to finish the course in 4 to 6 weeks.

  • Do I need to install any software before starting the course ?

    You will get information about all installations as part of the course.

  • What is the refund policy?

    The fee for this course is non-refundable.

  • Do I need to take the modules in a specific order?

    We would highly recommend taking the course in the order in which it has been designed to gain the maximum knowledge from it.

  • Do I get a certificate upon completion of the course?

    Yes, you will be given a certificate upon satisfactory completion of the course.

  • What is the fee for this course?

    Fee for this course is INR 7,999

  • How long I can access the course?

    You will be able to access the course material for six months since the start of the course.

Customer Support for our Courses & Programs

We are there for your support when you need!

  • Phone - 10 AM - 6 PM (IST) on Weekdays (Mon - Fri) on +91-8368808185

  • Email [email protected] (revert in 1 working day)

  • Discussion Forum - answer in 1 working day