Why Statistics & Exploratory Data Analysis?

Statistics and Exploratory Data Analysis is one of the most important skills for Business Analysts.

This course covers Statistics in a very hands on and practical manner. It starts by explaining Descriptive Statistics, Measures of Central Tendency and Variance.

Once you understand these concepts, you apply them to a business problem to understand how they are used in real life.

Next, the course discusses Hypothesis testing, Inferential Statistics, Central Limit Theorem and how these concepts help us gain Business Insights to solve the problems in Business.

By the end of this course, you will:

  • Understand the importance of Statistics & exploratory data analysis.
  • Be able to perform Exploratory Data Analysis on any dataset.
  • Build and test hypothesis
  • Understand relationships between multiple variables using techniques like Correlation, Scatter plots etc.

Course curriculum

  • 2
    Problem statement and Hypothesis Generation
  • 3
    Importance of Stats and EDA
  • 4
    Understanding Data
    • Introduction to dataset
    • Quiz - Introduction to dataset
    • Why Python?
    • Reading data files into python
    • Quiz - Reading data files into python
    • Different Variable Datatypes
    • Variable Identification
    • Quiz - Variable Identification
  • 5
    Probability
    • Probability for Data Science
    • Quiz - Probability for Data Science
    • Basic Concepts of Probability
    • Quiz - Basic Concepts of Probability
    • Axioms of Probability
    • Quiz - Axioms of Probability
    • Conditional Probability
    • Quiz - Conditional Probability
  • 6
    Exploring Continuous Variable
    • Data range for continuous variables
    • Central Tendencies for continuous variables
    • Spread of the data
    • Central Tendencies and Spread of the data: Implementation
    • Quiz: Central Tendencies and Spread of data
    • KDE plots for continuous variable
    • KDE plots : Implementation
    • Overview of Distributions for Continuous Variables
    • Normal Distribution
    • Normality Check
    • Skewed Distribution
    • Skewness and Kurtosis
    • Distributions for continuous variable
    • Quiz: Distribution of Continuous variables
    • Approaching Univariate Analysis
    • Approaching Univariate Analysis: Numerical Variables
    • Quiz: Univariate analysis for Continuous variables
  • 7
    Exploring Categorical Variables
    • Central Tendencies for categorical variables
    • Understanding Discrete Distributions
    • Discrete Distributions Demonstration
    • Performing EDA on Catagorical Variables
    • Quiz: Univariate Analysis for Categorical Variables
  • 8
    Missing Values and Outliers
    • Dealing with Missing values
    • Understanding Outliers
    • Identifying Outliers in data
    • Identifying Outliers in data: Implementation
    • Quiz: Identifying Outliers in datasets
    • Quiz: Outlier treatment
  • 9
    Central Limit theorem
    • Important Terminologies
    • Central Limit Theorem
    • CLT: Implementation
    • Quiz: Central Limit Theorem
    • Confidence Interval and Margin of error
  • 10
    Exploring Continuous - Continuous Variables
    • Introduction to Bivariate Analysis
    • Covariance
    • Pearson Correlation
    • Spearman's Correlation & Kendall's Tau
    • Correlation versus Causation
    • Tabular and Graphical Methods
    • Performing Bivariate Analysis on Continuous - Continuous variables
    • Quiz: Continuous-Continuous Variables
  • 11
    Continuos Categorical
    • Tabular and Graphical Methods
    • Introduction to hypothesis Testing
    • P-Value
    • One Sample z-test
    • Two Sampled z-test
    • Quiz: Hypothesis Testing and Z scores
    • T-Test
    • T-Test vs Z-Test
    • Quiz: T tests
    • Performing Bivariate Analysis on Catagorical - Continuous variables
  • 12
    Categorical Categorical Variables
    • Tabular and Graphical Methods
    • Chi-Squared Test
    • Quiz: Chi squared tests
    • Bivariate Analysis for Categorical Categorical Variables
  • 13
    Multivariate Analysis
    • Multivariate Analysis
    • Multivariate Analysis Implementation
  • 14
    Assignment
    • Understanding the NYC Taxi Trip Duration Problem
    • Assignment: EDA
  • 15
    Where to go from here?
    • What's next?

Instructor(s)

  • Ankit Choudhary

    Ankit Choudhary

    Ankit is an IIT Bombay Graduate with a Masters and Bachelors in Electrical Engineering. He is a corporate trainer and leads the hackathon category at Analytics Vidhya. He is responsible for liaison with various companies to transform their data into data science competitions. He has conducted corporate trainings for a BFSI client on Basic and Advanced Machine Learning. He has finished in top 5 of multiple data science competitions and also conducted a workshop on how to win data science competitions at DataHack Summit 2019. He has previously worked as a lead decision scientist for Indian National Congress deploying statistical models (Segmentation, K-Nearest Neighbours) to help party leadership/Team make data-driven decisions. His motivation lies in putting data at the heart of business for data-driven decision making.
  • Aishwarya Singh

    Aishwarya Singh

    Aishwarya is currently working as a Data Scientist at Analytics Vidhya. She is one of the primary content curators and an instructor for Analytics Vidhya’s most popular course – Applied Machine Learning. She is also an avid reader and blogger who loves exploring the endless world of data science and artificial intelligence. She has written over 70 articles in recent years on various machine learning and deep learning topics and applications.

Customer Support for our Courses & Programs

We are there for your support when you need!

  • Phone - 10 AM - 6 PM (IST) on Weekdays (Mon - Fri) on +91-8368808185

  • Email [email protected] (revert in 1 working day)

  • Discussion Forum - answer in 1 working day