Why Statistics & Exploratory Data Analysis?

Statistics and Exploratory Data Analysis is one of the most important skills for Business Analysts.

This course covers Statistics in a very hands on and practical manner. It starts by explaining Descriptive Statistics, Measures of Central Tendency and Variance.

Once you understand these concepts, you apply them to a business problem to understand how they are used in real life.

Next, the course discusses Hypothesis testing, Inferential Statistics, Central Limit Theorem and how these concepts help us gain Business Insights to solve the problems in Business.

By the end of this course, you will:

• Understand the importance of Statistics & exploratory data analysis.
• Be able to perform Exploratory Data Analysis on any dataset.
• Build and test hypothesis
• Understand relationships between multiple variables using techniques like Correlation, Scatter plots etc.

Course curriculum

• 1
Introduction to the Course
• What is Exploratory Data Analysis?
• Overview of the Course
• Course Handouts
• 2
Problem statement and Hypothesis Generation
• Defining the Problem statement
• Introduction to Hypothesis Generation
• Performing Hypothesis generation
• List of hypothesis
• Data Collection/Extraction
• Quiz - Data Collection/Extraction
• 3
Importance of Stats and EDA
• Introduction to Exploratory Data Analysis & Data Insights
• Quiz - Introduction to Exploratory Data Analysis & Data Insights
• Role of Statistics in EDA
• Descriptive Statistics
• Inferential Statistics
• Quiz - Descriptive and Inferential Statistics
• 4
Understanding Data
• Introduction to dataset
• Quiz - Introduction to dataset
• Why Python?
• Reading data files into python
• Quiz - Reading data files into python
• Different Variable Datatypes
• Variable Identification
• Quiz - Variable Identification
• 5
Probability
• Probability for Data Science
• Quiz - Probability for Data Science
• Basic Concepts of Probability
• Quiz - Basic Concepts of Probability
• Axioms of Probability
• Quiz - Axioms of Probability
• Conditional Probability
• Quiz - Conditional Probability
• 6
Exploring Continuous Variable
• Data range for continuous variables
• Central Tendencies for continuous variables
• Spread of the data
• Central Tendencies and Spread of the data: Implementation
• Quiz: Central Tendencies and Spread of data
• KDE plots for continuous variable
• KDE plots : Implementation
• Overview of Distributions for Continuous Variables
• Normal Distribution
• Normality Check
• Distributions for continuous variable
• Quiz: Distribution of Continuous variables
• Skewed Distribution
• Skewness and Kurtosis
• Approaching Univariate Analysis
• Approaching Univariate Analysis: Numerical Variables
• Quiz: Univariate analysis for Continuous variables
• 7
Exploring Categorical Variables
• Central Tendencies for categorical variables
• Understanding Discrete Distributions
• Discrete Distributions Demonstration
• Performing EDA on Catagorical Variables
• Quiz: Univariate Analysis for Categorical Variables
• 8
Missing Values and Outliers
• Dealing with Missing values
• Understanding Outliers
• Identifying Outliers in data
• Identifying Outliers in data: Implementation
• Quiz: Identifying Outliers in datasets
• Quiz: Outlier treatment
• 9
Central Limit theorem
• Important Terminologies
• Central Limit Theorem
• CLT: Implementation
• Quiz: Central Limit Theorem
• Confidence Interval and Margin of error
• 10
Exploring Continuous - Continuous Variables
• Introduction to Bivariate Analysis
• Covariance
• Pearson Correlation
• Spearman's Correlation & Kendall's Tau
• Correlation versus Causation
• Tabular and Graphical Methods
• Performing Bivariate Analysis on Continuous - Continuous variables
• Quiz: Continuous-Continuous Variables
• 11
Continuos Categorical
• Tabular and Graphical Methods
• Introduction to hypothesis Testing
• P-Value
• One Sample z-test
• Two Sampled z-test
• Quiz: Hypothesis Testing and Z scores
• T-Test
• T-Test vs Z-Test
• Quiz: T tests
• Performing Bivariate Analysis on Catagorical - Continuous variables
• 12
Categorical Categorical Variables
• Tabular and Graphical Methods
• Chi-Squared Test
• Quiz: Chi squared tests
• Bivariate Analysis for Categorical Variables
• 13
Multivariate Analysis
• Multivariate Analysis
• Multivariate Analysis Implementation
• 14
Assignment
• Understanding the NYC Taxi Trip Duration Problem
• Assignment: EDA
• 15
Where to go from here?
• What's next?

Project 1 - Customer Churn Dataset

Project Description - A Bank wants to take care of customer retention for their product; savings accounts. The bank wants to identify customers likely to churn balances below the minimum balance in the next quarter. The bank has customers’ information such as age, gender, demographics along with their transactions with the bank. The aim of this project is to predict the propensity to churn for each customer.

Project 2 - NYC Taxi Trip Duration Prediction

Problem description: Uber, Lyft, Ola and many more online ride hailing services are trying hard to use their extensive data to create data products such as pricing engines, driver allotment etc. To improve the efficiency of taxi dispatching systems for such services, it is important to be able to predict how long a driver will have his taxi occupied or in other words the trip duration. This project will cover techniques to extract important features and accurately predict trip duration for taxi trips in New York using data from TLC commission New York.

Certificate of Completion

Upon successful completion of the course, you will be provided a block chain enabled certificate by Analytics Vidhya with lifetime validity.

Instructor(s)

• Ankit Choudhary

Ankit is an IIT Bombay Graduate with a Masters and Bachelors in Electrical Engineering. He is a corporate trainer and leads the hackathon category at Analytics Vidhya. He is responsible for liaison with various companies to transform their data into data science competitions. He has conducted corporate trainings for a BFSI client on Basic and Advanced Machine Learning. He has finished in top 5 of multiple data science competitions and also conducted a workshop on how to win data science competitions at DataHack Summit 2019. He has previously worked as a lead decision scientist for Indian National Congress deploying statistical models (Segmentation, K-Nearest Neighbours) to help party leadership/Team make data-driven decisions. His motivation lies in putting data at the heart of business for data-driven decision making.
• Aishwarya Singh

Aishwarya is currently working as a Data Scientist at Analytics Vidhya. She is one of the primary content curators and an instructor for Analytics Vidhya’s most popular course – Applied Machine Learning. She is also an avid reader and blogger who loves exploring the endless world of data science and artificial intelligence. She has written over 70 articles in recent years on various machine learning and deep learning topics and applications.

Customer Support for our Courses & Programs

We are there for your support when you need!

• Phone - 10 AM - 6 PM (IST) on Weekdays (Mon - Fri) on +91-8368808185

• Email [email protected] (revert in 1 working day)

• Discussion Forum - answer in 1 working day