Theoretical Understanding:

  • It is mainly based on the Naive Bayes Theorem which is given below:

naive1

  • For a given independent variables X and target variables Y our Naive Bayes Therom modifies as below and works as a Naive Bayes Algorithm:

naive2

  • Let’s understand this algorithm using example, there is training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it.
  • Steps in algorithm:
    • STEP 1: Convert the data set into a frequency table
    • STEP 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.

    naive33

    • STEP 3: Now, use Naive Bayesian equation to calculate the final probability for each class. The class with the highest final probability is the outcome of prediction.

1. What Are the Basic Assumption?

Features Are Independent

2. Advantages

  1. Work Very well with many number of features
  2. Works Well with Large training Dataset
  3. It converges faster when we are training the model
  4. It also performs well with categorical features

3. Disadvantages

  1. Correlated features affects performance

4. Whether Feature Scaling is required?

No

5. Impact of Missing Values?

Naive Bayes can handle missing data. Attributes are handled separately by the algorithm at both model construction time and prediction time. As such, if a data instance has a missing value for an attribute, it can be ignored while preparing the model, and ignored when a probability is calculated for a class value

6. Impact of outliers?

It is usually robust to outliers

Different Problem statement you can solve using Naive Baye’s

  1. Sentiment Analysis
  2. Spam classification
  3. twitter sentiment analysis
  4. document categorization