THEORITICAL UNDERSTANDING

  • Boosting Algorithm

    • It is a sequential learning process where next model will depend on the output of previous model.
    • All the models doesn’t contribute equally. Models that make mistake will get high weightage(importance) and less weightage(importance) for correct predicting model.
    • Here Decision tree are called as Stumps(One root node and two leaf node).
  • How AdaBoost Regressor work?

    • STEP 1: Assign a equal weights to all features by assighning a weight as 1/N where N is the total number of records so that when we sum a weight it should be equal to 1.
    • STEP 2: Decision tree as STUMP is created by usual process which I have described in my previous post of decision tree LINK.
    • STEP 3: Now if our STUMP which we created in a previous step predicted a incorrect output then we give a higher weight to this output so that in next iteration that will be corrected. This process is completed in following steps:
      1. Calculate a Total Error(TE) which is the summation of weight of that model which our STUMP has made.
      1. Calculate the Performance(p) of STUMP by using formula given formula:
        • ada1
      1. Updating the weight of incorrectly classified records is done by using following formula which give the higher weight.
        • ada2
      1. Updating the weights of the correctly classified records is done by using following formula which gives the lower weight.
        • ada3
    • STEP 4: Update a new weight and normalize it by dividing each calculated weight by the sum of all calculated weight so that SUMMATION OF ALL WEIGHT SHOULD BE 1 and formula is given as:
    • ada4

    • STEP 5: Now create a new dataset using a normalized weight and create a bucket so that whenever iteration happens incorrect record will capture and populate in new dataset. By using this new dataset again process starts from STEP 1.

INTERVIEW TOPICS

  • 1. What Are the Basic Assumption?

    • There are no such assumptions.
  • Missing Values

    • Adaboost can handle mising values
  • 2. Advantages of Adaboost

    • It doesn’t overfit
    • It has few parameters to tune.
  • 4. Whether Feature Scaling is required?

    • Since it is a rule based model and distance calculation doesnot require so there is no need of feature scaling.
  • 6. Impact of outliers?

    • Adaboost is sensitive to outliers.
  • Types of Problems it can solve(Supervised)

    • Classification
    • Regression
  • Performance Metrics

    • Classification

      • Confusion Matrix
      • Precision,Recall, F1 score
  • Regression
    • R2,Adjusted R2
    • MSE,RMSE,MAE