Gradient Boosting Regressor.
All about Gradient Boosting Regressor and its associated Interview questions.
- THEORITICAL UNDERSTANDING
- INTERVIEW TOPIC
- 1. What Are the Basic Assumption?
- Missing Values
- 2. Advantages of Gradient Boost
- 3. Disadvantages of Gradient Boosting
- 4. Whether Feature Scaling is required?
- 6. Impact of outliers?
- 7. What elements are involbed in Gradient Boosting algorithm?
- 8. How can we improve Gradient Boosting algorithm?
- Difference between Gradient Boosting algorithm and Random forest algorithm?
- Difference between AdaBoost and GradientBoost algorithm?
THEORITICAL UNDERSTANDING
-
- STEP 1: Calculate the average of the target records and assign that as the new predicted value and make a new column for it.
- STEP 2: Calculate the residual(Difference between actual and predicted value) and make a different column for it.
- STEP 3: Now our residual value is a new target column so that model will fit on that column and predict next residual value.
- STEP 4: Update the default predicted value using following formula:
- Our model will iterate untill how many tree we want to grow.
-
-
- There are no such assumptions.
-
- GradientBoost cannot handle missing values.
-
- It has a great performance
- It can solve complex non linear functions
- It is better in solving any kind of ML usecases.
-
- It requires some amount of parameter tuning.
-
- It is a rule based algorithm and it does not use distance parameter, so feature scaling is not required.
-
- Gradient Boosting is robust to outliers because decision trees split things into lines and do not distinguish how distant a point is from a line. In general, the nodes are defined by the sample proportions in each split zone(not by their absolute value.)
-
- A loss function for Optimization.
- A weak learner(decision tree) for making predictions.
- An additive model to add weak learners to minimize the loss function.
-
- Pick a lower learning rate(Shrinkage), between 0.1 and 0.3.
- Have a tree constraints on number of trees, tree depth, minimum improvement in loss and number of observation per split.
- Lower the learning rate and increase the number of decision tree/estimators proportionally to acheive models that are more robust in nature.
- Establishing Penalized learning.
- Implementing random sampling.
- Utilizing Regularization.
-
- GB are more prone to overfitting if given data is noisy but this is not the case in RF algorithm
- GB takes longer to train since their decision trees are built sequentially but in random forest this is not the case since decision trees are made parallelly.
- GB algorithm is harder to tune.
- GB utilizes weak learner to get stron prediction but random forest uses maximum voting and taking average to get the prediction.
- Random forest is more prone to being biased.
- RF donot use sequential approach donot deal with unbalanced datasets.
- RF utilizes fully grown decision trees.
-
- Gb train learners based upon minimizing the loss function of a learner while AdaBoost train by concentrating on misclassified observations.
- Weak learners in AdaBoost are very basic form of decision tree that are STUMP while in Gradient Boosting they are more complex.
- All the learners in GB have equal weights but in AdaBoost final prediction is based a majority vote of all the weak learners predictions weighted by their individual accuracy.