THEORITICAL UNDERSTANDING

  • How XGBoost Regressor works?

    • STEP 1: Calculate the average of the target records and assign that as the new predicted value and make a new column for it.
    • STEP 2: Calculate the residual(Difference between actual and predicted value) and make a different column for it.
    • STEP 3: Create a XGBoost tree which is the collection of residuals and split a tree using tree spliting criteria.
    • STEP 4: Now calculate the Similarity Score of all the nodes and formula is given as:
    • xgb1
    • In figure, lambda is a regularization parameter used to control overfitting by setting a criteria to auto-prunning and reduces the effect of outlier.
    • STEP 5: Calculate the gain which is given as:
    • xgb2
    • In figure ss means similarity score. -STEP 6: Calculating new prediction which is given by following formula:
    • xgb4
    • STEP 7: Calculating new residual by using following formula:
    • xgb5
    • Now iterating again and again until our set criteria will not reach.
    • Also we have to give a gamma value which defines how aggresively we want to prune our tree. Higher the value of gamma means aggresively prunning and lower the value of gamma means less aggressively prunning.
    • Whenever gamma value is less than gain value prunning happen in a xgboost.
    • How XGBoost Binary Classifier works?

      • STEP 1: Calculate the probability by taking average((0+1)/2) of the label in target column which is 0 and 1.
      • STEP 2: Calculate the residual(Difference between actual and predicted value. Here actual value is 0 and 1 and our probability is predicted value which we have calculated in step 1.) and make a different column for it.
      • STEP 3: Create a XGBoost tree which is the collection of residuals and split a tree using tree spliting criteria and write a residual which fall under each side of criteria.
      • STEP 4: Now calculate the Similarity Score of all the nodes and formula is given as:
      • xgb6
      • STEP 5: Calculate the information gain which is given as:
      • xgb2
      • STEP 6 : Calculate the base model probability using following formula:
      • xgb9
      • STEP 7: Calculating new probability which is given by following formula:
      • xgb8
      • STEP 8: Calculate new residual and repeat the above step until our residual value will not very very less.
      • STEP 9: Here for post pruning to control overfitting we have to calculate the cover value which is given as following formula. If our gain is less than cover value then we cut the branch which is called as post prunning.
      • xgb10

INTERVIEW TOPICS

  • What Are the Basic Assumption?

    • There are no such assumptions.
  • Why XGBoost is Fast?

    • Uses parrellelization concept to speed up the process. That means it uses all the processing power of your system and if you are in distributed system then all the maximum available computational power of distributed system is utilized.
    • Cache Optimization is also done in XGBoost.
    • Out of memory computatiuon is also done in XGBoost.
  • Features of XGBoost

    • It is a framework and support multiple languages such as python, R, scala, java, c++ etc.
    • It is portable i.e you can run it in windows, mac, linux.
    • Integrable with every cloud platform.
  • Why performance is good in XGBoost?

    • Due to regularization parameter it can handle overfitting very easily.
    • Auto prunning automatically happen in XGBoost.
    • Missing value is automatically treated in XGBoost.
  • Missing Values

    -XGBoost has an in-built capability to handle missing values.

    • XGBoost supports missing values by default. In tree algorithms, branch directions for missing values are learned during training. It is important to note that the gblinear booster treats missing values as zeros. During the training time XGB decides whether the missing values should fall into the right node or left node. This decision is taken to minimise the loss. If there are no missing values during the training time, the tree made a default decision to send any new missings to the right node.
  • Advantage of XGBoost

    • It has a great performance
    • It can solve complex non linear functions
    • It is better in solve any kind of ML usecases.
    • XGB consists of a number of hyper-parameters that can be tuned — a primary advantage over gradient boosting machines.
    • XGBoost has an in-built capability to handle missing values.
    • It provides various intuitive features, such as parallelisation, distributed computing, cache optimisation, and more.
  • Disadvantage of XGBoost

    • It requires some amount of parameter tuning.
    • Like any other boosting method, XGB is sensitive to outliers.
    • Unlike LightGBM, in XGB, one has to manually create dummy variable/ label encoding for categorical features before feeding them into the models.
  • Whether Feature Scaling is required?

    • It is a rule based algorithm and does not contain distance measurement so feature scaling is not required.
  • Differences between XGBoost and LightGBM.

    • XGBoost and LightGBM are the packages belonging to the family of gradient boosting decision trees (GBDTs).

    • Traditionally, XGBoost is slower than lightGBM but it achieves faster training through the Histogram binning process.
    • LightGBM is a newer tool as compared to XGBoost. Hence, it has fewer users and thus a narrow user base than XGBoost and contains less documentation.
  • What is the difference between AdaBoost and XGBoost?

    • XGBoost is flexible compared to AdaBoost as XGB is a generic algorithm to find approximate solutions to the additive modeling problem, while AdaBoost can be seen as a special case with a particular loss function.

    • Unlike XGB, AdaBoost can be implemented without the reference to gradients by reweighting the training samples based on classifications from previous learners

  • Impact of outliers?

    • Xgboost is robust to outliers.
    • XGBoost is able to handle outliers well and still produce accurate predictions. This is due to the fact that XGBoost is a gradient boosting algorithm, which means it can adapt its learning rate depending on the amount of noise in the data.
  • Performance Metrics

    • Classification

      • Confusion Matrix
      • Precision,Recall, F1 score
    • Regression

      • R2,Adjusted R2
      • MSE,RMSE,MAE