Photo by Willian Justen de Vasconcellos on Unsplash

Using XGBoost

2 min readSep 26, 2020

XG Boost stands for extreme gradient boosting. It is an extremely powerful model that has won so many data science competitions that it has become an inside joke. I sought to explore the model more in depth in this blog since I would be able to learn to further employ it in my Machine Learning arsenal once I understand it well.

Extreme gradient boosting or XGBoost is a library in it of itself. It is an ensemble technique that uses an iterative approach into arriving at a first model. Gradient boosting is an algorithm that starts with residuals (errors) on a model and the errors are retrained to be improved upon. This process happens again and again until the residuals are minimized.

General Parameters

Tree-based models are used in the model unless you pass it as ‘gblinear’ which uses linear functions. This is the first hyper parameter in XGBoost called booster.

Verbosity just deals with the messages the model will report to the user.

I will go over some important parameters. The following parameters are for tree-based models.

eta aka learning rate: which deals with residuals and the step size the model will take in its gradient descent on minimizing residuals.

gamma: the gini/entropy split factor required for a decision node to be splitted. The larger the gamma, the more conservative the model.

max depth: the max layers or splits on a tree. The default is 6.

min child weight: deals with the leaf split where the child leaf cannot be too small.

sampling method: default (uniform) or gradient based which means the sample the ensemble method uses to retrain acts on the residuals and is not uniform.

lambda: L2 regularization. Increasing value makes model more conservative

alpha: L1 regularization. Increasing value makes model more conservative.

scale_pos_weight: very important to tune if there is a class imbalance in your data

process_type: can be default or ‘update.’ Update causes each iteration to use the previous tree as a starting point to re-begin its modeling.

num_parallel_trees: used so that your tree boosted gradient can have parallel trees constructed to have a random forest effect on your model.

And these are what I dubbed to be the most important hyper parameters to know about.

I certainly will manipulate these parameters the next time I use XGBoost. Most of my collection of data used for this blog can also be found on the XGBoost documentation xgboost.readthedocs.io

Using XGBoost

Written by Jeffrey Ng