What is...

What is boosting in machine learning?

July 16, 2022

By Mona Eslamijam

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI.

We train machine learning models to predict values such as the weather, stock prices, the class of an image, or the sentiment of a social media post. However, often, machine learning models fail to meet the performance levels that we expect of them.

There are several solutions to improve the accuracy of machine learning models. One popular method is “boosting,” an ensemble learning technique that brings together several ML models that perform poorly alone but stronger together.

Weak learners and strong learners

Before we get into boosting, it is worth visiting the concept of “weak” and “strong” learners. Weak learners are ML models that perform poorly, sometimes only slightly better than random guessing. There can be several reasons for an ML model becoming a weak learner. For example, there might not be enough training data or the model may not be complex enough.

In contrast, a strong learner makes mostly correct predictions with high confidence (the desired accuracy and confidence may vary depending on the application). Our goal in machine learning is to create strong learners.

Boosting takes several weak learners and combines them to create a strong learning system. Here, we’ll discuss some of the popular boosting methods.

Boosting is closely related to “bagging,” another ensemble method. Bagging (short for “bootstrap aggregating”) trains several weak learners on different bootstrap samples drawn from the training data (bootstrap samples are random samples taken with replacement). This results in the ML models learning different patterns. After training, when the ML model is presented with a new input, it runs it through all the weak learners and uses a majority voting system to make a final prediction. In a classification problem, the bagging model will choose the class that receives the most vote from the weak learners.

Boosting is like bagging but with the difference that it trains a sequence of weak learners that try to correct the mistakes of their predecessors. There are several different popular boosting techniques.

Boosting

Like bagging, boosting trains a series of weak learners on samples drawn from the training dataset. However, unlike bagging, boosting methods draw their samples “without replacement.” This means that the same example can’t be drawn twice from the training dataset when gathering a sample.

The weak learners are trained sequentially. First, the boosting algorithm draws a subset of training examples from the training dataset and trains a weak learner on them. The ML model will correctly classify some examples and misclassify others.

The algorithm then draws a second set of samples (without replacement) to train the second ML model. But this time, it also adds 50 percent of the examples that were misclassified by the first weak learner.

The boosting algorithm selects the examples that the first and second learners disagree on to train the third learner.

When all three learners are trained, the boosting model makes predictions by using majority voting, like bagging models. In comparison to bagging, boosting does a better job of reducing bias and variance in ML models.

AdaBoost (adaptive boosting)

AdaBoost is a variation of the original boosting algorithm that assigns weights to the training examples. In AdaBoost, the first learner is trained on the entire training dataset and all examples are assigned equal weights.

Since the ML model is a weak learner, it will misclassify some of the examples after training. The weights of those examples are increased, and the weights of the correctly classified examples are decreased. The second learner is trained on the new weights, with more emphasis on the misclassified examples.

The same self-correction process is repeated for the third (and later) models. Once all the models are trained, classification is done through majority voting.

Gradient boosting

Gradient boosting is another variant of the boosting algorithm that, like AdaBoost, trains the learners iteratively. Each new learner tries to overcome the shortcomings of the previous one.

However, instead of using misclassification, gradient boosting uses prediction errors to guide each new learner. This means that for each example that the first learner misclassifies, the boosting algorithm calculates the distance between the prediction and the ground truth and calculates the gradient. The algorithm uses these gradients to determine the weights of the training examples. Gradient boosting is more flexible in changing individual weights.

The downside of gradient boosting is that it can become inefficient and computationally expensive as your training dataset grows. It is also prone to overfitting if not configured properly. This is why AdaBoost still remains very popular.

XGBoost (Extreme Gradient Boosting)

XGBoost is like gradient boosting but with some performance enhancements. XGBoost takes advantage of parallel computing, distributed computing, cache optimization, and out-of-core processing to handle large datasets while keeping performance at an optimal level.

XGBoost also uses implementation tricks to make the processing of large datasets more efficient. For example, it divides the dataset into blocks that can be processed on separate machines at the same time.

XGBoost has become very popular in applied machine learning and Kaggle competitions because of its performance advantages.

For more information, you can refer to Machine Learning with PyTorch and Scikit-Learn, a good introductory book on machine learning that also has a chapter on boosting methods in machine learning.

About the author

Mona Eslamijam is a business analytics (MSc) graduate from the University of Texas at Dallas.

Moving beyond passive RAG: How to implement active memory reconstruction for…

How self-improving harnesses are rewriting the agent engineering playbook

How Nvidia’s ASPIRE framework accelerates robot programming with self-improving AI

How the AI arms race moved from smart models to full-stack…

Why LLMs should stop thinking out loud (and what comes after…

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

Demystifying loop engineering: Get more from AI agents, avoid loopmaxxing

Why the future of agentic AI is all about the harness

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

What is boosting in machine learning?

Weak learners and strong learners

Boosting

AdaBoost (adaptive boosting)

Gradient boosting

XGBoost (Extreme Gradient Boosting)

Like this:

Leave a ReplyCancel reply

Weak learners and strong learners

Boosting

AdaBoost (adaptive boosting)

Gradient boosting

XGBoost (Extreme Gradient Boosting)

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks