Calibration for Actual Probabilities

5 min readJan 5, 2022

using isotonic, logistic regression and calibratedclassifierCV

Hi everyone, hope you are doing awesome.

Today we have something very surprising and interesting to know about.

It's called Calibration.

We are aware that most machine learning models for classification result in between 0 and 1which again can be interpreted into probabilities by calling predict_proba().

But there is a catch when we do it in iterations. Let’s say we have used a random forest, which is an ensemble method having the number of decision trees. The random forest’s prediction is done by averaging individual decision tree results, Correct?

Now let's say a model predicted a probability of zero for some sample data. And for sure the other individual ensembled decision trees results will predict a slightly higher value. Taking the average will push the random forest’s overall prediction away from Zero. Makes Sense?

That's why we need calibration of our probability scores. The idea of probability calibration is to build a second model (called calibrator) that is able to correct them into actual probabilities.

Calibration consists of a function that transforms a -1 dimensional vector (uncalibrated probabilities) into another 1-dimensions vector (of calibrated probabilities)

Let’s take another example. if your model outputs a prediction of heart failure in the next months, the doctors for sure will start acting on the people whose risk is above 0.5+ probability score, now if your model prediction is not calibrated, the model outputs may mislead anyone taking action based on its immediate model outputs. A calibrated model with a score 0.8 probability will actually mean that the instance has an 80% chance of being True.

The easiest way to assess the calibration of your model is through a plot called calibration curve (a.k.a. “reliability diagram”).

“The idea is to divide the observations into bins of probability.
Thus, observations that belong to the same bin share a similar probability.
At this point, for each bin, the calibration curve compares the predicted mean (i.e. mean of the predicted probability) with the theoretical mean (i.e. mean of the observed target variable).’’

The calibration curve with an S-shaped pattern is often the case for many classification models and the consequence is usually over-forecasting low probabilities and under forecasting high probabilities.

There are 3 methods used as calibrators:

Logistic Regression. Logistic regression is a rate beast that actually produces calibrated probabilities, the reason behind that is it optimizes for log-odds, which makes probabilities actually present in the model’s cost function. It is also known as Platt-scaling.
Isotonic Regression. It’s a non-parametric model that fits a piecewise function to the outputs.
CalibratedClassifierCV. Probability classifier with isotonic regression or logistic regression from sklearn

calibrating the model does not guarantee an improvement in the existing model class accuracy, metrics like accuracy, precision, or recall also play an important role.

The most common types of miscalibration are:

l. Systematic overestimation. Compared to the true distribution, the distribution of predicted probabilities is pushed towards the right. This is common when you train a model on an unbalanced dataset with very few positives.

2. Systematic underestimation. Compared to the true distribution, the distribution of predicted probabilities is pushed leftward.

3. Center of the distribution is too heavy. This happens when “algorithms such as support vector machines and boosted trees tend to push predicted probabilities away from 0 and 1” (quote from «Predicting good probabilities with supervised learning»).

4. Tails of the distribution are too heavy. For instance, “Other methods such as naive Bayes have the opposite bias and tend to push predictions closer to 0 and 1” (quote from «Predicting good probabilities with supervised learning»).

Other use cases:

Ensembling: when we want to combine many probability models having uncalibrated predictions makes a difference.

Calibration for Actual Probabilities: Sklearn

1. random forest
2. random forest + isotonic regression
3. random forest + logistic regression

Calibration for Actual Probabilities: Isotonic regression

Metrics to assess which one is better calibrated. Expected Calibration Error gives how far away is our predicted probability from the true actual probability

Calibration for Actual Probabilities: Freedman-Diaconis rule

Random Forest: 0.07475200000000007

Random Forest + Isotonic Regression: 0.138570621223946

Random Forest + logistic regression: 0.11915893242619713

Now its try with CalibratedClassifierCV

Calibration for Actual Probabilities: CalibratedClassifierCV

calibrated = CalibratedClassifierCV(model, method=’sigmoid’, cv=5)

calibrated = CalibratedClassifierCV(model, method=’isotonic’, cv=5)

THE END — — — — — — —

Calibration for Actual Probabilities

BUT if you find this article useful…. do browse my other ensemble techniques like Bagging Classifier, Voting Classifier, Stacking, and more I guarantee you will like them too. See you soon with another interesting topic.