Sklearn PolynomialFeatures ~Feature Engineering

Rupak (Bob) Roy - II
3 min readMar 25, 2022

--

The best way to automate n improve our machine learning model accuracy

Hi there, hows things are up to? i hope it's good. I came across to an interesting topic, Feature engineering using our favorite sklearn package.

Sklearn provides a powerful package to create new features where the new features are combinations of interactions. And with a goal of such transformations to raise the number of input features leads to better model understanding and accuracy.

If you can remember when we were at the stone age of machine learning model building we use to manually create combinations of features recursively hoping for better model accuracy was a very tedious and painful experience This is where the sklearn ‘PolynomialFeatures’ package comes into rescue.

Polynomial Features are created by raising existing features to an exponent.

Let’s understand this with the help of an example:

Simple Polynomial Features

#“degree” argument controls the number of features created and defaults to 2.
#“interaction_only” argument means that only the raw values (degree 1) and the interaction (pairs of values multiplied with each other) are included, defaulting to False.
#“include_bias” argument defaults to True to include the bias feature.

Now let's try this with a dataset.

Without Polynomial Features

Accuracy: 0.797 (0.073)

With Polynomial Features

(208, 39711) Clearly, we can observe from 61 features to 39711 features.

Evaluate Model Performance

Accuracy: 0.800 (0.077) Well the accuracy improved even a bit.

We can also understand what will be the effect of the degree of polynomial features.

Effect of Polynomial Degree

Degree: 1, Features: 61
Degree: 2, Features: 1891
Degree: 3, Features: 39711
Degree: 4, Features: 635376
Degree: 5, Features: 8259888

The Degree effect

#Line Plot of the Degree vs. the Number of Input Features for the Polynomial Feature Transform
#More features may result in more overfitting, and in turn, worse results.

Now

Hyperparameter tuning ~ what is the degree of polynomial features that will give the best accuracy.

Hyperparameter tuning

Degree 1 ~0.797 (0.073)
Degree 2 ~0.793 (0.085)
Degree 3 ~0.800 (0.077)
Degree 4 ~0.795 (0.079)

Accuracy with the degree
Accuracy with the degree

We can see that degree 3 performs the best in lifting the model accuracy.

That's it.

It's great to know all these new advanced techniques. Thanks to the author Jason Brownlee for enlightening us with this hidden yet powerful feature engineering function.

Thanks again, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Let me know if you need anything. Talk Soon.

--

--

Rupak (Bob) Roy - II

Things i write about frequently on Medium: Data Science, Machine Learning, Deep Learning, NLP and many other random topics of interest. ~ Let’s stay connected!