# Extra Trees Classifier / Regressor

A Powerful Alternative Random Forest Ensemble Approach

Hi everyone, today we will explore another powerful ensemble classifier called as Extra Trees Classifier / Regressor

It is a type of ensemble learning technique that aggregates the results of different de-correlated decision trees similar to Random Forest Classifier.

Extra Tree can often achieve a good or better performance than the random forest. The key difference between Random Forest and Extra Tree Classifier is,

• Extra Tree Classifier does not perform bootstrap aggregation like in the random forest. In simple words, takes a random subset of data without replacement. Thus nodes are split on random splits and not on best splits.
• So in Extra Tree Classifier randomness doesn’t come from bootstrap aggregating but comes from the random splits of the data.

If you wish to know more about Bagging /Bootstrap aggregating works you can follow my previous article https://bobrupakroy.medium.com/bagging-classifier-609a3bce7fb3

So to make the Long Story Short.

Decision Tree — are prune to overfitting, thus giving High Variance.

Random Forest — to overcome the Decision Tree problems Random Forest was introduced. Thus gives Medium Variance.

Extra Tree — when accuracy is more important than a generalized model. Thus gives Low Variance

And one more thing — it also gives feature importance’s

Now Let’s see how can we perform it!

`import pandas as pdimport numpy as npimport matplotlib.pyplot as plt#load the data url = "Wine_data.csv"dataframe=pd.read_csv(url)X= dataframe.drop("quality",axis=1)y = dataframe["quality"]`

Let’s consider the data as this demonstration

`from sklearn.ensemble import ExtraTreesRegressor# Building the modelextra_tree_model = ExtraTreesRegressor(n_estimators = 100,          criterion ='mse', max_features = "auto")# Training the modelextra_tree_model.fit(X, y)`

Done…! just like a regular classifier.

We can also perform feature important using the extra_tree_forest.feature_importances_

`# Computing the importance of each feature#Feature Importancefeature_importance = extra_tree_model.feature_importances_# Plotting a Bar Graph to compare the modelsplt.bar(X.columns, feature_importance)plt.xlabel('Feature Labels')plt.ylabel('Feature Importances')plt.title('Comparison of different Feature Importances')plt.show()`

Sorry for the names of the features! overlapping each other.

Let’s do the same for Classification

`from sklearn.ensemble import ExtraTreesClassifierdataframe1  = dataframe.copy()#Convert the target into a Boolean dataframe1["quality"] = np.where(dataframe['quality']>=5,1,0)X= dataframe.drop("quality",axis=1)y = dataframe["quality"]`

Now we do the same as before but this time criterion will be ‘entropy’

`# Building the modelextra_tree_forest = ExtraTreesClassifier(n_estimators = 100,          criterion ='entropy', max_features = "auto")# Training the modelextra_tree_forest.fit(X, y)`

That’s it!