MultiLabel Multi Class Algorithms -II
New Guide to Advanced Predictive analytics via OutputCodeClassifier, Power Transformations, Ensemble approaches
Hi there, this is in continuation of part — I Chained and MultiLabel Algorithms: New Guide to Advanced Predictive analytics via Multi-label, Multi-output & Chained.
So today we will learn a few more different approaches to performing Multiabel MultiClass Prediction Tasks like OutputCodeClassifier, Binary Relevance, Adapted Algorithm, and Ensemble Approaches and offcourse MultiLabel for Deep Learning.
So let’s start with OutputCodeClassifier:
- OutputCodeClassifier
‘Error-Correcting Output Code-based strategies are fairly different from one-vs-the-rest and one-vs-one. With these strategies, each class is represented in a Euclidean space, where each dimension can only be 0 or 1. Another way to put it is that each class is represented by a binary code (an array of 0 and 1). The matrix which keeps track of the location/code of each class is called the code book. The code size is the dimensionality of the aforementioned space. Intuitively, each class should be represented by a code as unique as possible and a good code book should be designed to optimize classification accuracy. In this implementation, we simply use a randomly-generated code book as advocated in 3 although more elaborate methods may be added in the future.
At the fitting time, one binary classifier per bit in the code book is fitted. At prediction time, the classifiers are used to project new points in the class space and the class closest to the points is chosen.
In OutputCodeClassifier, the code_size attribute allows the user to control the number of classifiers that will be used. It is a percentage of the total number of classes.
A number between 0 and 1 will require fewer classifiers than one-vs-the-rest. In theory, log2(n_classes) / n_classes is sufficient to represent each class unambiguously. However, in practice, it may not lead to good accuracy since log2(n_classes) is much smaller than n_classes.
A number greater than 1 will require more classifiers than one-vs-the-rest. In this case, some classifiers will in theory correct for the mistakes made by other classifiers, hence the name “error-correcting”. In practice, however, this may not happen as classifier mistakes will typically be correlated. The error-correcting output codes have a similar effect to bagging.’ ~ machine learning mastery. The script is a simple template that we can follow to apply OutputCodeClassifier
Next, we have
2. Problem Transformation
This method can be further carried out in three different ways as:
2.1.Binary Relevance
2.2.Classifier Chains
2.3.Label Powerset
2.1 Binary Relevance
This basically treats each label as a separate single-class classification problem.
2.2 Classifier Chain
We have already seen how it is applied in our previous article.
Here is the link https://bobrupakroy.medium.com/chained-and-multilabel-algorithms-6b378ec761d3
Just in case:
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB# initialize classifier chains multi-label classifier
# with a gaussian naive bayes base classifier
classifier = ClassifierChain(GaussianNB())# train
classifier.fit(X_train, y_train)# predict
predictions = classifier.predict(X_test)accuracy_score(y_test,predictions)
2.3.Label Powerset
Here we transform the problem into a multi-class problem with one multi-class classifier is trained on all unique label combinations found in the training data.
Label powerset gives a unique class to every possible label combination that is present in the training set.
Next, we have Ensemble Approach
3.Ensemble Approaches
Official Repo for Ensemble Approach: http://scikit.ml/api/classify.html#ensemble-approaches
Last and not least is #Multi-Label Classification with Deep Learning
4.Multi-Label Classification with Deep Learning
Deep learning neural networks natively support multi-label classification problems.
from numpy import asarray
from sklearn.datasets import make_multilabel_classification
from keras.models import Sequential
from keras.layers import Dense
# get the dataset
def get_dataset():
X, y = make_multilabel_classification(n_samples=1000, n_features=10, n_classes=3, n_labels=2, random_state=1)
return X, y
# get the model
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
model.add(Dense(n_outputs, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
# load dataset
X, y = get_dataset()
n_inputs, n_outputs = X.shape[1], y.shape[1]# get model
model = get_model(n_inputs, n_outputs)
# fit the model on all data
model.fit(X, y, verbose=0, epochs=100)# make a prediction for new data
row = [3, 3, 6, 7, 8, 2, 11, 11, 1, 3]
newX = asarray([row])
yhat = model.predict(newX)
print('Predicted: %s' % yhat[0])
Here we go, We have our new ways to perform multilabel classification techniques as well.
If you enjoyed this article then i believe you will also enjoy the PART — I Chain and MultiLabel Algorithms article which talks about more ways and techniques to perform multilabel, multiclass, and chained classification.
Likewise, long story short I tried to bring to the best of from across and rephrasing it into a more simplified version, i will try to bring as much as possible new content across the data science realm and i hope the package will be useful at some point in your work. Because I believe machine learning is not replacing us, it’s about replacing the same iterative work that consumes time and much effort. So people should come to work to create innovations rather than be occupied in the same repetitive boring tasks.
Thanks again, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.
Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy
Let me know if you need anything. Talk Soon.
Kaggle Implementation:
https://www.kaggle.com/rupakroy/multilabel-multi-class-algorithms-ii
Git Repo:
OutputCodeClassiifer: https://github.com/rupak-roy/MultiLabel-OutputCodeClassifier
MultiLabel-MultiClass-Power-Transformation-Approach: https://github.com/rupak-roy/MultiLabel-MultiClass-Power-Transformation-Approach