--

Hi, you are correct isolation forest is an unsupervised algo. The X and y here if you carefully observe we have split the data into Train X, train y and Test x and Test Y. so we apply isolation forest over Train X dataset and say we get few outliers and removed(10 rows out of 100), on the other hand Train Y(100rows) will throw an dimensionality error becoz of number of rows not matching with the isolation forest processed Train Y (90rows). Thus we will be removing the correspond isolation forest Train X data from Train y data... Else you can also try ur way in one shot before splitting into X and y. i hope it helps,,, Regards ~

--

--

Rupak (Bob) Roy - II
Rupak (Bob) Roy - II

Written by Rupak (Bob) Roy - II

Things i write about frequently on Medium: Data Science, Machine Learning, Deep Learning, NLP and many other random topics of interest. ~ Let’s stay connected!

No responses yet