In Machine Learning classification tasks, the class imbalance problem is an important one which has
received a lot of attention in the last few years. In binary classification, class imbalance occurs when there are
significantly fewer examples of one class than the other. A variety of strategies have been applied to the problem
with varying degrees of success. Typically previous approaches have involved attacking the problem either algorithmically or by manipulating the data in order to mitigate the imbalance. We propose a hybrid approach which
combines Proportional Individualised Random Sampling(PIRS) with two different fitness functions designed to
improve performance on imbalanced classification problems in Genetic Programming. We investigate the efficacy of the proposed methods together with that of five different algorithmic GP solutions, two of which are
taken from the recent literature. We conclude that the PIRS approach combined with either average accuracy
or Matthews Correlation Coefficient, delivers superior results in terms of AUC score when applied to either
balanced or imbalanced datasets.
History
Publication
International Conference on Soft Computing (MENDEL)