Automated Grammar-based feature selection in symbolic regression
With the growing popularity of machine learning (ML), regression problems in many domains are becoming increasingly high-dimensional. Identifying relevant features from a high-dimensional dataset still remains a significant challenge for building highly accurate machine learning models. Evolutionary feature selection has been used for high-dimensional symbolic regression using Genetic Programming (GP). While grammar based GP, especially Grammatical Evolution (GE), has been extensively used for symbolic regression, no systematic grammar-based feature selection approach exists. This work presents a grammar?based feature selection method, Production Ranking based Feature Selection (PRFS), and reports on the results of its application in symbolic regression. The main contribution of our work is to demonstrate that the proposed method can not only consistently select the most rele?vant features, but also significantly improves the generalization performance of GE when compared with several state-of-the-art ML-based feature selection methods. Experimental results on benchmark symbolic regression problems show that the generalization performance of GE using PRFS was significantly better than that of a state-of-the-art Random Forest based feature selection in three out of four problems, while in fourth problem the performance was the same.
History
Publication
GECCO ’22 Proceedings of the Genetic and Evolutionary Computation Conference pp. 902-910Publisher
ACMExternal identifier
Department or School
- Computer Science & Information Systems