Automatic production selection in grammatical evolution
By the very nature of its representation, symbolic regression through Grammatical Evolution (GE) stands a chance of being interpretable. However, while GE builds solutions using available building blocks (grammar productions), striving to achieve better approximation can sometimes compromise program size and hence interpretability. On the other hand, increased data dimensionality in regression problems poses a significant challenge when attempting to achieve better performance. This research addresses both issues and demonstrates that choosing the right set of grammar productions through production selection for a given problem not only lets GE perform dimensionality reduction but also reduces program sizes while maintaining the most important performance criterion, generalisation.
Grammar design, especially the choice of productions, has largely been a subject of expert judgement or trial and error. We hypothesise that evolution convergence carries information which can be exploited to distinguish between worthy and less useful productions. To test this hypothesis, we devise a production ranking scheme to rank grammar productions used in solution derivations based on structural analysis. The ranking profile of productions provides rich information for production selection, and further development affirmed the effectiveness of the ranking approach.
Grammar is not a static artefact in this research but rather adapts to a given problem. At different stages during evolution, productions which appear not to improve evolvability are pruned from the grammar. We develop two grammar pruning approaches: static pruning and dynamic pruning. While static pruning removes productions across sub-experiments, dynamic pruning prunes the grammar across generations. The developed approaches of production ranking and grammar pruning are shown to achieve significantly smaller solutions while maintaining accuracy on a variety of synthetic as well as real-world regression problems.
Algorithms developed in this research, with an extensive set of experimentation, analysis, and comparison, are integrated into an automated tool, AutoGE, which not only aids in primitive set selection but also in feature selection. Feature selection has been a challenging task, especially in high-dimensional symbolic regression. Utilising linear scaling to build the ranking profile of features, it is demonstrated that feature selection with AutoGE helps improve generalisation performance in high-dimensional problems compared to state-of-the-artmachine learning approaches.
History
Faculty
- Faculty of Science and Engineering
Degree
- Doctoral
First supervisor
Conor RyanSecond supervisor
Meghana KshirsagarAlso affiliated with
- LERO - The Irish Software Research Centre
Department or School
- Computer Science & Information Systems