Loading...
Thumbnail Image
Publication

Interpretable breast cancer diagnosis using grammatical evolution

Date
2025
Abstract
Interpretability is critical for diagnostic models, particularly in medical applications where understanding the reasoning behind predictions can directly impact patient outcomes. Ensuring models are both interpretable and accurate remains a key challenge in developing Machine Learning (ML) and Neural Network (NN) classifiers. This thesis hypothesises that Grammatical Evolution (GE), an evolutionary algorithm, can address this challenge by significantly enhancing model interpretability while maintaining competitive classification accuracy. Additionally, it proposes STEM, a novel data augmentation approach combining Synthetic Minority Oversampling Technique (SMOTE), Edited Nearest Neighbour (ENN), and Mixup, that effectively mitigates issues related to class imbalance, poor generalisation, and biased training. By tackling these critical challenges, GE’s evolutionary framework and STEM’s hybrid augmentation strategies provide a complementary and effective foundation for improving diagnostic models. This thesis investigates GE’s potential across three key dimensions: instance classification, feature construction, and feature selection. It identifies challenges stemming from the limited interpretability of other ML models and demonstrates how GE offers a more interpretable alternative without compromising its ability to identify critical features or deliver accurate classifications. Additionally, the research examines the impact of different feature extraction techniques and the use of diverse image views on the performance of diagnostic models, particularly in the context of mammographic imaging. Empirical studies in this research demonstrate that GE can generate interpretable solutions that outperform traditional ML models in feature selection and classification while achieving competitive results in feature construction tasks. Furthermore, the efficacy of STEM in handling highly imbalanced datasets is validated, with results showing that its hybrid approach improves model robustness and accuracy in mammographic image classification. The research also evaluates the influence of various feature extraction methods and imaging perspectives on diagnostic outcomes, providing valuable insights when paired with NN models. The findings of this research validate the central hypothesis. The proposed methodologies, including integrating GE with STEM, significantly enhance diagnostic interpretability and accuracy. By addressing challenges related to interpretability, data imbalance, feature selection, and feature construction, this thesis presents a cohesive framework for developing robust and interpretable models for breast cancer diagnosis.
Supervisor
Description
Publisher
University of Limerick
Citation
Funding code
Funding Information
Sustainable Development Goals
External Link
License
Attribution-NonCommercial-ShareAlike 4.0 International
Embedded videos