University of Limerick
Browse

Improving breast cancer diagnosis using grammatical evolution‑based feature selection

Download (1.5 MB)
journal contribution
posted on 2025-06-16, 14:05 authored by Yumnah HasanYumnah Hasan, Allan de LimaAllan de Lima, Ehsan NamjooEhsan Namjoo, Darian Fernández de Bulnes, Juan F.H. AlbarracínJuan F.H. Albarracín, Conor RyanConor Ryan

Machine learning has significantly advanced breast cancer diagnosis, yet challenges such as high-dimensional data, severe class imbalance, and limited interpretability persist. To address these issues, we proposed a Grammatical Evolution (GE)-based Feature Selection (FS) approach, integrated with a class-balancing technique called STEM, which combines Synthetic Minority Oversampling Technique, Edited Nearest Neighbour and Mixup, effectively handling both inter-class and intra-class imbalance. Our study evaluates the performance of the GE-based FS method against other FS models, including Logistic Regression (LR) and Extreme Gradient Boosting (XGBoost), in identifying critical features for breast cancer diagnosis. The results demonstrate that the GE-based FS method effectively identifies critical features and achieves superior Area Under the Curve (AUC) scores, particularly with smaller subsets of features, unlike LR and XGBoost, which perform optimally with the full feature set. The analysis was conducted on the Digital Database for Screening Mammography and Wisconsin Breast Cancer datasets, which originally contained 52 and 30 features, respectively. The GE-based FS produces the highest AUC with subsets of 10 and 15 features, while LR and XGBoost achieve their best results using the entire feature set, underscoring the superiority of the GE-based FS method.

Funding

SFI Centre for Research Training in Artificial Intelligence

Science Foundation Ireland

Find out more...

Lero_Phase 2

Science Foundation Ireland

Find out more...

History

Publication

SN Computer Science, 2025, 6, article 306

Publisher

Springer

Also affiliated with

  • LERO - The Science Foundation Ireland Research Centre for Software

Department or School

  • Computer Science & Information Systems

Usage metrics

    University of Limerick

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC