University of Limerick
Browse

STEM rebalance: a novel approach for tackling imbalanced datasets using SMOTE, edited nearest neighbour, and mixup

Download (679.43 kB)
conference contribution
posted on 2024-04-24, 10:01 authored by Yumnah HasanYumnah Hasan, Fatemeh AmerehiFatemeh Amerehi, Patrick HealyPatrick Healy, Conor RyanConor Ryan

Imbalanced datasets in medical imaging are characterized by skewed class proportions and scarcity of abnormal cases. When trained using such data, models tend to assign higher probabilities to normal cases, leading to biased performance. Common oversampling techniques such as SMOTE rely on local information and can introduce marginalization issues. This paper investigates the potential of using Mixup augmentation that combines two training examples along with their corresponding labels to generate new data points as a generic vicinal distribution. To this end, we propose STEM, which combines SMOTEENN and Mixup at the instance level. This integration enables us to effectively leverage the entire distribution of minority classes, thereby mitigating both between-class and within-class imbalances. We focus on the breast cancer problem, where imbalanced datasets are prevalent. The results demonstrate the effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the Digital Database for Screening Mammography and Wisconsin Breast Cancer (Diagnostics) datasets, respectively. Moreover, this method shows promising potential when applied with an ensemble of machine learning (ML) classifiers.

Funding

SFI Centre for Research Training in Artificial Intelligence

Science Foundation Ireland

Find out more...

Automatic Design of Digital Circuits (ADDC)

Science Foundation Ireland

Find out more...

History

Publication

2023 IEEE 19th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, pp. 3-9

Publisher

Institute of Electrical and Electronics Engineers

Rights

© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”

Sustainable development goals

  • (3) Good Health and Well-being

Department or School

  • Computer Science & Information Systems

Usage metrics

    University of Limerick

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC