Loading...
Date
2025
Abstract
Achieving reliable and safe deployment of machine learning systems requires evaluating performance across a broader range of metrics beyond standard test accuracy.
These metrics include robustness to out-of-distribution (OOD) data, resilience against adversarial attacks, and the ability to produce calibrated uncertainty estimates. However, improving performance in these areas often involves trade-offs, as enhancing one aspect can come at the cost of another. For instance, while adversarial training can increase robustness to adversarial attacks, it may also reduce accuracy on clean input. Similarly, data augmentation, which improves robustness to certain factors, can lead to unintended spillover effects. For example, color jitter enhances robustness to brightness but may weaken robustness to pose variations.
This thesis aims to address these challenges by proposing techniques that enhance robustness while mitigating the trade-offs commonly observed in existing methods. A key contribution is the introduction of a label adjustment augmentation strategy, which offers distinct advantages over traditional methods. This approach is evaluated across diverse perturbation scenarios, demonstrating its ability to outperform advanced augmentation techniques in improving robustness to both non-adversarial and adversarial perturbations.
Beyond robustness and average accuracy, fairness is another critical consideration for the safe deployment of machine learning systems. Fairness involves reducing class-wise discrepancies in performance, ensuring that a model is not overly biased toward or against certain categories. This work also explores the impact of label adjustment in the context of adversarial training, addressing common spillover effects such as class-specific imbalances, where a model may exhibit strong robustness for certain classes while remaining vulnerable for others.
Expanding upon the progress made in non-adversarial perturbations, the focus then transitions to addressing adversarial attack scenarios. A key challenge in this domain is the limited generalization of traditional adversarial robustness methods to previously unseen threat models. In particular, adversarial training, widely considered one of the most effective defenses, is prone to overfitting to the specific threat model encountered during training (e.g., ℓ∞ adversarial examples). As a result, it struggles to maintain robustness when faced with unseen perturbations. To address this limitation, this thesis explores the concept of adversarial purification, leveraging a generative model—specifically, diffusion—to eliminate perturbations
prior to classification. This approach effectively decouples the robustification process from the knowledge of any specific attack or classifier, thereby offering greater flexibility in countering novel adversarial threats.
Supervisor
Description
Publisher
University of Limerick
Citation
Files
Funding code
Funding Information
Sustainable Development Goals
External Link
Type
Thesis
Rights
http://creativecommons.org/licenses/by-nc-sa/4.0/
