Using Adversarial Training to Improve Uncertainty Quantification

Kuan Huang, Meng Xu, Yingfeng Wang

Research output: Contribution to journalArticlepeer-review

Abstract

The success of adversarial attack methods suggests a small input change may mislead a trained machine-learning model. For example, changing one pixel of an image may cause the trained model to misclassify this updated image. Uncertainty quantification is crucial for detecting misclassifications; hence, precise uncertainty quantification, meaning uncertainty estimates that closely align with prediction correctness, is essential. We assume that misclassified samples should exhibit high uncertainty while correctly classified samples should exhibit low uncertainty. To evaluate the performance of uncertainty quantification, we investigate the task of uncertainty-based misclassification detection under adversarial attack conditions. Our findings suggest that existing uncertainty quantification methods are unable to accurately identify misclassified predictions resulting from adversarial attacks due to training issues. We propose a simple adversarial training strategy for improving uncertainty quantification. Our results show that adversarial training improves the reliability of uncertainty quantification by better aligning uncertainty with prediction correctness. Specifically, we observe consistent improvements in misclassification detection performance, measured by AUROC and AUPR, across clean and adversarial samples.

Original languageEnglish
JournalIEEE Transactions on Artificial Intelligence
DOIs
StateAccepted/In press - 2025

Keywords

  • adversarial attack
  • misclassification detection
  • Uncertainty quantification

Fingerprint

Dive into the research topics of 'Using Adversarial Training to Improve Uncertainty Quantification'. Together they form a unique fingerprint.

Cite this