TY - JOUR
T1 - Using Adversarial Training to Improve Uncertainty Quantification
AU - Huang, Kuan
AU - Xu, Meng
AU - Wang, Yingfeng
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2025
Y1 - 2025
N2 - The success of adversarial attack methods suggests a small input change may mislead a trained machine-learning model. For example, changing one pixel of an image may cause the trained model to misclassify this updated image. Uncertainty quantification is crucial for detecting misclassifications; hence, precise uncertainty quantification, meaning uncertainty estimates that closely align with prediction correctness, is essential. We assume that misclassified samples should exhibit high uncertainty while correctly classified samples should exhibit low uncertainty. To evaluate the performance of uncertainty quantification, we investigate the task of uncertainty-based misclassification detection under adversarial attack conditions. Our findings suggest that existing uncertainty quantification methods are unable to accurately identify misclassified predictions resulting from adversarial attacks due to training issues. We propose a simple adversarial training strategy for improving uncertainty quantification. Our results show that adversarial training improves the reliability of uncertainty quantification by better aligning uncertainty with prediction correctness. Specifically, we observe consistent improvements in misclassification detection performance, measured by AUROC and AUPR, across clean and adversarial samples.
AB - The success of adversarial attack methods suggests a small input change may mislead a trained machine-learning model. For example, changing one pixel of an image may cause the trained model to misclassify this updated image. Uncertainty quantification is crucial for detecting misclassifications; hence, precise uncertainty quantification, meaning uncertainty estimates that closely align with prediction correctness, is essential. We assume that misclassified samples should exhibit high uncertainty while correctly classified samples should exhibit low uncertainty. To evaluate the performance of uncertainty quantification, we investigate the task of uncertainty-based misclassification detection under adversarial attack conditions. Our findings suggest that existing uncertainty quantification methods are unable to accurately identify misclassified predictions resulting from adversarial attacks due to training issues. We propose a simple adversarial training strategy for improving uncertainty quantification. Our results show that adversarial training improves the reliability of uncertainty quantification by better aligning uncertainty with prediction correctness. Specifically, we observe consistent improvements in misclassification detection performance, measured by AUROC and AUPR, across clean and adversarial samples.
KW - adversarial attack
KW - misclassification detection
KW - Uncertainty quantification
UR - https://www.scopus.com/pages/publications/105008024092
U2 - 10.1109/TAI.2025.3578004
DO - 10.1109/TAI.2025.3578004
M3 - Article
AN - SCOPUS:105008024092
SN - 2691-4581
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
ER -