TY - GEN
T1 - Detecting Faults vs. Revealing Failures
T2 - 24th IEEE International Conference on Software Quality, Reliability and Security, QRS 2024
AU - Ayad, Amani
AU - Alblwi, Samia
AU - Mili, Ali
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - When we quantify the effectiveness of a test suite by its mutation coverage, we are in fact equating test suite effectiveness with fault detection: to the extent that mutations are faithful proxies of actual faults, it is sensible to consider that the effectiveness of a test suite to kill mutants reflects its ability to detect faults. But there is another way to measure the effectiveness of a test suite: by its ability to expose the failures of an incorrect program (or, equivalently, its ability to give us confidence in the correctness of a correct program). The relationship between failures and faults is tenuous at best: a fault is the adjudged or hypothesized cause of a failure. Whereas a failure is an observable, verifiable, certifiable effect, a fault is someone's hypothesis about the possible cause of the observed effect. The same failure may be attributed to more than one fault or combination of faults. In this paper we raise two questions: is the ability to detect faults the same as the ability to reveal failures? If not, which is the better measure of test suite effectiveness? We do not give definite answers to these questions, but we use empirical data to challenge some assumptions and show why these questions are worth answering.
AB - When we quantify the effectiveness of a test suite by its mutation coverage, we are in fact equating test suite effectiveness with fault detection: to the extent that mutations are faithful proxies of actual faults, it is sensible to consider that the effectiveness of a test suite to kill mutants reflects its ability to detect faults. But there is another way to measure the effectiveness of a test suite: by its ability to expose the failures of an incorrect program (or, equivalently, its ability to give us confidence in the correctness of a correct program). The relationship between failures and faults is tenuous at best: a fault is the adjudged or hypothesized cause of a failure. Whereas a failure is an observable, verifiable, certifiable effect, a fault is someone's hypothesis about the possible cause of the observed effect. The same failure may be attributed to more than one fault or combination of faults. In this paper we raise two questions: is the ability to detect faults the same as the ability to reveal failures? If not, which is the better measure of test suite effectiveness? We do not give definite answers to these questions, but we use empirical data to challenge some assumptions and show why these questions are worth answering.
KW - detecting faults
KW - exposing failures
KW - mutation coverage
KW - semantic coverage
KW - software testing
KW - test suite effectiveness
UR - http://www.scopus.com/inward/record.url?scp=85199504327&partnerID=8YFLogxK
U2 - 10.1109/QRS62785.2024.00021
DO - 10.1109/QRS62785.2024.00021
M3 - Conference contribution
AN - SCOPUS:85199504327
T3 - IEEE International Conference on Software Quality, Reliability and Security, QRS
SP - 115
EP - 126
BT - Proceedings - 2024 IEEE 24th International Conference on Software Quality, Reliability and Security, QRS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 July 2024 through 5 July 2024
ER -