TY - JOUR
T1 - Introduction of r m 2 (rank) metric incorporating rank-order predictions as an additional tool for validation of QSAR/QSPR models
AU - Roy, Kunal
AU - Mitra, Indrani
AU - Ojha, Probir Kumar
AU - Kar, Supratik
AU - Das, Rudra Narayan
AU - Kabir, Humayun
PY - 2012/8/15
Y1 - 2012/8/15
N2 - In silico techniques involving the development of quantitative regression models have been extensively used for prediction of activity, property and toxicity of new chemicals. The acceptability and subsequent applicability of the models for predictions is determined based on several internal and external validation statistics. Among different validation metrics, Q 2 and R 2 pred represent the classical metrics for internal validation and external validation respectively. Additionally, the r m 2 metrics introduced by Roy and coworkers have been widely used by several groups of authors to ensure the close agreement of the predicted response data with the observed ones. However, none of the currently available and commonly used validation metrics provides any information regarding the rank-order predictions for the test set. Thus, to incorporate the concept of ranking order predictions while calculating the common validation metrics originally using the Pearson's correlation coefficient-based algorithm, the new r m 2 (rank) metric has been introduced in this work as a new variant of the r m 2 series of metrics. The ability of this new metric to perform the rank-order prediction is determined based on its application in judging the quality of predictions of regression - based quantitative structure-activity/property relationship (QSAR/QSPR) models for four different data sets. The different validation metrics calculated in each case were compared for their ability to reflect the rank-order predictions based on their correlation with the conventional Spearman's rank correlation coefficient. Based on the results of the sum of ranking differences analysis performed using the Spearman's rank correlation coefficient as the reference, it was observed that the r m 2 (rank) metric exhibited the least difference in ranking from that of the reference metric. Thus, the close correlation of the r m 2 (rank) metric with the Spearman's rank correlation coefficient inferred that the new metric could aptly perform the rank-order prediction for the test data set and can be utilized as an additional validation tool, besides the conventional metrics, for assessing the acceptability and predictive ability of a QSAR/QSPR model.
AB - In silico techniques involving the development of quantitative regression models have been extensively used for prediction of activity, property and toxicity of new chemicals. The acceptability and subsequent applicability of the models for predictions is determined based on several internal and external validation statistics. Among different validation metrics, Q 2 and R 2 pred represent the classical metrics for internal validation and external validation respectively. Additionally, the r m 2 metrics introduced by Roy and coworkers have been widely used by several groups of authors to ensure the close agreement of the predicted response data with the observed ones. However, none of the currently available and commonly used validation metrics provides any information regarding the rank-order predictions for the test set. Thus, to incorporate the concept of ranking order predictions while calculating the common validation metrics originally using the Pearson's correlation coefficient-based algorithm, the new r m 2 (rank) metric has been introduced in this work as a new variant of the r m 2 series of metrics. The ability of this new metric to perform the rank-order prediction is determined based on its application in judging the quality of predictions of regression - based quantitative structure-activity/property relationship (QSAR/QSPR) models for four different data sets. The different validation metrics calculated in each case were compared for their ability to reflect the rank-order predictions based on their correlation with the conventional Spearman's rank correlation coefficient. Based on the results of the sum of ranking differences analysis performed using the Spearman's rank correlation coefficient as the reference, it was observed that the r m 2 (rank) metric exhibited the least difference in ranking from that of the reference metric. Thus, the close correlation of the r m 2 (rank) metric with the Spearman's rank correlation coefficient inferred that the new metric could aptly perform the rank-order prediction for the test data set and can be utilized as an additional validation tool, besides the conventional metrics, for assessing the acceptability and predictive ability of a QSAR/QSPR model.
KW - Pearson's correlation coefficient
KW - QSAR
KW - QSPR
KW - QSTR
KW - Spearman's rank correlation coefficient
KW - Validation
UR - http://www.scopus.com/inward/record.url?scp=84868210651&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2012.06.004
DO - 10.1016/j.chemolab.2012.06.004
M3 - Article
AN - SCOPUS:84868210651
SN - 0169-7439
VL - 118
SP - 200
EP - 210
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
ER -