TY - JOUR
T1 - Robust Testing of AI Language Model Resiliency with Novel Adversarial Prompts
AU - Hannon, Brendan
AU - Kumar, Yulia
AU - Gayle, Dejaun
AU - Li, J. Jenny
AU - Morreale, Patricia
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/3
Y1 - 2024/3
N2 - In the rapidly advancing field of Artificial Intelligence (AI), this study presents a critical evaluation of the resilience and cybersecurity efficacy of leading AI models, including ChatGPT-4, Bard, Claude, and Microsoft Copilot. Central to this research are innovative adversarial prompts designed to rigorously test the content moderation capabilities of these AI systems. This study introduces new adversarial tests and the Response Quality Score (RQS), a metric specifically developed to assess the nuances of AI responses. Additionally, the research spotlights FreedomGPT, an AI tool engineered to optimize the alignment between user intent and AI interpretation. The empirical results from this investigation are pivotal for assessing AI models’ current robustness and security. They highlight the necessity for ongoing development and meticulous testing to bolster AI defenses against various adversarial challenges. Notably, this study also delves into the ethical and societal implications of employing advanced “jailbreak” techniques in AI testing. The findings are significant for understanding AI vulnerabilities and formulating strategies to enhance AI technologies’ reliability and ethical soundness, paving the way for safer and more secure AI applications.
AB - In the rapidly advancing field of Artificial Intelligence (AI), this study presents a critical evaluation of the resilience and cybersecurity efficacy of leading AI models, including ChatGPT-4, Bard, Claude, and Microsoft Copilot. Central to this research are innovative adversarial prompts designed to rigorously test the content moderation capabilities of these AI systems. This study introduces new adversarial tests and the Response Quality Score (RQS), a metric specifically developed to assess the nuances of AI responses. Additionally, the research spotlights FreedomGPT, an AI tool engineered to optimize the alignment between user intent and AI interpretation. The empirical results from this investigation are pivotal for assessing AI models’ current robustness and security. They highlight the necessity for ongoing development and meticulous testing to bolster AI defenses against various adversarial challenges. Notably, this study also delves into the ethical and societal implications of employing advanced “jailbreak” techniques in AI testing. The findings are significant for understanding AI vulnerabilities and formulating strategies to enhance AI technologies’ reliability and ethical soundness, paving the way for safer and more secure AI applications.
KW - AI model resilience
KW - adversarial testing
KW - content moderation in AI
KW - cybersecurity in AI systems
KW - ethical AI implications
UR - http://www.scopus.com/inward/record.url?scp=85187933448&partnerID=8YFLogxK
U2 - 10.3390/electronics13050842
DO - 10.3390/electronics13050842
M3 - Article
AN - SCOPUS:85187933448
SN - 2079-9292
VL - 13
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 5
M1 - 842
ER -