TY - GEN
T1 - From Vulnerabilities to Improvements- A Deep Dive into Adversarial Testing of AI Models
AU - Hannon, Brendan
AU - Kumar, Yulia
AU - Sorial, Peter
AU - Li, J. Jenny
AU - Morreale, Patricia
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The security vulnerabilities inherent in large language models (LLMs), such as OpenAI's ChatGPT-3.5 and ChatGPT -4, Bing Bot, and Google's Bard, are explored in this paper. The focus is on the susceptibility of these models to malicious prompting and the potential for generating unethical content. An investigation is conducted into the responses these models provide when tasked with completing a movie script involving a character disseminating information about murder, weapons, and drugs. The analysis reveals that, despite the presence of filters designed to prevent the generation of unethical or harmful content, these models can be manipulated through malicious prompts to produce inappropriate and even illegal responses. This discovery underscores the urgent need for a comprehensive understanding of these vulnerabilities, as well as the development of effective measures to enhance the security and reliability of LLMs. This paper offers valuable insights into the security vulnerabilities that arise when these models are prompted to generate malicious content.
AB - The security vulnerabilities inherent in large language models (LLMs), such as OpenAI's ChatGPT-3.5 and ChatGPT -4, Bing Bot, and Google's Bard, are explored in this paper. The focus is on the susceptibility of these models to malicious prompting and the potential for generating unethical content. An investigation is conducted into the responses these models provide when tasked with completing a movie script involving a character disseminating information about murder, weapons, and drugs. The analysis reveals that, despite the presence of filters designed to prevent the generation of unethical or harmful content, these models can be manipulated through malicious prompts to produce inappropriate and even illegal responses. This discovery underscores the urgent need for a comprehensive understanding of these vulnerabilities, as well as the development of effective measures to enhance the security and reliability of LLMs. This paper offers valuable insights into the security vulnerabilities that arise when these models are prompted to generate malicious content.
KW - adversarial attacks
KW - Bard
KW - chatbots
KW - ChatGPT
KW - prompt engineering
KW - risk mitigation
KW - security vulnerabilities in chatbots
UR - http://www.scopus.com/inward/record.url?scp=85187927453&partnerID=8YFLogxK
U2 - 10.1109/CSCE60160.2023.00422
DO - 10.1109/CSCE60160.2023.00422
M3 - Conference contribution
AN - SCOPUS:85187927453
T3 - Proceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023
SP - 2645
EP - 2649
BT - Proceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023
Y2 - 24 July 2023 through 27 July 2023
ER -