From Vulnerabilities to Improvements- A Deep Dive into Adversarial Testing of AI Models

Brendan Hannon, Yulia Kumar, Peter Sorial, J. Jenny Li, Patricia Morreale

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

The security vulnerabilities inherent in large language models (LLMs), such as OpenAI's ChatGPT-3.5 and ChatGPT -4, Bing Bot, and Google's Bard, are explored in this paper. The focus is on the susceptibility of these models to malicious prompting and the potential for generating unethical content. An investigation is conducted into the responses these models provide when tasked with completing a movie script involving a character disseminating information about murder, weapons, and drugs. The analysis reveals that, despite the presence of filters designed to prevent the generation of unethical or harmful content, these models can be manipulated through malicious prompts to produce inappropriate and even illegal responses. This discovery underscores the urgent need for a comprehensive understanding of these vulnerabilities, as well as the development of effective measures to enhance the security and reliability of LLMs. This paper offers valuable insights into the security vulnerabilities that arise when these models are prompted to generate malicious content.

Original languageEnglish
Title of host publicationProceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2645-2649
Number of pages5
ISBN (Electronic)9798350327595
DOIs
StatePublished - 2023
Event2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023 - Las Vegas, United States
Duration: 24 Jul 202327 Jul 2023

Publication series

NameProceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023

Conference

Conference2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023
Country/TerritoryUnited States
CityLas Vegas
Period24/07/2327/07/23

Keywords

  • adversarial attacks
  • Bard
  • chatbots
  • ChatGPT
  • prompt engineering
  • risk mitigation
  • security vulnerabilities in chatbots

Fingerprint

Dive into the research topics of 'From Vulnerabilities to Improvements- A Deep Dive into Adversarial Testing of AI Models'. Together they form a unique fingerprint.

Cite this