Abstract
The security vulnerabilities inherent in large language models (LLMs), such as OpenAI's ChatGPT-3.5 and ChatGPT -4, Bing Bot, and Google's Bard, are explored in this paper. The focus is on the susceptibility of these models to malicious prompting and the potential for generating unethical content. An investigation is conducted into the responses these models provide when tasked with completing a movie script involving a character disseminating information about murder, weapons, and drugs. The analysis reveals that, despite the presence of filters designed to prevent the generation of unethical or harmful content, these models can be manipulated through malicious prompts to produce inappropriate and even illegal responses. This discovery underscores the urgent need for a comprehensive understanding of these vulnerabilities, as well as the development of effective measures to enhance the security and reliability of LLMs. This paper offers valuable insights into the security vulnerabilities that arise when these models are prompted to generate malicious content.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 2645-2649 |
| Number of pages | 5 |
| ISBN (Electronic) | 9798350327595 |
| DOIs | |
| State | Published - 2023 |
| Event | 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023 - Las Vegas, United States Duration: 24 Jul 2023 → 27 Jul 2023 |
Publication series
| Name | Proceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023 |
|---|
Conference
| Conference | 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023 |
|---|---|
| Country/Territory | United States |
| City | Las Vegas |
| Period | 24/07/23 → 27/07/23 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 16 Peace, Justice and Strong Institutions
Keywords
- adversarial attacks
- Bard
- chatbots
- ChatGPT
- prompt engineering
- risk mitigation
- security vulnerabilities in chatbots
Fingerprint
Dive into the research topics of 'From Vulnerabilities to Improvements- A Deep Dive into Adversarial Testing of AI Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver