TY - GEN
T1 - The Multilingual Eyes Multimodal Traveler’s App
AU - Villalobos, Wilbert
AU - Kumar, Yulia
AU - Li, J. Jenny
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024
Y1 - 2024
N2 - This paper presents an in-depth analysis of “The Multilingual Eyes Multimodal Traveler’s App” (MEMTA), a novel application in the realm of travel technology, leveraging advanced Artificial Intelligence (AI) capabilities. The core of MEMTA’s innovation lies in its integration of multimodal Large Language Models (LLMs), notably ChatGPT-4-Vision, to enhance navigational assistance and situational awareness for tourists and visually impaired individuals in diverse environments. The study rigorously evaluates how the incorporation of OpenAI’s Whisper and DALL-E 3 technologies augments the app’s proficiency in real-time, multilingual translation, pronunciation, and visual content generation, thereby significantly improving the user experience in various geographical settings. A key focus is placed on the development and impact of a custom GPT model, Susanin, designed specifically for the app, highlighting its advancements in Human-AI interaction and accessibility over standard LLMs. The paper thoroughly explores the practical applications of MEMTA, extending its utility beyond mere travel assistance to sectors such as robotics, virtual reality, and military operations, thus underscoring its multifaceted significance. Through this exploration, the study contributes novel insights into the fields of AI-enhanced travel, assistive technologies, and the broader scope of human-AI interaction.
AB - This paper presents an in-depth analysis of “The Multilingual Eyes Multimodal Traveler’s App” (MEMTA), a novel application in the realm of travel technology, leveraging advanced Artificial Intelligence (AI) capabilities. The core of MEMTA’s innovation lies in its integration of multimodal Large Language Models (LLMs), notably ChatGPT-4-Vision, to enhance navigational assistance and situational awareness for tourists and visually impaired individuals in diverse environments. The study rigorously evaluates how the incorporation of OpenAI’s Whisper and DALL-E 3 technologies augments the app’s proficiency in real-time, multilingual translation, pronunciation, and visual content generation, thereby significantly improving the user experience in various geographical settings. A key focus is placed on the development and impact of a custom GPT model, Susanin, designed specifically for the app, highlighting its advancements in Human-AI interaction and accessibility over standard LLMs. The paper thoroughly explores the practical applications of MEMTA, extending its utility beyond mere travel assistance to sectors such as robotics, virtual reality, and military operations, thus underscoring its multifaceted significance. Through this exploration, the study contributes novel insights into the fields of AI-enhanced travel, assistive technologies, and the broader scope of human-AI interaction.
KW - AI in travel
KW - Assistive navigation technologies
KW - Human-AI interaction in tourism
KW - Multimodal LLMs
KW - Real-time multilingual translation
UR - http://www.scopus.com/inward/record.url?scp=85201104509&partnerID=8YFLogxK
U2 - 10.1007/978-981-97-3305-7_45
DO - 10.1007/978-981-97-3305-7_45
M3 - Conference contribution
AN - SCOPUS:85201104509
SN - 9789819733040
T3 - Lecture Notes in Networks and Systems
SP - 565
EP - 575
BT - Proceedings of 9th International Congress on Information and Communication Technology - ICICT 2024
A2 - Yang, Xin-She
A2 - Sherratt, Simon
A2 - Dey, Nilanjan
A2 - Joshi, Amit
PB - Springer Science and Business Media Deutschland GmbH
T2 - 9th International Congress on Information and Communication Technology, ICICT 2024
Y2 - 19 February 2024 through 22 February 2024
ER -