TY - JOUR
T1 - Applying Swin Architecture to Diverse Sign Language Datasets
AU - Kumar, Yulia
AU - Huang, Kuan
AU - Lin, Chin Chien
AU - Watson, Annaliese
AU - Li, J. Jenny
AU - Morreale, Patricia
AU - Delgado, Justin
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/4
Y1 - 2024/4
N2 - In an era where artificial intelligence (AI) bridges crucial communication gaps, this study extends AI’s utility to American and Taiwan Sign Language (ASL and TSL) communities through advanced models like the hierarchical vision transformer with shifted windows (Swin). This research evaluates Swin’s adaptability across sign languages, aiming for a universal platform for the unvoiced. Utilizing deep learning and transformer technologies, it has developed prototypes for ASL-to-English translation, supported by an educational framework to facilitate learning and comprehension, with the intention to include more languages in the future. This study highlights the efficacy of the Swin model, along with other models such as the vision transformer with deformable attention (DAT), ResNet-50, and VGG-16, in ASL recognition. The Swin model’s accuracy across various datasets underscore its potential. Additionally, this research explores the challenges of balancing accuracy with the need for real-time, portable language recognition capabilities and introduces the use of cutting-edge transformer models like Swin, DAT, and video Swin transformers for diverse datasets in sign language recognition. This study explores the integration of multimodality and large language models (LLMs) to promote global inclusivity. Future efforts will focus on enhancing these models and expanding their linguistic reach, with an emphasis on real-time translation applications and educational frameworks. These achievements not only advance the technology of sign language recognition but also provide more effective communication tools for the deaf and hard-of-hearing community.
AB - In an era where artificial intelligence (AI) bridges crucial communication gaps, this study extends AI’s utility to American and Taiwan Sign Language (ASL and TSL) communities through advanced models like the hierarchical vision transformer with shifted windows (Swin). This research evaluates Swin’s adaptability across sign languages, aiming for a universal platform for the unvoiced. Utilizing deep learning and transformer technologies, it has developed prototypes for ASL-to-English translation, supported by an educational framework to facilitate learning and comprehension, with the intention to include more languages in the future. This study highlights the efficacy of the Swin model, along with other models such as the vision transformer with deformable attention (DAT), ResNet-50, and VGG-16, in ASL recognition. The Swin model’s accuracy across various datasets underscore its potential. Additionally, this research explores the challenges of balancing accuracy with the need for real-time, portable language recognition capabilities and introduces the use of cutting-edge transformer models like Swin, DAT, and video Swin transformers for diverse datasets in sign language recognition. This study explores the integration of multimodality and large language models (LLMs) to promote global inclusivity. Future efforts will focus on enhancing these models and expanding their linguistic reach, with an emphasis on real-time translation applications and educational frameworks. These achievements not only advance the technology of sign language recognition but also provide more effective communication tools for the deaf and hard-of-hearing community.
KW - ASL detection
KW - Swin transformer
KW - Taiwan Sign Language
KW - deep learning
KW - multimodal large language models (MLLMs)
KW - the unvoiced
UR - http://www.scopus.com/inward/record.url?scp=85191316084&partnerID=8YFLogxK
U2 - 10.3390/electronics13081509
DO - 10.3390/electronics13081509
M3 - Article
AN - SCOPUS:85191316084
SN - 2079-9292
VL - 13
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 8
M1 - 1509
ER -