Text-to-Sign Language Video Generation Using GANs, BERT, and Sora

  • Yulia Kumar
  • , Beining Niu
  • , Mengtian Lin
  • , Nidhi Mudholker

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper explores the generation of American Sign Language (ASL) videos using Generative Adversarial Networks (GANs), BERT-based text embeddings, and a dataset comprising authentic and synthetic SL clips. The original Kaggle dataset was enriched by creating a manually crafted collection. OpenAI's Sora video generator was then employed to augment the dataset by producing synthetic videos using multimodal prompts. Researchers implemented and compared several GAN architectures, including unimodal and Feature-wise Linear Modulation (FiLM) 3D convolutional generators, which integrate text embeddings for modal fusion. While preliminary training was conducted, quantitative evaluation revealed significant challenges in generating realistic and coherent ASL videos. Current results highlight the complexities of ASL video synthesis and underscore the need for advanced ASL generative applications.

Original languageEnglish
Title of host publication2025 15th IEEE Integrated STEM Education Conference, ISEC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331513436
DOIs
StatePublished - 2025
Event15th IEEE Integrated STEM Education Conference, ISEC 2025 - Princeton, United States
Duration: 15 Mar 2025 → …

Publication series

Name2025 15th IEEE Integrated STEM Education Conference, ISEC 2025

Conference

Conference15th IEEE Integrated STEM Education Conference, ISEC 2025
Country/TerritoryUnited States
CityPrinceton
Period15/03/25 → …

Keywords

  • GANs for ASL
  • Sora
  • Synthetic ASL Dataset

Fingerprint

Dive into the research topics of 'Text-to-Sign Language Video Generation Using GANs, BERT, and Sora'. Together they form a unique fingerprint.

Cite this