TY - GEN
T1 - Automatic Speech Recognition in Diverse English Accents
AU - Mohyuddin, Hashir
AU - Kwak, Daehan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Advancements in automatic speech recognition (ASR) systems have led to their widespread integration into daily life, significantly altering our interaction with technology. However, this interaction is not always seamless for all users. Specifically, speakers with accents frequently face difficulties using ASR technologies and often need to deliberately adjust their pronunciation for better recognition. This study aims to compare leading ASR models' ability to transcribe speech from accented speakers of various nationalities against their native American English-speaking counterparts. We utilize two speech corpora: the L2-ARCTIC (L2A) and the Speech Accent Archive (SAA), which provide the original 'clean' audio samples. From there, two additional files are created by adding background noise to the original samples. These files are then processed through the respective APIs of each ASR model to obtain transcriptions. The accuracy of these transcriptions is then assessed by calculating the Word Error Rate (WER) for each speaker and model. The primary objective of this study is to highlight the challenges faced by speakers with diverse accents in using ASR technology. By highlighting these issues, we aim to encourage proactive measures to take steps towards their resolution. We believe it emphasizes the importance of fostering a more equitable and inclusive user experience.
AB - Advancements in automatic speech recognition (ASR) systems have led to their widespread integration into daily life, significantly altering our interaction with technology. However, this interaction is not always seamless for all users. Specifically, speakers with accents frequently face difficulties using ASR technologies and often need to deliberately adjust their pronunciation for better recognition. This study aims to compare leading ASR models' ability to transcribe speech from accented speakers of various nationalities against their native American English-speaking counterparts. We utilize two speech corpora: the L2-ARCTIC (L2A) and the Speech Accent Archive (SAA), which provide the original 'clean' audio samples. From there, two additional files are created by adding background noise to the original samples. These files are then processed through the respective APIs of each ASR model to obtain transcriptions. The accuracy of these transcriptions is then assessed by calculating the Word Error Rate (WER) for each speaker and model. The primary objective of this study is to highlight the challenges faced by speakers with diverse accents in using ASR technology. By highlighting these issues, we aim to encourage proactive measures to take steps towards their resolution. We believe it emphasizes the importance of fostering a more equitable and inclusive user experience.
KW - Accent Recognition
KW - Accented Speech
KW - ASR Accuracy
KW - Automatic Speech Recognition
KW - Voice Assistants
UR - http://www.scopus.com/inward/record.url?scp=85199995477&partnerID=8YFLogxK
U2 - 10.1109/CSCI62032.2023.00122
DO - 10.1109/CSCI62032.2023.00122
M3 - Conference contribution
AN - SCOPUS:85199995477
T3 - Proceedings - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023
SP - 714
EP - 718
BT - Proceedings - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023
Y2 - 13 December 2023 through 15 December 2023
ER -