TY - GEN
T1 - Evaluating Automatic Speech Recognition Models
T2 - International conference on AI Revolution: Research, Ethics and Society, AIR-RES 2025
AU - Liu, Wei
AU - Xiong, Yukun
AU - Hu, Bin
AU - Kwak, Daehan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - The development of Automatic Speech Recognition (ASR) technology has progressed remarkably, becoming an integral component of virtual assistants, transcription services, and accessibility tools. Despite these advancements, ASR systems still struggle to accurately recognize speech from individuals with different accents and linguistic features. This work analyzes the performance of various ASR models, including cloud-based, local, and integrated speech recognition systems. For evaluation, we use different accented speech datasets and assess the ASR variants using Word Error Rate (WER) as the primary metric. The datasets include the Speech Accent Archive (SAA), L2-ARCTIC, and an Indian accent dataset. The results show that ASR accuracy varies depending on the speaker’s language and accent. OpenAI Whisper, Deepgram, and AssemblyAI perform significantly better compared to conventional models like Mozilla DeepSpeech. The results indicate that many standalone ASR models are optimized for non-regional standard English, leading to higher error rates for non-native and regionally accented speech. Future developments should focus on augmenting multilingual datasets and refining algorithms to achieve more equitable speech recognition capabilities for diverse accents.
AB - The development of Automatic Speech Recognition (ASR) technology has progressed remarkably, becoming an integral component of virtual assistants, transcription services, and accessibility tools. Despite these advancements, ASR systems still struggle to accurately recognize speech from individuals with different accents and linguistic features. This work analyzes the performance of various ASR models, including cloud-based, local, and integrated speech recognition systems. For evaluation, we use different accented speech datasets and assess the ASR variants using Word Error Rate (WER) as the primary metric. The datasets include the Speech Accent Archive (SAA), L2-ARCTIC, and an Indian accent dataset. The results show that ASR accuracy varies depending on the speaker’s language and accent. OpenAI Whisper, Deepgram, and AssemblyAI perform significantly better compared to conventional models like Mozilla DeepSpeech. The results indicate that many standalone ASR models are optimized for non-regional standard English, leading to higher error rates for non-native and regionally accented speech. Future developments should focus on augmenting multilingual datasets and refining algorithms to achieve more equitable speech recognition capabilities for diverse accents.
KW - Automatic Speech Recognition
KW - Linguistic Diversity
KW - Speech Recognition Models
UR - https://www.scopus.com/pages/publications/105028089529
U2 - 10.1007/978-3-032-12930-7_31
DO - 10.1007/978-3-032-12930-7_31
M3 - Conference contribution
AN - SCOPUS:105028089529
SN - 9783032129291
T3 - Communications in Computer and Information Science
SP - 447
EP - 458
BT - AI Revolution
A2 - Arabnia, Hamid R.
A2 - Deligiannidis, Leonidas
A2 - Amirian, Soheyla
A2 - Ghareh Mohammadi, Farid
A2 - Shenavarmasouleh, Farzan
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 14 April 2025 through 16 April 2025
ER -