Continued Pretraining for Domain Adaptation of Wav2vec 2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings

Journal article

Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol Espy-Wilson
arXiv preprint arXiv:2405.13018., 2024

DOI: https://doi.org/10.48550/arXiv.2405.13018

Link<<

Cite

APA Click to copy
Attia, A. A., Demszky, D., Ogunremi, T., Liu, J., & Espy-Wilson, C. (2024). Continued Pretraining for Domain Adaptation of Wav2vec 2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings. ArXiv Preprint ArXiv:2405.13018. https://doi.org/ https://doi.org/10.48550/arXiv.2405.13018

Chicago/Turabian Click to copy
Attia, Ahmed Adel, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, and Carol Espy-Wilson. “Continued Pretraining for Domain Adaptation of Wav2vec 2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings.” arXiv preprint arXiv:2405.13018. (2024).

MLA Click to copy
Attia, Ahmed Adel, et al. “Continued Pretraining for Domain Adaptation of Wav2vec 2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings.” ArXiv Preprint ArXiv:2405.13018., 2024, doi: https://doi.org/10.48550/arXiv.2405.13018.

BibTeX Click to copy

@article{ahmed2024a,
  title = {Continued Pretraining for Domain Adaptation of Wav2vec 2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings},
  year = {2024},
  journal = {arXiv preprint arXiv:2405.13018.},
  doi = { https://doi.org/10.48550/arXiv.2405.13018},
  author = {Attia, Ahmed Adel and Demszky, Dorottya and Ogunremi, Tolulope and Liu, Jing and Espy-Wilson, Carol}
}

Abstract

Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec 2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones, classroom conditions as well as classroom demographics. Our CPT models show improved ability to generalize to different demographics unseen in the labeled finetuning data.