Cognitive modeling of accented speech in Malayalam: exploring the impact of acoustic signal processing and deep learning techniques.

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Sullamussalam Science College, Areekod,Kondotty

Abstract

Accented Automatic Speech Recognition (AASR) is the ability of a system to recognize accented speech inputs. It poses a unique challenge, particularly for languages with limited available datasets. In this research, a comprehensive exploration of machine learning and deep learning along with feature engineering techniques was conducted to advance the understanding of accented speech recognition. The research is completed in several phases of experimental studies. The journey begins with an extensive literature review and finding the dominating gap in the domain of AASR for Malayalam. The unavailability of benchmark dataset in accented Malayalam and scarcity of previous study in literature hindered this research. To address the scarcity of relevant datasets, eight distinct sets of accented data were carefully constructed. Additionally, a spectrogram dataset was developed to facilitate a comprehensive study. The research investigates various feature extraction techniques and model architectures, exploring the impact of different feature combinations on accented speech recognition. Each dataset is characterized by a diverse range of key properties essential for robust speech recognition systems. The datasets exhibit a wide spectrum of accents from varied regions and demographic groups. Efforts were made to maintain balanced representation across genders, ages, and socio-economic backgrounds, thereby reducing potential biases. The recordings for some of the datasets were conducted in natural settings to authentically capture variations in accent and pronunciation. These datasets are annotated with word and sentence level transcriptions (depending on the type of audio signal) and the district of the specific accent providing valuable insights into speaker details and recording conditions. To evaluate system robustness, recordings were obtained under various noise conditions, spanning from quiet environments to bustling public spaces.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By