Analysis and identification of dravidian accented Malayalam speech using machine learning
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Govt. College Madapally, University of Calicut
Abstract
Automatic Speech Recognition (ASR) systems often face a steep decline in per- formance when dealing with accented speech, especially in languages where ac- cent variation is shaped by diverse first languages (L1). In the case of Malay- alam, a South Indian Dravidian language, regional speakers frequently carry over phonological influences from their native tongues such as Kannada, Tamil, or Telugu. Despite the significance of this phenomenon in ASR development and linguistic studies, there has been a lack of comprehensive, balanced, and annotated databases that systematically capture Malayalam spoken with differ- ent Dravidian accents. Furthermore, existing approaches relying on traditional speech features like Mel-frequency cepstral coefficients (MFCCs) and standard segmentation strategies often fall short in handling the variability and complex- ity introduced by accent and environmental noise. Motivated by this research gap, this study aims to investigate the influence of first language (L1) on Malay- alam accent, identify the most effective speech units for accent classification, and explore the integration of nonlinear features with MFCCs and Chroma features to enhance robustness under noisy conditions. The study is grounded in both linguistic insight and data-driven methods, contributing significantly to accented speech processing in under-resourced language contexts. The study begins by identifying the problem of accent-induced variability in Malayalam speech, shaped significantly by speakers’ first languages such as Kan- nada, Tamil, and Telugu. Recognizing the lack of a standardized, balanced, and annotated speech database to support accent classification, a new multi-accent corpus—Dravidian Accented Malayalam Speech Database (DAMSD)—was de- veloped. This resource captures Malayalam speech across diverse accent groups, with careful attention to linguistic coverage, gender balance, and noise condi- tions. The database is organized into four categories: Real Speech Dataset, Clean Speech Dataset, Annotated Speech Dataset, and Simulated Noisy Speech Dataset. These categories reflect a range of environmental conditions, spanning from natural settings with background noise to ideal noise-free and artificially simulated noisy environments. Phoneme- and syllable-level annotations are avail- able for a subset of the clean dataset to facilitate detailed analysis and support the development of segmentation algorithms. Building on this resource, a comparative analysis was conducted to deter- mine the most effective speech unit for accent identification: phoneme, syllable, or word. Using a feature set that includes MFCCs and DELTAs employing Sup- port Vector Machine (SVM) and Random Forest (RF) for classification, the study demonstrates that syllable-based models consistently outperform phoneme and word-level approaches. Syllables offer an optimal balance between unit length and acoustic discriminability, enabling superior classification accuracy, particu- larly when enriched with spectral roll-off and centroid features To support syllable-based processing, the research introduces a novel seg- mentation technique that estimates syllable boundaries using sonority envelopes derived from Ramped Autocorrelation Coefficients (RAC). This method is shown to be effective under both clean and noisy conditions, including White Gaussian noise, Pink noise, Red noise and Babble noise at multiple Signal-to-Noise Ra- tio (SNR) levels. Quantitative evaluations using precision, recall, and F1-score confirm that RAC-based segmentation provides a reliable and noise-resilient ap- proach to isolating syllable-like units, even at challenging signal conditions like 0 dB SNR. The analysis is further extended by incorporating nonlinear dynamic fea- tures to enhance classification robustness. Features such as Fractal Dimension, Shannon Entropy, Spectral Entropy, and Teager Energy Operator (TEO) are in- tegrated with MFCCs and Chroma-based representations. Surrogate data analy- sis validates the nonlinear structure of several features, particularly entropy and energy-based metrics. Results show that combining these nonlinear descriptors with MFCCs and Chroma significantly improves accuracy across different SNR levels, demonstrating their effectiveness in capturing complex speech dynamics and articulatory cues that MFCCs alone may overlook. The investigation then turns to phoneme-level analysis to identify which phonemes most reliably encode accent information. Phonemes are classified into four categories: Vowel phonemes (V), common phonemes across all accents (C), unique phonemes to Malayalam (U), and those phonemes differing in articulation or presence across accents (D). Classification results reveal that phonemes in the U and D categories exhibit the highest discriminability for accent identification, while common phonemes are frequently misclassified. This reinforces the influence of L1 phonological structures on second language (L2) (Malayalam in this study) pronunciation and highlights the potential of selectively targeting accent-rich phonemes in classification tasks. The thesis presents a comprehensive approach to Malayalam accent identification by combining linguistic theory with advanced signal processing and machine learning techniques. The creation of the DAMSD corpus fills a critical resource gap in Dravidian speech research. The findings emphasize that syllables are the most effective speech unit for accent classification, that nonlinear acoustic features significantly enhance noise robustness, and that accent-rich phonemes can be identified and utilised for more accurate classification and accented speech database creation. Together, these contributions form a robust foundation for developing accent-aware ASR systems and phonetic tools tailored to under-resourced languages like Malayalam.
