Offline handwritten character recognition of anciet Malayalam script vattezhuthu using hybrid vision transformer-swin model

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Sullamussalam Science College, University of Calicut)

Abstract

Ancient scripts provide primary sources for understanding historical events, societies, and cultures that may not have been recorded elsewhere. They help establish timelines and sequences of events, offering a clearer picture of historical developments. Among the numerous ancient scripts, Vattezhuthu stands out as one of the earliest in India, from the 8th to 15th centuries. This script contains information and knowledge that spans various fields, including history, culture, literature, law, science, mathematics, and medicine. However, its time-induced degradation, stylistic variability, and the scarcity of digitized samples pose significant challenges to preservation and study. This research addresses these obstacles by developing an automated framework for recognizing and digitizing Vattezhuthu script, integrating advanced image processing, innovative data augmentation, and a hybrid deep learning architecture. The study’s primary objective is to enhance recognition accuracy while mitigating the limitations of degraded historical artifacts and insufficient datasets. A comprehensive dataset was curated from stone inscriptions, copper plates, and palm leaf manuscripts sourced from repositories such as the Hill Palace Archaeological Museum, Tripunithura Palace, the State Archives Department and the University of Calicut. To address image degradation, a multi-stage preprocessing pipeline was implemented, including grayscale conversion, super-resolution techniques for detail enhancement, and noise reduction using median filtering and Gaussian smoothing. An adaptive binarization method was proposed, outperforming traditional algorithms (Otsu, Niblack, Sauvola) with high accuracy ensuring robust feature extraction from low-contrast, degraded manuscripts. The framework’s efficacy was validated using metrics such as Peak Signal-to-Noise Ratio (PSNR), Mean Square Error (MSE), and Structural Similarity Index Measure (SSIM). A novel strokebased data augmentation technique was introduced to simulate natural handwriting variations, increasing dataset diversity and improving model generalizability. For classification, the Hybrid Vision Transformer-Swin (HybridViTSwin) model was developed, combining the global self-attention mechanisms of Vision Transformers (ViTs) with the localized hierarchical attention of Swin Transformers. This architecture effectively captures both broad contextual patterns and fine-grained structural details of Vattezhuthu glyphs. Experimental results demonstrate the model’s superiority, achieving 100% accuracy compared to standalone ViT (94.25%) and Swin (95.78%), confirming its robustness in handling stylistic and degradational complexities. This work contributes theoretically through its hybrid attention mechanism and practically by releasing the publicly accessible Vattezhuthu dataset. Its implications extend beyond academia, offering museums and cultural institutions a scalable tool for digitizing endangered manuscripts. The framework’s adaptability to other ancient scripts, such as Brahmi or Grantha, underscores its broader relevance. By bridging technological innovation with cultural preservation, this research not only safeguards a critical aspect of South Indian heritage but also establishes a replicable methodology for global historical script analysis.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By