Offline handwritten character recognition of anciet Malayalam script vattezhuthu using hybrid vision transformer-swin model
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Sullamussalam Science College, University of Calicut)
Abstract
Ancient scripts provide primary sources for understanding historical events, societies, and
cultures that may not have been recorded elsewhere. They help establish timelines and
sequences of events, offering a clearer picture of historical developments.
Among the numerous ancient scripts, Vattezhuthu stands out as one of the earliest in India,
from the 8th to 15th centuries. This script contains information and knowledge that spans
various fields, including history, culture, literature, law, science, mathematics, and medicine.
However, its time-induced degradation, stylistic variability, and the scarcity of digitized
samples pose significant challenges to preservation and study. This research addresses these
obstacles by developing an automated framework for recognizing and digitizing
Vattezhuthu script, integrating advanced image processing, innovative data augmentation,
and a hybrid deep learning architecture. The study’s primary objective is to enhance
recognition accuracy while mitigating the limitations of degraded historical artifacts and
insufficient datasets.
A comprehensive dataset was curated from stone inscriptions, copper plates, and palm leaf
manuscripts sourced from repositories such as the Hill Palace Archaeological Museum,
Tripunithura Palace, the State Archives Department and the University of Calicut. To address
image degradation, a multi-stage preprocessing pipeline was implemented, including
grayscale conversion, super-resolution techniques for detail enhancement, and noise
reduction using median filtering and Gaussian smoothing. An adaptive binarization method
was proposed, outperforming traditional algorithms (Otsu, Niblack, Sauvola) with high
accuracy ensuring robust feature extraction from low-contrast, degraded manuscripts. The
framework’s efficacy was validated using metrics such as Peak Signal-to-Noise Ratio (PSNR),
Mean Square Error (MSE), and Structural Similarity Index Measure (SSIM). A novel strokebased data augmentation technique was introduced to simulate natural handwriting
variations, increasing dataset diversity and improving model generalizability.
For classification, the Hybrid Vision Transformer-Swin (HybridViTSwin) model was
developed, combining the global self-attention mechanisms of Vision Transformers (ViTs)
with the localized hierarchical attention of Swin Transformers. This architecture effectively
captures both broad contextual patterns and fine-grained structural details of Vattezhuthu
glyphs. Experimental results demonstrate the model’s superiority, achieving 100% accuracy
compared to standalone ViT (94.25%) and Swin (95.78%), confirming its robustness in
handling stylistic and degradational complexities. This work contributes theoretically through its hybrid attention mechanism and practically
by releasing the publicly accessible Vattezhuthu dataset. Its implications extend beyond
academia, offering museums and cultural institutions a scalable tool for digitizing
endangered manuscripts. The framework’s adaptability to other ancient scripts, such as
Brahmi or Grantha, underscores its broader relevance. By bridging technological innovation
with cultural preservation, this research not only safeguards a critical aspect of South Indian
heritage but also establishes a replicable methodology for global historical script analysis.
