A personalised malayalam travel recommendation model using deep clustering techniques.
Abstract
The research aims to develop a Personalized Travel Recommender
System tailored for the Malayalam-speaking audience in Kerala, India. Due
to the unavailability of a benchmark dataset in Malayalam, a two-pronged
data collection strategy was employed: extraction of 13458 travelogues and
reviews from Facebook's largest travel group in Kerala, 'Sanchari’, and
independent travel blogs and collecting 2,006 records via a Google form. As
part of this work, firstly, it seeks to develop an automated framework for
processing Malayalam text scraped from social media, addressing the
language's rich morphological complexity. Second, it intends to create an
intelligent system that utilizes opinion mining techniques to continuously
learn user preferences, thereby enabling highly personalized travel
suggestions. Finally, the work aims to design a recommender model that
leverages machine learning algorithms to identify tourist destinations
tailored to the preferences of travelers who exhibit similar tastes, employing
both collaborative and content-based filtering techniques. Data collection
process was the biggest hurdle in the initial phase. Collecting Malayalam
lengthy travelogue was the aim to create a dataset. The focus reached the
Facebook group named sanchari, which is the largest travel group in
Malayalam Language. Data collection faced several challenges such as
memory leak, bot detection, performance optimisation and page rendering
time. By utilizing some special features exclusively available for group
admins utilized to solve these issues. All travelogues written in English,
Manglish or any language other than Malayalam removed from the
spreadsheet before preprocessing.
To transform this unstructured data into a usable dataset, a variety of
Natural Language Processing techniques, along with a Part of Travelogue
Tagger (POT Tagger) and Look-up Dictionary, were used for preprocessing.
Feature engineering included vectorization and one-hot encoding of
important variables to create 'Travel DNA' and 'Location DNA.' Travel DNA is composed by aggregating key travel attributes such as travel type, travel
mode, and user preferences into a numerical vector through techniques like
one-hot encoding and vectorization, thereby providing a compact
representation of travel patterns of users. Location DNA is composed by
gathering essential characteristics of various travel destinations, such as
location type, climate, and popularity, and converting them into a numerical
vector using methods like one-hot encoding and vectorization.
Four distinct recommender models were developed leveraging
various algorithms in Artificial Intelligence: Rule-based Cosine Similarity,
Collaborative Filtering based on K-Means Clustering, Content-based
Filtering through Hierarchical Agglomerative Clustering, and a model
utilizing Bidirectional Long Short-Term Memory (BiLSTM) networks.
Additionally,
a
comparative
model
was
designed
that
combined
autoencoders with five different machine learning algorithms. These models
underwent rigorous individual testing to evaluate their performance.
The rule-based cosine similarity recommender model utilizes the
angle between user and item vectors in a multi-dimensional space to
measure similarity, thereby providing personalized travel suggestions based
on pre-defined rules and user preferences. The clustering techniques
employed in this research include K-Means for collaborative filtering to
group similar users, and Hierarchical Agglomerative Clustering for content-
based filtering to categorize travel destinations. The BiLSTM recommender
model leverages neural networks to capture both past and future context in
the data. The autoencoder-based travel recommender model employs neural
network architectures to compress and reconstruct the user-item interaction
data, effectively capturing latent features that are used for generating more
accurate and personalized travel suggestions. The RS designed with various
techniques to identify the best suggestions to the users. From these models,
collaborative filtering using K-Means Clustering and the model designed with Autoencoder exhibit promising results.
Collections
- Doctoral Theses [8]