A study on robust regression methods
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
St. Thomas College - Autonomous, University of Calicut
Abstract
A key goal of regression analysis is to estimate the unknown parameters of the
model, a process often referred to as fitting the model to the data. For that, the
most commonly used regression method is the method of ordinary least squares
(OLS). The classical technique to estimate the unknown regression parameters is
the OLS estimator, which minimizes the sum of squares of the residuals. OLS is
a fundamental method in the field of statistical estimation. These estimators are
known as the best linear unbiased estimators (BLUE). Despite of the advantages
like simplicity, mathematical properties, major drawback of OLS estimates, they
are sensitive to the presence of outliers. However, when the classical assumptions of
regression models are satisfied, OLS calculated directly from the data yields efficient
results and generally outperforms nearly all other estimation methods. Generally,
regression data can have good leverage points, bad leverage points, vertical outliers,
and regression outliers. There is no necessity that a single data contains all these
four type of extreme values in a regression data. Thus to overcome this draw-back,
it is always better to make use of robust regression methods. The thesis titled “A
Study on robust regression methods” has been arranged into 5 chapters. Chapter 1
introduce the basic meaning of simple linear regression, multiple linear regression,
multivariate regression, outliers, leverage points, robust estimators, their proper-
ties, importance of robust regression methods, and an extensive review of literature
on robust regression methods in multiple linear regression. Disadvantages of exist-
ing methods too discussed. Objectives of the thesis also discussed there, following
outline of the thesis.
Chapter 2 discuss how the classical location and scatter matrix estimators performs
in the presence of outliers, and discussed the need of using robust location, scat-
ter estimators. Introduced a novel robust shrinkage based Sn covariance estimator
and shrinkage based L1 location estimator. Sensitivity curve of the proposed es-
timator discussed. To check the efficiency of proposed estimators, a new robust
Mahalanobis distance proposed based on robust shrinkage Sn covariance estimator
and L1 location vector. Evaluated the performance and properties of distance measure, extensively through simulation study. Real - life benchmark datasets used to
evaluate the efficacy of the distance measure. Apart from benchmark dataset, we
collected Lung cancer data of patients from Malabar Cancer center and evaluated
the performance of the measure in identifying deviations in the data with respect
to the variables Albumin, Absolute and surival days of the patients.
Chapter 3 introduces the robust shrinkage reweighted regression estimator in mul-
tiple regression, utilizing the robust covariance matrix and location median pro-
posed in Chapter 2. The performance of the reweighted estimator is evaluated by
comparing it with OLS and several other existing robust techniques. An exten-
sive simulation study was conducted to assess the robustness and efficiency of the
proposed method. Additionally, the breakdown point, affine equivariance, and sen-
sitivity curve were empirically examined. The proposed estimator was also applied
and evaluated on benchmark datasets.
Chapter 4 presents a robust shrinkage two-step reweighted regression estimator for
multivariate regression. The performance of the two-step reweighted estimator is
evaluated against the classical maximum likelihood based estimator and other ex-
isting robust estimators. A monte carlo simulation study was conducted to assess
the robustness and efficiency of the method. The affine equivariance property of
the proposed estimator was also evaluated empirically. Thus, we evaluated our pro-
posed method in benchmark life dataset.
Finally, Chapter 5 presents the conclusion of the thesis and discusses the proposed
estimators. It also outlines the scope of the research and suggests directions for
future work.
