In univariate and in multivariate analyses, it is difficult to identify outliers in the case of skewed or heavy-tailed distributions. In this article, we propose simple univariate and multivariate outlier identification procedures that perform well with these types of distributions while keeping the computational complexity low. We describe the commands gboxplot (univariate case) and sdasym (multivariate case), which implement these procedures in Stata.
BillorN., HadiA. S., and VellemanP. F.2000. BACON: Blocked adaptive computationally efficient outlier nominators. Computational Statistics and Data Analysis34: 279–298.
2.
BruffaertsC., VerardiV., and VermandeleC.2014. A generalized boxplot for skewed and heavy-tailed distributions. Statistics and Probability Letters95: 110–117.
3.
CoxN. J.2009. Speaking Stata: Creating and varying box plots. Stata Journal9: 478–496.
4.
CoxN. J.2013. Speaking Stata: Creating and varying box plots: Correction. Stata Journal13: 398–400.
GeladeW., VerardiV., and VermandeleC.2015. Time-efficient algorithms for robust estimators of location, scale, symmetry, and tail heaviness. Stata Journal15: 77–94.
7.
HadiA. S.1992. Identifying multiple outliers in multivariate data. Journal of the Royal Statistical Society, Series B54: 761–771.
8.
HubertM., and Van der VeekenS.2008. Outlier detection for skewed data. Journal of Chemometrics22: 235–246.
9.
HubertM., and VandervierenE.2008. An adjusted boxplot for skewed distributions. Computational Statistics and Data Analysis52: 5186–5201.
10.
MaronnaR. A., and YohaiV. J.1995. The behavior of the Stahel–Donoho robust multivariate estimator. Journal of the American Statistical Association90: 330–341.
11.
MaronnaR. A., and YohaiV. J.2000. Robust regression with both continuous and categorical predictors. Journal of Statistical Planning and Inference89: 197–214.
TukeyJ. W.1977. Modern techniques in data analysis. In Proceedings of the NSF-sponsored regional research conference.North Dartmouth, MA: Southeastern Massachusetts University.
14.
VerardiV., and DehonC.2010. Multivariate outlier detection in Stata. Stata Journal10: 259–266.
15.
VerardiV., and VermandeleC.2016. Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions. Journal de la Société Française de Statistique157: 90–114.