We examine to what extent the GICS sector categorization of equity securities may be systematically reconstructed from historical quarterly firm fundamental data using gradient boosted tree classification. Model complexity and performance tradeoffs are examined and relative feature importance is described. Potential extensions are outlined including ideas to improve feature engineering, validating internal consistency and integrating additional data sources to further improve classification accuracy.
AltmanE.I., NaryananP., 1977. Zeta analysis: A new model to identify bankruptcy risk of corporations, Journal of Banking and Finance1(1), 29–54.
2.
BouchaudJ.P., PottersM., 2000. From Statistical Physics to Risk Management. Cambridge University Press.
3.
ChatterjeeS., HadiA., 1988. Sensitivity Analysis in Linear Regression. John Wiley and Sons.
4.
FamaE.F., FrenchK.R., 1989. Business conditions and expected returns on stocks and bonds, Journal of Financial Economics25(1), 23–49.
5.
FamaE.F., FrenchK.R., 1993. Common risk factors in the returns on stocks and bonds. Journal of Financial Economics33, 3–56.
6.
GombolaM., KetzJ.E., 1983. A note on cash flow and classification patterns of financial ratios, The Accounting Review58(1), 105–114.
7.
HarrisL., 1991. Stock price clustering and discreteness, The Review of Financial Studies4(3), 389–415.
8.
HrazdilK., ZhangR., 2012. The importance of industry classification in estimating concentration ratios, Economics Letters114(2), 224–227.
9.
HrazdilK., TrottierK., ZhangR., 2013. A comparison of industry classification schemes: A large sample study, Economics Letters118(1), 77–80.
10.
JungS.S., ChangW., 2016. Clustering stocks using partial correlation coefficients, Physica A: Statistical Mechanics and its Applications462(15), 410–420.
11.
KeG., 2017. A highly efficient gradient boosting decision tree. 31st Conference on Neural Information Processing Systems (NIPS 2017).
12.
KoganS., 2009. Predicting risk from financial reports with regression. Proceeding NAACL ’09 Proceedings of Human Language Technologies, pages 272–280.
13.
KumarP.R., RaviV., 2007. Bankruptcy prediction in banks and firms via statistical and intelligent techniques a review. European Journal of Operational Research180(1), 1–28.
14.
MalmendierU., TateG., 2005. Ceo overconfidence and corporate investment. The Journal of Finance60(6), 2661–2700.
15.
MartikainenT., AnkeloT., 1991. On the instability of financial patterns of failed forms and the predictability of corporate failure, Economic Letters35, 209–214.
16.
Standard & Poor’s. Compustat (Global) Data Guide. McGraw-Hill Companies, 2002.
17.
Standard and Poors. 2018. Global Industry Classification Standard (GICS) Methdology. Standard and Poors Down Jones Indicies.
18.
TumminelloM., LillioF., MantegnaR., 2010. Correlation, hierarchies, and networks in financial markets, Journal of Economic Behavior and Organization75, 40–58.
19.
YouH., ZhangX., 2008. Financial reporting complexity and investor underreaction to 10-k information, Review of Accounting Studies14(4), 559–586.