Financial credit risk assessment via learning-based hashing

Abstract

With the increasing amount of financial data produced today, the problem of finding the k-nearest neighbors to the query point in high-dimensional space is itself of importance to access the financial credit risk. Binary embeddings are efficient tools of indexing big datasets for financial credit risk analysis. The idea is to find a good hash function such that similar data points in Euclidean space preserve their similarities in the Hamming space for fast data retrieval. By exploring out-of-sample extension to test data it is possible to set forth a go-forward strategy to establish a fast retrieval model of companies' status thereby rendering the stakeholders' evaluation task very efficiently. First, we use semi-supervised learning-based hashing to take into account the pairwise information for constructing the weight adjacency graph matrix needed or building the binarised Laplacian EigenMap. Second, we train a generalised regression neural network (GRNN) to learn the k-bits hash function. Third, the k-bit binary code for the test data is efficiently found in the recall phase. Experimental results on financial data demonstrated the proposed approach showed the applicability and advantages of learning-based hashing to credit risk assessment.

Keywords

Hashing method financial credit risk generalised regression neural network binary embedding k-bits code

Get full access to this article

View all access options for this article.

References

Andoni

and Indyk

, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions, Communications of the ACM 51(1) (2008), 117-121.

Bakhtiary

A.H.

, Lapedriza

and Masip

, Speeding up neural networks for large scale classification using wta hashing, Computer Science (2015).

Baluja

and Covell

, Learning to hash: Forgiving hash functions and applications learning to hash: Forgiving hash functions and applications, Data Mining and Knowledge Discovery (2008).

Bodo

and Csato

, Linear spectral hashing, Neurocomputing, 2014.

Cai

, He

, Han

and Huang

T.S.

, Graph regularized non-negative matrix factorization for data representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 33(8) (2011), 1548-1560.

Chang

C.-C.

and Lin

C.-J.

, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1-27:27, software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.

Cheng

K.F.

, Chu

C.K.

and Hwang

, Predicting bankruptcy using the discrete-time semi-parametric hazard model, Quantitative Finance 10(9) (2010), 1055-1066.

Chung

, Spectral Graph Theory, 1st edition, American Mathematical Socitey, Providence, 1997.

Deerwester

, Dumais

S.T.

, Furnas

G.W.

, Landauer

T.K.

and Harshman

, Indexing by latent semantic analysis, Journal of the American Society for Information Science 41(6) (1990), 391-407.

10.

Deerwester

S.C.

, Dumais

S.T.

, Landauer

T.K.

, Furnas

G.W.

and Harshman

R.A.

, Indexing by latent semantic analysis, JAsIs 41(6) (1990), 391-407.

11.

Gordo

, Perronnin

, Gong

and Lazebnik

, Asymmetric distances for binary embeddings, IEEE Transactions on Pattern Analysis and Machine Intelligence 36(1) (2014), 33-47.

12.

Guo

, Chi

and Zhu

, Graph hashing and factorization for fast graph stream classification, in: ACM International Conference on Conference on Information & Knowledge Management, (2013), 1607-1612.

13.

, Liu

and Chang

S.-F.

, Scalable similarity search with optimized kernel hashing, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, ACM, New York, NY, USA (2010), 1129-1138. http://doi.acm.org/10.1145/1835804.183 5946.

14.

Hwang

, Chung

and Chu

, Predicting issuer credit ratings using a semi-parametric method, Journal of Empirical Finance 17(1) (2010), 120-137.

15.

Indyk

and Motwani

, Approximate nearest neighbors: Towards removing the curse of dimensionality, in: 30th STOC, ACM Press (1998), 604-613.

16.

Katsuyama

, Hotta

, Omachi

and Omachi

, High speed and high accuracy pre-classification method for ocr: Margin added hashing, Ieice Transactions on Information & Systems E96.D(9) (2013), 2087-2095.

17.

Nene

S.A.

and Nayar

S.K.

, A simple algorithm for nearest neighbor search in high dimensions, 1995.

18.

Raginsky

and Lazebnik

, Locality sensitive binary codes from shift-invariant kernels, in: Advances in Neural Information Processing Systems (NIPS), (2009), 1509-1517.

19.

Ribeiro

and Chen

, Graph weighted subspace learning models in bankruptcy, in: Proc IEEE International Joint Conference on Neural Networks (IJCNN), (2011), 2055-2061.

20.

Ribeiro

, Silva

, Chen

, Vieira

and das Neves

J.C.

, Enhanced default disk models with SVM+, Expert Systems with Applications 39 (2012), 10140-10152.

21.

Ribeiro

and Chen

, Hashing for financial credit risk analysis, in: Neural Information Processing, volume 8835 of Lecture Notes in Computer Science, Loo

, Yap

, Wong

, Teoh

and Huang

, eds, Springer International Publishing, 2014, pp. 395-403.

22.

van Rijsbergen

, Information Retrieval, Butterworths Ed., 1979.

23.

Salakhutdinov

and Hinton

, Semantic hashing, Int J Approx Reasoning 50(7) (2009), 969-978.

24.

Weiss

, Fergus

and Torralba

, Multidimensional spectral hashing, Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7576 LNCS(PART 5) (2012), 340-353, cited By (since 1996)3.

25.

Weiss

, Torralba

and Fergus

, Spectral hashing, in: Advances in Neural Information Processing Systems 21 (NIPS), (2009), 1753-1760.

26.

Zhang

, Wang

, Cai

and Lu

, Laplacian co-hashing of terms and documents, in: Advances in Information Retrieval, volume 5993 of Lecture Notes in Computer Science, Gurrin

, He

, Kazai

, Kruschwitz

, Little

, Roelleke

, Rüger

and van Rijsbergen

, eds, Springer Berlin Heidelberg, 2010, pp. 577-580.

27.

Zhang

, Wang

, Cai

and Lu

, Self-taught hashing for fast similarity search, in: Proc of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM (2010), 18-25.