Abstract
With the increasing amount of financial data produced today, the problem of finding the k-nearest neighbors to the query point in high-dimensional space is itself of importance to access the financial credit risk. Binary embeddings are efficient tools of indexing big datasets for financial credit risk analysis. The idea is to find a good hash function such that similar data points in Euclidean space preserve their similarities in the Hamming space for fast data retrieval. By exploring out-of-sample extension to test data it is possible to set forth a go-forward strategy to establish a fast retrieval model of companies' status thereby rendering the stakeholders' evaluation task very efficiently. First, we use semi-supervised learning-based hashing to take into account the pairwise information for constructing the weight adjacency graph matrix needed or building the binarised Laplacian EigenMap. Second, we train a generalised regression neural network (GRNN) to learn the k-bits hash function. Third, the k-bit binary code for the test data is efficiently found in the recall phase. Experimental results on financial data demonstrated the proposed approach showed the applicability and advantages of learning-based hashing to credit risk assessment.
Keywords
Get full access to this article
View all access options for this article.
