Abstract
This paper expounds the principles and characteristics of several typical decision forest algorithms. It then proposes a modified random forest method from the perspective of the decision forest construction process. The decision tree serves as the foundation for node splitting. In this paper, we combine the support vector machine algorithm in the attribute selection of decision tree to split the hyperplane composed by the linear combination of feature variables, we combine the support vector machine algorithm in the attribute selection of decision tree to split the hyperplane composed by the linear combination of feature variables, which shows stronger classification ability than a single attribute. This advantage is statistically significant (p < 0.05) in our experimental dataset of 1000 negotiation cases from multinational enterprises (2018–2023), where the out-of-bag accuracy is improved by 15% compared to single-attribute splitting. The calculation process of out-of-bag error estimation can be described as: in the established random forest algorithm, the out-of-bag data is 1000, and then the generated out-of-bag data and the correctly classified number is 850, and the correct rate of out-of-bag data is 850/1000. Multiple experiments have shown that the test results of out-of-bag data are unbiased estimates, without having to use the test set to test its accuracy or perform cross-validation to calculate its unbiased estimate. The hybrid algorithm leverages linear combinations of feature variables to construct hyperplanes, significantly enhancing classification accuracy compared to traditional single-attribute splitting methods. Experimental analysis demonstrates that the improved algorithm achieves a 15% higher out-of-bag (OOB) accuracy than conventional random forests. The calculation process of out-of-bag (OOB) error estimation can be described as: in the established random forest algorithm, the out-of-bag data consists of 1000 samples, among which 850 are correctly classified, resulting in an OOB accuracy = 850/1000 = 85%.
Keywords
Get full access to this article
View all access options for this article.
