Sage Journals: Discover world-class research

Abstract

Background

To evaluate whether machine learning (ML) methods (Elastic Net (EN), eXtreme Gradient Boosting (XGBoost), Feed Forward Neural Net (FNN)) can improve claims-based inpatient quality measurement by Logistic Regression.

Methods

This retrospective cohort study used German claims data from the years 2015-2021. The study population encompassed inpatient cases of acute myocardial infarction (n = 165,130) and proximal humerus fracture (n = 34,912), for which quality related outcomes were assessed. The performances of risk adjustment models based on machine learning methods (EN, XGBoost, FNN) were compared to stepwise backwards Logistic Regression by Receiver Operating Characteristics-Area under the Curve (ROC-AUC), Precision Recall-Area under the Curve (PR-AUC), Brier Score (BS). The institution-specific quality was measured by Standardised Mortality Ratios (SMR) which were used to visualise the impact of the tested methods on quality assessment.

Results

For most of the outcomes none or only marginal gains were found for the machine learning methods. Highest gain in model performance showed the FNN in comparison to Logistic Regression with a gain in ROC-AUC of 2.4%, in PR-AUC of 4.5%, and slightly in the BS with a loss of 0.007. The FNN was followed by XGBoost with a gain in ROC-AUC of 2.3%, anyhow this improvement was not reflected in a lower BS.

Conclusions

None of the machine learning methods tested is generally superior for creating quality indicators. Marginal gain in model performance should not be the main basis for choosing an adequate method; instead, interpretability should be emphasised, especially when dealing with new datasets with little knowledge of important risk factors.

Keywords

Machine learning quality assessment logistic regression XGBoost neural net elastic net standardised mortality ratio

Introduction

Claims-based quality assessment in healthcare is fundamental for evaluating and improving the effectiveness of medical treatments and services.^1–5 The longitudinal structure of claims data allows a retrospective analysis of treatment courses, risk factors and outcomes. Since the risk profiles of patients can differ profoundly between hospitals an appropriate risk adjustment is essential when it comes to comparative hospital quality assessment. Such appropriate risk adjustment takes into account the initial health situation of patients at the start of treatment and thus enables a comparison of hospitals that treat patients with different degrees of illness. To date, the widely used gold standard for modelling and risk adjustment of quality indicators has been multivariable Logistic Regression using stepwise backwards selection because of their ease of use and interpretability of the regression coefficients.^1–8

ML methods hold the potential to improve model performance and risk adjustment for a comprehensive quality assessment and analysis in big data sets compared to the current standard of Logistic Regression and by that might make quality indicators more precise. But a rigorous evaluation of the model’s performances with the task of risk adjustment is necessary because of its significant impact on spending, resource allocation and performance measurement. Therefore, this study aimed at comparing the model performance of the ML methods EN, XGBoost and FNN to the model performance of a Logistic Regression and assesses their impact on hospital rankings.

In this retrospective cohort study German claims data from the years 2015–2021 were used to develop and evaluate quality indicators for the two inpatient treatment causes of acute myocardial infarction (AMI), and endoprosthetic and osteosynthetic treatment of proximal humerus fractures (PHF).

Methods

Data pool

German statutory health insurance data (claims data, SHI) (2015–2021) were supplied by the AOK Research Institute (WIdO); the data were derived from 11 local health care funds (AOK – Die Gesundheitskasse) covering data of 27 million German patients in total. Diagnoses of the claims data follow the 10th International Classification of Diseases German Modification (ICD-10-GM), procedures are coded according to the German version of the International Classification of Procedures in Medicine, known as operation and procedure code (OPS) and pharmacotherapy is coded by the German modification of the Anatomical Therapeutic Chemical (ATC) classification system. These codes are compliant with reimbursement guidelines and were reviewed by the Medical Review Board of the Health Insurance Funds.

The retrospective claims data encompassed demographic, medical, cost, pharmacotherapy and physiotherapy details for both inpatient and outpatient treatments. The years 2015–2016 were considered as pre-observation period for the index hospitalisation (index stay) which was required to have occurred between 2017 and 2020. The year 2021 was defined as the follow-up period subsequent to the index stay.

An ethics vote from the Ethics Committee at the Technical University of Dresden (BO-EK-175042022) was obtained. The study complies with the Declaration of Helsinki and the recommendations of Good Practice Secondary Data Analysis (GPS) and Good Epidemiological Practice (GEP).^9–11

Inclusion criteria, risk factors and outcomes

Quality models for assessing quality outcomes, incorporating relevant timeframes and risk factors, were developed by medical expert panels based on clinical expertise for both inpatient treatment causes. This procedure based on medical expert knowledge for creating specific risk factors for each treatment cause instead of using all available variables as input for the ML methods was chosen, because in the case of quality assessment it is crucial to differentiate between variables, which describe the initial condition of the patients and conditions which could be influenced by the medical treatment or other care site-specific procedures. Only variables which describe the initial condition of the patient are valid risk factors for comparative quality assessment.

Patient risk profiles were evaluated by claims data including ICD-10-GM, OPS codes and ATC codes (only outpatient stays) from both inpatient and outpatient stays to identify risk factors during the pre-observation period and the index stay. The definition of outcomes was based on ICD-10-GM and OPS codes recorded during the index stay and any subsequent inpatient stays within the 365-day follow-up period after admission. The timeframe of each outcome was selected to ensure that the outcome can be linked to the index stay, making them suitable for quality measurement.

For acute myocardial infarction (AMI) treatment, inpatient stays with a primary diagnosis of AMI (ICD-10-GM I21) and a procedure code for coronary angiography (CA), patients aged ≥20 years were included. These included cases of ST-segment elevation myocardial infarction (STEMI), non-ST-segment elevation myocardial infarction (NSTEMI), or other types of AMI (Table 1).

Table 1.

Inclusion and exclusion criteria for both treatment causes.

Treatment Cause	ICD	OPS	Age	Exclusion criteria	Transfers	Claims-based risk factors
AMI	I21	1-275.0, 1-275.1, 1-275.2, 1-275.3, 1-275.4, 1-275.5	≥20 years		Patients with several tansfers within 1 day were excluded	Men (reference), women, STEMI (reference), NSTEMI, 3rd quintile (65-73 years), 4th quintile (74-80 years), 5th quintile (over 81 years), BMI 30-34, BMI 35-39,BMI plus 40, atherosclerotic heart disease: two-vessel disease, atherosclerotic heart disease: three-vessel disease, atherosclerotic heart disease: stenosis of the left main stem, cardiovascular arrest before admission to hospital, ventricular flutter and ventricular fibrillation, shock, NHYA > 1, chronic kidney disease (stage 1-2), chronic kidney disease (stage 3-5), acute renal failure (stage 1), acute renal failure (stage 2), acute renal failure (stage 3), post bypass surgery, 1 stent in an artery (reference: no stent), at least 2 stents in one artery (reference: no stent), at least 2 stents in several arteries (reference: no stent), malignant neoplasm, dementia/Alzheimer's disease, diabetes with insulin requirement, diabetes without insulin requirement, dialysis in the previous observation period, chronic liver disease, chronic lung disease, cerebral infarction, stroke, alcohol abuse, peripheral vascular disease, antithrombotics
PHF	S42.2	5-794.21, 5-794.k1, 5-824.21, 5-824.01, 5-824.20, 5-794.a1, 5-794.b1, 5-794.01, 5-794.11, 5-793.31, 5-793.k1, 5-793.a1, 5-793.b1, 5-793.11, 5-793.21, 5-790.01, 5-790.11, 5-790.91, 5-790.31, 5-790.41, 5-790.51, 5-790.k1, 5-790.n1	≥51 years	- missing or bilateral site localization		Men (reference), women, 51-74 (1st-4th quantile) (reference), 75-79 years (4th-6th quantile), 80-84 years (6th-8th quantile), > 84 years (8th-10th quantile), osteosynthesis with plate (reference), inverse endoprosthesis, humeral head prosthesis, conventional (not inverse), osteosynthesis with intramedullary nail of a multiple fracture, other treatment of a multiple fracture, treatment with plate of a single fracture, open reduction with intramedullary nail/screw, closed reduction with plate/nail/other, intermediate risk HFRS (5-15), high risk HFRS (>15), nicotine abuse, rupture of the rotator cuff, analgesics (14 days), antibiotics (14 days), anticoagulants (90 days), bisphosphonates (90 days), denosumab (90 days), opioids (90 days), selective oestrogen receptor modulators (90 days), vitamin D/calcium (90 days), obesity, atherosclerosis, diabetes mellitus, type 1: With coma, diabetes mellitus, type 1: With renal complications, hypertension, congestive heart failure, coronary heart disease, seropositive chronic polyarthritis, other chronic polyarthritis, atrial fibrillation and atrial flutter
				- diagnosis of polytrauma, cancer, juvenile arthritis, bone cysts or bone fractures with neoplasms
				- ipsilateral surgery in the pre-monitoring period

The assessed risk factors incorporated among others age, gender, body mass index (BMI), STEMI/NSTEMI classification, coronary diagnostics & interventional procedures, cardiovascular history, clinical representation, extent of coronary artery disease, risk factors, comorbidities and antithrombotic use (Table 1).

For PHF, all inpatient cases with a primary diagnosis of PHF (ICD-10-GM S42.2) and a procedure code for arthroplasty or osteosynthetic treatment were included and categorised into eight different types of treatment. The assessed risk factors incorporated among others age, gender, type of treatment, comorbidities in the pre-monitoring period (outpatient & inpatient), pharmacotherapy in the pre-monitoring period (90/14 days before index), Elixhauser comorbidities and the Hospital Frailty Risk Score (HFRS; German modification)¹² in the index stay (Table 1).

The following outcome definitions were based on ICD-10-GM codes and OPS codes recorded during the index stay and subsequent inpatient stays within the 365-day follow-up period after admission:

• AMI: Death within 30 days, major adverse cardiac and cerebrovascular events (MACCE) within 30 days, MACCE plus cardiac insufficiency within 30 days, MACCE within 365 days, MACCE plus cardiac insufficiency within 365 days

• PHF: Death during the index stay, death within 90 days, early surgical complications & revisions within 90 days, surgical complications & revisions within 365 days, general complications during the index stay, general complications within 90 days, other complications during the index stay, any secondary surgery within 365 days

Data analysis

Raw data was aggregated on patient level. Risk factors and outcomes were encoded as distinct dichotomous categorical variables using dummy coding (0/1). Patient age was divided into quantiles. During data preparation, variables with near-zero variance (variance<0.005) were excluded and remaining variables were assessed for high variation inflation factors (>3.5) and correlations exceeding 0.7 between risk factors. In the case of correlations above 0.7, the most frequent variable was always retained.

Quality indicators were built by the gold standard of clustered multivariable Logistic Regression using stepwise backwards selection (p < .04). As the internal variance within a care site tends to be lower than between care sites, a cluster effect and robust sandwich estimators from Huber and White were incorporated.¹³

The model performance of the Logistic Regression was compared to an EN, XGBoost and simple FNN. For these ML methods, the hyperparameter were tuned via grid search. Model training was performed for all models on a randomly selected training dataset of 80% stratified by the respective outcome and model testing on the remaining 20%.

Model performances were compared using ROC-AUC (Receiver Operating Characteristic-Area Under the Curve), PR-AUC (Precision Recall-Area Under the Curve), and Brier Score (BS). To explore the impact of the model on quality assessment, standardised mortality/morbidity ratios (SMR) with 95% confidence intervals were calculated on care site level. The SMR value for a care site represents the ratio of observed events to expected events. Values between 0 and 1 indicate that fewer events occurred than expected, while values above 1 indicate a higher number of observed events than anticipated.¹⁴ The SMRs were compared using the Spearman rank correlation coefficient to establish care site-specific rankings. Care sites with fewer than 30 cases were excluded from the analyses.

All statistical analyses for AMI and PHF were performed using Python 3.10.4 alongside respective packages.

Results

Study population

For the treatment cause AMI 78,422 cases got excluded due to treatment cause-specific exclusion criteria, leaving a study population of N = 165,130 index cases for data analyses. For PHF 8,917 cases were removed from the initial 43,829 cases due to cause-specific exclusion criteria, leaving a study population of N = 34,912 index cases for data analyses (Table 2).

Table 2.

Study populations by treatment cause.

Index stays	AMI	PHF
Full inpatient stays (01.01.17-31.12.20) with relevant principal diagnosis and/or procedure in participating hospitals	243,552 (100%)	43,829 (100%)
Treatment cause-specific exclusions
Age <20 years, no coronary angiography	−78,422
Age <50 years, missing/bilateral localization or ipsilateral treatment (2 years prior to index stay), comorbidities in the index stay		−8.917
Study population	165,130 (67%)	34,912 (80%)

Model performance of the methods

For AMI (n = 165,130), the highest improvement in ROC-AUC was 2.4% between Logistic Regression and FNN for the outcome “MACCE 365 days” (ROC-AUC 0.762 and 0.786, respectively). This improvement was also reflected in an increased PR-AUC of plus 4.5%, and slightly in the BS with 0.147 versus 0.154, respectively. The FNN also showed for the outcomes “Death 30 days” and “MACCE plus cardiac insufficiency 365 days” a marginally higher model performance of plus 0.8% and 1.5%, respectively. For the outcomes “MACCE 30 days” and “MACCE plus cardiac insufficiency 30 days”, XGBoost resulted in the best model performance with an increase of 2.3% in ROC-AUC for both outcomes. Anyhow, when comparing the BS for both outcomes the FNN showed the best result (Table 3).

Table 3.

Comparison of model performance of Logistic Regression, Elastic Net, XGBoost and Neural Net for both treatment causes.

AMI (N = 165,130)	Metric	Log. Regression	Elastic net	XGBoost	Neural net
Death within 30 days after admission	ROC-AUC	0.889	0.889	0.895	0.896
	PR-AUC	0.546	0.540	0.564	0.576
	Brier score	0.049	0.050	0.069	0.047
MACCE within 30 days after admission	ROC-AUC	0.798	0.798	0.822	0.821
	PR-AUC	0.510	0.508	0.565	0.566
	Brier score	0.107	0.107	0.125	0.101
MACCE plus cardiac insufficiency within 30 days after admission	ROC-AUC	0.791	0.791	0.814	0.810
	PR-AUC	0.512	0.511	0.565	0.564
	Brier score	0.114	0.115	0.134	0.108
MACCE within 365 days after admission	ROC-AUC	0.762	0.761	0.778	0.786
	PR-AUC	0.551	0.549	0.586	0.596
	Brier score	0.154	0.154	0.187	0.147
MACCE plus cardiac insufficiency within 365 days after admission	ROC-AUC	0.770	0.770	0.785	0.785
	PR-AUC	0.595	0.594	0.625	0.627
	Brier score	0.167	0.168	0.203	0.161
PHF (N = 34,912)	Metric	Log. Regression	Elastic net	XGBoost	Neural net
Death index stay	ROC-AUC	0.890	0.890	0.887	0.868
	PR-AUC	0.127	0.122	0.118	0.128
	Brier score	0.012	0.012	0.013	0.012
Death within 90 days after admission	ROC-AUC	0.828	0.831	0.831	0.831
	PR-AUC	0.156	0.160	0.156	0.168
	Brier score	0.036	0.035	0.088	0.035
Early surgical complications & revisions within 90 days after admission	ROC-AUC	0.675	0.676	0.676	0.662
	PR-AUC	0.221	0.220	0.222	0.193
	Brier score	0.090	0.090	0.102	0.092
Surgical complications & revisions within 365 days after admission	ROC-AUC	0.632	0.629	0.633	0.627
	PR-AUC	0.275	0.274	0.276	0.275
	Brier score	0.142	0.142	0.170	0.143
General complications index stay	ROC-AUC	0.819	0.819	0.819	0.824
	PR-AUC	0.275	0.273	0.283	0.277
	Brier score	0.060	0.061	0.071	0.060
General complications within 90 days after admission	ROC-AUC	0.792	0.793	0.796	0.784
	PR-AUC	0.297	0.300	0.302	0.297
	Brier score	0.071	0.071	0.235	0.071
Other complications index stay	ROC-AUC	0.777	0.780	0.780	0.775
	PR-AUC	0.223	0.228	0.230	0.222
	Brier score	0.071	0.071	0.081	0.071
Any secondary surgery within 365 days after admission	ROC-AUC	0.616	0.616	0.622	0.626
	PR-AUC	0.203	0.200	0.202	0.206
	Brier score	0.117	0.117	0.116	0.116

MACCE: Major Adverse Cardiac and Cerebrovascular Events; best values in bold.

For PHF (n = 34,912), the model performances were even more homogenous over all outcomes, with the highest improvement in ROC-AUC of 1.0% (FNN) for the outcome “any secondary surgery 365 days”. The FNN also performed best for the outcome “general complications index stay” plus 0.5% and for “death 90 days” plus 0.3%. For “surgical complications & revisions 365 days”, “general complications 90 days” and “other complications index stay” the best model performance was seen in XGBoost with plus 0.1%, 0.4% and 0.3% in ROC-AUC, respectively. Anyways, when comparing the BS for those outcomes the superior model was the EN. Moreover, the EN resulted in the best model performance with a 0.1% higher ROC-AUC value than the Logistic Regression for “early surgical complications & revisions 90 days”. For “death index stay”, none of the ML methods showed a superior model performance compared to the Logistic Regression (ROC-AUC 0.890).

Method impact on SMR-based hospital ranking

For AMI, the ranked SMR correlated strongly between the Logistic Regression and the respective best ML method with Spearman R from 0.97 (p < .001) for “MACCE within 365 days” and 0.98 (p < .001) for all other outcomes (Figure 1).

Figure 1.

Correlation of Standardised Mortality Ratios (SMR) and SMR ranking for treatment of acute myocardial infarction derived from the Logistic Regression versus the best outcome-specific machine learning method (ML).

The same was observed in the PHF study population: the ranked SMR correlated strongly between the Logistic Regression and the respective best ML method, resulting in a ρ of 0.94 (p < .001) for “any secondary surgery up to 365 days”, 0.99 (p < .001) for “general complications during index stay” and 1.00 (p < .001) for all other outcomes (Figure 2).

Figure 2.

Correlation of Standardised Mortality Ratios (SMR) and SMR ranking for treatment of proximal humerus fractures derived from the Logistic Regression versus the best outcome-specific machine learning method (ML).

Discussion

In our analyses, the linked claims data were used to analyse whether ML learning methods can improve model performance compared to the gold standard of the Logistic Regression method. Care site-specific quality was assessed by Standardised Mortality Ratios (SMR) which were used to visualise the impact of the tested methods on quality measurement. The results show a heterogeneous pattern between the treatment causes and outcomes with regard to the choice of method. None of the AMI outcomes showed an improvement in ROC-AUC of more than 3% and for PHF more than 1% with any of the ML methods tested. Furthermore, the respective best method does not always show the best results in all compared metrics.

These results are in accordance with previous studies showing that the model with the best fit varies between datasets and selected outcomes.^15–22 Therefore, selecting a specific model a priori could lead to poor model fits. While ML offers considerable advantages, especially when processing complex and high-dimensional data, traditional statistical models such as Logistic Regression are characterised by their clarity, simplicity and good interpretability, especially with well-defined smaller datasets. In the field of quality measurement with claims data, the good interpretability of the Logistic Regression as the main advantage over ML methods can provide deeper insights into the plausibility and effect size of risk factors. This advantage in interpretability combined with the low requirements for computational power led to the conclusion that the Logistic Regression seems to be the most appropriate method for quality measurement with claims data.

However, the choice of an appropriate model may be critical depending on the dataset and outcome. To the best of our knowledge, this is the first study to attempt to improve claims-based inpatient quality measurement using the ML methods described here and from linked routine data using logistic regression. We therefore encourage all researchers in this field to conduct comparative modelling analyses when developing quality indicators.

Strengths and limitations

A strength is that the claims data used came from 11 legally independent AOK health insurance funds, covering around one third of the German population. Furthermore, although this analysis is based on nationwide claims data from the AOK, which represents more than a quarter of the population covered by SHI in Germany, there may be differences in age, gender, social status and morbidity to the German population. The indirect standardisation of the SMR helps to compensate for this effect.²³

Conclusion

None of the ML methods tested is generally superior to Logistic Regression for creating quality indicators on the given datasets. Marginal gains in model performance should not be the main basis for choosing a method; model interpretability should also be considered, especially for new datasets with little knowledge of important risk factors. In this respect, Logistic Regression still seems favourable for risk adjustment in health care.

Footnotes

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Jochen Schmitt reports institutional grants for investigator-initiated research from the German Federal Joint Committee (Gemeinsamer Bundesausschuss, GBA), Federal Ministry of Health (Bundesministerium für Gesundheit, BMG), Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung, BMBF), European Union (EU), Federal State of Saxony (SN), Novartis, Sanofi, ALK, and Pfizer. He also participated in advisory board meetings as a paid consultant for Sanofi, Lilly, and ALK. JS is a member of the Expert Council on Health and Care at the Federal Ministry of Health and a member of the government commission for modern and needs-based hospital care of the current German Coalition. All other authors report no conflicts of interest regarding the submitted work.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was supported by the Innovation Fund of the Federal Joint Committee (Gemeinsamer Bundesausschuss, 01VSF20013)

Ethical statement

ORCID iDs

Melissa Spoden

Thomas Datzmann

Patrik Dröge

Caroline Lang

Christoph Gumbinger

Jörg Nowotny

Pompiliu Piso

Olaf Schoffer

Ekkehard Schuler

Nils Sommer

Simone Wesselmann

Christian Günster

Data Availability Statement

The authors note that the data supporting the study results cannot be provided due to restrictions on the protection of personal data. External access to the data can be granted under the circumstances defined in German social law (SGB V § 287). For information and assistance in these cases, please contact wido@wido.bv.aok.de.

References

Agency for Healthcare Research and Quality . About AHRQ quality indicators (AHRQ qls). https://qualityindicators.ahrq.gov/Downloads/Modules/V2022/AHRQ_QI_Full_Brochure.pdf (2024), accessed April 29, 2024.

Canadian Institute for Health Information . Indicator library: general methodology notes — clinical indicators. https://www.cihi.ca/sites/default/files/document/general-methodology-notes.pdf (2024), accessed April 29, 2024.

Australian Institute of Health and Welfare . Towards national indicators of safety and quality in health care. Cat. no. HSE. https://www.aihw.gov.au/getmedia/e352f4fc-3a24-43e9-8a92-adfada971227/hse-75-10792.pdf?v=20230605174304&inline=true (2009), accessed April 29, 2024.

Institut für Qualität und Transparenz im Gesundheitswesen. Qualitätsindikatoren. https://iqtig.org/veroeffentlichungen/qidb/ (2024), accessed April 29, 2024.

Jeschke

Günster

Klauber

. Qualitätssicherung mit Routinedaten (QSR): Follow-up in der Qualitätsmessung – Eine Analyse fallübergreifender Behandlungsverläufe. Z Evid Fortbild Qual Gesundhwes 2015; 109(9): 673–681.

Iezzoni

. Risk adjustment for measuring health care outcomes. 4th ed. Chicago, Illinois: Health Administration Press, 2013.

Jeschke

Günster

Klauber

. Qualitätssicherung mit Routinedaten (QSR): Follow-up in der Qualitätsmessung - Eine Analyse fallübergreifender Behandlungsverläufe [Quality assurance with administrative data (QSR): follow-up in quality measurement - an analysis of patient records]. Z Evid Fortbild Qual Gesundhwes 2015; 109(9-10): 673–681.

Bottle

Aylin

. Statistical methods for healthcare performance monitoring. Boca Raton: CRC Press, 2017.

World Medical Association . World medical association declaration of Helsinki. JAMA 2013; 310(20): 2191–2194.

10.

Swart

Gothe

Geyer

, et al. Gute Praxis Sekundärdatenanalyse (GPS): Leitlinien und Empfehlungen. Gesundheitswes 2015; 77(2): 120–126.

11.

Hoffmann

Latza

Baumeister

, et al. Guidelines and recommendations for ensuring good epidemiological Practice (GEP): a guideline developed by the German society for epidemiology. Eur J Epidemiol 2019; 34(3): 301–317.

12.

Schofer

Jeschke

Kröger

, et al. Risk-related short-term clinical outcomes after transcatheter aortic valve implantation and their impact on early mortality: an analysis of claims-based data from Germany. Clin Res Cardiol 2022; 111(8): 934–943.

13.

Rogers

. Regression standard errors in clustered samples. Stata Technical Bulletin 1993; 3(13): 19.

14.

Ash

Shwartz

Pekoez

, et al. Comparing outcomes across providers. In: Risk adjustment for measuring health care outcomes.4th ed. Illinois: Health Administration Press, 2013, p. 342.

15.

Bzdok

. Classical statistics and statistical learning in imaging neuroscience. Front Neurosci 2017; 11: 543.

16.

Bzdok

Altman

Krzywinski

. Statistics versus machine learning. Nat Methods 2018; 15(4): 233–234.

17.

Lange

Schwarzer

Datzmann

, et al. Machine learning for identifying relevant publications in updates of systematic reviews of diagnostic test studies. Res Synth Methods 2021; 12: 506–515.

18.

Leiner

Pellissier

König

, et al. Machine learning-derived prediction of in-hospital mortality in patients with severe acute respiratory infection: analysis of claims data from the German-wide Helios hospital network. Respir Res 2022; 23(1): 264.

19.

Raita

Goto

Faridi

, et al. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care 2019; 23(1): 64.

20.

Rajula

HSR

Verlato

Manchia

, et al. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina 2020; 56(9): 455.

21.

Song

Mitnitski

Cox

, et al. Comparison of machine learning techniques with classical statistical models in predicting health outcomes. Stud Health Technol Inform 2004; 107(Pt 1): 736–740.

22.

van der Galiën

Hoekstra

Gürgöze

, et al. Prediction of long-term hospitalisation and all-cause mortality in patients with chronic heart failure on Dutch claims data: a machine learning approach. BMC Med Inform Decis Mak 2021; 21(1): 303.

23.

Hoffmann

Icks

. [Structural differences between health insurance funds and their impact on health services research: results from the Bertelsmann Health-Care Monitor]. Gesundheitswesen 2012; 74(5): 291–297.

Comparison of machine learning methods and standard logistic regression to improve inpatient quality measurement in two clinical use cases

Abstract

Background

Methods

Results

Conclusions

Keywords

Introduction

Methods

Data pool

Inclusion criteria, risk factors and outcomes

Data analysis

Results

Study population

Model performance of the methods

Method impact on SMR-based hospital ranking

Discussion

Strengths and limitations

Conclusion

Footnotes

Declaration of conflicting interests

Funding

Ethical statement

ORCID iDs

Data Availability Statement

References