Abstract
Objective
This study aims to address the limitations of current clinical methods in predicting delivery mode by constructing a multimodal neural network-based model. The model utilizes data from a digital twin-empowered labor monitoring system, including computerized cardiotocography (cCTG), ultrasound (US) examination data, and electronic health records (EHRs) of pregnant women.
Methods
The model integrates three modalities of data from 105 pregnant women (76 vaginal deliveries and 29 cesarean deliveries) at the Department of Obstetrics and Gynecology of The First Affiliated Hospital of Jinan University, Guangzhou, China. It employs a hybrid architecture of a convolutional neural network (CNN) and bi-directional long short-term memory (BiLSTM) to compress the data into a single feature vector for each patient.
Results
The designed model achieves a cross-validation accuracy of 93.33%, an F1-score of 86.26%, an area under the receiver operating characteristic curve of 97.10%, and a Brier Score of 6.67%. Importantly, while cCTG and EHRs are crucial for labor management, the integration of US imaging data significantly enhances prediction accuracy.
Conclusion
The findings of this study suggest that the developed multimodal model is a promising tool for predicting delivery mode and provides a comprehensive approach to intrapartum maternal and fetal health monitoring. The integration of multi-source data, including real-time information, holds potential for further improving the algorithm's predictive accuracy as the volume of analyzed data increases. This could be highly beneficial for dynamically fusing data from different sources throughout the maternal and fetal health lifecycle, from pregnancy to delivery.
Keywords
Introduction
Globally, an alarming number of maternal fatalities (approximately 287,000), neonatal deaths (2.4 million), and stillbirths (1.9 million) are reported annually, with the bulk occurring in low- and middle-income countries. 1 A significant proportion (up to 45%) of these adverse outcomes transpire during labor, birth, and the immediate postnatal period. 2 Although cesarean deliveries hold life-saving potential, their increasing rate poses a substantial global health challenge. According to the Guangzhou Women and Children's Medical Center birth cohort study in China, the cesarean section rate (i.e. 32%) in Guangzhou is still much higher than the ideal cesarean section rate (10–15%) recommended by the World Health Organization. 3 By 2030, it is projected that around 28.5% of all global births will involve cesarean sections, which equates to approximately 38 million women annually. 4 Optimizing the use of cesarean section is a matter of global concern as underutilization can lead to heightened maternal and perinatal mortality and morbidity, while overutilization is linked to non-reduced mortality rates and can cause complications in subsequent pregnancies.5,6
The process of childbirth is influenced by the “three Ps” model: the passageway (maternal bony pelvis and soft tissues), the passenger (fetus), and the power (uterine contractions and voluntary maternal efforts during delivery). 7 Effective decision-making necessitates the integration of a vast array of data collected during antenatal visits, upon admission to the delivery unit, and throughout labor. 8 This emphasizes the need for a dynamic, individualized assessment approach, which is at the core of precision medicine—tailoring medical treatments to the unique characteristics of each patient.9,10
Technological progressions, such as big data analytics, cloud computing, virtual reality, and the Internet of Things and Artificial Intelligence, play a crucial role in advancing this personalized approach to medicine.11,12 These technologies facilitate the development of digital twins (DTs), sophisticated simulations that amalgamate real-time data from diverse sources like electronic health records (EHRs), medical devices, and genomic information. 13 DT technology is transforming healthcare systems by leveraging real-time data integration, advanced analytics, and virtual simulations to enhance patient care, enable predictive analytics, optimize clinical operations, and facilitate training and simulations.14,15 With their ability to analyze extensive patient data from various sources, DTs can offer personalized treatment plans and facilitate predictive analytics and preventive interventions through machine learning algorithms.16,17 The implementation of DT technology in healthcare holds the potential to significantly improve patient outcomes, enhance patient safety, and drive innovation in the healthcare industry. While DTs have been applied in various healthcare domains such as asthma, 18 diabetes, 19 cancer, 20 and cardiovascular disease, 21 their potential in labor care remains largely unexplored. 22
In this brief communication, we described the digital twin-empowered labor monitoring system (DTLMS), developed by Guangzhou Lianyin Medical Co., Ltd (Lian-Med, China). This system integrates IoT devices, virtual reality, and artificial intelligence to provide personalized labor care support (Figure 1). The DTLMS can display patient characteristics in EHRs, real-time electronic fetal monitoring signal, and three-dimensional (3D) virtual labor progress based on intrapartum ultrasound imaging.23–28 Here, we utilize these heterogeneous data obtained by the DTLMS to develop and assess an automated assessment tool capable of predicting the delivery mode.

A digital twin framework for maternal and fetal health. This work focuses on its application in assisting decision-making.
Methods
Proposed digital twin framework
The DTLMS is developed by Guangzhou Lianyin Medical Co., Ltd (Lian-Med, China) for comprehensive maternal and fetal safety monitoring, and also offers decision support to optimize the use of cesarean section. Consequently, it does not specifically address the challenges of women starting antenatal care (ANC) late or not attending the recommended number of visits. The proposed framework consists of four parts: the physical space, the DT data module, the digital space, and the DT intelligent module. The DT data module encompasses dynamic multi-source recordings of maternal and fetal health, including EHR, ultrasound (US) images, and physiological signals (Supplementary Figure S1). The digital space represents a 3D visualization of maternal and fetal structures during labor, based on intrapartum ultrasound examinations (Supplementary Video S1). The DT intelligent module, central to this framework, is a predictive model designed to identify risk factors and predict the likelihood of adverse events or specific delivery modes, thus enhancing clinical decision-making capabilities. Note: this device is designed to handle a variety of data sources and formats. In China, a mix of electronic systems and paper registers is observed, but this system has the capability to automatically recognize and extract information from unstructured paper-based medical reports. 29 This functionality ensures that even in areas with limited technological infrastructure, valuable medical data can be utilized for accurate monitoring and prediction.
Multimodal dataset collection
This retrospective study analyzed data from January 2012 to December 2020 from the Department of Obstetrics and Gynecology at The First Affiliated Hospital of Jinan University, Guangzhou, China. Exclusions were multifetal gestations, planned cesarean deliveries, antepartum fetal death, major fetal anomalies, and preterm deliveries. Additionally, cases lacking comprehensive electronic data on the labor process were excluded. The dataset included EHR, US examination data, and raw 30-minutes computerized cardiotocography (cCTG) records (Supplementary Figure 2). 30 Post-delivery data such as infant gender and actual birth weight were not utilized in the prediction analysis. Institutional ethical review board waived the need for the consent (JNUKY-2022-018).
Multimodal deep learning model
We employed a hybrid model to predict the delivery mode, integrating discrete data (EHR and US) with continuous signal data (cCTG, which provides a continuous graphic record of fetal heart rate (FHR) and uterine contractions (UCs)). The hybrid model combines a convolutional neural network (CNN) and bi-directional long short-term memory (BiLSTM) to capture complex nonlinear spatial and temporal features of the cCTG, while features from EHR and US were digitized and normalized. These data types were then fused to perform classification. The model's architecture is detailed in Supplementary Figure 3.
Model training utilized the TensorFlow library, taking advantage of the parallel-processing capabilities of Graphical Processing Units. Stochastic gradient descent with momentum (momentum = 0.9) was the chosen optimization algorithm. Cross-entropy loss function was used for loss computation. The model underwent 120 training epochs, with a learning rate initially set at 0.01. To optimize learning dynamics, we employed a decay factor of 10 after 15 and 90 epochs, maintaining a stable learning pace throughout training. To mitigate overfitting, L2 regularization with a factor of 0.0001 was applied.
Evaluation metrics
Model performance was assessed through a 5-fold cross-validation process, evaluating each model on accuracy, sensitivity, specificity, area under the receiver operating curve (AUC), F1 Score, and Brier Score. The 95% confidence intervals for these metrics were calculated using a z-value of 1.95996, based on the Gaussian distribution. Here are brief definitions of each metric: accuracy: The ratio of correctly predicted instances to the total number of instances. It measures the overall performance of a model but can be misleading when classes are imbalanced. Sensitivity: The proportion of actual positives that are correctly identified by the model. It reflects the model's ability to identify positive cases. Specificity: The proportion of actual negatives that are correctly identified by the model. It measures how well the model can avoid false positives. Area under the receiver operating characteristic curve: The area under the ROC curve, which plots the true positive rate (sensitivity) against the false positive rate (1 - Specificity) at different threshold levels. AUC-ROC provides a measure of how well a model distinguishes between classes. An AUC of 1 indicates a perfect classifier, while 0.5 represents random guessing. F1 Score: The harmonic mean of precision and recall (sensitivity), balancing the trade-off between false positives and false negatives. It's useful when class distribution is uneven or when false positives and false negatives have different costs. Brier Score: Measures the mean squared difference between predicted probabilities and the actual outcomes (0 or 1). It evaluates the accuracy of probabilistic predictions, where a lower score indicates better model performance. These metrics are commonly used to evaluate the performance of machine learning models, especially in classification tasks.
Statistical analysis
Statistical analysis was performed with Python 3.7.12, scikit-learn library 1.0.2, scipy library 1.4.1. Dichotomous features were compared using Welch's t-test and Chi-squared test in cases of small numbers, as appropriate.
Results
Table 1 presents the clinical profiles of 105 patients undergoing vaginal and cesarean deliveries as recorded in the EHRs. Demographically and physically, no significant differences were observed between the vaginal (N = 76) and cesarean (N = 29) delivery groups concerning age, height, gestational age, pregnancy number, and parity. However, the body mass index (BMI) was significantly higher in the cesarean delivery group (26.74 ± 3.19 kg/m2) compared to the vaginal delivery group (24.82 ± 2.27 kg/m2), with a p-value of 0.0057. Maternal complications such as anemia, gestational diabetes, liver disease, thyroid disorders, and pregnancy-induced hypertension showed no significant differences between the groups. Yet, the incidences of premature rupture of membranes and amniotic fluid contamination were notably higher in the cesarean delivery group (p-values of 0.0435 and 0.0075, respectively). For neonatal outcomes, a higher proportion of male infants was noted in the cesarean group (p = 0.0309). Birth weights were comparable across both groups (p = 0.3480), as were the Apgar scores at 1, 5, and 10 minutes post-delivery (p > 0.05). A significant distinction was found in the umbilical cord pH levels, which were higher in the cesarean delivery group (7.28 ± 0.05) compared to the vaginal delivery group (7.20 ± 0.1), with a p-value of less than 0.0001.
Clinical profile of patients in electrical health recordings (EHR).
Note. ns, not significant.
In conjunction with these EHRs, US examination data and 30-minute cCTG were also collected using the DTLMS. US examination revealed no significant differences between the groups in measurements of biparietal diameter, abdominal circumference, femur length, and head circumference. However, fetal positioning displayed a highly significant difference; the occiput anterior position was prevalent in 97.37% of vaginal deliveries, compared to just 3.45% in cesarean deliveries. Conversely, the occiput transverse and occiput posterior positions were predominantly observed in the cesarean delivery group (75.86% and 20.69%, respectively) (Supplementary Table S1). Changes in FHR and their temporal relationship to UCs in cCTG records demonstrated marked fluctuations in some cases of cesarean delivery compared to vaginal delivery (Supplementary Figure S2). However, when examining specific cCTG features such as baseline, accelerations, decelerations, variability, and the nature of uterine contractions, no significant differences were observed between the groups (Supplementary Table S2). Using this limited dataset with cCTGs, EHRs and US examination data, a deep learning model (Supplementary Figure S3) was trained with five-fold cross-validation to assess the predictive power of different data combinations on delivery mode. Employing solely cCTGs, the model yielded an AUC of 0.4642 ± 0.0320 (95% confidence interval: 0.2509–0.6775). By integrating EHRs and cCTG signals, the classification performance improved, achieving an AUC of 0.6447 ± 0.1014 (95% confidence interval: 0.4400–0.8494). The combination of all data sources (i.e. cCTG, EHRs, and US) enabled the model to achieve its best performance, with an AUC of 0.9710 ± 0.0211 (95% confidence interval: 0.8992–1.0428) (Supplementary Figure S4). This trend was similarly reflected in other evaluation metrics, such as accuracy (ACC), sensitivity (SEN), specificity (SPE), and F1 and Brier scores (Table 2). Training and validation accuracy (Acc) curve and precision-recall curves of the deep learning model are shown in Supplementary Figures S5 and S6.
The performance of the deep learning model using different data combinations.
Note. CTG, cardiotocography; EHR, electronic health records; US, ultrasound examination results; Acc, Accuracy; SEN, Sensitivity; SPE, Specificity; F1, F1 Score; Brier, Brier Score.
Discussion
These results demonstrate the feasibility of DTLMS to predict the mode of delivery, showing an increasing level of prediction accuracy as the volume of data analyzed by the model increases. Prior studies have explored various features in recordings from patients undergoing vaginal and cesarean deliveries.31–54 While these studies have significantly enhanced the characterization of delivery modes, they often relied on specific features extracted at different points in antepartum and intrapartum care. These studies primarily concentrated on specific features extracted from data collected at various stages, either during antepartum 40,50 or intrapartum care.31–39,41–54
It is important to note that certain features, like those from cCTG—including baseline, accelerations, and decelerations31,55–58—are subjective and heavily reliant on the interpretation by caregivers. In this study, we followed an alternative approach, to predict the delivery mode using a broad spectrum of raw data gathered during the delivery process, rather than characterizing it. Notably, while intrapartum cCTG and US imaging remain crucial for labor management, our findings underscore the enhanced prediction accuracy provided by integrating supplementary information from EHRs.53,59 In recent years, fetal monitoring has advanced with the integration of novel technologies. Wearable devices with multi-modal sensors offer a promising solution for continuous and non-invasive monitoring. For example, Ghosh et al.60,61 and Mesbah et al. 62 have proposed approaches that leverage sensor fusion techniques and machine learning algorithms to improve fetal movement detection and automate the process. These advancements highlight the growing trend towards more sophisticated sensor-based fetal monitoring systems.
Our study serves as a proof-of-concept, demonstrating the utility of computer-assisted analysis in predicting delivery modes. To improve prediction accuracy further, it is essential to adhere to comprehensive data collection guidelines such as those recommended by the WHO Labour Care Guide (2020). 63 Furthermore, optimizing data gathering and labeling are critical to create databases that allow the model to learn and analyze effectively. For the model's application to be considered reliable across various clinical settings, conducting independent validation studies is necessary. Additionally, randomized prospective trials should be undertaken to assess the model's efficacy and its impact on clinical decision-making processes. Looking ahead, future research should aim to develop models that utilize real-time data, enhancing the continuous improvement of the model by integrating various levels of data recorded throughout labor. 59 This will not only improve the model's applicability but also its reliability in clinical practice.
Once fully developed, the DTLMS holds enormous potential for routine clinical use, observational studies, and interventional trials. Its capability to dynamically integrate data from multiple sources—including cCTG, US examination data, 64 and EHRs—throughout the various stages of maternal and fetal health from pregnancy to delivery, is particularly valuable. This study underscores the DTLMS's potential, supported by artificial intelligence,65–68 to reassure patients about the feasibility of different delivery modes and enable early preparation for necessary interventions. As the system collects data in real-time during the delivery process, its algorithm's ability to accurately predict the likelihood of vaginal delivery improves significantly. This functionality not only enhances the decision-making process regarding delivery modes but also contributes to the overall aim of reducing unnecessary cesarean deliveries, thereby improving maternal and fetal health outcomes. It will be particularly beneficial for dynamically fusing data from multiple sources across different stages of the maternal and fetal health lifecycle, from pregnancy through to delivery. This study highlights the potential of the DTLMS and Artificial Intelligence to provide reassurance to patients about the feasibility of delivery modes and to facilitate early preparations for interventions.69–73 As data is continuously acquired in real time during the delivery process, the algorithm's ability to predict the likelihood of vaginal delivery significantly increases.
However, there are limitations to this study. One limitation is the uncertainty regarding the device's ability to reduce the number of cesarean sections. Although the system can collect information on the entire pregnancy-to-delivery lifecycle and the number of cesarean sections in the study is known, it is not clear if the device can actually make a difference. The system's role in reducing cesarean sections or simply predicting delivery mode to improve efficiency remains undetermined. The device's role in addressing specific causes of cesarean sections in different contexts, especially in low- and middle-income countries and sub-Saharan Africa, is yet to be determined. Further research is needed to establish the relationship between the device and existing evidence on reducing cesarean sections. Another limitation is the relatively small size of the dataset compared to some other studies. This may limit the generalizability and robustness of the deep learning model. A smaller dataset might lead to overfitting or reduced accuracy when applied to larger and more diverse populations. It may also not fully capture the complexity and variability of real-world scenarios. Future research should involve collecting more data from multiple centers to increase the size and diversity of the dataset and improve the model's generalizability and performance.
It is worth noting that while our current study focused on predicting the delivery mode, the aspect of birth weight adequacy is an important consideration in the overall assessment of pregnancy outcomes. Birth weight adequacy can have significant implications for the health of the newborn and may potentially be related to the mode of delivery as well. For example, macrosomia (a condition where the fetus is significantly larger than average) may increase the likelihood of a cesarean delivery due to potential difficulties during vaginal birth. Although we do not have specific results regarding the prediction of birth weight adequacy in this study, it presents an interesting avenue for future research. Incorporating factors related to birth weight into our model could potentially enhance its predictive ability not only for delivery mode but also for a more comprehensive understanding of the pregnancy process and outcomes. Future studies could explore the collection of more detailed data related to fetal growth and development throughout pregnancy, such as serial ultrasound measurements of fetal size and growth velocity, in combination with other clinical and demographic factors. This could help in building a more integrated model that can better predict both delivery mode and birth weight adequacy, providing clinicians with a more powerful tool for prenatal assessment and decision-making. Furthermore, understanding the relationship between birth weight adequacy and the digital twin-empowered labor monitoring system could offer new insights. The real-time data collected by the system, including cCTG, US examination data, and EHRs, may contain valuable information that could be utilized to develop predictive models for birth weight adequacy. For instance, changes in fetal growth patterns detected by serial ultrasounds or alterations in cCTG patterns that might be associated with fetal growth could be incorporated into future models.
Conclusion
In conclusion, this study presents a multimodal model for predicting delivery mode using data from a digital twin-empowered labor monitoring system. The model integrates cCTG, US examination data, and EHRs to offer a comprehensive approach to intrapartum maternal and fetal health monitoring. While this model shows great promise, further research is needed to validate its efficacy in different clinical settings and explore its impact on clinical decision-making. As the healthcare industry continues to advance, technologies like DTs and artificial intelligence hold immense potential for improving patient outcomes. This multimodal model represents a significant step in the right direction, offering a tool for personalized and predictive care during labor and delivery. Future research should focus on refining the model, expanding its application to other areas of maternal and fetal health, and investigating its potential in reducing unnecessary cesarean deliveries and enhancing overall health outcomes.
Supplemental Material
Supplemental Material
sj-docx-1-dhj-10.1177_20552076241304934 - Supplemental material for A multimodal model in the prediction of the delivery mode using data from a digital twin-empowered labor monitoring system
Supplemental material, sj-docx-1-dhj-10.1177_20552076241304934 for A multimodal model in the prediction of the delivery mode using data from a digital twin-empowered labor monitoring system by Jieyun Bai, Xue Kang, Weishan Wang, Ziduo Yang, Weiguang Ou, Yuxin Huang and Yaosheng Lu in DIGITAL HEALTH
Footnotes
Acknowledgements
The authors thank the technical staff of Guangzhou Lianyin Medical Co., Ltd (Lian-Med, China) and researchers from The First Affiliated Hospital of Jinan University for obtaining data.
Contributorship
Jieyun Bai was involved in conception, organization and execution of research project and writing of manuscript. Weishan Wang and Xue Kang were involved in visualization, software, methodology, data curation and review and editing. Ziduo Yang and Yuxin Huang were involved in clinical assistance, data collection, data analysis and review and editing. Yaosheng Lu and Weiguang Ou were involved in organization and execution of research project and review of manuscript.
Data availability
The data cannot be publicly available due to privacy and ethical restrictions. However, we provided the US dataset used for the PSFHS challenge of MICCAI 2023 (https://ps-fh-aop-2023.grand-challenge.org/). This dataset comprises two parts: one is the PSFHS dataset (https://doi.org/10.5281/zenodo.10969427) and the other is from the JNU-IFM dataset (https://doi.org/10.6084/m9.figshare.14371652). The images from the PSFHS dataset also was utilized for the Intrapartum Ultrasound Grand Challenge (IUGC) 2024 of MICCAI 2024 (
).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work is funded by Guangzhou Municipal Science and Technology Bureau Guangzhou Key Research andDevelopment Program (2024B03J1289 and 2024B03J1283), Key Research and Development Program of Guangxi Province under Project No. 2023AB22074, Natural Science Foundation of Guangzhou under Project No. 202201010544, Natural Science Foundation of Guangdong Province under Project No. 2024A1515011886 and 2023A1515012833, Guangzhou Science and Technology Planning Project under Grant 2023B03J1297, National Key Research and Development Program of China under Project No. 2019YFC0120100 and 2019YFC0121907, National Natural Science Foundation of China under Project No. 61901192, and China Scholarship Council underProject No. 202206785002.
Guarantor
Jieyun Bai.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
