Abstract
The aim of this study is to develop a computational prediction model for implantation outcome after an embryo transfer cycle. In this study, information of 500 patients and 1360 transferred embryos, including cleavage and blastocyst stages and fresh or frozen embryos, from April 2016 to February 2018, were collected. The dataset containing 82 attributes and a target label (indicating positive and negative implantation outcomes) was constructed. Six dominant machine learning approaches were examined based on their performance to predict embryo transfer outcomes. Also, feature selection procedures were used to identify effective predictive factors and recruited to determine the optimum number of features based on classifiers performance. The results revealed that random forest was the best classifier (accuracy = 90.40% and area under the curve = 93.74%) with optimum features based on a 10-fold cross-validation test. According to the Support Vector Machine-Feature Selection algorithm, the ideal numbers of features are 78. Follicle stimulating hormone/human menopausal gonadotropin dosage for ovarian stimulation was the most important predictive factor across all examined embryo transfer features. The proposed machine learning-based prediction model could predict embryo transfer outcome and implantation of embryos with high accuracy, before the start of an embryo transfer cycle.
Keywords
Introduction
Infertility is known as a disorder of the reproductive system diagnosed by the failure to conceive after 12 months or more of regular unprotected sexual intercourse. 1 Based on the latest definition by the World Health Organization, “infertility is a disease which generates disability as an impairment of function.” 2 Infertility is the most common global health complaint. 3 More than 186 million couples worldwide suffer from infertility, and the majority of infertile couples are depraved from appropriate treatments in developing countries. 3 There are many types of infertility treatment, including lifestyle changes (e.g. losing weight), medical treatments (e.g. use of drugs for ovulation induction), surgical treatments (e.g. laparoscopy), and assisted reproductive technologies (ARTs). 4 ARTs are advanced technologies that human oocytes and sperm are handled and fertilized in in vitro conditions and embryos are transferred to the woman’s uterus for establishing a clinical pregnancy. An ART cycle, which is over an interval of approximately 2 weeks, involves several sequential steps that are complicated, time-consuming, costly, and hard to endure by infertile couples.5,6 The live birth rate 1 for each complete ART cycle is 29.1 percent. 7 However, 38 to 49 percent of couples will stay resultless, even after six treatment cycles. Therefore, opposite to the general belief, ART does not guarantee the success. 8
In vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) are two popular ART procedures, which have nearly the same stages of process. 6 The main difference between IVF and ICSI is the method of sperms and eggs fertilization. 9 IVF is a conventional treatment including standard insemination of oocytes with sperms outside the body, while ICSI is an extension of IVF, which is performed by injection of a selected sperm cell into the oocyte cytoplasm. 10 IVF is used for different causes of infertility, especially female factors, 11 while ICSI is a proper treatment option in severe male factor infertility, such as azoospermia. 12 An ART treatment cycle usually starts with controlled drug-induced ovarian stimulation to produce several mature eggs. Gonadotropins, such as human menopausal gonadotropin (HMG) and follicle-stimulating hormone (FSH), are used for stimulation in different protocols and prescriptions. Human chorionic gonadotropin (HCG) is also used as an ovulation trigger to stimulate final oocyte maturation before oocyte retrieval. With oocyte production, the cycle progresses to the oocyte retrieval phase for fertilization with sperm in the laboratory. After the eggs are fertilized, the resultant embryos are cultured, and then selected embryos at the cleavage or blastocyst stage are transferred into the woman’s uterus.5,13 Embryo transfer (ET) is the most critical stage in ART, which is composed of many variables, strategies, and techniques. All aspects, mentioned above, are important for overall ART success. 14
Reliable and accurate prediction of ART outcome is considered as an unsolved issue in the literature. 15 Considering financial burden, physical and emotional risks, multiple pregnancies, complex process of treatment, and low rate of success, it is essential that infertile couples are well informed about their treatment with ART.6,8 On the other hand, there is a weak concordance between clinicians on treatment decisions and pregnancy probability estimation. 16 To overcome these problems (i.e. predicting the probability of pregnancy), utilizing computational prediction models is an optimized solution. Clinical prediction models estimate the treatment results with a strong likelihood and also allow the treatment process to be adapted by using a variety of related parameters such as patient parameters and other effective ART cycle-specific variables.8,17,18
To develop more accurate prediction models and algorithms with high-performance capacity, advanced computational approaches and data mining methods could be employed. 19 The focus of machine learning and data mining techniques is on developing computerized modeling and efficient predictive algorithms by detecting the hidden patterns in data to discover knowledge with high predictive accuracy. However, traditional statistical methods have focused on assessment and proportion of predefined models, which are not well fitted to face with mentioned complicated challenges.17,19,20
There are many types of machine learning approaches, such as Bayes nets, support vector machines (SVMs), decision trees (DTs), and so on. 21 In machine learning methods, most of the input data are used for training algorithm(s). Indeed, the purpose of machine learning is to design and develop prediction models that enable the computer to solve a specific problem by learning from past data or experience. 22 A practical application of machine learning in medicine is clinical decision support systems that allow the users, including related health professionals and patients, to indicate a suitable therapeutic plan.23,24
Machine learning methods are powerful computational tools to perform analysis in ART data and predict treatment outcomes. However, the literature on this field is limited. 25 The previous studies rarely examined two aspects of embryological and clinical data together with and just considering a small number of effective variables.19,26 Moreover, the lack of categorization and ordering of the features in the previous studies led to difficult interpretation and comparison of the results. 27
In a recent publication, 28 we introduced 20 prediction models on ART. The prediction target of all these models is pregnancy (i.e. clinical pregnancy and ongoing pregnancy). However, prediction of ET outcome as a critical step is a major gap in the literature.
This study aims to construct a prediction model for ET outcome, using a comprehensive, varied features set and different machine learning algorithms. Several steps were performed to develop the ET predictor. The first step of this study was to identify predictive factors on ET, which is the combination of demographics, clinical, embryological, and ART cycle parameters related to infertile couples. The second step was to construct a prediction model by applying machine learning algorithms to build a computational model for predicting ET outcomes. In the last step of this study, we performed a comparative analysis of machine learning algorithms to determine which model(s) predicted ET outcomes with better performance in terms of their accuracy, sensitivity, specificity, and so on. Figure 1 represents the overall process of the study.

Study design: (a) processes of attribute extraction, data acquisition, and embryo transfer (ET) dataset preparation and (b) overall steps of constructing a proposed machine learning-based ET prediction model. kNN: k-nearest neighbors; SVM: support vector machine.
Materials and methods
Data acquisition and preparation
The data were obtained from 500 patients undergoing IVF and ICSI treatment at the East Azerbaijan ACECR ART center (Tabriz, Iran) from April 2016 to February 2018. In the obtained data of the 500 embryo transfer cases, 251 samples were recorded positive β-HCG and 249 samples were recorded negative β-HCG. In this study, we excluded other infertility treatment procedures at this clinic that does not include embryo transfer, such as intrauterine insemination. The complementary information about the recruited ET data set is given in Figure 2. This study and the data collection phase were approved by the Ethics Committee of Tabriz University of Medical Sciences (IR.TBZMED.REC.1397.345).

Schematic representation of recruited ET dataset. β-HCG: beta-human chorionic gonadotropin; ET: embryo transfer; SET: single embryo transfer; DET: double embryo transfer; TET: triple embryo transfer; QET: four embryo transfer.
According to Figure 2, except 22 single embryo transfer (SET) cycles, more than one embryo were transferred to each patient through different cycles. Therefore, the total number of embryos transferred in the dataset was 1360.
To prepare the data set, first, all the data about ET stored in paper-based medical records were collected in an electronic format and then preprocessed. Data preprocessing, one of the critical steps in machine learning, 29 was executed by handling missing values, outlier data, and application of normalization methods. 30 The missing values of numerical features are replaced with median and categorical attributes filled by mode of their corresponding feature.31–33 To achieve an actual and better outcome, records of patients with embryo donation and surrogacy, which could be resulting in misclassification and noise in prediction, were eliminated. Embryo donation and surrogacy are two phenomena of modern ART that introduce legal, ethical, and biological issues (e.g. genetic disparities of multiparents).34,35
Attribute extraction and selection
In this study, to predict implantation potential, all required variables were extracted from clinical-related guidelines, papers, and infertility specialists. At this point, after searching and obtaining relevant features, we developed a checklist and conducted a focus group with experts, including 14 obstetricians and gynecologists, two embryologists, two medical geneticists, and one social medicine physician. During this process, we collected expert’s opinions and attitudes about initial features set by a checklist based on Likert-type scale. Moreover, they could comment at the end of the checklist in response to an open question about other useful variables that are not in the variable list. Based on the expert’s opinions, we added some variables (i.e. anemia, thyroid disease, prolactin hormone disorders, amenorrhea, dysmenorrhea, period status, hirsutism, galactorrhea). After the survey and reevaluation of each element, the final feature set was selected as potential predictors for ET and implantation (Figure 1(a)).
Ultimate attributes consist of two main groups: (1) patient-related features (e.g. demographics, diagnostic, and clinical characteristics) and (2) ART cycle features (e.g. oocyte stimulation/morphology data and embryological data). Also, patient-related data were divided into female and male subcategories. The value of β-HCG was considered as a target variable (1 for positive β-HCG and 0 for negative β-HCG).
There were 59 and 23 attributes in groups 1 and 2, respectively. All of the recruited features have the potential to affect the performance of an algorithm. The features and their attribute types are summarized in Table 1.
Description of extracted features in ET dataset.
For the prediction of embryo implantation, we examined six common ML algorithms on all groups of attributes. Furthermore, to identify effective features and their particular values that affect the outcome of embryo transfer, we applied feature selection (FS) and ranking algorithms. FS algorithms were used to identify and mine attributes that have a significant prognostic effect on implantation outcome. Hence, the relative weights for each feature were extracted. Classification algorithms were separately tested with the different number of weighted attributes to determine a set of features with superior performance. Therefore, the selected subset of features was used in the rest of the experiments.
Implementing predictors with machine learning algorithms
In this step, six ML algorithms were fed with preprocessed data to determine their performances in ET outcome prediction. Comparative analysis of diverse classifiers enabled us to determine the best fitting models for the employed data set. To predict ET outcome, six of the most well-known prediction algorithms, including SVMs, neural network (NN), k-nearest neighbors (kNN), naive bayes (NB), random forest (RF), and DT, were used to develop our predictor. The hypotheses underlying each of these algorithms were the minimization of empirical risk and reducing the errors in the training set. These classifiers were chosen and tested with Orange data mining software. Figure 1(b) shows the overall stages of the proposed machine learning-based ET outcome prediction model.
Prediction assessment
A standard assessment method was essential to evaluate the performance of each algorithm. In other words, the prediction process needs two types of data: training and testing data. In this study, 80 percent of the data set was used as a training set, and the remaining 20 percent was used as a test set. To do this, we used 10-fold cross-validation to assess the robustness of the approaches. The ET dataset was randomly divided into 10 equal-sized subsets, and the cross-validation process was repeated 10 times. Each time, one of the 10 subsets is used as the validation set for testing the model and the remaining nine subsets are put together to form a training data set. Finally, 10 results of experiments were averaged to produce a single estimation for each algorithm.
The performance of the algorithms was evaluated in terms of common standard machine learning evaluation parameters. These parameters were computed based on the values of true negatives (TN), true positives (TP), false positives (FP), and false negatives (FN) as detailed below.
Accuracy (ACC): percentage of positive and negative β-HCG that was correctly predicted
Sensitivity (SN): percentage of positive β-HCG that was predicted correctly
Specificity (SP): percentage of negative β-HCG that was correctly predicted
Matthew’s correlation coefficient (MCC): this value ranges from −1 for worst prediction to +1 for accurate prediction; 0 indicates random prediction
Precision or positive predictive value (PPV)
Negative predictive value (NPV)
F-measure: this parameter is a combined evaluation of precision and recall
The area under the curve (AUC): this parameter is a logical evaluation for model performance. It is the value ranges from 0 to 1, where 1 represents the best performance, and 0 is the worst performance. AUC = 0.5 when random ranking is used.
Results
Attribute selection and predictive features
As mentioned before, for accurate prediction of ET outcome, the feature set with the optimum quantity and quality is essential; thus, more precise attribute selection methods are undoubtedly required. Among the different ranking algorithms, the SVM-FS resulted in better performance, so this FS algorithm was used to select and rank the optimal number of attributes. Based on SVM-FS, 78 features were chosen as the optimal feature set.
According to estimated feature weights, the FSH/HMG dosage was identified as the most effective predictor variable, and also the contraception duration and the number of germinal vesicle (GV) quality oocytes are other key features in the success of an ET cycle. On the other hand, the quality of injected metaphase II (MII) oocyte, sperm count, and male factor features have less predictive value on the ET outcome. The ranking and relative weights of these attributes that were calculated based on the SVM-FS algorithm are given in Table 2.
Scored variables related to the embryo transfer cycle, with SVM-FS.
BMI: body mass index; AFC: antral follicle count; PCOS: polycystic ovary syndrome; RIF: repeated implantation failure; RPL: recurrent pregnancy loss; TESE: testicular sperm extraction; PESE: percutaneous epididymal sperm extraction; FSH: follicle-stimulating hormone; LH: luteinizing hormone; HMG: human menopausal gonadotropin; MII: metaphase II; GV: germinal vesicle; ET: embryo transfer; PRP: platelet-rich plasma.
Classifier selection and predictive modeling
Six machine learning algorithms were employed to develop a model to predict ET outcome. Features were ranked by SVM-FS and after examining the performance of the algorithms with different number of features; finally, 78 features were selected. The performance of each algorithm, without and with FS is summarized in Table 3.
Performance of six algorithms without and with feature selection (82 vs. 78 features).
NB, naive bayes; SVM, support vector machine; NN, neural network; RF, random forest; KNN, k-nearest neighbor; DT, decision tree; CA, classification accuracy; SN, sensitivity; SP, specificity; AUC, area under the curve; IS, information score; F1, F- measure; PPV, precision or positive predictive value; MCC, Matthew’s correlation coefficient
As highlighted in Table 3, the best classification performance based on the highest CA and AUC belonged to the NN and RF classifiers. Therefore, among the six classification algorithms, the NN and RF were considered as the best algorithms for ET data. Selected features after FS (78 features) in comparison with all features (82 features without FS) showed better performance (Table 3).
Figure 3 shows the AUC plots of different algorithms. The performances of the RF and NN are slightly better both before and after FS.

Area under the curve (AUC) plots of different algorithms: (a) all 82 attributes and (b) the optimum number of features selected by SVM-FS.
Discussion
The present study showed that machine learning methods could help to predict the implantation outcomes of embryo transfers by determining the essential factors on the ART treatment procedure. Employing more relevant attributes is crucial for building a functional and straightforward prediction model with high performance. 36 Therefore, we applied an adequate ET data set that included comprehensive and detailed features of patient demographics, embryo parameters, and cycle characteristics with a sufficient number of records to train a model by powerful machine learning prediction methods. In the reviewed literature, the maximum number of recruited features in the ART outcome prediction models was 64; 18 however, the minimum number of features was four in Wald et al. 37
To the best of our knowledge, few previous studies related to ART outcome predictions concentrated on only a small number of attributes and limited aspects, 26 which are unlikely to represent all effective factors on embryo implantation. Only five papers of 20 ART outcome prediction models used an almost comprehensive features set.25,30,38–40 However, we increased the number of features, such as semen analysis parameters, differently with earlier studies. 25 In this study, to achieve high prediction accuracy, pivotal factors in ART were considered as different feature groups and analyzed in detail (Table 1).
Remarkably, covering a more extensive range of embryo morphological data including both cleavage-stage (i.e. day 2 and day 3) and blastocyst-stage embryos (i.e. day 5 or day 6 or more) is another important distinction of this study. However, the development of a model with combined embryo patterns of cleavage and blastocyst stages was mentioned as a limitation, and further investigation is suggested in earlier studies. 25 Also, most of them have focused on cleavage stages-related parameters, 25 while blastocyst transfers provide a higher implantation rate compared with the cleavage stages.41,42
There is a growing interest in embryologists to transfer multiple embryos to increase implantation chance and obtain high pregnancy rates, which resulted in high rates of multiple pregnancies and births. The various pregnancies have been documented as a significant public health issue that leads to many maternal and neonatal complications and risks. Therefore, the SET strategy has been recognized worldwide as the only practical solution to overcome this problem and avoid multiple pregnancies in ART cycles, and many countries establish rules for encouraging or mandating increased use of SET.19,41,43–45 Predictive models with decision support capabilities that are based on embryo assessment parameters may facilitate better selection of embryos with the highest implantation potential and facilitate the utilization of a SET policy.
It is crucial to determine which features have a potential role in the prediction of ART treatment outcome. In many of previous studies, the age of woman is the most important attribute in prediction of ART outcomes.6,15,19,25,30,37–39,46 According to the results obtained from the present study, FSH/HMG dosage was a high weighted feature. This feature is related to the controlled ovarian stimulation (COH) phase, which is an initial procedure in the ART cycle to induce the growth of follicles with gonadotropins. It is in accordance with an earlier study that remarked the importance of the total dose of gonadotropins with different COH treatment protocols on the adequate number of retrieved eggs in the success of IVF. 47
Contraception duration was the other crucial prognostic factor determined by this study. In earlier studies, the various contraceptive methods have different health risks, and infertility is one substantial adverse effect on them. It is founded that long-term use of birth control methods, for example, intrauterine device, is associated with increased risk of fertility disability, by causing different types of infertility in women, such as tubal or ovulatory causes.48–50
The female oocytes play a crucial role in the developmental competence of embryo and later on ART results. 51 The immature oocytes result in a significant reduction in IVF success. Therefore, the oocytes that are arrested at the GV stage are incompetent oocytes for fertilization with sperms and embryo development. 52 On the other hand, usually all MII oocytes, as mature oocytes and ready for fertilization, are collected and inseminated. 53 Hence, progressing the maturation cycle to MII phase and considering their number per ART cycle play an essential role in the chance of establishing pregnancy. 54 These are following our feature ranking result that introduced the number of GV and MII quality oocytes as potential factors in the prediction of ET outcome. Results of feature ranking also show that uterus depth is another effective feature in ET outcome. This finding has been confirmed by previous studies that reported the impact of the uterine cavity measure for suitable ET and subsequent pregnancy rate in ART cycles. 55
A straight comparison of the presented results in this study with those in the literature is not possible because of the diversity of research purposes, input data, applied analytical software, algorithms, and strategies for training/testing that play a vital role in the performance of model and selection of highly effected predictive features.
In this study, we used six common machine learning algorithms to develop a prediction model for ET outcomes. The performance of each algorithm was determined by evaluating how correctly they could predict whether embryos were implanted or not implanted, and the gold standards of evaluation metrics were used. The area under the receiver-operating characteristic (ROC) curve is accepted as a reliable and popular performance measure for assessing the quality of classification algorithms in machine learning approaches. 18 The high value of AUC in this study (Table 3) shows the reliability of the presented approach for ET prediction.
The results showed that the prediction performance improved by applying FS and classification threshold optimization. Among the implemented algorithms, NN and RF showed superior performance to NB, SVM, kNN, and DT models, particularly by using a reduced and ranked feature set and optimized threshold of sampling (10-fold cross-validation). The NN algorithm has been used in two studies46,56 as single technique, and in other studies, it has been used along with other algorithms.6,25,37,39,57–60 In accordance with our results, the NN algorithm in comparison with other algorithms has been selected as a more suitable method in ART outcome prediction.37,39,59 Another superior algorithm in our study was RF. In support of this result, in three studies,6,40,60 among the five studies using the RF algorithm along with others, the RF was identified as a better method. Also, in previous studies,18,39 the RF algorithm has achieved performance close to that of the superior algorithm. More studies in this domain have applied approaches based on classical statistical methods, such as logistic regression. A few rare studies in the literature were performed based on machine learning techniques that were different from current research in terms of aim and target variable, the number of feature sets, recruited algorithms, and results.
In this study, we faced several limitations. One of the restrictions was generalization of our the proposed model, since the input data in this work come from a single source, and thus may yield that this model works in a given clinic but cannot necessarily be transferred to other clinics without adaptation of the algorithm parameters to clinic-specific characteristics (e.g. culture conditions, fertilization method, media). Another fundamental limitation was the absence of any electronic documentation at the investigation center that resulted in many problems and significant time consumption on data gathering and data entry. Illegibility of paper-based records, incomplete patients records, and missing values affected the performance of classifiers and FS algorithms. Due to these limitations, we had to eliminate some important variables such as culture medium because majority of values were missed. Due to the lack of electronic dataset and unavailability of public registries, the data sharing in this domain is one of the major challenges. 25
Using implantation and primary β-HCG test (after embryo transfer) versus live birth as an endpoint for the model development provides the opportunity of investigating different variables in the ART cycle. However, a positive β-HCG does not grantee a live birth, which is not focused in this study. Also in this study, β-HCG values collected from patient records were not homogeneous (different laboratories and immunoassay methods led to a different level of β-HCG), 61 which may influence the accuracy of the presented model. The issues mentioned above also had been documented in earlier studies.6,25,44
This study shows that the combination of different subsets of ET attributes and efficient machine learning algorithms can significantly improve and boost the predictability of the embryo implantation outcome. Hence, the results of prediction by the model could help make more accurate decisions in embryo selection by embryologists and minimize the current challenges in ART treatments. The proposed model can reduce costs of ART treatments by preventing repeated ART cycles. High expenditures of ART cycles is one of the major barriers that have significant economic effects on communities.62,63 Unlike high expenses, IVF/ ICSI are not covered by health insurance. 64
Conclusion
The application of computational approaches in the prediction of human embryo implantation can increase pregnancy rate after ART treatments. The proposed machine learning-based model can provide a clinical decision support tool to clinicians and infertile couples to consider the chances of success before the treatment procedure. Since this model integrates the experiences of all experts and the history of treatments into a single computational tool, learns from the past cases, and analyzes several embryos and patient records, it can make predictions in minimum time with less subjectivity, human bias, and higher precision. Also, using such an intelligent model is expected has a promising benefit in the selection of the best embryo with the highest implantation potential to transfer in IVF/ICSI treatment and may be used as an educational assistant tool for embryologists.
Our findings support the possibility and benefits of these applications, and the results of the actual use of this tool in clinical practice by a prospective trial are highly valuable. As future work, collecting similar datasets from various infertility clinics and applying the proposed predictive model for covering a wider range of ET data distributions toward external validation and impact analysis of the model is recommended. A further extension of this study could focus on the ultimate goal of treatment (healthy birth after positive β-HCG). Rich datasets with the variability of intervening parameters in this domain, such as embryological factors (i.e. genetic screening, time-lapse monitoring, and morphokinetic parameters of the embryo), and further examinations on the female/male partner (i.e. metabolomics and extra hormonal tests) could improve the accuracy of this model.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This research was a part of an MSc thesis that approved and funded by Tabriz University of Medical Sciences.
