Abstract
This study envisages assessing the effects of the COVID-19 on the on-time performance of US-airlines industry in the disrupted situations. The deep learning techniques used are neural network regression, decision forest regression, boosted decision tree regression and multi class logistic regression. The best technique is identified. In the perspective data analytics, it is suggested what the airlines should do for the on-time performance in the disrupted situation. The performances of all the methods are satisfactory. The coefficient of determination for the neural network regression is 0.86 and for decision forest regression is 0.85, respectively. The coefficient of determination for the boosted decision tree is 0.870984. Thus boosted decision tree regression is better. Multi class logistic regression gives an overall accuracy and precision of 98.4%. Recalling/remembering performance is 99%. Thus multi class logistic regression is the best model for prediction of flight delays in the COVID-19. The confusion matrix for the multi class logistic regression shows that 87.2% flights actually not delayed are predicted not delayed. The flights actually not delayed but wrongly predicted delayed are12.7%. The strength of relation with departure delay, carrier delay, late aircraft delay, weather delay and NAS delay, are 94%, 53%, 35%, 21%, and 14%, respectively. There is a weak negative relation (almost unrelated) with the air time and arrival delay. Security delay and arrival delay are also almost unrelated with strength of 1% relationship. Based on these diagnostic analytics, it is recommended as perspective to take due care reducing departure delay, carrier delay, Late aircraft delay, weather delay and Nas delay, respectively, considerably with effect of 94%, 53%, 35%, 21%, and 14% in disrupted situations. The proposed models have MAE of 2% for Neural Network Regression, Decision Forest Regression, Boosted Decision Tree Regression, respectively, and, RMSE approximately, 11%, 12%, 11%, respectively.
Introduction
COVID 19 has drastically affected every walk of life. The air transportation infrastructure is no exception. Numerous regions adopted travel restrictions initiatives in response to the pandemic situation arising from the spread of corona disease 2019 (COVID-19), which was caused by the extreme acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This affected both domestic and international traveling across the world [1]. After imposing a preliminary initiative constraining the northern territory of Lodi on February 21st 2020, Italy became the first country to impose a nationwide lockdown on March 9th 2020, [2]. On March 11th, 2020, two days after Italy announced its lockdown, the US barred non-US travelers from entering the country who had visited China, Iran, or any of the EU’s 26 member states. On March 16th, 2020, the restriction was expanded to non-US visitors who had toured the Great Britain and Ireland [1]. Later on, the EU formally sealed the borders for all its member nations to almost all non-EU citizens. The US State department released a Level 4 Global Health Travel Advisory on March 19, 2020, cautioning all US people against foreign travel. Such devastating series of incidents becomes the thread in opposition to the air transportation industry forced to gradually put itself into a semi-comatose state in order to overcome the rapidly rising hygienic and economic issues [3].
The information from the Customs and Border Protection (CBP) website’s “Airport Wait Times,” depicts the passengers’ arrival at US immigration through entrance via all the airports [4]. Monmousseau et al., [5] presents a comprehensive overview of the accessible CBP dataset, as well as a study of the wait times at the US airport immigration facilities from Jan 2013 to Jan 2019.
The air transportation infrastructure is critical to comprehend and research in pandemic circumstances from different viewpoints, such as disease transmission within planes [6], outbreak transmission through flights [7], and the impact of travel bans on airline business [8]. The impact of the outbreak on traveller attitudes against airlines remained the main subject of the studies throughout the pandemic circumstances. Throughout 2019, people on twitter mentioned thirty-four main US airports services in a median of 13,255 tweets referring an airport and 295,904 tweets discussing an airline, suggesting the need for better flights and airports services.
Flight bans and other initiatives implemented by the number of countries around the world are having a massive effect on the airline industry. The data collected by US Department of Bureau of Transportation Statistics depicts the several foreign and domestic flights affected during March 1st to April 22nd 2020, based on BTS and CBP statistics since June 24th 2020. The daily internal flights declines by 50% in the second half of March 2020, according to the dataset. Although not technically sleeping, numerous flights continued to be operated by airlines for fear of losing their slots [9] or because they needed to keep operating routes with intention of obtaining financial help [10], a circumstance that were similar to a sleep disorder known as “nightmare”.
The uncertainty in the COVID-19 situations makes it difficult to predict the real future of air travel. Truong [11] developed neural network models for predicting international and domestic travel based on daily travel by distance, economic status, COVID-19 situation, and travel bans. The study concluded that weekly economic index (WEI) contribute more to air travel. Distance played important role in domestic and international travel. Travel bans affected both the international and domestic air travel. Moreover, air travel will takes years to back to normal, even after the restoration of the economy from the jerks of COVID-19.
This study is organized as follows. After the introduction the problem statement is presented in section 2. Related studies are presented in section 3. In section 4 research gap is highlighted. In section 5 theoretical frameworks is presented. Section 6 provides information about the dataset and factors. Formulation of the problem is presented in section 7. Results and discussion are in section 8. Responses to hypothesis are recorded in section 9 and the study concluded in section 10.
Problem statement
Critical analysis is required to know the affect of COVID 19 on the air transportation. It is required to know the effect on important issues, such as disease transmission within planes, transmission through flight, and the impact of travel bans on overall airline industry. Although, flight bans are required for controlling the spread of the COVID 19, they drastically affect the airline industry business. Due to COVID 19 a large amount of flight schedule disruption have been reported. The result is in the form of negative affect on the on-time performances of the airlines with long flight delays. Whereas the passengers are increasingly becoming services quality sensitive; the drastic change in the airline schedule with long flight delays negatively affect the airline good name, passenger satisfaction, time and money. However, since the flight ban has been strictly implemented, a common query is to know how much the flight is delayed? One possible answer is indefinite time delay or can’t say anything, but such a response is going to create more curiosity and confusion among the passengers. Moreover, a response of such type sounds low while living in a modern advanced era. Thus, it is needed to predict correctly how much the flight is delayed in the disruption situations? Furthermore, the accuracy and reliability of the prediction is inevitably important in pandemic situations. It is required to choose proper, correct, reliable and accurate method of prediction in such disrupted situations. It is reported that the modern techniques of the artificial intelligence and machine learning possess extraordinary prediction capabilities. Thus in this research the techniques of artificial intelligence and machine learning are utilized for prediction of flight delays in the COVID 19 pandemic situations.
Due to the outbreak of the COVID 19, many flights were delayed causing a major disruption in the flight traffic control. To predict the correct status of the flight is important information for an airline business and better passenger service in the pandemic situation. This knowledge can save time, money, energy and result in better passenger service. It is reported that artificial intelligence and machine learning techniques have tremendous prediction capabilities. Thus the problem of the proposed research problem is “How to exploit the prediction capabilities of the artificial intelligence and machine learning techniques for predicting flight delays in COVID-19 pandemic situations?” The proposed research has the following objectives. To study the pandemic effect on the flight delays with the help of the modern AI tools. To use neural network regression, decision forest regression, boosted decision tree regression and multiclass logistic regression for predicting flight delay in COVID-19 pandemic situation. To perform feature-target correlations for identifying most important feature contributing to flight delays in pandemic situation.
The inputs-target correlation is computed to response to the following hypothesis following Seok et al., [12].
H1: The departure delay of the flight is directly related to the arrival delay in the COVID 19 situations.
H2: The air time of the flight has a positive relationship to the arrival delay in the COVID 19 situations.
H3: The carrier delay of the flight is directly related to the arrival delay in the COVID 19 situations.
H4: The weather delay of the flight has a positive relationship to the arrival delay in the COVID 19 situations.
H5: The NAS delay of the flight is directly related to the arrival delay in the COVID 19 situations.
H6: The security delay of the flight has a positive relation to the arrival delay in the COVID 19 situations.
H7: The late aircraft delay of the flight is positively related to the arrival delay in the COVID 19 situations.
Related studies
Airline flight delays in normal situations have been previously addressed by studies, however, it has not been properly addressed in the COVID 19 pandemic situations. In the normal situation the flight delay / cancellation can help provide important information for decision making for airlines, airports, and air traffic control. It can be an early information to airline customers so that they can reschedule the itinerary beforehand. The advance rescheduling can reduce the loss of time, money and service. It is reported that flight delays and cancellations have socio-economic and environmental effects [13, 14]. Zhixing et al., [15] concluded that flight network characteristics, resilience and flight delays are directly and indirectly related to each other. Wang et al., [16] used structural properties of network to reduce flight delays and compared the network structure of USA and China. They used the data from 1 August 2012 to 31 August 2013 with 196 airports. Zhou et al., [17] analyzed the impact of flight delays using the data of flight arrivals and departures in China from December 1st to 31st 2000.
Predicting the flight delays and cancellations had been the main theme of various major studies in the past [18–21]. In [22], machine learning based regression was used for flight delays hours ahead of the flight operation. The flight delay states and flight schedule information was taken as input features to the regression model. The authors showed that the on-time performance was largely effected by the delay states of the flight. Ding [23] used multiple linear regression model for prediction of airline flight delays with 79.1 % accuracy. Kenan et al., [24] addressed to reduce flight delays with optional flights. Gui et al., [25] recommended random forest technique for flight delay prediction with 90.2% accuracy. Lambelho et al., [26] used LightGBM, Multilayer Perceptron, and Random Forests for flight delays and cancellations in Heathrow airport. They used the method of binary classification. Stefanovi¡c et al., [27] predicted deviation from the scheduled flight time using machine learning for lithuanian airports. Yazdi et al., [28] used Levenberg-Marquart technique for predicting flight delays.
Disease transmission in planes and outbreak spreading with flights become the bases of flight bans and air transport. Baspinar and Koyuncu [29] addressed the issue of flight delays caused by spread of epidemic. The authors studied two different epidemic models, one was flight-based model, whereas, the second was airport-based model. The authors considered data form 10 busiest airports in Europe to study the spread of infection in air traffic between 13:30 to 16:30 on June 02, 2015. Nowzari [30] commented on the spread of epidemic in a network. Perotti et al., [31] observed that network sparsity (Low interaction time), slowdown the Suspectable-Infectous SI spread of disease. Mou et al., [32] pointed that the temporal sparsity (inter-event interaction time) and the time of flight slow down the infection transmission rate. They used the data of 1,627 airline routes with14,268 flights for 2014 of Chinese Aviation Network. Bussell et al., [33] utilized control theory strategies to control the spreading of informed humans, plants and animals diseases. They addressed the disease management strategies, to practically control the transmission of infectious diseases. Alamo et al., [34], used a 3M strategy, Monitor, Model and Manage to control COVID 19 transmission. Li et al., [35] studied delay propagation model using an integrated airport-based Susceptible-Infected-Recovered-Susceptible (ASIRS) epidemic model. They used the flight data of the Civil Aviation Administration of China (CAAC) from June to December in 2015 with 93630 flights.
Gaps in the literature
Alla et al., [36] used multilayer perceptron neural network for predicting flight arrival delay of the American airlines. The data used by the authors was for the American airlines for the time space between 1st of January to the 31st of December 2018 with total of 760000 flights. In our proposed model the flight delay and cancellation data will be from January 2020 to Jun 2020, with 2745847 flights, covering the time span of the advent of COVID-19 pandemic. Thus there is a promising population gap. Secondly, our data pertains to the advent of COVID 19. Moreover, the study [36] considers only 10 input features to predict flight delay. In our proposed model we focus on more important numerical features. Thus there are promising methodological, evidence, empirical, population and practical knowledge types of gaps with [36]. Zeng et al., [37] simulate data from 325 airports in the United States from 2015 to 2018, portraying the situation before the advent of COVID-19 situation. This motivates there are promising methodological, evidence, empirical, population and practical knowledge types of gaps with [37]. Niu et al., [38] used control theory approach for flight delays in complex airline networks. Thus there are promising methodological, evidence, empirical, population and practical knowledge types of gaps with [38]. Zoutendijk and Mitici [39] used machine learning to reduce the flight delay at Rotterdam The Hague Airport (RTM) between 1 January 2017 and 29 February 2020. A total of 17,365 departing and 17,336 arriving flights were taken into consideration. Because in [39] only one airport with approximately 17365 flights were considered, whereas, the current model will consider 2745847 flights information, depicting promising methodological, evidence, empirical, population and practical knowledge types of gaps with [39]. Bandyopadhyah et al., [40] utilized the data of January 2019 to train a flight cancellation model and test it with the data of January 2020. The proposed study possess promising methodological, evidence, empirical, population and practical knowledge types of gaps with [40]. Thus we can refine the research gap easily from these studies. The research gaps with renowned studies are refined and presented in Table 1. From a through literature review, we come to know that no model has been trained for flight delay prediction in the COVID 19 disruption situation. Thus the proposed study envisages to train and test AI based prediction models for airline flight delay prediction in the COVID 19 disruption situation.
Identification of the gaps and its types
Identification of the gaps and its types
The Table 1 summaries that with most of the renowned research, the proposed study have methodological, evidence, empirical, population and practical knowledge types of gaps. Thus there exist promising gap to work on the proposed research study.
Airline service quality and schedule disruption are major factors affecting the airline business. Studies have demonstrated that airline capacity reductions disproportionately affect passengers, emphasizing the discrepancies between quantifying aircraft delays and cancellations and evaluating the real passenger delay [41–44]. According to statistics from a prominent US airline, disrupted passengers, whose routes were disrupted due to a capacity drop, account for just 3% of overall passengers, but account for 39% of entire passenger inconvenience.
The inclusions of passenger-centric strategies were advocated in USA [45] and Europe [46] when assessing the air transportation mechanism. Cook et al., [47] made the first effort at incorporating passenger-oriented metrics. The ideas of Multimodal, Efficient Transportation in Airports and Collaborative Decision Making (META-CDM) were established to include passenger interests in airport procedure of decision making [48–50]. Despite the fact that these efforts offer passengers a prominent role, they nonetheless rely significantly on flight-centric data and hence suffer from the same latency limitations. After multiple years, the desired move from flight-centric to passenger-centric metrics has yet to be adopted by regulatory authorities. EUROCONTROL and the FAA offered timeliness indicators that combined airline and passenger perspectives into a unified picture (EUROCONTROL and FAA [51]). Lemer [52] called for integrated airport efficiency indicators which would address the needs of travelers, airlines, and airports, as well as the demands of many other players (such as restaurants or governments). Many experiments have been conducted to better understand the travel experience or at minimum the passenger expectation of aviation and airline efficiency. Tsaur et al., [53] suggested fuzzy theory to conduct surveys to assess airline service efficiency. Hunter [54] conducted a comprehensive survey of aviation perception studies from 1995 to 2006, highlighting the decline in customer support across the airlines sector. De-Oña and De-Oña [55] performed a study of survey-based research of public transit system for more detailed information. They concluded that while researchers continue to attempt to boost the sophistication of models in order to better predict customer comfort in public transportation, management and practitioners use simplified models to achieve their target of maximizing passenger perceived service efficiency in exchange for money. Passenger assessments undertaken for airports or airlines, indeed very comprehensive, remained restricted to very specific samples of passengers and brief time span, and may not be representative. Tsaur et al., [53], for instance, has a sampling data of 211 passengers, while Pakdil and Aydın [56] has a sampling set of 385 passengers in their survey studies. They are often costly and time-consuming to introduce, rendering their usage for assessing the impact of significant perturbations on the air transportation environment, such as the COVID-19 disease outbreak, inefficient and challenging to update.
The widespread availability of smartphones has made it simpler to measure the performance of the air transport industry utilizing passenger-generated information. Passenger behavior at airports [57, 58] and transit stations [59] is studied using information from Wi-Fi hotspots and Bluetooth beacons, along with historical data. Information created by passengers’ smart phones and obtained by phone carriers, if appropriate, can be used to evaluate passengers’ door-to-door activity under both nominal and deteriorated circumstances [60–62]. From this literature it is clear that the airline passengers are increasingly becoming service quality sensitive.
Artificial intelligence and machine learning techniques have extraordinary capabilities of predicting the occurrence of a future event. Minimum schedule disruption and best passenger services affect the airline business ([41–48, 50]). In the study, Pakdil and Aydin [56], the authors emphasized the need for preparedness in disruptions, emergency and unexpected situations for the airline industry. Reliable expectations need efficient and accurate prediction mechanisms. The modern deep learning based machine learning algorithm can memorize big data for the purpose of predicting the future unexpected situations. Thus in this study the memorizing power of the modern deep learning algorithms are exploited to predict the flight delays in the COVID 19 pandemic situation. Thus AI based computer integrated systems are developed to predict whether the flight will be delayed or not? Moreover, it is important to know which of the features contribute more towards flight delays during the COVID-19 pandemic situations. The theoretical framework can be presented graphically as shown in Fig. (1). The framework depicts that when the passengers are service quality sensitive, the outbreak of the COVID 19, and the belief that the disease transmit through planes, impose ban on air travel. Due to the ban a huge flight schedule disruption occurred and the passengers want to know whether a certain flight is delayed or not? Thus the powerful prediction tools of AI are used to response to the event of flight delay prediction in the pandemic situation.

Graphical presentation of theoretical framework.
Thus we memorize / train our model with 2196678 flight delay information. We further test the accuracy of our proposed model with 549169 flight delay information. In other words, we try to teach / memorize / learn our system 2196678 flight delay / cancellation patterns. Then we test with 549169 flight delay patterns, and let the system to check whether a flight with specific features will be delayed?
Figure (2) shows the research design adopted in this study for the predicting the flight delay during the COVID 19 situation. The techniques of the artificial intelligence used for prediction and interpretations of flight delays are as under. Neural network regression. Decision forest regression. Boosted decision tree regression. Multi-class logistic regression.

Research design of deep learning based flight delay prediction during COVID 19.
COVID-19 has severely crippled the global airline industry. All the major air services were thoroughly reduced throughout the year 2020. The data of 2745847 flights with 47 delays features were obtained from Kaggle (2020). The data is further obtained by (Kaggle, [63]) from The United States Department of Transportation’s (DOT) Bureau of Transportation Statistics ([64]). The department detail information for the on-time performance of flights for major US carriers. The data is from January - June 2020 containing all the relevant flight information from the Top 10 United States flight carriers with 2745847 flights during the pandemic time. The unit of analysis for this research is the US Airline industry. The study will help to predict the airline flight delays in COVID 19 pandemic situations. The system is trained with 2196678 flight delay data during the COVID 19 situation. The data of the 549169 flights are used for testing to check the accuracy of the model. The total dataset consists of 2745847 flights delay information during the COVID 19 situation. In all deep learning techniques the data is usually subdivided into the training and testing data sample. It is common practice to allocate 80% of the data to training sample and 20% to testing sample. In other words 80% of the dataset is given to the model for training / memorizing and the rest 20% check how the model have memorized the dataset for prediction. Thus the totality of dataset is subdivided into training and testing samples having data of 2196678 and 549169 flights, respectively, during the COVID 19 situation. The following features are used for the prediction of flight delays in the COVID-19 situations.
AIR_TIME: Total elapsed time the airplane remains in air and measured in minutes.
ARR_DELAY: Measured in minutes.
CARRIER_DELAY: Measured in minutes. The cancellation or delay due to circumstances within the control of and airline. For example, crew unavailability, maintenance, aircraft cleaning, fueling, baggage loading, fueling, etc.
DEP_DELAY: Measured in minutes.
DISTANCE: Measured in miles from the origin to destination airport.
LATE_AIRCRAFT_DELAY: Measured in minutes. This type of delay is caused by a previous flight arrived late, this situation causes the present flight to departure late.
NAS_DELAY: Measured in minutes. Delays and cancellations due to national aviation system. For example, airport operations, non-extreme weather conditions, heavy traffic volume, air traffic control.
SECURITY_DELAY: Measured in minutes. Delays / cancellations caused by emergency evacuation of a terminal or concourse. Re-boarding of aircraft caused by security breach, fault in security screening equipment. Long waiting lines at security screening areas exceeding 29 minutes.
WEATHER_DELAY: Measured in minutes. Extreme meteorological conditions causing delays / cancellations of a flight. For example, winds, heavy rains, hurricane, tornado, floods, blizzard.
Formulation of the Proposed Neural Network Regression for Flight Delay Prediction
A neural network is composed of interconnected nodes and edges forming layers. The first and last layers are termed as input layer and output layer, respectively. There may be many hidden layers between these two layers. Majority of the prediction problems are solved by having one or a few hidden layers. However, deep neural networks may have even hundreds hidden layers. The interconnected layers have higher levels of semantic depth. Training of the network on the given data is performed to know the relationship between inputs and outputs. The graph starts from inputs to hidden and ends with output layer. The nodes succeeding layers are connected to next layer with the help of weighted edges. To compute the output for an input, an activation function is used to calculate the value at nodes. Each layer calculates a value based on the weighted sum of values that is calculated at nodes of the preceding layer.
In the neural network, the inputs are multiplied with the weights at the hidden layers to create node “j”. For example the output of the first hidden layer is given as (1). Here
(1) For the second hidden layer the output is given in (2). In the second layer f2 is applied on the output of
The output at the third hidden layer at node “j” can be calculated (3). In the third layer f3is applied on the output of
Finally the output at the “j” th node of the output layer is given as (4). In the output layer θ
j
is applied on the output of
In case the network has “k” hidden layers, the output at the k th hidden layer is given as (5).
Finally the output at the output layer is given in (6). In the output layer θ
j
is applied on the output of
The performance of neural network based models are measured in the form of mean solution error (7), root mean square error (8), relative absolute error (9), relative squared error (10), and coefficient of determination (11).
Here, o
i
, t
i
denote the predicted target and the actual target, respectively.
Thus Equations 7–11 are used for performance evaluation.
The machine learning techniques used in this study are neural network aggression, decision forest regression, boosted decision tree regression and multiclass logistic regression. The parameters of the models are presented in Table (2). The neural network regression uses a cross entropy loss function. The loss function ensures how the prediction is close to the actual value. Its value range from 0–1, with 0 representing the perfect match and 1 the completely mismatch. Learning rate of the problem is 0.005. The learning rate is the changing parameter, setting the step size at each iteration to come close to the minimum loss function. The network performed 100 iterations with minmax normalizer. The normalizer transform the original data into the (0, 1) range, it preserve the relationship of the original data. The initial weights from which the machine start learning is 0.1. The system perform shuffling with random seed. A random seed is a initial random number generator for the system to start the iterative process. The method allow unknown level to create groups of unknown values in training and validation. This allows the method to better predict the unknown values.
Methods and parameters
The decision forest set up 8 decision forests for the proposed problem. The max depth is the depth of each tree in the forest. Deep tree has more depth capturing more information for a problem. The depth of the tree range from 1–32. The minimum sample at each leaf is 1 shows the minimum number of samples to reach a leaf (decision). The method allow unknown level to create groups of unknown values in training and validation. This allows the method to better predict the unknown values. The resampling method used is bagging starting with a random seed of 5. In the process of bagging also known as bootstrapping decision trees in the forest are grown on randomly selected sample of the original dataset with replacement and the process continues until a dataset of the size of the original dateset is obtained. Bagging uses voting its aggregation policy. Each tree score unnormalized histograms of labels, which are normalized as probabilities. Trees with high probabilities have greater change to be the decision of the ensemble.
The boosted decision tree method setup 100 decision trees. Ten samples are required to reach a leaf (decision). The method allow unknown level to create groups of unknown values in training and validation. This allows the method to better predict the unknown values. The resampling method used is boosting. In boosting each tree is dependent on the prior trees. The second tree corrects errors in first. Likewise the third tree correct errors in second and first and so on.
The multiclass logistic regression use a combination of L1 and L2 regularization techniques. Regularization is a method used for penalizing models with extreme coefficient values. In this method a penalty is charged with coefficient values to reduce the errors. L1 is used for sparse models whereas L2 for non sparse. Spare data is high dimensional where the values are not exactly known. The optimization convergence value for the multiclass logistic regression is 0.0000007. The method allow unknown level to create groups of unknown values in training and validation, thus better predicting the unknown values.
Simulation models and machine learning models are opposite in characteristics. In this study machine learning techniques are adopted thus the model may not necessarily exhibit the characteristic properties associated with simulation studies. The difference between the two is that, in simulation study, the model is known exactly but the unknown inputs are not. On the other hand in machine learning the unknown inputs are known exactly, but the model is not exactly known [66, 67].
After training the model on a set of 2196678 flight data, it is tested on a data of 549169 to check the performance of the learned model. The results of the proposed neural network regression are shown in Table 3. The Table 3 takes into consideration the flight information data and predict whether a flight with certain flight information is going to be delayed or not? In the Table 3 the second column shows the result that whether the flight is going to be delayed or not? Moreover the last column named Scored Labels shows the degree of assurance that the flight is going to be delayed. In Table 3, only few flights out of 549169 are shown and their predictions are displayed.
Table 4 shows the results of testing for the decision forest regression for some flights out of 549169. The Table 4 takes into consideration the flight information data and predict whether a flight with certain flight information is going to be delayed or not? In the Table 4 the second column shows the flight arrival delay status. Moreover the last two columns named Scored Label Mean and Scored Label Standard Deviation, respectively show the degree of assurance of flight delay and the standard deviation of the prediction. In Table 4, the predictions for some flights out of 549169 are shown.
The testing results of the boosted decision tree are shown in Table 5. Training the model on a set of 2196678 flight data and then testing on a data of 549169 flight information data. In the Table 5 the second column shows the result that whether the flight is going to be delayed or not? Moreover the last column named Scored Labels shows the degree of assurance that the flight is going to be delayed. Table 5 shows the predictions of only few flights out of 549169 tested flights.
Moreover, the decision forest constructs hundred decision trees. Two decision trees are shown in the Fig. 3. By clicking on the edges of the trees, the prediction probabilities of the certain decision tree can be seen.

Two decision trees out of 100, by boosted decision tree regression.
Table 6 shows the prediction results of the multi class logistic regression. Table 6 shows that the logistic regression predicts the status of the flight with three parameters. The last three columns show prediction of a flight with certain information. The scored probabilities for class “False” and the scored probabilities for the class “True”, shows the probabilities of a certain flight in the delayed and not delayed category. Table 6 shows the predictions for some flights out of 549169 tested flights.
The performances of the three methods are compared in Table 7. The performances of all the methods used are satisfactory. From Table 7 it is clear that the mean absolute error of the boosted decision tree and decision forest regression is 0.02697. The coefficient of determination for the boosted decision tree is 0.870984. This shows that 87% of the variations in the arrival delay are explained by the inputs in the model. Thus the performance of the boosted decision tree regression is best. Moreover, the graphs for the errors in the neural network regression, decision forest regression and boosted decision forest regression are shown in Fig. 4.
Comparisons of the performances of the three method for arrival delay prediction

Errors of the three method for arrival delay prediction.
The Table 8 shows that multi class logistic regression gives an overall accuracy and precision of 98.4%. Recalling/remembering performance is 99%. Thus multi class logistic regression is the best prediction model for prediction of flight delays in the COVID-19 situations. The confusion matrix for the multi class logistic regression is shown in Fig. 5. The Fig. 5 shows that 87.2% flights which were not delayed were predicted not delayed by the proposed model. The flights which were actually not delayed but were wrongly predicted as delayed by the model were 12.7%. There is no flight which was actually delayed but wrongly predicted not delayed by the model. All the flights which were delayed were predicted delayed by the model.
Results of the multi class logistic regression for arrival delay prediction

Confusion matrix for multi class logistic regression for arrival delay prediction.
The model is cross validated to assess the variability and reliability of the model. Cross validate evaluates the performance of the model by considering a bigger data space. Instead of a part, it uses the entire training dataset for training and evaluation. In the process the model is subdivided into 10-folds (samples), build new model for each fold, and finally returns the accuracy metrics for each fold. The model set aside data in fold 1 for validation and the remaining folds are used for training. For example, if the data is divided into 10 folds, then it generates 10 models so that one-tenth of the data is used for validation and nine-tenth for training each fold.
The cross validation results for the neural network regression are presented in Table (9). The entire training dataset is divided into ten fold i.e (0–9). The results illustrate that the representative coefficient of determination for all the folds is 0.86. Moreover, the representative MAE, RMSE, RAE, RSE, are 0.043, 0.11, 0.20, 0.13, respectively.
Cross validation for the neural network regression
The cross validation results for the decision forest regression are presented in Table (10). Here the entire training dataset is divided into ten fold i.e (0–9). The results illustrate that the representative coefficient of determination for all the folds is 0.85. Moreover, the representative MAE, RMSE, RAE, RSE, are 0.026, 0.12, 0.12, 0.14, respectively.
Cross validation for the decision forest
The cross validation results for the boosted decision tree regression are presented in Table (11). The entire training dataset is divided into ten fold i.e (0–9). The results illustrate that the representative coefficient of determination for all the folds is 0.87. Moreover, the representative MAE, RMSE, RAE, RSE, are 0.026, 0.11, 0.11, 0.12, respectively.
Cross validation for the boosted decision tree regression
The cross validation results for the muti class logistic regression are presented in Table (12). The entire training dataset is divided into ten fold i.e (0–9). The representative precision and recall for the false class are 0.99 and 0.87, respectively. The representative precision and recall for the true class are 0.98 and 0.99, respectively.
Cross validation for the multiclass logistic regression
Time complexity is the total running time required for a computational model to complete its operations.The time complexity of algorithms is expressed using the big O notation. Model complexity can be determined by many factors. For example, the number of training instances, the number of number of features, the number of layers (depth) and the number of neurons in hidden layers (width) in a given model, as well as the linear, nonlinear nature of the problem. There are sophisticated and automated methods to control and reduce model complexity. These include features and model selection, linear model, subset selection, shrinkage methods and regularization. Machine learning techniques take relatively less time to train, ranging from a few seconds to a few hours. Deep learning techniques have many parameters, big data, many hidden layers thus taking longer time to train. For example an algorithm, ResNet, takes around two weeks to train. It is worth noting that time complexity is important in real-time systems. For training based systems, time complexity does not matter due to availability of highly computational devices such as GPU’s, Virtual Machines, TPU’s. Once trained, they can be used as real-time systems with less complexity.
Let denote n = number of training examples, m = number of features, k’ = number of trees. The train time complexity of the neural network regression is O(n) and its test time complexity is O(m). The train time complexity of the decision tree algorithm is O(n*log(n)*m), its test time complexity is O(m). Decision Forest has train time complexity O(k’*n*log(n)*m) and test time complexity O(m*k’). Moreover, Logistic Regression has train time complexity O(n*m) and test time complexity O(m) [68].
The hypothesis testing help to identify which feature contribute more to the flight delays in the COVID 19 situations. Figure 6 and Table 13-14, show that arrival delay has strong relationship with departure delay, carrier delay, Late aircraft delay, weather delay and Nas delay, respectively. Whereas, for the other variables its relationship is random or unrelated/weak related.

Relationship / correlation of the arrival delay with related factors.
Strength relationship / correlation of the arrival delay with related factors
Correlation with probability of the arrival delay with related factors
Based on the calculation in Tables 13-14, and Fig. 6, the following deductions are drawn for the formulated hypothesis.
H1: The departure delay of the flight is directly related to the arrival delay in the COVID 19 situations.
The null hypothesis is accepted as there is a strong relation 0.94% between the departure delay of the flight and the arrival delay in the COVID 19 situations.
H2: The air time of the flight has a positive relationship to the arrival delay in the COVID 19 situations.
The null hypothesis is rejected, as there is a minute negative relation –0.0046% of the air time with the arrival delay in the COVID 19 situations.
H3: The carrier delay of the flight is directly related to the arrival delay in the COVID 19 situations.
Accept the null hypothesis as there is significant relation 0.53% between carrier delay of the flight and the arrival delay in the COVID 19 situations.
H4: The weather delay of the flight has a positive relationship to the arrival delay in the COVID 19 situations.
Accept the null hypothesis as there is significant positive relationship between the weather delay and arrival delay in the COVID 19 situations. The relation is 0.21%.
H5: The NAS delay of the flight is directly related to the arrival delay in the COVID 19 situations.
Accept the null hypothesis as there is positive relationship between the NAS delay and arrival delay and the strength of the relation is 0.14%.
H6: The security delay of the flight has a positive relation to the arrival delay in the COVID 19 situations.
Reject the null hypothesis as the positive relation between the security delay and arrival delay is weak and its strength is 0.016%.
H7: The late aircraft delay of the flight is positively related to the arrival delay in the COVID 19 situations.
Accept the null hypothesis, as the late aircraft delay and arrival delay in the COVID 19 situations were significantly positively related and the strength of the relation 0.35%.
Figure 6 and Table 13-14, it is clear that the arrival delay has relationship with departure delay, carrier delay, Late aircraft delay, weather delay and Nas delay, respectively. The strength of relation with departure delay, carrier delay, Late aircraft delay, weather delay and Nas delay, 94%, 53%, 35%, 21%, and 14%, respectively. There was a weak negative relation (almost unrelated) with the air time and arrival delay. Security delay and arrival delay are almost unrelated with strength of 1% relationship.
Finally compare the proposed model with the models presented in the existing literature. The model is compared for MAE and RMSE with some studies (Alla et al., [36]; Zoutendijk and Mitici [39]; Yu et al., [21]). Moreover, the proposed model is compared for accuracy, precision and recall metrics with some studies (Yu et al., [21]; Bandyopadhyah et al., [40]). The Table 15 depicts that the proposed model have MAE of 2% for proposed Neural Network Regression, Decision Forest Regression, Boosted Decision Tree Regression, respectively. RMSE for Neural Network Regression, Decision Forest Regression, Boosted Decision Tree Regression are approximately, 11%, 12%, 11%, respectively. Table 15 summarizes that the proposed techniques are better than studies (Alla et al., [36]; Zoutendijk and Mitici [39]; Yu et al., [21]) in terms of MAE and RMSE. Furthermore, the Multi Class Logistic Regression is better in accuracy, precision and recall, respectively from (Yu et al., [21]; Bandyopadhyah et al., [40]). Figure 6 and Table 13-14, illustrate the relationship of the arrival delay with departure delay, carrier delay, Late aircraft delay, weather delay and Nas delay, respectively. The strength of relation with departure delay, carrier delay, Late aircraft delay, weather delay and Nas delay, 94%, 53%, 35%, 21%, and 14%, respectively. Weak negative relation (almost unrelated) with the air time and arrival delay exist. Security delay and arrival delay are almost unrelated with strength of 1% relationship.
Comparisons of the proposed methods with literature
The proposed research can be extended in many directions. It can be extended to predict flight delays in other disrupted situations. AI based Flight delay predictions can be implemented in situations such as floods, snowfall, earthquake, winds, hurricane, blizzards and agitated weather conditions. Moreover, the various air transport issues can be addressed using the techniques of the artificial intelligence and machine learning. For example airline revenue maximization, security checking, on-time operations, air traffic prediction, flight to gate assignment, crew scheduling, fleet scheduling, and flight operations.
Artificial intelligence techniques namely neural network regression, decision forest regression, boosted decision tree regression and multi class logistic regression are used for airline flight delay prediction. The performances of all the methods used are satisfactory. The mean absolute error of the boosted decision tree and decision forest regression is 0.02697. The coefficient of determination for the neural network regression is 0.86 and for decision forest regression is 0.85, respectively. The coefficient of determination for the boosted decision tree is 0.870984. This shows that 87% of the variations in the arrival delay are explained by the model. Thus the performance of the boosted decision tree regression is better. Multi class logistic regression gives an overall accuracy and precision of 98.4%. Recalling/remembering performance is 99%. Thus multi class logistic regression is the best prediction model for prediction of flight delays in the COVID-19 situations. The confusion matrix for the multi class logistic regression shows that 87.2% flights which are not delayed and predicted not delayed by the proposed model. The flights which are actually not delayed but wrongly predicted as delayed by the model are 12.7%. The proposed models have MAE of 2% for proposed Neural Network Regression, Decision Forest Regression, Boosted Decision Tree Regression, respectively. RMSE for Neural Network Regression, Decision Forest Regression, Boosted Decision Tree Regression are approximately, 11%, 12%, 11%, respectively. The strength of relation with departure delay, carrier delay, Late aircraft delay, weather delay and Nas delay, 94%, 53%, 35%, 21%, and 14%, respectively. There is a weak negative relation (almost unrelated) with the air time and arrival delay. Security delay and arrival delay are almost unrelated with strength of 1% relationship. The research can be extended in many directions in future. Possible future proposals may be the flight delays prediction in agitated weather conditions for example heavy snow, flood, hurricane, blizzards, and naturally disrupted situations. New methods can be applied to explore dynamics of the system. The model can be extended for coordinated delays. The future models can address the uninformed situations such as storms, strikes, security breaches. Alternative flight plans should be sorted out for future scheduling in disruptions.
Footnotes
Acknowledgments
We thank the respectable reviewers and editors for their reviews and suggestions.
