Abstract
People around the world have experienced fundamental transformations during mass events. The Industrial Revolution, World War II, and the collapse of the Berlin Wall are some of the cases that have caused radical societal changes. COVID-19 has also been a process of mass experiences regarding society. Determining the mass impact the pandemic has had on society shows that the pandemic is facilitating the transition to the so-called new normal. Istanbul is a multi-identity city where 16 million people have intensely experienced the pandemic’s impact. While determining the identities of cities in the world, one can see that different city structures provide different data sets. This study models a machine learning algorithm suitable for the data set we’ve determined for the 39 different districts of Istanbul and 82 different features of Istanbul. The aim of the study is to indicate the changing societal trends during the COVID-19 pandemic using machine learning techniques. Thus, this work contributes to the literature and real life in terms of redesigning cities for the post-COVID19 period. Another contribution of this study is that the proposed methodology provides clues on what people in cities consider important during a pandemic.
Introduction
All smart city practices aim to increase human welfare; however, public expectations differ from country to country regarding the conditions for smart city practices. All geographic locations require different amounts of clean water to supply; for instance, the need for clean water in African cities is not the same as in European cities because there are many sources of clean water in the European region. Accordingly, people around the world have experienced the COVID-19 pandemic differently, while some regions have experienced the same difficulties [44]. The measures and restrictions implemented worldwide have been diverse. Thus, the extent of closures, vaccination rates, and number of positive cases differ from country to country. Managing the pandemic has been difficult, especially in city centers with dense populations. These different experiences have led to different behavioral changes for the citizens of each city. Many studies have used multi-criteria decision-making approaches and expert opinions to detect these trends [6,9,13]. Recent developments in metropolitan cities require nimble city services, infrastructure, and communication networks using harmonic management at the metropolitan level [6]. Modern cities are complex systems with their inhabitants, businesses, various modes of transport, huge communication networks, utilities, and public services [29]. For a metropolis, challenges arise from increases in the urban population and involve traffic density, air pollution, waste management challenges, and inadequate infrastructural quality [40]. Managing a complex city entails providing social inequality, human health, social balance, and social management in the region [21]. Considering all these dynamics, governments should manage complex cities innovatively [5].
Due to emerging conditions such as COVID-19 not having been previously experienced by experts, the literature has a great gap on how to manage complex cities post-pandemic. Therefore, this paper aims to use machine learning techniques by considering this gap in the literature to quickly determine people’s behaviors and tendencies. Moreover, the priorities and choices of the people living in Istanbul’s districts have profoundly changed throughout COVID-19. These changes allow Istanbulites’ needs to be determined for smart city applications. This paper proposes a model that shows the effects COVID-19 has had on public life over one specific metropolis. In addition, the proposed model helps administrators quickly respond to the dynamic changes occurring in cities. Moreover, this paper tries to determine urban livability levels with respect to necessary conditions.
In this study, some machine learning algorithms (LR, BRR, RFR, XGB) are preferred because other Neural Network technic such as Recurrent Neural Network (RNN), Long Short Term Memory networks (LSTM), focus group, survey, etc. approaches have limitations. For instance, since neural network learning systems work in a closed network system it is difficult to interpret the significance of features. In LR, BRR, RFR and XGB models, feature interpretation is relatively easy. In survey studies, to interpret meaningful outcomes pre and post disaster surveys are needed, which is not possible because such disasters, i.e. COVID19, are not predicted. There is a similar problem within the focus group method. A survey or focus group study would be a nice study to do with the same population before and during the pandemic and include the same questions to ensure good research. However, in this study, machine learning is used to detect a mass trend. Another reason is that the scope of the study is limited, as such methods are generally applied to a small segment of the population. Oppositely, in this study, secondary data is used. Secondary data, i.e. governmental data which covers almost all or a very high percentage of the society, brings higher reliability to the system. By this study, it can be seen that Istanbul has an unusual feature compared to other European cities. Considering the isolated life in the pandemic, the city is not only a region where about 16 million people share their lives, but also the only region in Europe with a high density due to its geographical structure. To compare multiple cities, the same features must be collected from both cities and for pre- and post-pandemic. Thus, collecting data is a difficult even though the method is very economical and fast.
The variables; education, economy, health, work, social life, environment, and infrastructure influence society’s housing preferences. Proximity to work, proximity to social life, proximity to transportation, and proximity to educational opportunities all play important roles in people’s location choices [46]. As a result, rental and purchase prices may be higher in city centers than in other regions.
The habitat choice of people who experienced the pandemic in a crowded city has changed.
This study seeks for the answer of “how the preferences of people living in a metropolitan and experiencing a pandemic is changed”. Specifically, it aims to determine the changes in society that experienced a worldwide pandemic. For this purpose a machine learning based approach is developed. The approach is able to show the reflection of residents of a metropolitan to pandemic such as COVID-19. To the best of our knowledge, there is no such study that searches the abovementioned research question comprehensively. The rest of the paper is organized as follows: Section 2 presents the relevant literature review. Section 3 shows the method used in the study. In Section 4 a real case application is conducted and assessed. Section 5 interprets the results of the evaluations. Section 6 compares the results of the study to similar ones in the recent literature.
Literature review
The increasing demand for urban living has caused cities’ populations to increase. Thus, cities’ limited resources cause problems for the people living in these cities such as China’s major cities [24]. Sustainable solutions to these problems are indispensable for smart cities, being living spaces whose citizens have the highest quality of life and economic prosperity [48]. The concept of smart cities is based on sustainability principles and are intended to continue well into the future. In this case, new city services create a new way of doing business by integrating with information technologies. A smart city provides its residents with safe, sophisticated, and environmentally friendly services [35]. The smart city is an ultramodern urban space that meets the needs of businesses, institutions, and citizens in particular [20]. Moreover, smart cities must identify city residents’ dynamically changing needs and respond to changing conditions.
Using multi-criteria decision-making, Onnom et al.’s study classified urban people’s expectations under nine different categories: environment, recreation, safety, health, economy, transportation, public utility, population density, and education [30]. Each category has different priorities within people’s life choices. Serious threats and changes have also occurred in people’s lives as a result of COVID-19 [18]. Smart cities with better characteristics will create opportunities to increase the city’s resilience against events such as pandemics [36].
A smart city equipped with sensors generates large data sets. Machine learning and artificial intelligence techniques have provided successful results from these datasets [10]. The big data produced by smart cities has a variety of different sizes and types. Therefore, smart solutions are required for uniformizing data [37]. Deep learning applications have provided early answers to the dynamic structures of cities [27].
Many parameters involved in machine learning must be optimized to produce a suitable model over complex datasets. Models resulting in higher reliability are produced using optimal parameters [14]. A relationship is created between these parameters using approaches such as randomness and complex searches among the parameters, after which different performance outputs are evaluated for each parameter in the model. Error values such as explained variance (EV), maximum error (ME), negative mean absolute error (NMAE), and root mean squared error (RMSE) require minimization when working on a regression datum. In addition, the magnitude of the absolute value of the R-squared score is also an important measurement parameter [4]. The scikit-learn library provides reliable results from metric measurements for determining the relationship between outputs and inputs [12]. The ideal model should first be selected for the existing dataset, then the ideal parameters should be determined. Four different models are used in our study. These models are frequently used in hyperparameter applications. Linear regression model have been used by Zheng and friends to determine high performance parameters for cyber-enabled [47]. Sun and friends have made the best estimation in their study by comparing Bayes Ridge Linear and Random Forest Regression to determine the parameters that cause landslides [39]. AlThuwaynee and friends used Extreme Gradient Boosting (XGB) Regressor and RFR models together in their index study [1]. In another study, Yuchi et al. found successful results for selecting features using the Random Forest Regressor (RFR), a machine learning tool [45]. Our study uses four different machine learning techniques to statistically estimate Istanbulites’ changing priorities according to their performances prior to and during COVID-19. These four models are chosen both to give successful results for indexing studies in the literature and to compare these models in our study.
This study will enable smart cities application to identify needs of a city and make quick decisions for city managers in times of crisis such as COVID-19. The absence of a similar study for the COVID-19 period in the literature until this period led us to conduct this study. The proposed framework does not provide meaningful results only for COVID-19 but also for other disasters such as global warming, earthquake, flood, cyber attack, drought etc.[19,38]. The same logic may be applied to parameter measuring before and during a disease and the changes in these parameters with high accuracy. Further, the effects of these changes on human body can be predicted successfully [33]. One of this study’s innovative aspects is to show such effects on society. The community managers to redesign more sustainable and robust cities can easily use the proposed approach
Problem statement and methodology
This study uses and compares four different forecasting models based on their performance values.
Linear regression (LR)
The linear regression equation is provided in Equation (1). In the equation,
Bayesian ridge regression (BRR)
Bayesian Ridge Regression is also used in hyperparameter studies and is shown in Equation (2). Using the a and λ parameters together in this equation maximizes the log marginal likelihood value according to the gamma distribution. While
Where n_iter is the maximum number of iterations and should be greater than or equal to 1. The tol is the stopping criterion; the algorithm stops when the value tol has converged.
Random forest regressor (RFR):
The RFR algorithm is a multi-combination of classification and regression trees (CART) algorithms and provides good results for both classification and regression models [16]. The algorithm creates the regression estimation by averaging the results from each decision tree (CART) in the classification. The randomness in the RFR algorithm ensures that the data sets assigned to each decision trees are random and the set of features the decision trees determine are assigned differently. This randomness reduces both error variance and error values in the model. Figure 1 shows the k trees assigned for k-bootstrap and the different predictions produced for different feature sets in each tree. CART makes each decision by assigning a feature to the point of separation of the branches of the trees in the k-tree. The accuracy of the decisions is evaluated using the training output. The decrease in error values ensures that the correct decision was made.

Random forest regression flowchart (adapted from [32]).
XGB is a machine learning application modeled by Chen and Guestrin [8]. XGB gives successful results in regression and classification applications. XGB is based on the CART algorithm timeline as with RFR.
η is the shrinkage parameter (i.e., learning rate)
M is the number of decision trees that make up the forest.
Figure 2 provides the pseudocode where the study uses RFR. The code has been used for other algorithms by replacing only the RFS portion of the algorithm.

The RFR model’s pseudocode.
Another important step in the study is to determine which input features are the most affecting features on the output features. The focus of this study is not only on building a robust model, but also detecting changes at different points in time.
SHAP (Shapley Additive Explanations) is a method proposed by Lundberg and Lee to evaluate the effects of features on how they affect the outcomes [25]. Thus, SHAP provides an interpretation of the features in the training modes of complex learning methods. Moreover, the values of SHAP assign predictive importance (
Data acquisition
Istanbul was chosen for the study because its cosmopolitan structure provides more realistic results during the data analysis stage. The organic ties (hometown) of the people in other districts in Turkey are thought to mostly involve subjectivity in their life choices. Istanbul meets the concept of urbanity in terms of both immigrant and citizen diversity. In our study, we created the dataset under nine main categories: economy, health, transportation, security, population, education, utility, environment and value output. The first case in Turkey was detected in March 2020, and pandemic control measures began at that time.
Economy A data set on economy was created using OpenStreetMap (
Health The data on the number of health centers, the number of public and private hospitals, and the number of patient beds in each district in Istanbul in 2021 were taken from the Ministry of Health website (
Transportation The data on the Metro (subway), Metrobus, ferry transportation lines, ferry voyages, metro passengers, and journey numbers within the boundaries of the district were taken from the data web page (
Environment The OpenStreetMap and Ministry of Environment and Forestry (
Population District population data for each district were collected for the 0–14, 14–65, and 65+ age groups. Gender ratios and population counts were formed from the data obtained from the Population Directorate (
Education The number of schools, classes, students, and teachers was derived using data from the Istanbul National Education Directorate (
Security The data on fire stations, police stations, and emergency aid centers per district were used as the study’s safety data.
Utility IMM’s number of wireless Internet points per district form the data set for this category.
Value output The first case in Türkiye was detected in March 2020, and pandemic control measures began at that time. The real estate company Endeksa (

Increasing rate of real estate costs per square meter in Istanbul during COVID-19 according to Endeksa/2021.
Intercorrelations shown in Fig. 4 have been created for 82 different features of the 39 districts of Istanbul. The size and direction of the relationship between features are shown in different colors, with red areas representing a positive relationship while blue represents a negative relationship between features. A positive relationship exists when the value of one variable increase while the value of the other variables increases. A negative relationship exists when the value of one variable increase while the value of the other variable decreases.

Correlation chart between features.
The study has selected which machine learning techniques to use by taking into consideration the types of continuous data being used. For inter correlations greater than 0.8 or less than −0.8, one data type is excluded from the data set to increase the prediction performance of the LR, BRR, RFR, and XGB models. The means and standard deviations of the results of the RFR and XGB models are shown in Table 1, where each model is run for 10 times. According to the ratios in Table 1, the rate of 0.8 is preferred because it resulted in a lower standard deviation on average. In addition, the results are interpreted by excluding highly correlated features from the data set. This approach increases the reliability of the results and validates the model by reducing the error value. The number of features in the dataset has been decreased to 52 using this arrangement. For example, a strong positive relationship between male and female ratios were detected, and thus the model excluded the features indicating the male ratio from the data set. The results are then interpreted accordingly. Housing costs per square meter in Istanbul have been divided into two outputs: pre-COVID-19 and during COVID-19. Thus, our dataset became a new dataset with 50 inputs and two different outputs. These four different models were run for 20 times in Python (a programing language) using the scikit-learn library and the highest performance values were gathered. Table 2 gives these performance values and the features rankings determined by each model for the two cases.
The dataset is divided into two groups (i.e., 30% test and 70% training data), with performance values being measured for both groups. The reason why the data is divided into 30% and 70% as test and train data is that the standard square values of the
RFR and XGB are seen to provide successful results in terms of all performance measures. RFR gave an
Performances of models used in determining features reduction ratio
Performances of models used in determining features’ weights

Comparison of RFR: (a) pre-COVID-19; (b) COVID-19.

Comparison of XGB performance: (c) pre-COVID-19; (d) COVID-19.
Figures 5 and 6 show the regressions for both test (blue) and train (green) models. The inconsistency between test and training (Fig. 6(c)), which is created for XGB, is striking. Explained variance of RFR model is 0.90. The value of explained variance describes the features that are displayed and aren’t due to error variance [33,42]. The value of 0.90 achieved by the model indicates a stronger power of association. This value means that it makes better predictions [33].

Istanbulites’ value ranges in January 2020 for the RFR.
The PersonPolice feature shows the number of people per police station. Figure 7 shows that the PersonPolice feature has a strong negative correlation with hosting preference. The low number of persons per police station indicates that the district has more security measures. For instance, while there is one police station for every 41018 persons in Beykoz district, this number is 191479 in Esenyurt district, where immigration is high. Figure 6 shows that housing preference increases as security increases. PersonWaste is the amount of waste per person. It was formed by relating the amount of waste collected by the municipalities of the district to the population of the district. The consumption of prepared food increases in districts where the number of people living alone and workers are high. The large number of cafes, restaurants and fast-food establishments increases waste generation. In addition, waste generation increases in districts where people are constantly on the move during the day. The high rate of waste is a factor that negatively affects the real estate value of the district. There is a very strong relationship between the male rate and the female rate (correlation ratio 1.0). Due to this strong correlation ratio, the WomanRatio attribute is excluded from the data set. Figure 7 shows that the male ratio has a negative impact on housing demand, while the female ratio has a positive impact.
Figures 7 and 8 show the RFR values regarding Istanbulites’ tendencies regarding their life preferences for the pre-COVID-19 and COVID-19 periods, respectively. This study easily detects distinct changes in the rankings. As shown in Fig. 8, SportFacility has a positive impact on outputs. Districts with more park space per capita positively influenced the residential preferences of 1% of people during the pandemic period according to RFR model.

Istanbulites’ value ranges in January 2021 for the RFR.
This study finds that society’s housing preferences changed during the pandemic period, both according to the results of the model and due to the fact that real estate prices in Istanbul did not change to the same extent in each district. Istanbulites’ trends as determined for the pandemic process show negative and positive results. Figure 9 shows the changing trends for the available features. The feature of PersonPolice showed a positive change of 2.2% from the COVID-19 process between January 2020 and January 2021. Figure 9 is created by calculating the difference between the feature weights in 2021 and the feature weights in 2020.

The direction of Istanbuler’ value changes due to COVID-19.
By choosing the best model, estimation were made for each feature and the estimation indicated that the effect of each feature on the output exactly matched with Table 3. Table 3 shows the changes in the weights of the features sequentially as absolute values. According to Table 3, the greatest change is seen in the feature of green area per capita. The feature of PersonPolice has a positive relationship with the property values determined for both 2020 and 2021 according to the correlation table in Fig. 4. In other words, waste per person decreases square meter costs. Our study shows a 1.6% decrease in Istanbulites’ preference for green spaces compared to the pre-COVID-19 period. Although this ratio varies for the different models in the study, the change in this feature stands out for each trial.
Due to increased immigration in Istanbul, the population of some districts has increased significantly over the past 20 years (Turkish Statistical Institute [TurkStat]). Owing to the new dense population structure in these districts, the number of police officers per capita has remained quite low, which causes the security gap in the districts [41]. According to the results of the model, the demand for housing in the areas where the number of police officers per capita is low and immigration continues decreased during the pandemic. This situation can be interpreted as that the pandemic has led to safety concerns among Istanbulers. According to the model, this trend shows a rate of 2.2% for one year. The demand is declining especially where most of Istanbul’s residents live. While both immigration and new housing construction increased demand in these regions pre-pandemic, this trend seems to be turned upside down during the pandemic, as shown in Fig. 3.
The first settlements of Istanbul are the central points of the city, where there are relatively small flats, which are preferred by young people and active workers. In these regions, the number of employed people, the consumption of fast-food and the amount of waste per person are high comparing to other regions. In addition, since these areas are old settlements, there are very few green spaces and parks, which even decreased more by 1.6% during the pandemic. With the increase in remote working conditions, there is a decrease in demand of real estate in these regions according to the results of the model. During the pandemic, Istanbulers were put on lockdown, which in return caused a decrease in physical activities. The demand for the districts that include sport facilities increased during the pandemic.
How istanbulites’ priorities have changed throughout COVID-19
The unsanitary workplaces such as the automotive supply industry, chemical industry, and leather industry are located in the Istanbul’s remote districts, where the settlement is low, urbanization is still ongoing, and the proportion of forests is high. According to the results of the model, the demand for these regions has increased. This trend has shown that people tend to move away from the city centers during the pandemic. A similar trend is observed among those who live near to metro bus stations. 13% of public transportation in Istanbul is conducted by BRT. In 2019, BRT carried an average of 2,059,151 passengers per day [17]. According to Table 3, 0.7% of Istanbul residents tend to move away from BRT stations. As the change rate of the features presented in Table 3 becomes smaller, the tendency of the society is difficult to interpret, because the error of the model may cause the sign of the trend (positive, negative) to change.
In general, features such as the number of theaters or the number of banks is higher in downtown districts than in other districts. A negative trend in these features can be observed from Table 3. The number of features in the study is reduced from 80 to 50 parameters. Having 50 features for 39 districts simplified the model. In other words, a simplified set of features made the results easier to interpret. Furthermore, including two features with a very strong relationship (above 0.8 rate) and with different weights into the model makes it difficult to interpret the results of the model [11]. This is unusual that the number of features is higher than the number of districts. However, the consistency of the training and test values is the strength of the model [22].
Unpredictable trends in the city may lead to unpredictable migration, which can lead to infrastructural and sociological problems especially for the regions that are not prepared for immigration. Unwanted population mobility can be stopped only if the reason of mobility is known by the managers. For instance, the number of police officers per capita (given in the model’s results) can be balanced by taking the necessary planning, which in return may partially prevent the potential human mobility. In addition, anticipating potential mobility to the business centers that are located outside of the city may help managers to prioritize improving the infrastructure in those regions. Creating smart cities require the collaboration of several disciplines [2]. Therefore, it may be insufficient to interpret the results of the model using only one discipline.
The digital opportunities society have newly encountered during the COVID-19 period and the increases in online orders have changed the preference values of city centers. Many situations that we describe as the new-normal will continue after COVID-19. Therefore, these may become permanent in people’s value priorities. When considering these factors, the smart city index, life index, and sustainability index will need to be updated with the new-normal. This study shows how the weights of certain features have changed in Istanbul. Variations in the preference values of approximately 16 million Istanbulites may reflect similar results for the pandemic conditions experienced by the whole world. However, each city has its own geopolitical, cultural, and social dynamics, so these variations cannot be expected to be the same for other cities.
Mujahed’s study in Amman, Jordan observed changes in its citizens’ living habits during the COVID-19, observing a decrease in green space usage preferences while the quarantine and virus panic during COVID-19 caused people’s home preferences to be mandatory [28]. The low number of COVID-19 cases outside of cities with green spaces led to this change in their study. The security needs of Istanbul’s residents also changed significantly in this study. Chen et al.’s study conducted during COVID-19 in China determined districts with high mobility to display high rates of the spread of COVID-19 [7]. The green rate is low in districts with high mobility in Istanbul. As in Chen et al.’s study, the preference of green areas thus also caused a decrease in the preference weight of places with high mobility in our study.
Onnom et al.’s study conducted in pre-COVID-19 Thailand showed the sustainability value in the city to have been weighted as 5.553 for environment, 4.817 for safety, 3.168 for health, 2.362 for transportation, 1.926 for public utility, and 1.663 for education [30]. Although these values parallel those in Table 2 (pre-COVID) from our study, the weights for the factors of health and environment regarding the sustainability value in post-COVID-19 Thailand need to be updated. Saez et al. mentioned each city having different dynamics in terms of sustainability to make standardization difficult [34].
Development proposal for further studies; Our study determined people’s tendencies using a statistical method and will help future studies to be conducted determine country-specific dynamics. Because expert opinions are provided for current country conditions, the attributes are likely to have different significance weights for each country. Due to the lack of high reliability from experts in sudden events such as pandemics, our study has avoided subjective values by using statistical methods.
The proposed framework does not provide meaningful results only for COVID-19 but also for other disasters such as global warming, earthquake, flood, cyber-attack, drought etc. [19,39]. The same logic may be applied to parameter measuring before and during a disease and the changes in these parameters with high accuracy. One of this study’s innovative aspects is to show such effects on society. The community managers to redesign more sustainable and robust cities can easily use the proposed approach. With the dynamic changes in the structures of the cities the expectations of the people may change as well. Changes and trends in the priorities of the society are the elements that smart cities should detect. To observe the changes, it is necessary to conduct the same survey pre and post disaster. However, this may not be viable or sustainable, because there is a shortage in number of experts as experienced in COVID-19. To keep the process under control, determining the social orientation is a very important information. Otherwise, sudden events may cause chaos in the society and this chaos may have profound irreparable results in the society.
Limitations
Some features in this study have been observed to not affect a single category. For example, whether the number of cafes in a district should be considered a social or economic feature is unclear. As such, the category to which the weights for such features belongs is subjective. Therefore, statistical approaches and larger datasets apart from expert opinions from city index studies will increase the reliability of studies. A comparison of the study with similar cities in other cultures of the world could expand the scope of the study.
Footnotes
Acknowledgement
The authors would like to acknowledge that this paper is submitted in partial fulfilment of the requirements for PhD degree at Yildiz Technical University. This study is produced within the scope of Kenan Mengüç’s Phd dissertation.
Conflict of interest
None to report.
