Sage Journals: Discover world-class research

Abstract

The completion progress of residential development projects and the length of construction are frequently discussed in the construction industry, but rarely studied by urban modellers. Nonetheless, a realistic reflection of housing supply processes is important for urban microsimulation and land use modelling. To predict the dwelling units generated over space and time, this paper decomposes the housing supply process into two major components: housing starts and completions, the nature and modelling logic of which are quite different. This paper deals with the latter segment, aiming to answer the question of: how long will it take to complete construction of new dwellings? A Cox Proportional Hazard (CPH) Model is employed to examine the “survival” rate of residential building projects and the probabilistic distribution of construction periods. Narrowing down the scope of research, this study investigates housing completions at the individual project level, and discusses the impact of structure type, surrounding land use, and accessibility on the housing completion rate. The Cities of Toronto, Hamilton, and Brampton in the Greater Toronto and Hamilton Area (GTHA) were selected to conduct the empirical study, with each representing different types of urban form to test model compatibility. The hazard models show good performance in replicating completion rates, and the impact of each factor on hazard ratio indicates that, single detached dwelling units with relatively homogeneous land use have the shortest completion time. This study could provide one component of a comprehensive framework for modelling housing supply, especially in urban microsimulation systems.

Keywords

housing supply housing completions cox proportional hazard model construction period residential construction duration

Introduction

Two major components of the residential housing supply process are the decisions of when to start a housing development project and the length of time it takes to complete the construction after getting building permits from the government. It is these two processes that determine the rate at which new housing comes onto the market, but these processes are often modelled in relatively simplistic manner. Construction time, in particular is often not explicitly modelled at all. This paper addresses this gap in the literature by developing a model of housing construction durations for selected municipalities within the Greater Toronto and Hamilton Area (GTHA). In this paper, the length of the construction period is regarded as a random variable with a probabilistic distribution. Assuming the same average management ability of GTHA builders, this paper’s hypothesis is that the completion time for a housing project is similar, to some extent, to the survival time of patients, or service time of a working machine. Thus, a Cox Proportional Hazard model (CPH) might be employed to model the housing completion process.

Previous studies reveal that construction duration relates to the project size, housing demand variation, construction cost, and project management of contractors (Burrows et al., 2005; Kaka and Price, 1991; Kumaraswamy and Chan, 1995; Koo et al., 2010), which differ from the factors considered by developers when deciding when to start a project. In this study, the time to completion is represented as the “survival time” of the construction project, a random variable with a probabilistic distribution. The purpose of this paper is to predict the housing construction period and simulate the housing supply process within a broader urban microsimulation context. In particular, the modelling process of housing completion and housing starts are closely connected to the Integrated Land use, Transportation, and Environment (ILUTE) model system (Miller and Salvini, 1998, 2001; Rosenfield et al., 2013; Salvini and Miller, 2005), aiming to provide one component at a micro level within the overall modelling structure of ILUTE.

The paper is organized as follows: the modelling logic and framework for the residential construction process is discussed in Section 3, following a literature review of construction duration studies and modelling techniques in Section 2. Section 4 details the methodology of this study, the CPH model basic concepts, and application to construction completion modelling. Section 5 describes the residential construction market in the GTHA and introduces the data employed in the model. Section 6 presents the CPH modelling results in three selected cities within the Greater Toronto and Hamilton Area, Toronto, Brampton, and Hamilton, both in the aggregated and disaggregated by dwelling types. The prediction performances are compared in Section 6 as well. Finally, the findings of the study are summarized in Section 7, with study limitations and future research direction discussed.

Literature review

Construction duration and residential development processes have been widely discussed in the construction project management field. Previous research on residential construction duration indicates that factors affecting residential building progress mostly derive from three aspects: project cost and type of workload involved, locational features, and team management and project scheduling. Project cost is recognized as the best indicator of construction duration (Bashford et al., 2005; Choudhury and Rajan, 2003; Stoy et al., 2007; Walker and Vines, 2000) as it reflects the project size and quality of the work (Qiao et al., 2019). Mačková and Bašková (2014) apply only project cost in the simple regression to estimate the construction duration. Bayram (2017) found this works better than using amount of work. Though management-related factors and contractor’s ability are found to influence project durations (Al-Momani, 2000; Aalen, 1989; Durdyev et al., 2017; Odeh and Battaineh, 2002), these are difficult to quantitatively assess and include in the model. Other than the factors mentioned, other potential factors include finances, availability of resources, site conditions, site access, facility design type, and project complexity (Martin et al., 2006; Qiao et al., 2019).

Researchers have been attempting to develop models to predict the construction duration for better project planning and risk management, among which multivariate regression, temporal autoregressive models, and machine learning algorithms were applied, but hazard models have been seldom considered. Multivariate regression is most commonly applied in previous studies (Burrows et al., 2005; Chan and Kumaraswamy, 1999; Mackova et al., 2017; Stoy et al., 2007). Koo et al. (2010) applied a hybrid method of multivariate analysis, case-based reasoning, artificial neural networks, and Monte-Carlo simulation to predict the construction duration and achieve a prediction accuracy of 93.3%. Anysz and Buczkowski (2019) develop software to predict the construction duration and model which combination of properties of the project can cause significant delay. Though the hybrid method achieved high accuracy and simulation-based methods are explored for construction duration, the application could be overburden within an urban microsimulation system. The only research that applied a hazard model is by Qiao et al. (2019), who estimated the construction duration of highway projects by different types. The above methods applied can identify the major factors that cause delays, however, a large number of factors affect project duration and this inherent variability leads to a large variance in project duration. The probabilistic methods introduce stochastic elements into the model and give a probabilistic description about the duration instead of modelling an exact outcome (Greene and Hensher, 2010; Irfan et al., 2011; Qiao et al., 2019). Construction duration or the time to completion is in nature probabilistically distributed, and hazard model is suitable for analyzing the “time to occurrence” type of issue.

Built on the review of variables and methodologies applied in previous studies, this study applies a hazard model for modelling the construction duration. The major contribution of this study is that instead of a simplified treatment of housing supply in previous urban models, this study investigates the housing supply process in a comprehensive two-stage framework (housing starts and housing completion), and applies a survival model to predict residential construction duration within an urban microsimulation system.

The logic of modelling housing supply in urban microsimulation systems

The housing market module of ILUTE involves various processes including residential mobility decisions, housing supply, auctioning, and market clearing (Rosenfield et al., 2013). The market entry, property valuation, location choice, and auctioning modules are well structured, while the supply module only considers zoning, interest rate, type, size, location, and value of the new dwelling units. More detailed and realistic modelling is required for the supply side of the ILUTE model system.

Residential supply involves several decisions and actions of agents: the decision of developers to start a project, choices of dwelling type, location and number of dwelling units to build in the project, and the construction work of builders to complete the project. The construction process is complex, involving many detailed steps that are relatively fixed and routine, the details of which might not be of critical interest for urban microsimulation models such as ILUTE. Thus, the challenge in such models is to capture key explanatory variables consistent with the level of detail in such model systems that provide a robust, systematic prediction of construction times.

Following Farooq and Hurtubia, (2011) framework for urban built-space modelling, the modelling structure of housing supply with two stages of this study are proposed in Figure 1, with more focus on the housing completion (phase two) in this paper. As shown in Figure 1, from vacant land to new housing supply, developers need to make decisions on when, where to start, and the size (number of units) of the project, and then the construction activity of builders will determine when the project will be completed and when the new housing units will be available on the market.

Figure 1.

Structure of housing supply modelling with two phases.

In phase two in Figure 1, the model system needs to determine the new housing supply available within the market, which is achieved sequentially by modelling the construction duration of the started projects. The total construction period may vary with the structure type and market conditions. Builders rely heavily on subcontractors to do the site work, and cooperate with each other locally with each subcontractor specializing in specific aspect of construction. Therefore, the construction length is also influenced by the labour rate, availability of specialized labour and material cost (Buzzelli and Harris, 2003; Farooq et al., 2010). From the perspective of stock-flow theories, the state change from under construction to completed should follow a state changing rate over time. Given, however, the multiplicity of factors influencing a project’s completion time (many of which are inevitably unknown within any land use modelling exercise), it is reasonable to model construction durations as a stochastic process. Empirical evidence shows that, indeed, construction durations generally follow a probabilistic distribution (Greene and Hensher, 2010; Irfan et al., 2011), such as exponential, Weibull, or Gamma distributions. Given this, a survival or hazard model would appear to be well suited to model the distribution of housing completions.

Methodology: Survival analysis of construction duration

Survival analysis examines and models the time it takes for events to occur (Fox and Weisberg, 2002; Schoenfeld, 1982). The original application of this analysis lies in the medical and engineering fields, and the typical events as the analysis object include human deaths or machinery working life, from which the term “survival analysis” derives. From these origins, survival analysis has seen much broader application into other fields in recent decades. Following Fox & Weisberg (2002), we reiterate the important concepts of survival analysis to build the discussion context, and explain the Cox Proportional Hazard (CPH) regression in detail below.

Survival analysis

The major objectives of survival/hazard analysis are estimating the distribution of survival times, predicting the hazard probability at certain time points, and examining the relationship between survival and its predictors or covariates. The project duration is the length of time between project starts and completions, which is considered as a survival time. Taking $T$ as the random variable representing the survival time or construction duration, we define the cumulative distribution function as $P (t) = \Pr (T \leq t)$ and probability density function as $p (t) = \frac{d P (t)}{d t}$ . Then the survival function is $S (t) = \Pr (T > t) = 1 - P (t)$ . The hazard function, which assesses the instantaneous risk of “death” (i.e., change of state) at time $t$ , is

h (t) = \lim_{∆ t \to 0} \frac{\Pr [(t \leq T \leq t + ∆ t) | T \geq t]}{∆ t}

(1)

The hazard model function form can be derived based on different assumptions concerning the distribution of survival time. For instance, if the survival times are assumed to follow an exponential distribution, then the density function is $p (t) = v e^{- v t}$ , and a constant hazard rate is assumed as $h (t) = v$ . If we assume the survival time is Weibull distributed, then the hazard model is $\log (h (t)) = v + ρ \log (t)$ , and the hazard function is $\log (h (t)) = v + ρ t$ with a Gompertz distribution.

Kaplan–Meier estimation and log rank test

The Kaplan–Meier estimator is a non-parametric statistic that allows us to estimate the survival function (Kaplan and Meier, 1958). This statistic gives the probability that an individual patient will survive past a particular time $t$ . At $t = 0$ , the Kaplan–Meier estimator is 1, and as $t$ approaches infinity, the estimator goes to 0. In theory, with an infinitely large dataset and $t$ measured to the second, the corresponding function of t against survival probability is smooth. The KM survival curve, a plot of the KM survival probability against time, provides a useful summary of the data that can be used to estimate measures such as median survival time.

The log-rank test is a large-sample chi-square test that uses as its criterion a statistic that provides an overall comparison of the KM curves being compared (Kleinbaum and Klein, 2012). This statistic makes use of observed versus expected cell counts for the log-rank statistic defined by each of the ordered failure times for the entire set of data being analyzed. Thus, using the log-rank test, we can identify whether different categories of observations have significantly different KM curves, which can provide a reference for the Cox Proportional Hazard model as well.

Cox proportional hazard model

Kaplan–Meier curves and log-rank tests are useful only when using categorical explanatory variables, but not for the quantitative predictors. An alternative method is the Cox Proportional Hazard (CPH) regression analysis, which works for both quantitative and categorical predictor variables (Bradburn et al., 2003; David, 1972).

The purpose of the CPH model is to evaluate simultaneously the effect of several factors on survival (Bradburn et al., 2003). Using the CPH model, it is possible to examine the relationship between the covariates and the survival rate. Most importantly, this examination enables the specification of a linear-like model for the log hazard. For example, a parametric model based on the exponential distribution may be written as

\log (h (t_{i})) = α + β_{1} x_{i 1} + β_{2} x_{i 2} + \dots + β_{k} x_{i k}

(3)

h (t_{i}) = h_{0} \times \exp (β_{1} x_{i 1} + β_{2} x_{i 2} + \dots + β_{k} x_{i k})

(4)

where the coefficients measure the impact of covariates, and the term

h_{0}

is called the baseline hazard. It corresponds to the value of the hazard if all the

x_{i}

are equal to 0. The notation

h (t_{i})

indicates that the hazard may vary over time. There are similar parametric regression models based on other survival distributions.

The Cox model, in contrast, leaves the baseline hazard function $α = \log (h_{0} (t))$ unspecified

\log (h (t_{i})) = α (t_{i}) + β_{1} x_{i 1} + β_{2} x_{i 2} + \dots + β_{k} x_{i k}

(5)

h (t_{i}) = h_{0} (t_{i}) \times \exp (β_{1} x_{i 1} + β_{2} x_{i 2} + \dots + β_{k} x_{i k})

(6)

This model is semi-parametric because, while the baseline hazard can take any form, the covariates enter the model linearly. The hazard ratio for any two cases at time $t_{i}$ is independent of the baseline hazard $h_{0} (t_{i})$ . The coefficients of each covariate do not vary over time either. Consequently, the Cox model is a proportional-hazards model (Bradburn et al., 2003).

In this paper, the CPH model is applied to estimate the duration of residential construction, or the “survival length of the residential construction projects”. The factors in structure type, land use, magnitude of the construction, location, and labour costs are included in the model after referring to a broad literature on the construction process management.

Empirical study

Study area

The empirical study area for this paper is the Greater Toronto and Hamilton Area (GTHA), where considerable residential development has occurred in the past few decades. Like other urban metropolitan regions in North America, the GTHA developed a strong urban core within the City of Toronto, and gradually sprawled to a wide range of suburban regions, where residential, industrial, and commercial buildings developed with lower density. The old Toronto downtown has more high-rise condo towers densely built, while due to limited land availability, the detached housings located within the central urban region were mostly built a long time ago, with ongoing renovation occurring. Townhouse, single and semi-detached dwelling units are the major dwelling types at the urban fringe, while some condo apartments have also been built at the major highway intersections. The cities of Hamilton and Brampton are typical sub-centres in the GTHA that interact with the City of Toronto.

Three municipalities in GTHA have relatively complete housing construction datasets. City of Toronto is the typical old regional centre with longer investment and economic development history. Brampton is a traditionally industrial suburban city and City of Hamilton is also a smaller regional centre which has increasingly been integrated with the larger GTHA metropolitan region. The demographic and economic features of the three cities are listed in Table 1.

Table 1.

Case study cities’ demographic and socio-economic features.

	Toronto	Hamilton	Brampton
Population (million)	2.73	0.54	0.59
Area (sq.km)	630	1117	266
Private dwellings (million)	1.179	0.223	0.173
Average household income (CAD)	102,721	87,775	98,855
Average dwelling value (CAD)	754,015	430,555	570,344

The number of housing completions by dwelling types are plotted in Figure 2 for the census subdivisions in the GTHA. The curve for each census subdistrict shows a relatively stable trend over the past 30 years.² Brampton has the highest number of single detached dwelling completed, especially since 2000, while most apartments completed are located in the city of Toronto. The semi-detached units follow a similar trend to the detached units, declining in the recent years due to increasing land price and limited availability. Townhouse completions on the other hand, have been rising, especially since 2015, as the dwelling units with more compact interior design and affordable price are generally preferred by the younger generation. In this study, we selected three census subdivisions, Toronto, Hamilton, and Brampton, as the study area, to compare how the CPH model performs when applied to various urban context (Table 1).

Figure 2.

The monthly number of housing completions by CSD¹ by dwelling types. Source: CMHC Housing Market Database.

Data

The individual project-level building permits datasets for Toronto, Hamilton, and Brampton were collected to conduct the analysis. The construction length in the entire GTHA is also collected from Canada Mortgage and Housing Corporation (CMHC), plotted in Figure 3. The average construction length closely follows a gamma distribution, which is consistent with Hammadi’s (2020) research on the housing supply in the City of Toronto. The yearly changes of the construction length indicate that the parameters might need to be adjusted over time for the CPH model, and that the construction length has been getting longer during the past decade. The distribution of construction time indicates that housing completion rate might well be estimated through a hazard model.

Figure 3.

Distribution of construction duration in the GTHA.

In this study, the censored³ cases are defined as the issued building permits that have not been finished at the last observation period (the end of 2020), while the survival time of the projects are defined as the construction length, from the building permits issuance date through to the completion date. The time from application to issuance date, that is, the length of issuing the building permits by administration department, is not covered in the construction time of this study and would be assumed to be constant in the overall housing supply model system, since the length of issuance time is fairly short and has few variations compared to the construction duration. The construction length is computed in months, and the data from 2010 to 2020 for City of Toronto (35,583 records, with 23,180 projects completed and 12,403 under construction) and Brampton (20112 records, with 18,984 projects completed and 1128 under construction), 2008 to 2020 for Hamilton (15,756 records, with 13,473 projects completed and 2283 under construction), were collected from the urban building permits open database.

After referring to the existing studies on housing supply and construction process and considering the data availability, 13 variates were selected and employed in the CPH model, as detailed described in Table 2. The project specific variables, such as structure type, area, number of dwelling units created, number of bedrooms, and construction const, are collected from the city-level building permits open database. The land cost of each residential project was missing in the building permits dataset, which might affect the construction duration as developers would have incentives to shorten the construction period to avoid additional capital cost, although many other factors may intervene to mitigate this effect. The locational variables, distance to the CBD, the population and job density, and the housing stock of the DA where the project locates, are collected from the census. The construction labour cost monthly data and number of units under construction of the located DA are collected from CMHC and the land use data of the located DA from University of Toronto’s land use dataset. Multiple datasets were collected and fused into one project-based database, with each construction project identified in each record, together with the construction length, locational features, project specific features, land use, and housing market features of the neighbourhood attached as each column in the final database applied to the CPH model.

Table 2.

Variables affecting construction process and housing completion rate.

Variable	Abbreviation	Description	Source
Structure type	Semi-detached, row, apartment	Dummy variables for four dwelling types, single detached as the default	City-level building permits open database
Area	Area	The occupied land area of the project
Dwelling units created	Num_dwelling	Number of units to be built by the construction project
Number of bedrooms	Num_bed	Number of bedrooms included in the construction project
Construction cost	Cost (CAD)	Construction material cost
Population density	Pop_den (per km²)	Population density in the dissemination area	Canadian census
Job density	Job_den (per km²)	Job density in the DA
Number of housing stock	Housing_stock	Number of housing stock in the DA
Distance to the CBD	DistCBD	Euclidean distance to the CBD of the CSD	Author calculated from ArcGIS
Construction labour price	Labour_rate	The hourly wage of construction labours	Canada mortgage and housing corporation
Number of units under construction	Under_construct	Number of units under construction in the DA	Canada mortgage and housing corporation
Percentage of open area	Popen	Percentage of open area in the DA	University of Toronto land use dataset
Percentage of industrial land	Pind	Percentage of industrial land in the DA/of the project prior construction	University of Toronto land use dataset

Results

In this section, the KM estimation applied to the samples in City of Toronto is first examined (results included in Supplementary File 1 due to limited space), with the log-rank test results being used to distinguish whether each factor is significantly impacting the construction duration, thereby providing a foundation for variable selection for the CPH model. Three diagnostic tests are then performed on the dataset to check if the model with variables selected could satisfy the assumption for the Cox Proportional Hazard model, with detailed diagnostic results in Supplementary File 2. Next, the CPH model results and application for Hamilton and Brampton are analyzed in detail, with prediction performances compared for the three different urban forms and dwelling types.

CPH model results for Toronto, Hamilton, and Brampton

The model results for Toronto are shown in Table 3. Similar to previous studies, dwelling types, dwelling units created, construction cost, location, and labour availability are found to be jointly significant in affecting the hazard ratio. As discussed in the Section 5.2, the dummy variable for area appears to have the trend for non-proportional hazard, and the log-rank test also shows that the relationship between area and construction length might vary over time, which could explain the insignificance in the modelling results. Industrial land use also is found to be insignificant, however, the coefficient signs are in the expected direction, that is, projects located on industrial land are more likely to take longer to complete. Population density and job density has significant positive effect on hazard ratio, indicating that projects locating in the higher density neighbourhood will be completed faster. The p-value for the likelihood ratio, Wald and log-rank tests all show that the model is significant in fitting the sample data. These tests are asymptotically equivalent tests evaluating the omnibus null hypothesis that all of the parameters are zero. The concordance reached 0.599 which is the probability of agreement for any two randomly chosen observations, where in this case agreement means that the observation with the shorter survival time of the two also has the larger risk score.

Table 3.

CPH modelling result in Toronto.

Variables⁴	Coef	exp (coef)	se (coef)	z	Pr (>\|z\|)
Semi-detached	−0.210	0.810	0.025	−8.51	0.00***
Row	−0.391	0.677	0.021	−19.05	0.00***
Apartment	−0.475	0.622	0.099	−4.80	0.00***
Area_groupSmall	−0.020	0.980	0.018	−1.10	0.27
DWELLING_UNITS_CREATED	−0.002	0.998	0.000	−3.61	0.00***
Cost_groupSmall	0.140	0.869	0.030	4.66	0.00***
Pop_den	0.019	1.019	0.002	11.46	0.00***
Job_den	0.024	1.024	0.003	8.13	0.00***
Housingstock	−0.070	0.932	0.038	−1.86	0.06
distCBD	0.019	1.019	0.001	12.80	0.00***
Multiuse_group_indusYes	−0.036	0.965	0.112	−0.32	0.75
Labour rate	−0.061	0.940	0.003	−21.65	0.00***
Underconstruct	−0.021	0.979	0.001	−32.53	0.00***

Concordance = 0.599 (SE = 0.002).

R-squared = 0.113,732.

Likelihood ratio test = 2799 on 13 df, p = <0.

Wald test = 2748 on 13 df, p = <0.

Score (logrank) test = 2798 on 13 df, p = <0.

The modelling results indicate that detached units with smaller project size, lower cost, locating far away from the CBD but with considerable population density, are more likely to complete faster. The p-value for dwelling units created is −0.002, with a hazard ratio HR or exp(coef) of 0.998, indicating a strong relationship between the size of the project and increased construction period (decreased risk of completion). The hazard ratios of covariates are interpretable as multiplicative effects on the hazard, or relative probability of hazard. For example, holding the other covariates constant, one unit of increase in the number of dwelling units decreases the completion by a factor of 0.998, or 0.2%. It could be concluded that larger project size is associated with lower completion rate. Similarly, the projects under 800,000 CAD increases the completion by a factor of 13.1%; a 1-km increase in the distance to the CBD increases the completion by 1.9%; a one percent increase in the labour rate could decrease the completion by a factor of 0.94; and a one unit of increase in the number of dwelling units under construction at the same time decreases the completion by a factor of 0.979.

The Cox proportional model can be visualized through a forest plot (Supplementary Figure 5). It shows the hazard ratios which are derived from the model for all covariates. Briefly, an HR >1 indicates an increased risk of death (increased probability of completion), while an HR <1, on the other hand, indicates a decreased risk. For instance, as shown in the forest plot, holding other variables constant, semi-detached units have completion probability 0.81 times of the detached units, and row, apartments have 0.68 and 0.62 times of the detached units, indicating that detached units are more likely to complete faster.

The estimated Cox proportional model is then applied to generate predictions of the hazard rate for the sample, and the prediction confusion matrix⁵ is shown in Table 4. The overall accuracy reaches 63.04% which is a reasonable result given the complexity of the process being modelled. The model performs very well in predicting project completions (positive). For projects that have already been completed (positive) at the observation time, 80.41% are correctly identified as completed. The model does not work as well in predicting the actual still under construction projects as negative, with only 30.57% of the not-completed ones being correctly predicted. Thus, the model has the inclination to predict projects as completed. This result is not overly surprising, given that reasons for longer than usual completion times are likely due to a wide variety of idiosyncratic factors which are often difficult to systematically capture. This tendency in the model will require more attention in operational application, possibly within the overall model system calibration process.

Table 4.

Confusion matrix of the CPH model prediction for Toronto.

		Actual
		Positive	Negative
Predicted	Positive	18,640	8612	68.40%	Positive predictive value
Predicted	Negative	4540	3791	45.50%	Negative predictive value
		80.41%	30.57%	Accuracy = 63.04%
		Sensitivity	Specificity	—

As discussed in Section 5.2, some variables included in the model show non-proportional hazard in their coefficients in the CPH model; thus, it would be helpful to examine this issue in estimation of the additive hazard regression model as well, to allow for the variation in coefficients over time. Thus, we applied the Aalen’s additive hazard regression model to examine the changes of coefficients over time, as shown in Figure 4. The coefficient of the dummy variable for apartment shows a significant declining trend over time, indicating that as time passed, the projects of apartment compared to other projects would have stronger negative impact on the construction completion and would tend to experience even longer construction periods. Construction cost shows similar behaviour. Given that the changes coefficient estimates over time for most variables are slight, and given that the CPH model achieves reasonable prediction accuracy, it is still recommended to apply the CPH model for housing completion modelling in ILUTE-type applications.

Figure 4.

Variation of coefficients over time for the additive hazard model.

The CPH model is further applied in the cities of Hamilton and Brampton, which represent two different urban forms, to check how the model performs in prediction in different urban contexts. The results for Hamilton are shown in Table 5. Note that project size and area were not included in the Hamilton model due to limited data availability. The effect of dwelling type on construction completion probability differs for Hamilton from Toronto, as apartment type decreases the relative completion probability by a factor of 0.314 (compared to 0.622 for Toronto), indicating that apartment projects have even longer construction durations compared to detached units in Hamilton, a typical satellite city in the metropolitan area. Other notable differences are the coefficients for housing stock and distance to the CBD. Since Hamilton’s old urban core is not as developed as Toronto’s, housing capital, labour, and consumption still tends to be concentrated in the densely built area. Projects locating in the denser residential areas closer to the CBD have higher probabilities to complete faster (one unit increase in the housing stock increase the relative completion probability by a factor of 2.2, and a 1-km increase of the distance from the CBD decreases the completion probability by a factor of 0.983). The prediction performance shown in the confusion matrix in Table 6 shows that the CPH model reaches an overall accuracy of 72.68%, better than Toronto, and the model also works well in correctly predicting the projects under construction (76.6%). In terms of the projects predicted as completed, 94.78% of those are actual completed, indicating that the prediction for completion is quite accurate. The improved fit of the Hamilton model relative to the Toronto model may well reflect the smaller, relatively less complex and less dynamic nature of housing supply in Hamilton. The relatively homogeneous residential projects in smaller market like Hamilton are more predictable, and the CPH model could be more sensitive and specific to identify the completed and under construction projects over time, compared to a context such as Toronto that involves more complexity and uncertainty.

Table 5.

CPH model results in Hamilton.

	Coef	exp (coef)	se (coef)	z	Pr (>\|z\|)
Semi-detached	−0.329	0.720	0.054	−6.05	0.00***
Row	−0.270	0.764	0.029	−9.37	0.00***
Apartment	−1.159	0.314	0.134	−8.68	0.00***
Pop_den	0.047	1.048	0.010	4.87	0.00***
Job_den	−0.186	0.830	0.028	−6.68	0.00***
Housingstock	0.789	2.201	0.028	28.14	0.00***
distCBD	−0.017	0.983	0.002	−7.08	0.00***
Popen	−0.001	0.999	0.000	−2.78	0.01**
Pind	0.007	1.007	0.001	5.49	0.00***
Labour rate	−0.023	0.977	0.004	−5.95	0.00***
Underconstruct	0.248	1.282	0.026	9.45	0.00***

Concordance = 0.605 (SE = 0.003).

R-squared = 0.09702204.

Likelihood ratio test = 1375 on 11 df, p = <0.

Wald test = 1393 on 11 df, p = <0.

Score (logrank) test = 1424 on 11 df, p = <0.

Table 6.

Confusion matrix of the CPH model prediction for Hamilton.

		Actual
		Positive	Negative
Predicted	Positive	9702	534	94.78%	Positive predictive value
Predicted	Negative	3771	1749	31.68%	Negative predictive value
		72.01%	76.61%	Accuracy = 72.68%
		Sensitivity	Specificity

The CPH model for Brampton reaches an overall accuracy of 61.55% and has better performance in identifying the completed projects (Table 7). However, the observed negative cases are predicted to be in fact positive almost 95% of the time, which shows that the model again has the tendency to overpredict project completions. The coefficients are quite similar to those from the CPH model for Toronto in terms of significance and direction, as shown in Table 8, with the dwelling units with larger area and more bedrooms having lower relative probability to be completed, and projects locating far from the CBD constructed during a lower labour rate having shorter construction duration.

Table 7.

Confusion matrix of the CPH model prediction for Brampton.

		Actual
		Positive	Negative
Predicted	Positive	12,077	827	93.59%	Positive predictive value
Predicted	Negative	6907	301	4.18%	Negative predictive value
		63.62%	26.68%	Accuracy = 61.55%
		Sensitivity	Specificity

Table 8.

CPH modelling result in Brampton.

	Coef	exp (coef)	se (coef)	z	Pr (>\|z\|)
Semi-detached	−0.457	0.633	0.026	−17.42	0.00***
Row	−0.311	0.732	0.024	−12.83	0.00***
Apartment	−0.370	0.691	0.036	−10.24	0.00***
Area	−0.170	0.844	0.015	−11.45	0.00***
BedRoom	−0.032	0.968	0.010	−3.38	0.00***
Pop_den	0.064	1.066	0.006	10.31	0.00***
Job_den	−0.286	0.751	0.031	−9.28	0.00***
Housingstock	−0.073	0.930	0.010	−7.60	0.00***
distCBD	0.066	1.068	0.005	13.87	0.00***
Popen	−0.004	0.996	0.000	−10.57	0.00***
Pind	0.019	1.019	0.002	7.95	0.00***
Labour rate	−1.584	0.205	3.845	−0.41	0.68
Underconstruct	0.019	1.019	0.007	2.60	0.01**

Concordance = 0.584 (SE = 0.002).

R-squared = 0.04474588

Likelihood ratio test = 869 on 13 df, p = <0.

Wald test = 924.7 on 13 df, p = <0.

Score (logrank) test = 921.6 on 13 df, p = <0.

CPH model results by dwelling types

In order to compare the important covariates for different types of residential construction project, the CPH model is further applied to the three cities using separate models for the different dwelling types. Since townhouse and semi-detached dwellings have similar physical characteristics in terms of privacy, structure, and design, these two types were combined as a single “attached” dwelling type. The modelling results in Tables 9, 10, and 11 indicate significant differences in the influential factors on construction length by dwelling types.

Table 9.

CPH modelling result by dwelling types in Toronto.

	Apartment			Single detached			Attached
Variables	coef	exp (coef)	Pr	coef	exp (coef)	Pr	coef	exp (coef)	Pr
Area_groupSmall	0.595	1.814	0.01**	0.115	1.122	0***	−0.354	0.702	0***
DWELLING_UNITS_CREATED	−0.001	0.999	0.21	−0.144	0.866	0***	−0.016	0.984	0**
Cost_groupSmall	−0.526	0.591	0***	−0.121	0.886	0***	0.375	1.455	0***
Pop_den	0.006	1.006	0.43	0.025	1.025	0***	0.023	1.023	0***
Job_den	0.001	1.001	0.93	0.033	1.033	0***	0.048	1.05	0***
Housingstock	0.003	1.003	0.98	−0.117	0.89	0.07	−0.092	0.912	0.08
distCBD	−0.009	0.991	0.49	0.015	1.015	0***	0.043	1.044	0***
Multiuse_group_indusYes	−0.058	0.943	0.71	−0.185	0.831	0.85	0.458	1.581	0.01*
LabourRate	−0.061	0.941	0.03*	−0.031	0.97	0***	−0.123	0.885	0***
Underconstruct	0.003	1.003	0.85	0.002	1.002	0.33	0.036	1.036	0***

Table 10.

CPH modelling result by dwelling types in Hamilton.

	Apartment			Single detached			Attached
Variables	coef	exp (coef)	Pr	coef	exp (coef)	Pr	coef	exp (coef)	Pr
Pop_den	−0.115	0.891	0.08	0.088	1.092	0.00***	−0.055	0.947	0.04*
Job_den	−0.038	0.963	0.56	−0.285	0.752	0.00***	−0.033	0.968	0.64
Housingstock	−0.558	0.573	0.66	0.842	2.321	0.00***	0.398	1.488	0.00***
distCBD	0.054	1.055	0.30	−0.019	0.981	0.00***	−0.019	0.981	0.01**
Popen	−0.043	0.958	0.00***	−0.001	0.999	0.22	0.000	1.000	0.87
Pind	−0.019	0.982	0.05*	0.008	1.008	0.00***	0.037	1.037	0.00***
LabourRate	−0.136	0.873	0.01*	−0.015	0.985	0.00***	−0.054	0.947	0.00***
Underconstruct	0.001	1.001	1.00	0.208	1.232	0.00***	0.414	1.513	0.00***

Table 11.

CPH modelling result by dwelling types in Brampton.

	Apartment			Single detached			Attached
Variables	coef	exp (coef)	Pr	coef	exp (coef)	Pr	coef	exp (coef)	Pr
Area	0.545	1.724	0.00***	−0.176	0.839	0.00***	0.091	1.096	0.11
BedRoom	0.135	1.145	0.00***	−0.038	0.963	0.00**	−0.072	0.931	0.00***
Pop_den	0.042	1.043	0.14—	0.010	1.010	0.29	0.084	1.088	0.00***
Job_den	−2.408	0.090	0.00***	−0.232	0.793	0.00***	−0.176	0.839	0.00***
Housingstock	0.173	1.189	0.00**	−0.047	0.954	0.00***	−0.112	0.894	0.00***
distCBD	−0.119	0.888	0.00***	0.084	1.087	0.00***	0.117	1.124	0.00***
Popen	−0.002	0.998	0.21	−0.005	0.995	0.00***	−0.008	0.992	0.00***
Pind	0.262	1.299	0.00**	0.010	1.010	0.03*	0.020	1.020	0.00***
Labour rate	−92.180	0.000	0.00***	−1.417	0.242	0.79	18.200	7997	0.00**
Underconstruct	−0.161	0.851	0.00***	−0.003	0.997	0.75	0.076	1.079	0.00***

Apartment projects are more affected by the construction area, construction cost, previous land use of the construction site, and size of the project, while less influenced by location, and density. The apartment projects occupying small areas with higher cost have relatively higher probability to be completed. The CPH model for apartment projects performs better in Brampton than in Toronto and Hamilton, partly due to the heterogeneity of apartment building projects in the more developed cities which introduces more uncertainty.

Single detached projects are affected by all the factors. In general, for the three cities, single detached residential projects with smaller area, less bedrooms, less dwelling units created would take shorter times to complete. In terms of location, similar to the results of the full model, Brampton and Toronto have the same influential direction of the distance to the CBD and population density to the construction length, with projects that are far away from the CBD with considerable population density finishing more quickly than those locating near the city centre, while Hamilton has the exact opposite signs for these two variables.

Attached units are also affected by all the factors included. Different from the other two types, attached unit projects with larger occupied area, lower cost and locating far from the central area would finish more quickly.

Model performance evaluation

The prediction performances for different dwelling types are compared for the three cities in Tables 12, 13, and 14. For city of Toronto, the model works well for predicting the single detached family residential construction projects, with the tendency to overpredict the completed ones, that is, a large number of the projects under construction were identified as completed. As shown in Supplementary Figure 6, the confidence interval for apartments spreads over a larger range in the survival probability indicating more uncertainty. Supplementary Figure 6 also shows clearly that the survival probability for apartments is higher than single detached and attached projects, which means the apartment projects take longer to construct in general—a not unexpected result given the relative complexity of most apartment projects, and the comparatively limited number of samples for apartment.

Table 12.

Performance indicators of the CPH models by dwelling types for City of Toronto.

	Sensitivity, %	Specificity, %	Positive predictive value, %	Negative predictive value, %	Accuracy, %
Full	80.41	30.57	68.40	45.50	63.04
Apartment	84.94	12.46	44.52	50.00	45.27
Single detached	77.90	43.27	76.36	45.43	67.57
Attached	82.96	25.60	61.05	51.67	59.12

Table 13.

Performance indicators of the CPH models by dwelling types for Hamilton.

	Sensitivity, %	Specificity, %	Positive predictive value, %	Negative predictive value, %	Accuracy, %
Full	72.01	76.61	94.78	31.68	72.68
Apartment	87.10	10.00	50.00	42.86	49.18
Single detached	71.92	77.41	95.45	29.48	72.64
Attached	75.40	76.55	92.49	44.82	75.64

Table 14.

Performance indicators of the CPH models by dwelling types for Brampton.

	Sensitivity, %	Specificity, %	Positive predictive value, %	Negative predictive value, %	Accuracy, %
Full	63.62	26.68	93.59	4.18	61.55
Apartment	62.72	40.70	94.25	6.58	61.38
Single detached	64.50	21.23	93.53	3.28	62.18
Attached	63.96	34.17	93.94	5.61	62.21

The CPH model for Hamilton achieves an accuracy of around 75% for single detached and attached projects (Table 13 and Supplementary Figure 7). However, the model did not perform well in specificity (the percentage of correctly identified actual negative cases in all the actual negative cases) for apartments and the negative predictive value (the percentage of correctly identified actual negative cases in all the predicted negative cases) for single detached units. The KM estimates generate very similar survival plots compared to the CPH model, which could indicate that the CPH model did well in fitting the sample. For Brampton, the model shows similar prediction performance (Table 14 and Supplementary Figure 8), and has the tendency to overpredict the number of completed cases, which should require adjustment when applied in an operational housing supply modelling system.

Conclusion and discussion

Previous urban models typically contain an oversimplified representation of the housing supply market; however, a realistic urban microsimulation system requires a more refined level of representation for the land development and housing provision process. The housing supply process has two major stages: start and completions. This study focuses on modelling housing construction durations within the context of eventually modelling housing supply in the ILUTE urban microsimulation system. This study contributes both to the literature of urban modelling and housing market modelling, as future urban microsimulation models can structure the housing supply side as indicated in this paper, and future studies in residential supply market can further investigate the stochastic property of residential construction. Applying the Cox Proportional Hazard Model, this study develops a modelling framework for housing completion, and provides a new approach to modelling the supply side of housing markets, which provides both methodological and theoretical insights for urban planners and modellers.

The CPH model presented in this paper achieves reasonable prediction accuracy in the empirical studies. The prediction cannot be compared directly with the other examples in the literature; however, the study provides a new and effective way to model the residential construction duration and the housing completion. Though having some issue in overpredicting completions, the model could accurately predict around 90% of the completions and reaches an overall accuracy of around 65% for the three case study cities examined, which is much better than the KM estimation (see the KM estimation in Supplementary File 3). The prediction performances differ for the three cities due to the differences in level of complexity of the residential construction process and housing market. In general, homogeneous real estate markets (such as Hamilton) have less uncertainty and are more predictable. In addition, structure type, size of the project, construction cost, labour cost and the previous land use of the land parcel, location, and density, are found to significantly affect construction durations. This influence was found to be stable over time for most variables, however, non-proportional hazard still exists for some variables and additive hazard regression could be applied as a complement to inspect for non-proportional effects within the main model.

Dwelling type is found to have a fundamental effect on construction duration and separate models for different dwelling types are recommended. Apartment projects in this model are not predicted as well as other dwelling types, the modelling of which could involve more factors that might better capture real estate market fluctuations. Further research focussed on apartment building construction could help to update the apartment supply model. Empirical studies in the three different cities with different urban form also reveals that the model is applicable in different urban contexts, with adjustment in the parameters, but the overall modelling framework remains valid. Unfortunately, we do not have a uniform dataset in the entire GTHA at the time. Another limitation of the study is that weather condition and construction ability of contractors affect the construction duration, which are not included in the model due to limited data availability. Future work could explore further in additional factors and parameters to improve the transferability of the model.

Supplemental Material

Supplemental Material - Predicting housing construction period based on a cox proportional hazard model––an empirical study of housing completions in the greater Toronto and Hamilton area

Supplemental Material for Predicting housing construction period based on a cox proportional hazard model––an empirical study of housing completions in the greater Toronto and Hamilton area by Yu Zhang and Eric J Miller in Environment and Planning B: Urban Analytics and City Science

Supplemental Material

Supplemental Material - Predicting housing construction period based on a cox proportional hazard model––an empirical study of housing completions in the greater Toronto and Hamilton area

Supplemental Material

Supplemental Material - Predicting housing construction period based on a cox proportional hazard model––an empirical study of housing completions in the greater Toronto and Hamilton area

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Yu Zhang

Supplemental Material

Supplemental material for this article is available online.

Notes

Yu Zhang is a PhD candidate at University of Toronto and a graduate researcher at the University of Toronto Transportation Research Institute (UTTRI). Her research focus on the agent-based urban microsimulation, with particular interest in the land use and transportation integrated modelling. She has experience in analysing the urban development trends, housing market, and accessibility through various methods. Her recent work builds the framework of simulating housing supply market in the Integrated Land use, Transportation and Environment (ILUTE) microsimulation systems.

Eric J Miller is Past Chair of the U.S. Transportation Research Board (TRB) Committee on Travel Behavior and Values, Member Emeritus of the TRB Transportation Demand Forecasting Committee and Past Chair of the International Association for Travel Behaviour Research (IATBR). He served on the US National Academy of Sciences Committee for Determination of the State of the Practice in Metropolitan Area Travel Forecasting. He has chaired or been a member of numerous travel demand modelling peer review panels throughout North America. He is the developer of GTAModel, an advanced regional travel demand modelling system used by municipalities in the Greater Toronto Area (GTA) to forecast travel demand that is based on TASHA, a state-of-the-art agent-based microsimulation model of activity and travel, and ILUTE, an integrated land use-transportation model system for the GTA.

References

Aalen

(1989) A linear regression model for the analysis of life times. Statistics in Medicine 8(8): 907–925.

Al-Momani

(2000) Construction delay: a quantitative analysis. International Journal of Project Management 18(1): 51–59.

Anysz

Buczkowski

(2019) The association analysis for risk evaluation of significant delay occurrence in the completion date of construction project. International Journal of Environmental Science and Technology 16(9): 5369–5374.

Bashford

Walsh

Sawhney

(2005) Production system loading–cycle time relationship in residential construction. Journal of Construction Engineering and Management 131(1): 15–22.

Bayram

(2017) Duration prediction models for construction projects: in terms of cost or physical characteristics? KSCE Journal of Civil Engineering 21(6): 2049–2060.

Bradburn

Clark

Love

, et al. (2003) Survival analysis part II: multivariate data analysis–an introduction to concepts and methods. British Journal of Cancer 89(3): 431–436.

Burrows

Pegg

Martin

(2005) Predicting building construction duration. Anaesthesiology Intensive Therapy.

Buzzelli

Harris

(2003) Small is transient: housebuilding firms in Ontario, Canada 1978-98. Housing Studies 18(3): 369–386.

Chan

Kumaraswamy

(1999) Modelling and predicting construction durations in Hong Kong public housing. Construction Management & Economics 17(3): 351–362.

10.

Choudhury

Rajan

(2003) Time-cost relationship for residential construction in Texas. CIB Report 284: 73.

11.

David

(1972) Regression models and life tables (with discussion). Journal of the Royal Statistical Society 34(2): 187–220.

12.

Durdyev

Omarov

Ismail

(2017) Causes of delay in residential construction projects in Cambodia. Cogent Engineering 4(1): 1291117.

13.

Farooq

Hurtubia

(2011) A Unified Framework of Urban Built-Space Evolution. 11th Swiss Transportation Research Conference, Ascona (Vol. 371,: 372–373.

14.

Farooq

Miller

Haider

(2010) Hedonic analysis of office space rent. Transportation Research Record 2174(1): 118–127.

15.

Fox

Weisberg

(2002) Cox Proportional-Hazards Regression for Survival Data. An R and S-PLUS Companion to Applied Regression.

16.

Greene

Hensher

(2010) Ordered choices and heterogeneity in attribute processing. Journal of Transport Economics and Policy (JTEP) 44(3): 331–364.

17.

Hammadi

(2020) Modelling Transportation System Impacts of Housing Supply Dynamics. University of Toronto.

18.

Irfan

Khurshid

Anastasopoulos

, et al. (2011) Planning-stage estimation of highway project duration on the basis of anticipated project cost, project type, and contract type. International Journal of Project Management 29(1): 78–92.

19.

Kaka

Price

(1991) Relationship between value and duration of construction projects. Construction Management and Economics 9(4): 383–400.

20.

Kaplan

Meier

(1958) Nonparametric estimation from incomplete observations. The Journal of the Acoustical Society of America 53(282): 457–481.

21.

Kleinbaum

Klein

(2012) Kaplan-Meier Survival Curves and the Log-Rank Test. Survival Analysis. Springer.

22.

Koo

Hong

Hyun

, et al. (2010) A CBR-based hybrid model for predicting a construction duration and cost based on project characteristics in multi-family housing projects. Canadian Journal of Civil Engineering 37(5): 739–752.

23.

Kumaraswamy

Chan

(1995) Determinants of construction duration. Construction Management and Economics 13(3): 209–217.

24.

Mačková

Bašková

(2014) Applicability of Bromilow´ s time-cost model for residential projects in Slovakia. Selected Scientific Papers-Journal of Civil Engineering 9(2): 5–12.

25.

Mackova

Kozlovska

Baskova

, et al. (2017) Construction-duration prediction model for residential buildings in Slovak Republic based on computer simulation. International Journal of Applied Engineering Research 12(13): 3590–3599.

26.

Martin

Burrows

Pegg

(2006) Predicting construction duration of building projects. Congreso FIGOctubre de.

27.

Miller

Salvini

(1998) The Integrated Land Use, Transportation, Environment (ILUTE) Modelling System: A Framework. Washington, DC: Proceedings 77th Annual Meeting of the Transportation Research Board.

28.

Miller

Salvini

(2001) The integrated land use, transportation, environment (ILUTE) microsimulation modelling system: Description and current status. Travel Behaviour Research: The Leading Edge 2001: 711–724.

29.

Odeh

Battaineh

(2002) Causes of construction delay: traditional contracts. International Journal of Project Management 20(1): 67–73.

30.

Qiao

Labi

Fricker

(2019) Hazard-based duration models for predicting actual duration of highway projects using nonparametric and parametric survival analysis. Journal of Management in Engineering 35(6): 04019024.

31.

Rosenfield

Chingcuanco

Miller

(2013) Agent-based housing market microsimulation for integrated land use, transportation, environment model system. Procedia Computer Science 19: 841–846.

32.

Salvini

Miller

(2005) ILUTE: An operational prototype of a comprehensive microsimulation model of urban systems. Networks and Spatial Economics 5(2): 217–234.

33.

Schoenfeld

(1982) Partial residuals for the proportional hazards regression model. Biometrika 69(1): 239–241.

34.

Stoy

Dreier

Schalcher

(2007) Construction Duration of Residential Building Projects in Germany. Engineering: Construction and Architectural Management.

35.

Walker

Vines

(2000) Australian Multi‐unit Residential Project Construction Time Performance Factors. Engineering: Construction and Architectural Management.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.32 MB

0.71 MB

0.70 MB