Sage Journals: Discover world-class research

Abstract

Accurate destination choice modeling is vital for transportation and urban planning, influencing demand prediction, infrastructure development, and policy assessment. Conventional methods for choice set generation—both deterministic and stochastic—often face limitations such as failing to capture individual variability, requiring extensive data, and inadequately representing decision-making processes, thus compromising reliability. This study introduces a novel application of variational autoencoders (VAEs) for generating choice sets in destination choice modeling, focusing on home-based work trips in Montreal, Canada. Leveraging the deep generative capabilities of VAEs enhances the realism and diversity of choice sets, addressing implicit perception and availability of alternatives. Our methodology integrates VAE-generated choice sets with a multinomial logit model to predict destination choices. The results show significant improvements in model fit and predictive accuracy. The VAE-based model achieved a $ρ^{2}$ value of 0.5733 and an accuracy of 49.61%, outperforming other models. The importance-sampling-based model followed, with a $ρ^{2}$ of 0.3123 and an accuracy of 41.19%, while the simple random-sampling-based model achieved a $ρ^{2}$ of 0.2528 and an accuracy of 30.10%. The model considering all alternatives available performed worst, with a $ρ^{2}$ of 0.1532 and an accuracy of 10.82%. These results highlight the effectiveness of VAEs for choice set generation, leading to more reliable destination choice models. All models were evaluated using the same number of alternatives and observations, ensuring a fair comparison and demonstrating the superior performance of the VAE-based approach.

Keywords

destination choice modeling variational autoencoders (VAEs)choice set generation

Introduction

Destination choice modeling is an important component in transportation and urban planning, involving the prediction of individual location choice among numerous potential destinations. Accurate destination choice models are essential for applications such as demand prediction, infrastructure planning, and policy evaluation. The process typically comprises two main steps: choice set generation and the modeling of choices from the generated choice sets. These two steps are required because the actual set of alternatives considered by individuals before making a decision is generally unobservable and can vary significantly among individuals; therefore, any misspecification in the choice set can lead to biased and inconsistent parameter estimates, affecting the reliability of the model’s predictions ( 1 ).

The generation of choice sets poses several challenges. First, it is generally impossible to identify the exact alternatives an individual considers before making a decision. Moreover, when dealing with a vast number of potential alternatives, enumerating all feasible options becomes impractical ( 2 , 3 ). To address these issues, various choice set generation methods have been developed, which can generally be categorized into deterministic and stochastic approaches.

Deterministic methods involve predefined rules or criteria to generate choice sets. Common examples include rule-based methods where alternatives are included or excluded based on specific attributes or thresholds. For instance, a rule might exclude destinations beyond a certain travel time or distance. While straightforward, deterministic methods often oversimplify the decision-making process by excluding viable alternatives that do not meet rigid criteria, potentially leading to biased choice sets ( 4 ). Moreover, deterministic methods rely on predefined generation rules, leading to the generation of the same set of alternatives for a given observation each time. This contradicts the concept that the true consideration set can differ among individuals.

Stochastic methods incorporate randomness into the choice set generation process, utilizing techniques such as random sampling and importance sampling. These methods aim to represent the decision-making process more flexibly by allowing for variability in the choice sets generated for different individuals or scenarios. While stochastic methods are often computationally manageable in essence, their complexity can increase in practical applications, particularly when dealing with large datasets, numerous alternatives, or the need for repeated sampling and weighting adjustments ( 5 , 6 ). Additionally, in the context of large choice sets and the inherent difficulty of recognizing the true consideration set, these methods are often simplified to balance computational feasibility. For instance, Cascetta et al. generated choice sets by modeling the implicit availability or perception of alternatives using a simple binomial logit model ( 7 ).

The limitations with traditional methods necessitate a need for developing a more comprehensive modeling framework for choice set formation. It is also important to note that the concept of choice set generation is prevalent in other areas of transportation research, especially when dealing with many potential alternatives. A notable example is route choice modeling, where recent advancements have been particularly noteworthy. For instance, Yao and Bekhor combined machine learning (ML) with discrete choice models to generate choice sets with high prediction accuracy ( 8 ).

Similarly, deep generative methods such as generative adversarial networks (GANs) and variational autoencoders (VAEs) have demonstrated significant success in generating realistic high-dimensional data across various domains, including image synthesis and natural language processing ( 9 , 10 ). These models learn the underlying data distribution and use it to generate new realistic samples, making them highly suitable for applications requiring the generation of realistic alternatives. GANs function through an adversarial training framework, in which a generator produces synthetic samples aimed at deceiving a discriminator tasked with distinguishing between real and generated data. This mechanism enables GAN models to generate visually compelling and detailed synthetic samples. However, despite these strengths, GANs possess certain limitations that become particularly critical in choice set generation contexts. Most notably, GANs do not provide an explicit probabilistic model of data, as they do not inherently estimate data likelihood. Instead, they rely entirely on adversarial competition, which can lead to unstable training, mode collapse (limited diversity among generated samples), and convergence issues ( 11 ). Additionally, in the context of choice set generation, the actual consideration sets of individuals are typically unknown, meaning there is no clearly defined “real” dataset to effectively implement the adversarial framework.

In contrast, VAEs stand out as particularly well-suited for choice set generation because of several key attributes. VAEs are typically self-supervised; their performance is directly evaluated by comparing regenerated samples against the original input data. This characteristic is particularly beneficial in scenarios involving unlabeled datasets, as is the case in choice set generation. Given the typically unknown true consideration sets, the VAE framework provides a flexible means of estimating the likelihood that an alternative belongs within an individual’s implicit consideration set. Furthermore, VAEs explicitly infer structured latent representations from observed samples. This explicit probabilistic modeling enables meaningful interpretation of latent factors underlying decision-making processes. Together, these features make VAEs especially appropriate and effective for generating realistic and behaviorally meaningful choice sets. In the context of route choice modeling, Yao and Bekhor proposed a VAE-based approach that considers the implicit perception and availability of alternatives ( 12 ). Their method maximizes the likelihood of including chosen alternatives in the generated choice set by modeling the underlying generation process. Their study demonstrated that the VAE method could effectively reproduce true values and outperform traditional methods in goodness-of-fit and predictive performance.

The potential of VAEs in choice set generation, extends beyond route choice modeling, as the underlying principles of choice set generation and perception of alternatives are applicable to other similar domains such as destination choice modeling. However, as far as we know, these deep generative models, especially VAE, have not yet been adapted for choice set generation in destination choice modeling. By leveraging the VAE approach, we can enhance the generation of destination choice sets, accounting for the implicit perception of alternatives and improving the robustness of destination choice models.

In this paper, we draw inspiration from the VAE-based choice set generation method proposed by Yao and Bekhor and adapt its principles to the context of choice set generation in destination choice modeling ( 12 ). We compute the implicit perception and availability of alternatives from these generated sets and implement them in a multinomial logit (MNL) model to predict the destination choices of home-based work trips, using a case study in Montreal, Canada.

This study makes two key contribution. First, we address the challenges of large and unobservable true choice sets in destination choice modeling by using a VAE to generate choice sets. Secondly, we provide a comprehensive evaluation of the VAE approach using the Montreal Origin-Destination (OD) Survey dataset, demonstrating its applicability and effectiveness in real-world scenarios ( 36 ).

The rest of the paper is structured as follows. The Literature Review discusses previous methods of choice set generation, while the Methodology section details the VAE method used in this paper for choice set generation. The Data section details the dataset used in the study and the attributes incorporated into the models. The Results and Discussion section presents the findings, and the Conclusion summarizes the key insights and implications of the study.

Literature Review

Choice set formation is a critical step in the specification, estimation, and application of destination choice models. As noted by Thill, misspecification of choice sets can adversely affect parameter estimates and the accuracy of predicted choice probabilities ( 13 ). Accurate definition of the destination choice set is essential because the number of potential alternatives can be extremely large, especially in urban contexts where hundreds of zones are involved. This section provides a detailed review of traditional and advanced methods for choice set generation in discrete choice modeling, with traditional methods broadly categorized into deterministic and stochastic approaches.

Deterministic Methods

Deterministic methods, such as those used by Scott and He, and Ben-Akiva and Lerman, rely on predefined rules to generate alternatives ( 14 , 15 ). These methods, while systematic and straightforward, often fail to capture the variability in individual preferences and can result in overly simplistic choice sets. For example, Bhat et al. employed a deterministic approach to model destination choice for work and shopping trips in Boston, using the generalized cost (time and monetary) as a key determinant in generating choice sets ( 16 ). This method categorized alternatives based on fixed criteria, failing to account for individual preferences that might lead to more diverse or unconventional choices. Bowman and Ben-Akiva applied deterministic rules in an activity-based travel demand model, focusing on factors such as time, distance, and monetary cost to generate choice sets ( 17 ). Although clear and simple, this method often led to the exclusion of feasible alternatives that did not meet predetermined criteria, such as destinations offering unique amenities or opportunities outside predefined bounds. Sivakumar and Bhat used a generalized cost approach, including both time and monetary aspects, to generate choice sets for non-work trips ( 18 ). Despite detailed cost considerations, this deterministic method excluded relevant alternatives because of rigid predefined criteria. Finally, the time-space prism approach, based on time geography, maps activity spaces where individuals travel and engage in activities ( 19 ). While providing a more dynamic and flexible representation of travel behavior than other deterministic methods, it still imposes deterministic constraints that can overlook viable alternatives. These limitations highlight the failure of deterministic methods to account for variability in individual preferences, often resulting in the exclusion significant alternatives.

Stochastic Methods

Stochastic methods introduce randomness to capture variability in decision-making processes. For example, Pel et al. used random utility models to generate choice sets, integrating randomness to better reflect real-world decision-making processes ( 20 ). Molloy used a zonal-based approach that incorporated randomness by considering different zones for destination choices in Ontario ( 21 ). This method captured spatial variability but required extensive data on zone characteristics and boundaries. Additionally, Tsoleridis et al. proposed a probabilistic approach to choice set formation by integrating the concept of activity spaces in mode and destination choice modeling ( 22 ). This approach involves inherent randomness in representing individuals’ habitual activity spaces. Similarly, Elgar et al. introduced the concept of anchor points in modeling the location decisions of office firms ( 23 ). Their method involves creating choice sets that reflect the spatial structure and constraints faced by firms through the introduction of random anchor points.

In addition, sampling methods are frequently used to generate choice sets by incorporating randomness. Two commonly employed sampling approaches are random sampling and importance sampling. In random sampling approaches, a random sample of locations is drawn from the universe of feasible choices to constitute the consideration choice set. Each alternative in the universe of feasible choices has an equal probability of being selected. This method is straightforward and ensures that all potential choices have an equal chance of inclusion, which can provide a broad representation of possible alternatives. In contrast, in importance sampling approaches, some destinations are considered more desirable based on variables like size and distance. Monte Carlo simulations sample destinations from an importance probability distribution, requiring correction factors to maintain the properties of the maximum likelihood estimator.

Stochastic methods that incorporate these sampling techniques have been extensively applied in transportation and urban planning research. Jonnalagadda et al. employed a combination of stratification by importance and random sampling to generate choice sets for work and intermediate trips in San Francisco ( 24 ). This method introduced variability while still relying heavily on initial stratification criteria. Pozsgay and Bhat implemented a simple random method to generate choice sets for leisure trips in Dallas-Fort Worth ( 25 ). While this approach allowed for a broader range of alternatives, it also increased the potential inclusion of irrelevant options. Similarly, Kim et al. and Mishra et al. applied random sampling techniques in their comparative studies of aggregate and disaggregate models, aiming to include a diverse range of alternatives but often struggling with ensuring adequate representation of all relevant alternatives ( 26 , 27 ). Hammadou et al. used a spatial analysis approach with random sampling in Antwerp, emphasizing the importance of spatial dimensions in choice set generation ( 28 ). While this method enhanced the spatial realism of choice sets, it faced challenges in ensuring the relevance of included alternatives. Li et al. applied geographically stratified importance sampling for destination choice modeling in Maryland ( 29 ). Consistent with this, Shiftan used a stratified-by-importance method for generating choice sets in trip chaining models ( 30 ). This approach categorized alternatives by importance but often neglected less obvious yet significant alternatives, such as destinations that individuals might consider because of unique preferences or specific constraints. To elucidate this further, consider a situation where individuals, in the context of work destination choices, may select jobs in faraway locations because of limited local job opportunities or the need for specialized employment. Thus, although stochastic methods introduce flexibility, they tend to oversimplify the choice set formation process and struggle to fully reflect the actual consideration set of individuals.

Recent Advances in Choice Set Generation

As mentioned before, recent advances in choice set generation using ML and deep learning (DL) algorithms have been primarily applied in route choice models. Because of the similarities between route choice models and destination choice models, particularly in handling many alternatives, these advanced methods hold significant potential for application in choice set generation for destination choice models.

In this regard, Yao and Bekhor proposed a novel data-driven framework for choice set generation in route choice modeling by combining ML techniques with observed route data ( 8 ). Their method utilized clustering and classification to detect patterns in travelers’ revealed preferences, enabling the generation of personalized and behaviorally realistic choice sets without relying on arbitrary rules. This method demonstrated high prediction accuracy and strong explanatory power compared with conventional methods. Another innovative approach is the “Neural Choice by Elimination” framework introduced by Tran et al. ( 31 ). This framework integrates DL into probabilistic sequential choice models and proved competitive against state-of-the-art learning to rank methods.

VAEs have been used for choice set generation in route choice modeling. Yao and Bekhor proposed using VAEs to generate choice sets by maximizing the likelihood of including chosen alternatives ( 12 ). This method effectively bridges ML methods with traditional choice models by incorporating implicit availability and perception of alternatives. Liu et al. proposed a hybrid choice set generation approach for route choice modeling using a conditional VEA ( 32 ). Their method built on the earlier VAE framework by Yao and Bekhor and incorporates individual- and OD-specific features as conditioning variables, allowing the model to generate alternatives that are more personalized and context-specific ( 12 ).

In summary, traditional choice set generation methods, whether deterministic or stochastic, face limitations that can affect their practicality and accuracy. Deterministic methods often exclude feasible alternatives because of strict criteria, while stochastic methods, despite their flexibility, may still struggle with retrieving the true consideration set and rely on simplified representations of the decision-making process. Both approaches have inherent challenges that can affect the reliability of destination choice models. Recent advancements in using ML methods for choice set generation highlight the potential of these techniques to enhance the predictive power and reliability of choice models. Despite these advancements, there remains a significant gap in evaluating the adoption of such advanced methods for choice set generation in destination choice modeling. Given the features of VAEs, such as their self-supervised nature which is advantageous for handling unlabeled data, and their strength in learning the latent distribution of observed samples, along with promising results in previous studies, this paper implements a VAE approach to generate choice sets in destination choice modeling. Our findings demonstrate the effectiveness of this approach in this new context.

Methodology

The methodology for choice set generation in destination choice modeling consists of three main components: choice set generation using VAEs, computing the implicit availability/perception of alternatives, and the discrete choice model for destination choice prediction. This section closely follows Yao and Bekhor and we refer interested readers to their paper for further discussions and in-depth explanations of the mathematical foundations ( 12 ).

Choice Set Generation Using VAE

The VAE model architecture comprises an encoder and a decoder. The encoder maps observed alternatives to a lower-dimensional latent space capturing the essential features of the alternatives. The decoder reconstructs the alternatives from the latent space, generating new samples consistent with the observed data.

The encoder $Φ$ is an inference model that transforms the observed input data into latent space. Mathematically, the encoder network maps the input data $j$ to a latent representation $z$ with the parameters $ϕ$ :

p_{ϕ} (z ∣ j) = N (z; μ_{ϕ} (j), σ_{ϕ}^{2} (j))

(1)

where

$μ_{ϕ} (j)$ = the mean of the latent variable $z$ for the input data $j$ , and

$σ_{ϕ}^{2} (j)$ = the variance of the latent variable $z$ for the input data $j$ .

The decoder $Θ$ is a generative model that reconstructs the observed input data as $j^{'}$ from the latent variables. The decoder network maps the latent variables $z$ back to the data space with the parameters $θ$ :

q_{θ} (j^{'} ∣ z) = N (j^{'}; μ_{θ} (z), σ^{2} I)

(2)

where

$μ_{θ} (z)$ = the mean of the reconstructed data,

$σ^{2}$ = the variance, and

I = the identity matrix.

The VAE model aims to maximize the likelihood of including chosen alternatives in the generated choice set by learning the underlying distribution of the alternatives. The underlying distribution learned by the VAE represents the distance distribution of chosen trip destinations across census tracts in the study area. Specifically, it captures the normalized distance between origin and destination zones as the primary attribute. This focus on distance ensures the generated alternatives reflect realistic spatial patterns observed in the data and aligns with its critical role in destination choice modeling. While the VAE framework is capable of incorporating multiple distributions, such as population density, total residents, and trip density, this study focuses on distance as a foundational attribute for model evaluation. Including additional distributions in the latent space would enhance the richness of generated choice sets and is an important direction for future work.

The objective function, specifically formulated in this paper for training the VAE, includes three main components: the reconstruction loss, the Kullback-Leibler (KL) divergence, and the total correlation regularizer. This combination helps in learning a compact representation of the data while promoting independence among the latent variables. The objective function is expressed as:

\begin{matrix} L = E_{q_{ϕ} (z ∣ j)} [MSE (j, j^{'})] \\ + KL (q_{ϕ} (z ∣ j) p (z)) \\ + TC (z, μ, \log σ^{2}) \end{matrix}

(3)

where

$L$ = total loss,

$E_{q_{ϕ} (z | j)}$ = the expected value over the approximate posterior distribution,

$MSE (j, j^{'})$ = the mean squared error between the input $j$ and the reconstructed output $j^{'}$ ,

$KL (q_{ϕ} (z | j) ∥ p (z))$ = the Kullback-Leibler divergence between the approximate posterior distribution $q_{ϕ} (z | j)$ and the prior distribution $p (z)$ , and

$TC (z, μ, \log σ^{2})$ = the total correlation term (which regularizes the independence among the latent variables).

The reconstruction loss (mean squared error) measures how well the model reconstructs the input data, ensuring that the generated alternatives are close to the original ones ( 10 ). The KL divergence ensures that the learned latent space follows a standard normal distribution, promoting smoothness and enabling meaningful sampling ( 10 ). The total correlation term further encourages independence among the latent variables, reducing redundancy and improving the quality of the latent representations ( 33 ). This objective function is optimized using stochastic gradient descent with the Adam optimizer ( 34 ). The parameters of the encoder and decoder networks are updated to minimize this loss, effectively learning the distribution of the alternatives and enabling the generation of realistic and diverse choice sets. The model is trained by drawing multiple samples from the approximate posterior distribution and using these samples to calculate the expected reconstruction loss and regularization terms.

The training procedure for the VAE involves several key steps. First, the attributes of the observed alternatives are normalized to ensure consistent scaling across features. Next, the encoder maps the preprocessed input data into a lower-dimensional latent space, capturing the underlying structure of the alternatives in relation to compact latent variables. Once these latent representations are learned, the decoder reconstructs new alternatives by sampling from this latent space, generating alternatives that follow the same underlying distribution as the observed data. Finally, the model optimizes an importance-weighted log-likelihood objective, ensuring that the reconstructed and generated alternatives remain statistically consistent with the patterns observed in the original dataset.

Implicit Perception of Alternatives

In choice modeling contexts, individuals typically do not explicitly reveal all alternatives they consider; instead, some alternatives are implicitly perceived or considered within their choice sets without direct indication. We define the implicit availability/perception of alternatives as the likelihood that an alternative is implicitly included within an individual’s choice set, even though such perception is not directly observable or explicitly stated. This approach assumes that each alternative has an implicit degree of availability or perception represented as a probability. To quantify these implicit perception probabilities, we use the Bayesian conditional (BC) metric ( $B C_{nj}$ ), which estimates the probability that a given alternative is implicitly considered, based on latent representations learned from observed choices through a VAE. Formally, the log-likelihood of implicitly perceiving alternative $j$ is expressed as:

\begin{matrix} \ln q_{θ} (j) = \ln \int^{p} (z ∣ j) \frac{p (z) q_{θ} (j ∣ z)}{p (z ∣ j)} dz \\ = \ln E_{p (z ∣ j)} [\frac{p (z) q_{θ} (j ∣ z)}{p (z ∣ j)}] \end{matrix}

(4)

where

$q_{θ} (j)$ = the marginal likelihood of observing alternative $j$ (which integrates the decoder’s conditional probability $q_{θ} (j | z)$ over the latent variable distribution $p (z | j)$ ).

Because of the intractable nature of the posterior distribution $p (z | j)$ , we used a variational distribution $p_{ϕ} (z | j)$ as a practical approximation. By applying Jensen’s inequality, we derive a lower bound on the log-likelihood (following the importance-weighted autoencoder objective ( 35 ):

\ln q_{θ} (j) \geq E p_{ϕ} (z | j) [\ln \frac{1}{S} \sum_{i = 1}^{S} \frac{p (z_{i}) q_{θ} (j ∣ z_{i})}{p_{ϕ} (z_{i} ∣ j)}]

(5)

Consequently, the $B C_{nj}$ term is computed via Monte Carlo approximation using $S$ samples drawn from the approximate posterior distribution:

\ln B C_{nj} \approx \ln (\frac{1}{S} \sum_{i = 1}^{S} \frac{p (z_{i}) q_{θ} (j ∣ z_{i})}{p_{ϕ} (z_{i} ∣ j)})

(6)

Further theoretical details and broader context are comprehensively discussed by Yao and Bekhor ( 12 ). The process of choice set generation and implicit perception estimation is summarized as follows.

Alternative Generation Steps

Draw $z$ at random from the prior distribution $p (z)$ .

Draw a new alternative $j$ from the decoder $Φ q_{θ} (j | z)$ given $z$ from step 1.

Implicit Perception Estimation Steps

Draw $z$ from the encoder $Φ p_{ϕ} (z | j)$ for the generated/chosen alternative $j$ .

Draw from the decoder $Θ q_{θ} (j | z)$ for the given $z$ from step 1.

Draw from prior distribution $p (z)$ .

Compute $B C_{j} = \frac{p (z) q_{θ} (j | z)}{p_{ϕ} (z | j)}$ .

Repeat steps 1–4 for $S$ times random draws.

Calculate $\ln B C_{nj}$ using Formula 6.

To validate the generated choice sets, we adopted a two-step approach. First, we compared the distribution of generated alternatives with the distribution of chosen alternatives in the dataset. This comparison involved visual methods, such as density plots, and statistical measures. These methods were employed to assess how well the generated choice sets represent observed behavior.

In the second step of validation, the generated alternatives were used to create choice sets for individuals, incorporating the $B C_{j}$ term into the destination choice model. The $B C_{j}$ term adjusts the likelihood of alternatives being included in the consideration set. This integration allowed us to evaluate the VAE-based model against traditional methods for choice set generation, enabling a direct comparison of their performance.

To provide robust benchmarks, we included two traditional methods for choice set generation: simple random sampling and importance sampling. Simple random sampling serves as a baseline, where alternatives are selected purely at random, without any behavioral considerations. The importance sampling method was implemented following a gravity-based framework introduced by Ben-Akiva and Lerman ( 1 ). In this framework, the probability of selecting a destination zone is proportional to its size and inversely proportional to its distance from the traveler’s origin. This ensures that larger and closer zones are more likely to be included in the choice set, reflecting the behavioral tendency of travelers to prefer nearby, attractive destinations. The probability of choosing a destination zone $i$ for a given origin $o$ is expressed as:

P (i) \propto {\tilde{M}}_{i} e^{- α d_{oi}}

(7)

where

${\tilde{M}}_{i}$ = the size or attractiveness of the destination zone, such as its population density or area,

$d_{oi}$ = the distance between the origin and destination zones, and

$α$ = a parameter capturing the distance decay effect (reflecting how the likelihood of selecting a destination decreases with increasing distance) (as suggested by Ben-Akiva and Lerman, $α$ is calculated as $α = \frac{2}{\bar{d}}$ where $\bar{d}$ is the mean trip distance) ( 1 ).

The implementation of importance sampling involves several steps. First, a size measure, such as population density or area, is assigned to each destination zone. When specific size data is unavailable, zones are assumed to have equal size, making distance the primary determinant of selection probability. The distance decay parameter $α$ is then calculated using the mean trip distance to capture how the likelihood of choosing a destination decreases with increasing distance. Using these parameters, the unnormalized probability for each destination is calculated for every traveler’s origin, and these probabilities are normalized across all destinations to produce a valid probability distribution. Once the probability distribution is established, alternatives are sampled to generate a fixed-size choice set for each traveler. The sampling process ensures that higher-probability destinations are more frequently included in the choice set, creating a realistic representation of feasible alternatives. To maintain accuracy and consistency, the traveler’s actual chosen destination is always included in the choice set by replacing one of the sampled alternatives if necessary. This process allowed us to compare the VAE-based method against both simple random sampling and importance sampling under consistent conditions. Detailed results from this evaluation are presented in the Results and Discussion section.

Discrete Choice Model for Destination Prediction

We employ an MNL model for destination choice prediction. The utility function $U_{in}$ of individual $n$ for alternative $j$ can be written as:

U_{in} = V_{in} + \ln B C_{ni} + ϵ_{in}

(8)

where

$V_{in}$ = the systematic part of the utility function ( $V_{in} = α + β_{a} x_{ain}$ , where $β_{a}$ = generic coefficient for attribute $a$ , and $x_{ain}$ = attribute value for the alternatives $i$ of individual $n$ ),

$\ln B C_{ni}$ = the log of implicit/perception, and

$ϵ_{in}$ = the random error term.

The probability of choosing alternative $i$ given the choice set $C_{n}$ is:

P_{ni} = \frac{B C_{ni} \exp (V_{in})}{\sum_{j \in C_{n}} B C_{nj} \exp (V_{jn})}

(9)

Figure 1 illustrates the methodology employed in this paper for choice set generation and destination choice modeling using a VAE. The process starts with data preprocessing to normalize the data. The VAE generates choice sets through an encoder, which compresses input data into latent variables, and a decoder, which reconstructs the data. The encoder consists of two hidden layers producing mean ( $μ_{ϕ} (j)$ ) and log-variance ( $σ_{ϕ}^{2} (j)$ ), while the decoder reconstructs the data from sampled $z$ . Implicit perception of alternatives is computed using the BC metric. By integrating the generated choice sets and implicit availabilities into the discrete choice model, we can estimate destination choice models for MNL using a training subset. The testing subset containing the estimated models is used to evaluate the model prediction performance.

Figure 1.

Process of implementing the suggested variational autoencoder (VAE) approach in destination choice modeling.

Study Area and Data

The primary focus of this study is the Island of Montreal, a part of the Greater Montreal metropolitan area. The study area covers a population of approximately 1.9 million people over an area of 462 square kilometers.

The dataset used in this study originates from the Enquête Origine-Destination 2018 survey which has been conducted every 5 years since 1970 ( 36 ). For the first time in 2018, the survey included an online questionnaire alongside traditional telephone interviews. This survey targeted weekday travel behavior and involved approximately 70,000 households, representing nearly 170,000 individuals and accounting for 393,826 trips. Participants in the OD survey were randomly selected and asked detailed questions about their household characteristics, individual demographics, and travel behaviors, capturing all trips made by each household member on the day before the interview.

The OD data covered the Greater Montreal area, but in this study, we filtered our data and focused on the Island of Montreal, as most trip destinations were located within this area. Moreover, the data were filtered to focus specifically on home-based work trips. Individuals under 16 years old and trips with unreasonable travel times (exceeding 3 hours or having zero travel time) were excluded from the analysis. The final database comprised 19,389 observations, representing 4.92% of the total dataset. Figure 2 shows the density distribution of each zone selected as a destination in the study area.

Figure 2.

Density (per square kilometer) distribution of each zone selected as a destination in the study area.

Census tract data further enriched the dataset, providing additional attributes such as population, number of dwellings, employment rate, density, and the number of occupations in different categories. The study area was divided into 533 census tract zones, forming the basis for analyzing travel patterns and modeling destination choices in the Island of Montreal. Figure 3 illustrates the distribution of total number of all kinds of residents’ occupations across census tracts and the distribution of population density per square kilometer in each census tract of the study area.

Figure 3.

Distribution of: (a) total number of residents occupations (all categories) by census tract, and (b) population density (per square kilometer) by census tract in the Island of Montreal.

Travel time and distance were initially considered as impedance measures for each trip between the zone centroids. However, because of a high correlation between distance and travel time, the models were estimated using only the distance attribute. Table 1 provides summary statistics of the data and Table 2 details the attributes used in the estimated destination choice models.

Table 1.

Summary Statistics of the Data Features

	Metric
Attribute	Average	SD	Min.	Max.
Distance between zones (km)	14	9	0	56
Total zone population (2016 census)	3,644	1,732	0	8,183
Total private dwellings	1,762	798	0	4,887
Population density (per square kilometer)	8,258	5,497	0	50,278
Employment rate (%)	58	12	0	85
Total number of all occupations	1,843	815	0	4,415

Note: Max. = maximum; Min. = minimum; SD = standard deviation.

Table 2.

Summary of Attributes Applied in Destination Choice Models

Attribute	Description
B_DIST	Distance between zones (m)
B_POP	Total zone population (2016 census)
B_TPD	Total private dwellings
B_PDENS	Population density (per square kilometer)
B_EMP	Employment rate
B_MGMT	Total number of management occupations
B_BFA	Total number of business, finance, and administration occupations
B_NAS	Total number of natural and applied sciences and related occupations
B_HLTH	Total number of health occupations
B_ELSCG	Total number of occupations in education, law and social, community, and government services
B_ACRS	Total number of occupations in art, culture, recreation, and sport
B_SALES	Total number of sales and service occupations
B_TRADE	Total number of trades, transport and equipment operators, and related occupations
B_NRA	Total number of natural resources, agriculture, and related production occupations
B_MFG	Total number of occupations in manufacturing and utilities

Results and Discussion

To train the VAE, a vector of normalized distances between the origin and all destinations is used as the attributes for each chosen alternative. The VAE is then trained to learn the distribution of chosen alternatives. Based on this distribution, 30 new alternatives are generated for each observation, resulting in a total of 581,670 generated alternatives (30 alternatives per observation multiplied by 19,389 observations). The BC term for these generated alternatives is calculated and incorporated into Equation 8 to predict destination choices. We normalized the input data using MinMaxScaler and used the Tanh activation function in hidden layers of both the encoder and decoder. Batch normalization and a dropout rate of 0.5 were applied to improve training stability and prevent overfitting. Weights were initialized using He initialization ( 37 ). The latent space dimension was set to 20 with hidden layers containing 128 neurons each. The encoder consists of two hidden layers, while the decoder has an additional Softplus layer to ensure that the generated values are not negative, as the input values represent distances and cannot be negative. To determine the optimal configuration, we tested a range of hyperparameters, as shown in Table 3. The final chosen values are underlined in the table.

Table 3.

Specifications of Neural Network Variational Autoencoder Model

Item	Values
Data normalization	StandardScaler, MinMaxScaler
Activation function	ReLU, Tanh
Encoder hidden layers	1, 2, 3
Decoder hidden layers	1, 2, 3
Initialization	He initialization (Kaiming normal)
Dropout	True (p = 0.5)
Batch normalization	True
Batch size	32, 64, 128
Latent space dimension	5, 10, 20
Hidden layer dimensions	64, 128, 256
Learning rate	0.001
Random draws (S) in Monte Carlo	100

The primary criteria for evaluating each hyperparameter combination were the alignment of the generated alternatives’ distribution with the chosen alternatives’ distribution and the final log-likelihood of the destination choice model. We visually and statistically compared the density plots of generated and chosen alternatives to assess how well the VAE model captured the underlying data distribution, and we used higher log-likelihood values to indicate better performance in capturing the observed data. Based on these evaluation criteria, we selected the hyperparameters that resulted in the generated alternatives having a distribution most similar to that of the chosen alternatives and achieved the highest log-likelihood in the destination choice model.

Alternative Generation Using VAE

To analyze the performance of generated attributes of alternatives by the VAE in recognizing the distribution of the chosen alternatives, we used a KDTree algorithm for efficient nearest neighbor search ( 38 ). This method involves constructing a KDTree using the scaled values from the original dataset and identifying the closest destination for each generated alternative by querying the KDTree. Parallel processing is used to expedite the mapping process, ensuring efficiency and speed.

Evaluating the quality of generated choice sets is inherently challenging because of the unobservable nature of true consideration sets. In our study, we compared the distribution of chosen alternatives from the survey with the distribution of alternatives generated by VAE. While this is not a common approach in the literature, traditional deterministic and stochastic methods do not explicitly aim to capture the underlying distribution of consideration sets. By contrast, the VAE approach seeks to approximate the true distribution of individuals’ consideration sets, making this comparison a meaningful way to evaluate its performance.

The plot shown in Figure 4 presents the comparison between the density of chosen alternatives and generated alternatives by VAE across all zones. The chosen alternatives show specific patterns and the alignment of peaks and variations in the density of generated alternatives with those of the chosen alternatives suggests that the VAE model accurately learns the underlying data distribution, enabling it to generate plausible and diverse alternatives. However, the trained VAE model tends to produce higher density peaks in certain zones, particularly noticeable at the first peak. This happens because the model might generate alternatives biased toward regions with higher prior probabilities when sampling from the latent space, causing the generated density to be more concentrated around popular zones. This bias could actually be beneficial for the choice set generation case, as zones with higher probabilities of being chosen are more represented in the choice sets. Further validation of the quality of generated choices will be discussed in the estimation results of the destination choice model with these generated choices by VAE. Also, to quantitatively assess the similarity between these distributions, we employed two metrics: the histogram intersection index and the Jensen-Shannon (JS) divergence. The histogram intersection index, introduced by Swain and Ballard, measures the overlap between the density distributions of the chosen and generated alternatives ( 39 ). It yielded a score of 0.8634, indicating that 86.34% of the total area under the density curves of the chosen and generated alternatives overlaps. This high score suggests that the VAE model has effectively captured the overall pattern of the chosen alternatives’ distribution.

Figure 4.

Density plot of chosen and generated alternatives.

The JS divergence, as introduced by Lin, is a symmetric measure of divergence between two probability distributions, providing an additional perspective on the similarity between the distributions ( 40 ). It produced a value of 0.1402, where lower values (bounded between 0 and 1) indicate greater similarity. This relatively low divergence further confirms that the generated alternatives closely mimic the chosen alternatives in distributional characteristics.

Further validation is provided by the cumulative distribution functions (CDFs) of chosen and generated alternatives in Figure 5. The VAE-generated alternatives exhibit a cumulative distribution that closely follows the trajectory of the chosen alternatives’ CDF, further validating the effectiveness of the VAE model in replicating the observed choice behavior.

Figure 5.

Cumulative distribution functions of chosen and generated alternatives.

To provide additional support for the behavioral realism of choice sets generated by different methods, we analyzed how frequently each generated set included the actual chosen destination. Figure 6 provides a visual comparison among the VAE-based method, importance sampling, and simple random sampling, each generating sets of 30 alternatives per individual. As depicted in Figure 6, the VAE-generated choice sets included the actual chosen alternatives in approximately 25.15% of cases, significantly outperforming importance sampling (7.66%) and simple random sampling (5.59%). These results provide additional support that the VAE approach generates choice sets that more closely reflect observed decision-making behavior.

Figure 6.

Inclusion frequency of chosen alternatives by choice set generation method.

Model Estimates

The model estimates for the destination choice models provide a comparative analysis of the performance of four different models: the destination choice model with all alternatives available (DC-FA), the destination choice model with choice sets generated by a simple random sample method (DC-SRS), the destination choice model with choice sets generated by importance sampling (DC-IS), and the destination choice model with choice sets generated by VAE (DC-VAE). All these models follow the MNL framework.

Validation is essential to prevent overfitting, where the model fits the estimation dataset well but performs poorly on other datasets. The validation process involves estimating the model with a subset of the total dataset, then applying the estimated parameters to the remaining validation subset. Accuracy is used as a performance measure. In this study, 80% of the database (15,517 out of 19,389 observations) is used for estimation and 20% (3,872 out of 19,389 observations) for validation. The percentage match rate between estimated and observed individual choices is analyzed as a performance metric. Performance measures are detailed in Table 4, including final log-likelihood, $ρ^{2}$ , and validation results as the percentage of correct predictions. In the validation process, the alternative with the highest predicted probability was labeled as the predicted choice. This method aligns with the standard practice in discrete choice modeling, grounded in the random utility maximization framework, where the alternative with the maximum utility is selected as the predicted outcome ( 41 , 42 ).

Table 4.

Goodness-of-Fit and Prediction Performance Results

Criteria	DC-FA	DC-SRS	DC-IS	DC-VAE
LL(0)	−97,423.8	−53,285.2	−52,776.4	−46,138.3
LL(final)	−82,497.8	−39,815.2	−36,295.9	−41,566.3
$ρ^{2}$	0.1532	0.2528	0.3123	0.5733
Prediction accuracy (%)	10.82	30.10	41.19	49.61

The goodness-of-fit and prediction performance metrics provide a comparative evaluation of the destination choice models. While DC-IS achieves the best (least negative) final log-likelihood at −36,295.9, followed by DC-VAE at −41,566.3, DC-SRS at −39,815.2, and DC-FA at −82,497.8, final log-likelihood alone does not fully capture overall model performance. Each model’s strengths must also be assessed in relation to their explanatory power and predictive accuracy, where clear differences are observed.

The $ρ^{2}$ value for DC-VAE is 0.5733, which is higher than 0.3123 for DC-IS, 0.2528 for DC-SRS, and 0.1532 for DC-FA. This indicates that DC-VAE captures a larger proportion of the variance in the data, demonstrating its capability to represent the underlying destination choice behavior more effectively. While DC-IS performs well with a substantial improvement over DC-SRS and DC-FA, DC-VAE offers the highest explanatory power among the models.

Prediction accuracy index measures the percent of correct predictions which highlights further distinctions between the models. DC-VAE achieves the highest accuracy at 49.61%, followed by 41.19% for DC-IS, 30.10% for DC-SRS, and 10.82% for DC-FA. The improved accuracy of DC-VAE underscores its ability to generate realistic choice sets that closely align with observed behavior, thereby enhancing predictive performance. DC-IS also demonstrates strong predictive accuracy compared with DC-SRS and DC-FA, confirming the effectiveness of probabilistic approaches such as importance sampling in creating behaviorally consistent choice sets.

To investigate the practicality and scalability of the proposed VAE method in large-scale applications, we compared the computational complexity and resource consumption across the different choice set generation methods (DC-FA, DC-SRS, DC-IS, and DC-VAE) and subsequent discrete choice model estimations. All experiments were executed on a system with an Intel Core i7 14700K CPU, 32 GB of RAM, and Nvidia RTX 4000 Ada GPU. The VAE model was implemented using PyTorch, while discrete choice models were estimated using the Apollo package (version 0.3.2) ( 43 ). The computational times for each method are summarized in Table 5.

Table 5.

Computational Times for Different Methods

Method	Choice set generation time	Model estimation time
DC-FA	Not applicable (no generation needed)	8 min 59 s
DC-SRS	Negligible (random draw)	14 min 20 s
DC-IS	70 s	23 min 40 s
DC-VAE	55 s (VAE training)	14 min 58 s
DC-VAE	2 s (VAE inference)	14 min 58 s

The VAE approach has an initial non-trivial computational overhead associated with training (approximately 55 s for 100 epochs), but the training is a one-time investment. After training, the VAE’s inference phase for generating choice sets is extremely efficient, requiring only about 2 s, which is significantly faster than the importance sampling method (70 s). Although the simple random sampling (DC-SRS) method involves negligible computational effort because of its straightforward random draw process, it performs considerably worse in predictive accuracy than the VAE-based approach. Furthermore, while the DC-IS method offers improved predictive accuracy over DC-SRS, it still requires substantially longer inference times and exhibits some behavioral inconsistencies in its parameter estimations. In contrast, the DC-VAE method strikes a balance: after its initial training period, it achieves excellent inference-time efficiency combined with superior predictive performance.

The parameter estimates for the destination choice models are presented in Table 6. The results indicate that all parameter estimates are statistically significant at the 5% level, as evidenced by t-test values exceeding the critical threshold of 1.96. Across most models, the coefficient for distance (B_DIST) is negative, aligning with the well-established behavioral principle that individuals prefer closer destinations, typically characterized by a negative exponential relationship between distance and destination utility. This negative exponential form represents the diminishing likelihood of selecting a destination as its distance increases, reflecting realistic travel cost minimization and distance decay effects. However, the positive B_DIST coefficient in DC-IS contradicts this expectation and suggests a deeper examination. In the importance sampling approach, alternatives closer to the origin are disproportionately oversampled in the choice set. This results in a choice set where the actual chosen destination often has a longer distance than the majority of sampled alternatives. When the model estimates the likelihood of choosing a destination, it finds that longer distances are systematically associated with chosen alternatives relative to the sampled set, leading to an artificially positive relationship between distance and destination selection. This is a methodological artifact rather than a genuine behavioral preference. Our findings underline that the importance sampling approach, while offering improved predictive performance compared with simpler sampling methods such as random sampling, inherently carries significant methodological bias. Therefore, using importance sampling naively in choice set generation can lead to unrealistic behavioral interpretations and compromised parameter validity.

Table 6.

Parameter Estimation Results

	DC-FA			DC-SRS			DC-IS			DC-VAE
Parameter	Estimate	SD	t-statistic	Estimate	SD	t-statistic	Estimate	SD	t-statistic	Estimate	SD	t-statistic
B_DIST	−0.00013	0.00000	−92.86505	−0.00013	0.00000	−87.92436	0.00005	0.00000	31.32098	−0.00012	0.00000	−54.90858
B_POP	−0.00052	0.00002	−27.69448	−0.00046	0.00002	−21.51371	−0.00079	0.00002	−32.12891	−0.00015	0.00003	−5.73683
B_TPD	0.00192	0.00002	101.73988	0.00186	0.00002	81.18517	0.00145	0.00002	60.04421	0.00128	0.00003	43.37709
B_PDENS	−0.00018	0.00000	−72.39896	−0.00018	0.00000	−64.69791	−0.00033	0.00000	−101.96581	−0.00016	0.00000	−47.44018
B_EMP	0.01199	0.00082	14.69555	0.01164	0.00090	12.88107	−0.03022	0.00145	−20.79715	0.00409	0.00117	3.51100
B_MGMT	na	na	na	na	na	na	0.00143	0.00013	10.63379	na	na	na
B_BFA	na	na	na	−0.00063	0.00016	−3.84728	−0.00054	0.00018	−3.01689	na	na	na
B_NAS	0.00380	0.00012	31.27627	0.00416	0.00015	28.65650	0.00599	0.00018	33.64031	0.00154	0.00020	7.78505
B_HLTH	−0.00103	0.00022	−4.63026	−0.00084	0.00024	−3.49030	na	na	na	−0.00417	0.00033	−12.75917
B_ELSCG	−0.00258	0.00016	−16.59753	−0.00269	0.00017	−16.00697	−0.00141	0.00016	−8.66114	−0.00082	0.00021	−4.02349
B_ACRS	−0.00195	0.00017	−11.28348	−0.00153	0.00019	−8.19437	na	na	na	0.00054	0.00023	2.31528
B_SALES	−0.00141	0.00011	−12.49068	−0.00132	0.00012	−11.03827	−0.00060	0.00012	−4.81167	−0.00101	0.00016	−6.28159
B_TRADE	−0.00483	0.00019	−26.08305	−0.00448	0.00020	−22.60898	−0.00257	0.00022	−11.71397	−0.00467	0.00024	−19.40658
B_NRA	0.00463	0.00105	4.38762	0.00379	0.00113	3.36400	0.00365	0.00115	3.16210	0.00652	0.00162	4.03712
B_MFG	0.00545	0.00028	19.78491	0.00484	0.00031	15.74534	0.00548	0.00030	18.26407	0.00516	0.00037	14.00625

Note: DC-FA = destination choice model considering all alternatives available; DC-IS = destination choice model with choice sets generated by importance sampling; DC-SRS = destination choice model with choice sets generated by a simple random sample method; DC-VAE = destination choice model with choice sets generated by variational autoencoder; SD = standard deviation. B_ACRS = Total number of occupations in art, culture, recreation, and sport; B_BFA = Total number of business, finance, and administration occupations; B_DIST = Distance between zones (m); B_ELSCG = Total number of occupations in education, law and social, community, and government services; B_EMP = Employment rate; B_HLTH = Total number of health occupations; B_MFG = Total number of occupations in manufacturing and utilities; B_MGMT = Total number of management occupations; B_NAS = Total number of natural and applied sciences and related occupations; B_NRA = Total number of natural resources, agriculture, and related production occupations; B_PDENS = Population density (per square kilometer); B_POP = Total zone population (2016 census); B_SALES = Total number of sales and service occupations; B_TPD = Total private dwellings; B_TRADE = Total number of trades, transport and equipment operators, and related occupations; na = not applicable.

In continuation, the negative coefficient for total zone population (B_POP) across all models suggests that zones with higher populations are less likely to be chosen. This effect is strongest in the DC-IS model and weakest in the VAE-based model, reflecting variations in how these models account for population effects. For total private dwellings (B_TPD), the positive coefficient demonstrates that zones with more dwellings are more likely to be chosen, with the strongest effect observed in the DC-FA and DC-SRS models, followed by DC-IS, while the VAE-based model shows a more modest effect. The negative coefficient for population density (B_PDENS) suggests that higher density zones are less attractive as destinations, with the effect being most pronounced in the DC-IS model and least in the VAE-based model. The coefficient for employment (B_EMP) is positive in most models, indicating that zones with higher employment are generally more attractive. However, the DC-IS model shows a negative coefficient, which could suggest complex interactions within the choice set generated through importance sampling. Employment-related attributes, such as B_BFA, B_NAS, B_HLTH, and others, exhibit varying effects across the models, highlighting differences in how each method interprets these attributes. For instance, the positive coefficient for B_NAS is strongest in the DC-IS model, whereas the negative coefficient for B_HLTH is most pronounced in the VAE-based model. These variations underscore the differing strengths and limitations of each model in capturing the impact of destination attributes on choice behavior.

By closely examining the estimated parameters in Table 6 alongside the performance metrics in Table 4, we see that the DC-VAE model produced the most intuitive and realistic parameters, as well as showing strong overall performance. This alignment suggests the DC-VAE method better captures genuine decision-making behavior. In contrast, simpler methods such as DC-FA and DC-SRS, despite yielding generally reasonable parameter estimates, struggled to achieve similarly high predictive accuracy and model fit. This emphasizes that both behavioral realism and predictive performance need to be jointly considered when comparing alternative approaches. The DC-IS method fell between these extremes: it improved predictive performance compared with simpler methods but still exhibited some behavioral inconsistencies in its parameter estimates.

According to Bernardin et al., the most frequently used validation method for destination choice models involves comparing trip length frequency distributions ( 44 ). The plot provided in Figure 7 compares the density distributions of travel distances for real data and three destination choice models. The DC-FA model captures the general shape of the distribution but deviates noticeably, overestimating the frequency of destinations at short distances (around 5–10 km) and underestimating those at longer distances. Its peak is sharper than the real data, indicating poor alignment, and, in some areas, it behaves inconsistently. This “weird” behavior may be because DC-FA considers all alternatives equally available, disregarding behavioral or contextual factors such as distance or attractiveness which often play a critical role in shaping destination choices. The DC-SRS model, while introducing stochasticity, shows significant discrepancies, particularly overestimating lower distances and underestimating higher ones. It also fails to capture the variability observed in mid-range distances (10–20 km), resulting in a flatter overall distribution. The DC-IS model performs better than both DC-FA and DC-SRS, with a distribution closer to the real data. However, it slightly underestimates mid-range distances (10–15 km) and exhibits deviations at longer distances (15–25 km). In contrast, the DC-VAE model aligns most closely with the real data, particularly in capturing the peak around 10 km and the gradual decrease for longer distances. Its shape and spread closely replicate the real data, making it the most accurate method for replicating the real distribution among the evaluated models. To quantify these observations, we employed KL divergence, a measure of the difference between two probability distributions ( 45 ). KL divergence was chosen as it captures subtle differences across the entire distribution by comparing normalized probability densities, offering a complementary perspective to the visual analysis.

Figure 7.

Comparison of trip length distributions.

To ensure a fair and robust comparison, kernel density estimation with a standardized bandwidth was applied to smooth all distributions consistently, and the analysis was restricted to the range of 0–40 km, which covers the most relevant distances in the dataset. The KL divergence results demonstrate that the DC-VAE model achieves the smallest divergence (0.0197), indicating the closest match to the real data. The DC-IS and DC-SRS models follow with divergences of 0.0694 and 0.0891, respectively, showing reasonable alignment but some discrepancies. Conversely, the DC-FA model has the largest divergence (4.8588), highlighting significant deviations from the real data. These findings confirm that the DC-VAE model not only visually aligns better with the real data but also statistically provides the most accurate representation of travel distance distributions.

Further illustration of the comparison of distance distributions between origin and destination across the four models and the real data using box plots (Figure 8) provides additional insights. The real data show a relatively wide spread with a median distance around 10 km and a broad interquartile range (IQR), indicating significant variability in travel distances. Several outliers at longer distances highlight the diversity of observed destination choices.

Figure 8.

Distance frequency distribution comparison across models and real data.

The DC-FA model underestimates the spread of the real data, with a narrower IQR and a lower median distance. It fails to effectively capture longer-distance choices, as shown by the smaller number of outliers. This misrepresentation may stem from its rigid assumption of equal availability for all alternatives, which disregards key behavioral factors. The DC-SRS model shows a spread closer to the real data than DC-FA but has a slightly lower median. It captures some longer-distance choices but does so less effectively than the real data, as evidenced by a smaller number of outliers and a flatter distribution overall. The DC-IS model demonstrates notable improvements over both DC-FA and DC-SRS. It has a wider IQR and a median that is closer to the real data. Additionally, it captures a reasonable number of outliers, representing longer-distance trips more accurately. However, DC-IS slightly overestimates choices at longer distances, which introduces some deviations from the real data. The DC-VAE model aligns most closely with the real data, effectively capturing the median, IQR, and outliers. It replicates the spread of travel distances observed in the real data and accurately represents longer-distance trips.

Overall, the box plot analysis highlights that DC-VAE is the best-performing method, as it most closely matches the observed variability and central tendency of the real data. DC-IS follows as the second-best method, with noticeable improvements over DC-SRS and DC-FA. This analysis reinforces conclusions from the density plot comparison, further confirming that the VAE-based model offers the most effective approach for capturing realistic destination choice behavior.

Conclusions

This study has explored the potential of using VAEs to enhance choice set generation in destination choice models, focusing on home-based work trips in Montreal, Canada.

The VAE method integrates variational Bayesian techniques with autoencoders to estimate the probability distribution for the underlying choice set generation process, aiming to maximize the likelihood of including the chosen alternatives in the choice set. Initially, the VAE maps alternatives to a lower-dimensional latent space using an encoder model, then generates new alternatives with a decoder model based on these latent representations. The choice set is implicitly created by sampling from this latent space and feeding the samples into the decoder model. Additionally, the VAE model can produce the implicit perception measure when generating new alternatives.

The findings from this study revealed the VAE model’s capability to effectively capture the underlying distribution of chosen alternatives, producing plausible and diverse alternatives. This capability was demonstrated through density plots and CDFs, which exhibited a high degree of similarity between the chosen and generated alternatives. Such visual and quantitative alignment suggests that VAE can mimic the decision-making process of individuals, thereby generating realistic and varied choice sets that enhance the robustness of the destination choice model. This alignment underscores VAE’s potential to address the biases and limitations associated with traditional deterministic and stochastic methods which often fail to capture the full spectrum of potential alternatives.

Moreover, the application of the VAE method to generate choice sets for destination choice models was validated using real-world data and several model structures: DC-FA, DC-SRS, DC-IS, and DC-VAE. The results indicated that models utilizing VAE-generated choice sets (DC-VAE) outperformed those with traditional choice sets with regard to model fit and predictive accuracy. Specifically, the model’s $ρ^{2}$ and validation results consistently demonstrated superior performance compared with other models. These findings suggest that the VAE method not only improves the accuracy of choice set generation but also enhances the predictive power and reliability of destination choice models. This advancement is particularly significant, as it demonstrates the practical applicability and effectiveness of VAEs in a real-world context, potentially setting a new standard for choice set generation in destination choice modeling.

While this study focused on destination choice modeling for home-based work trips, future research should extend the VAE-based approach to other trip types and different urban areas. This would involve comparing the prediction accuracy of discrete choice models with choice sets generated by VAEs across various contexts and trip purposes. Additionally, using actual job data instead of resident occupation data for census tracts may provide more explanatory power in destination models, making it a valuable direction for future research. Future studies should also examine the performance of destination choice models with various choice set sizes generated by different methods. This would help identify the optimal choice set size and method that balances model complexity with predictive accuracy. Furthermore, integrating VAEs with other DL techniques developed for choice modeling (e.g., Wang et al., Wong and Farooq, and Kamal and Farooq) could enhance model performance and predictive accuracy ( 46 – 48 ).

Investigating whether the training process of the VAE and the ML-based choice modeling can be done simultaneously presents an exciting opportunity. This simultaneous training could streamline the modeling process and potentially lead to even more-accurate and efficient models, as the generative and predictive components would be optimized together. Another future direction could be to adopt other methods for choice set generation in route choice modeling, such as Metropolis-Hastings-based sampling and path size logit-based sampling, for choice set generation in destination choice modeling ( 49 , 50 ). Comparing these methods with the VAE approach would help understand their relative strengths and weaknesses in this new context. It is also important to note that the methods explored in this study rely on sampling mechanisms, meaning their outputs may vary across replications. In this study, because of the computational burden of estimating multiple discrete choice models for each method, we conducted a single full replication per approach. While this setup was sufficient to highlight clear differences in predictive performance and behavioral realism across methods, future work should explore multiple replications to better assess the variability and robustness of results.

Although the VAE-based approach shows promise, it also has potential limitations. The quality of generated choice sets depends heavily on the quality and quantity of the training data. Additionally, while the computational complexity of VAEs can be significant during training, they benefit from modern hardware acceleration (e.g., GPUs) and efficient optimization algorithms, making them scalable for large datasets. Compared with stochastic methods, VAEs offer a unified framework for choice set generation that avoids the need for iterative sampling during inference. Future work should explore techniques for further optimizing VAE training and evaluating the impact of different data preprocessing methods on choice set generation.

Footnotes

Acknowledgements

The authors would like to thank Autorité régionale de transport métropolitain (ARTM) for providing the origin-destination survey data used in this study.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: H. Haghi, Z. Patterson, B. Farooq; data collection: xxx; analysis and interpretation of results: H. Haghi, Z. Patterson, B. Farooq; draft manuscript preparation: H. Haghi, Z. Patterson, B. Farooq. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Canada First Research Excellence Fund and the Bridging Divides program funded under it.

ORCID iDs

Houman Haghi

Zachary Patterson

Bilal Farooq

References

Ben-Akiva

M. E.

Lerman

S. R.

Discrete Choice Analysis: Theory and Application to Travel Demand, Vol. 9. MIT Press, Cambridge, MA, 1985.

Ben-Akiva

Boccara

Discrete Choice Models with Latent Choice Sets. International Journal of Research in Marketing, Vol. 12, No. 1, 1995, pp. 9–24.

Swait

Ben-Akiva

Empirical Test of a Constrained Choice Discrete Model: Mode Choice in Sao Paulo, Brazil. Transportation Research Part B: Methodological, Vol. 21, No. 2, 1987, pp. 103–115.

de Dios Ortúzar

Willumsen

L. G.

Modelling Transport, 5th ed. John Wiley and Sons, Hoboken, NJ, 2024.

Bierlaire

Bolduc

McFadden

The Estimation of Generalized Extreme Value Models from Choice-Based Samples. Transportation Research Part B: Methodological, Vol. 42, No. 4, 2008, pp. 381–394.

Frejinger

Bierlaire

Ben-Akiva

Sampling of Alternatives for Route Choice Modeling. Transportation Research Part B: Methodological, Vol. 43, No. 10, 2009, pp. 984–994.

Cascetta

Russo

Viola

F. A.

Vitetta

A Model of Route Perception in Urban Road Networks. Transportation Research Part B: Methodological, Vol. 36, No. 7, 2002, pp. 577–592.

Yao

Bekhor

Data-Driven Choice Set Generation and Estimation of Route Choice Models. Transportation Research Part C: Emerging Technologies, Vol. 121, 2020, p. 102832.

Goodfellow

Pouget-Abadie

Mirza

Warde-Farley

Ozair

Courville

Bengio

Generative Adversarial Nets. Advances in Neural Information Processing Systems, Vol. 27, 2014, pp. 2672–2680.

10.

Kingma

D. P.

Welling

Auto-Encoding Variational Bayes. arXiv Preprint arXiv:1312.6114, 2013.

11.

Arjovsky

Chintala

Bottou

Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Proceedings of Machine Learning Research, Vol. 70, 2017, pp. 214–223.

12.

Yao

Bekhor

A Variational Autoencoder Approach for Choice Set Generation and Implicit Perception of Alternatives in Choice Modeling. Transportation Research Part B: Methodological, Vol. 158, 2022, pp. 273–294.

13.

Thill

J.-C.

Choice Set Formation for Destination Choice Modelling. Progress in Human Geography, Vol. 16, No. 3, 1992, pp. 361–382.

14.

Scott

D. M.

S. Y.

Modeling Constrained Destination Choice for Shopping: A GIS-Based, Time-Geographic Approach. Journal of Transport Geography, Vol. 23, 2012, pp. 60–71.

15.

Ben-Akiva

Lerman

S. R.

Disaggregate Travel and Mobility-Choice Models and Measures of Accessibility. In Behavioural Travel Modelling ( Hensher

D. A.

Stopher

P. R.

eds.), Routledge, London, 2021, pp. 654–679.

16.

Bhat

Govindarajan

Pulugurta

Disaggregate Attraction-End Choice Modeling: Formulation and Empirical Analysis. Transportation Research Record: Journal of the Transportation Research Board, 1998. 1645: 60–68.

17.

Bowman

J. L.

Ben-Akiva

M. E.

Activity-Based Disaggregate Travel Demand Model System with Activity Schedules. Transportation Research Part A: Policy and Practice, Vol. 35, No. 1, 2001, pp. 1–28.

18.

Sivakumar

Bhat

C. R.

Comprehensive, Unified Framework for Analyzing Spatial Location Choice. Transportation Research Record: Journal of the Transportation Research Board, 2007. 2003: 103–111.

19.

Miller

H. J.

Bridwell

S. A.

A Field-Based Theory for Time Geography. Annals of the Association of American Geographers, Vol. 99, No. 1, 2009, pp. 49–75.

20.

Pel

A. J.

Bliemer

M. C.

Hoogendoorn

S. P.

A Review on Travel Behaviour Modelling in Dynamic Traffic Simulation Models for Evacuations. Transportation, Vol. 39, 2012, pp. 97–123.

21.

Molloy

Development of a Destination Choice Model for Ontario. Master’s thesis. Technical University of Munich, Germany, 2016, p. 74.

22.

Tsoleridis

Choudhury

C. F.

Hess

Probabilistic Choice Set Formation Incorporating Activity Spaces into the Context of Mode and Destination Choice Modelling. Journal of Transport Geography, Vol. 108, 2023, p. 103567.

23.

Elgar

Farooq

Miller

E. J.

Modeling Location Decisions of Office Firms: Introducing Anchor Points and Constructing Choice Sets in the Model System. Transportation Research Record: Journal of the Transportation Research Board, 2009. 2133: 56–63.

24.

Jonnalagadda

Freedman

Davidson

W. A.

Hunt

J. D.

Development of Microsimulation Activity-Based Model for San Francisco: Destination and Mode Choice Models. Transportation Research Record: Journal of the Transportation Research Board, 2001. 1777: 25–35.

25.

Pozsgay

M. A.

Bhat

C. R.

Destination Choice Modeling for Home-Based Recreational Trips: Analysis and Implications for Land Use, Transportation, and Air Quality Planning. Transportation Research Record: Journal of the Transportation Research Board, 2001. 1777: 47–54.

26.

Kim

Choi

C. G.

Cho

Kim

A Comparative Study of Aggregate and Disaggregate Gravity Models Using Seoul Metropolitan Subway Trip Data. Transportation Planning and Technology, Vol. 32, No. 1, 2009, pp. 59–70.

27.

Mishra

Wang

Zhu

Moeckel

Mahapatra

Comparison between Gravity and Destination Choice Models for Trip Distribution in Maryland. Transportation Research Board 92nd Annual Meeting, Washington, D.C., 2013.

28.

Hammadou

Thomas

Verhetsel

Witlox

How to Incorporate the Spatial Dimension in Destination Choice Models: The Case of Antwerp. Transportation Planning and Technology, Vol. 31, No. 2, 2008, pp. 153–181.

29.

M.-T.

Chow

L.-F.

Zhao

S.-C.

Geographically Stratified Importance Sampling for the Calibration of Aggregated Destination Choice Models for Trip Distribution. Transportation Research Record: Journal of the Transportation Research Board, 2005. 1935: 85–92.

30.

Shiftan

Practical Approach to Model Trip Chaining. Transportation Research Record: Journal of the Transportation Research Board, 1998. 1645: 17–23.

31.

Tran

Phung

Venkatesh

Choice by Elimination via Deep Neural Networks. arXiv Preprint arXiv:1602.05285, 2016.

32.

Liu

Gao

Song

Zhang

Enhancing Choice-Set Generation and Route Choice Modeling with Data-and Knowledge-Driven Approach. Transportation Research Part C: Emerging Technologies, Vol. 162, 2024, p. 104618.

33.

Chen

R. T. Q.

Grosse

R. B.

Duvenaud

D. K.

Isolating Sources of Disentanglement in Variational Autoencoders. Advances in Neural Information Processing Systems, Vol. 31, 2018, pp. 2610–2620.

34.

Rezende

D. J.

Mohamed

Wierstra

Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; Proceedings of Machine Learning Research, Vol. 32, 2014, pp. 1278–1286.

35.

Burda

Grosse

Salakhutdinov

Importance Weighted Autoencoders. arXiv Preprint arXiv:1509.00519, 2015.

36.

Autorité régionale de transport métropolitain. Enquête Origine-Destination 2018. 2018. https://www.artm.quebec/planification/enqueteod/.

37.

Zhang

Ren

Sun

Delving Deep into Rectifiers: Surpassing Human-Level Performance on Imagenet Classification. Proc., IEEE International Conference on Computer Vision, Santiago, Chile, IEEE, New York, 2015, pp. 1026–1034.

38.

Goldberger

Hinton

G. E.

Roweis

Salakhutdinov

R. R.

Neighbourhood Components Analysis. Advances in Neural Information Processing Systems, Vol. 17, 2004, pp. 513–520.

39.

Swain

M. J.

Ballard

D. H.

Color Indexing. International Journal of Computer Vision, Vol. 7, No. 1, 1991, pp. 11–32.

40.

Lin

Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information Theory, Vol. 37, No. 1, 1991, pp. 145–151.

41.

McFadden

Conditional Logit Analysis of Qualitative Choice Behavior. In Frontiers in Econometrics ( Zarembka

, ed.), Academic Press, New York, NY, pp. 105–142.

42.

Train

K. E.

Discrete Choice Methods with Simulation. Cambridge University Press, Cambridge, 2009.

43.

Hess

Palma

Apollo: A Flexible, Powerful and Customisable Freeware Package for Choice Model Estimation and Application. Journal of Choice Modelling, Vol. 32, 2019, p. 100170.

44.

Bernardin

Daniels

Chen

How-To: Model Destination Choice. Technical Report, Report No. FHWA-HEP-18-080. United States. Federal Highway Administration, U.S. Department of Transportation, Washington, D.C., 2018. Corporate Contributor: RSG, Inc. https://rosap.ntl.bts.gov/view/dot/55803.

45.

Kullback

Leibler

R. A.

On Information and Sufficiency. The Annals of Mathematical Statistics, Vol. 22, No. 1, 1951, pp. 79–86.

46.

Wang

Zhao

Deep Neural Networks for Choice Analysis: Extracting Complete Economic Information for Interpretation. Transportation Research Part C: Emerging Technologies, Vol. 118, 2020, p. 102701.

47.

Wong

Farooq

ResLogit: A Residual Neural Network Logit Model for Data-Driven Choice Modelling. Transportation Research Part C: Emerging Technologies, Vol. 126, 2021, p. 103050.

48.

Kamal

Farooq

Ordinal-ResLogit: Interpretable Deep Residual Neural Networks for Ordered Choices. Journal of Choice Modelling, Vol. 50, 2024, p. 100454.

49.

Flötteröd

Bierlaire

Metropolis–Hastings Sampling of Paths. Transportation Research Part B: Methodological, Vol. 48, 2013, pp. 53–66.

50.

Sobhani

Aliabadi

H. A.

Farooq

Metropolis-Hasting Based Expanded Path Size Logit Model for Cyclists’ Route Choice Using GPS Data. International Journal of Transportation Science and Technology, Vol. 8, No. 2, 2019, pp. 161–175.

Choice Set Generation in Work Destination Choice Modeling with Variational Autoencoders

Abstract

Keywords

Introduction

Literature Review

Deterministic Methods

Stochastic Methods

Recent Advances in Choice Set Generation

Methodology

Choice Set Generation Using VAE

Implicit Perception of Alternatives

Alternative Generation Steps

Implicit Perception Estimation Steps

Discrete Choice Model for Destination Prediction

Study Area and Data

Results and Discussion

Alternative Generation Using VAE

Model Estimates

Conclusions

Footnotes

Acknowledgements

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References