Abstract
This paper considers which work-related trip patterns are included in household travel surveys and which in commercial travel surveys and if there are certain patterns that are distinctly underrepresented in either one. The study is structured as a comparison between data from a household travel survey and data from a commercial travel survey. Both surveys were conducted in Germany and within close temporal proximity. We applied cluster analysis to identify differences in the data and identify work-related travel patterns. The results show that work-related travel patterns are quite complex. Although some patterns are covered in both surveys, mobile workers’ travel patterns in particular are not represented well in the household travel survey. Furthermore, our analysis shows that not all commercial trips are generated by motorized vehicles and a considerable share of work-related trips are undertaken using public transport or active modes of transport that are not covered by the commercial travel survey. The results indicate that researchers and transport planners creating travel demand models need to pay more attention to work-related travel behavior and acknowledge that depending on the area of study, traditional household travel surveys may not provide a complete sample of the population; however, simply adding data on commercial trips from commercial travel demand models to data from household travel surveys does not provide a complete picture of work-related travel either.
Keywords
To this day, travel behavior analyses and travel demand models still rely on data from household travel surveys (HTSs). Although information and communications technology and especially global navigation satellite system technology have simplified the survey process in some cases, many traditional and nationwide travel surveys still rely on manual input. In these cases, the issue of underreporting trips is more of a problem because there are no mechanisms to validate trip characteristics such as number, start times, and distances. Previous research shows that work-related trips have been affected by this underreporting for a long time. In 1990, Brög and Winter (
This paper investigates which work-related trip patterns are included in HTSs and commercial travel surveys (CTSs) and if there are certain patterns that are distinctly underrepresented in either one. In this study, work-related trip patterns are considered on an individual level and cover all trips that were undertaken in the course of the respondents’ work. In this case, commuting trips were not regarded as work-related trips. This study is structured as a comparison between data from a traditional HTS and that from a CTS. Both surveys were conducted in Germany and within close temporal proximity. We applied cluster analysis to increase our understanding of differences in data and identify work-related travel patterns. Recognizing that results of cluster analyses are often ambiguous, this study aims to provide general indications of the coverage of the different surveys in relation to work-related trip patterns, and identify gaps and redundancy in data.
Identifying which travel patterns might be missing in surveys can be difficult, because in most cases, work-related variables—be they travel purpose or workplace information—are only considered in scant detail. However, commercial travel is just as complex as its private counterpart. For example, tradespeople tend to make trips with several different purposes: service to a customer; transportation of material to a construction site; shopping trip to purchase material. Although these different purposes and professions entail different behavioral travel patterns, traditional HTSs only account for these trips using a single trip purpose (work) and very broad categorizations of work status (full-time, part-time, unemployed) (see, for example [
Previous studies have identified several different influencing factors with regard to work-related travel. For example, Mohino et al. (
In addition to the influence of occupation on work-related travel, previous studies have also identified the relationship between company characteristics and work-related trip generation. Steinmeyer and Wagner (
To increase insights into work-related travel patterns, there have been several efforts to gather information on commercial transport that have focused on light commercial vehicles. Hunt et al. (
In the following section, we explain the data used for the analysis, including its preparation and descriptive analysis. We continue by describing the multivariate analysis method used (cluster analysis). The results section of the paper contains the outcome of our analyses, which we discuss in the subsequent section. The conclusion of this paper addresses the main outcomes of our work and its implications.
Materials and Methods
This study relies on two main sources of data: HTS and CTS. We describe these data and their processing below.
Data
To capture commercial travel patterns, the German Federal Ministry of Transport and Digital Infrastructure commissioned the nationwide vehicle-based travel survey Motorized Transport in Germany (Kraftfahrzeugverkehr in Deutschland [KiD]). Recognizing that there are existing statistics and data sources in relation to freight traffic created by larger vehicles, KiD focused on light vehicle commercial travel. The data is divided into four different data sets: vehicle data, trip data, trip chain data, and geospatial data. KiD was carried out in 2002 and 2010. The sample of KiD 2010, which we used as a database for our analyses of commercial trips, includes data on 70,249 vehicles, and on the survey day, 177,377 trips were undertaken with these vehicles (
As a proxy for traditional HTSs we used data from Mobility in Germany (Mobilität in Deutschland [MiD]). MiD is a nationwide HTS commissioned by the German Federal Ministry of Transport and Digital Infrastructure with the purpose of capturing households’ daily travel behavior. Respondents were asked to report generic household information and their travel behavior using a travel diary for one day. MiD was conducted in 2002, 2008, and 2017. Although choosing the most recent data is generally sensible, we have opted to use MiD 2008 for two reasons: it is temporally closer to KiD 2010, which allows for a more stable comparison; and MiD 2008 contains a section on regular work-related trips. MiD 2008 is comprised of four different data sets: car data, household data, person data, and trip data. For this study, we used the base sample, which includes information on 25,922 households, 60,713 persons, 193,290 trips, and 34,601 cars (
Data Preparation
Although the HTS and CTS data sets share characteristics, they are not comparable without adjustments. For our analysis, we first had to choose variables and then prepare and merge the data accordingly. Because our goal is to identify travel patterns, we used trip-related data. With regard to the CTS, both the trip and the trip chain data sets include trip-related information. Although the trip chain data set includes fewer variables, they are already grouped by vehicle and summarize all relevant characteristics of the trip chains: cumulative travel time, cumulative activity duration, time of first work-related trip, cumulative trip distance, and number of stops. The data already come in a prepared format, but there are a lot of missing values (see Table 1), the variable with the most missing values being activity duration. Although we recognize that this variable might have high explanatory value, we chose to exclude it from subsequent analyses because we would have lost too many observations. Imputation was not suitable either, because we would have had to use the same variables for imputation and cluster analysis. This would have resulted in biased results. Rather, we imputed activity duration and analyzed this variable after conducting the cluster analysis. Although cumulative travel time does not present as many missing values as activity duration, we opted not to use this variable either because of its correlation with the variable cumulative trip distance, because the cluster analysis would again have been biased by using correlated variables. Subsequently, we removed entries that were missing time of first trip and cumulative trip distance, resulting in 27,306 observations.
Summary Statistics of CTS and HTS
According to the chosen variables in the CTS, we selected the corresponding variables in the HTS. For this step, we selected only work-related trips and determined the start of the first trip. We then grouped the data by respondent and summarized distances traveled and number of stops. This resulted in 1,222 observations. After this step, we were able to merge the data from both data sets into one large data set with 28,528 observations, and this is the data set used in the following analysis.
Descriptive Analysis
To gain a better overview of the data sets and to identify differences in variables, we first present a descriptive analysis of the individual variables of the two data sets. The data for trip-related variables are presented in Table 1. For variables with missing values, the statistics are based on the available values. Therefore, they do not necessarily represent the sample.
The analysis shows that the first trip of the day in the CTS tends to start earlier than the first trip in the HTS. The median start time of the first trip in the CTS is 7 a.m. whereas the first trip of the day in the HTS has a median of two hours later. Whereas the median cumulative trip duration (in minutes) of CTS trips is quite close to those in the HTS, the CTS includes temporarily longer trip chains. The same holds true for the variable cumulative trip distance (in km). The CTS includes longer trips and the median cumulative trip distance is also 6.5 km longer. The reported median number of trips per respondent is the same in both surveys, whereas the mean is twice as high in the CTS. Furthermore, the CTS includes trip chains with up to 343 trips, which is over 30 times higher than the number of trips included in the HTS. Although the CTS includes shorter activities, compared with HTS activities durations are longer.
Cluster Analysis
To identify travel patterns with regard to work-related trips and analyze differences between traditional HTS data and designated CTS data, we used cluster analysis.
Cluster analysis is a method of pattern recognition to find groups in a population comprised of individuals. It works by increasing the similarity within a group and the dissimilarity to other groups (
There are many different clustering methods and the choice is highly dependent on the objective of the study and the available data. Cluster analysis methods can broadly be split into three groups: probabilistic and generative models; distance-based; and density-based (
In this study, the objective was to obtain stable clusters with a small within-cluster variance. A suitable criterion for this objective is Ward’s method. We conducted the cluster analysis in R using the stats package (

Elbow plot of Ward cluster analysis.
The graph shows that there are several possible solutions: a 5-cluster and a 8-cluster solution is sensible. To choose the final cluster model, we determined validation measures for both solutions. Generally, external and internal validation measures are differentiated. In this study, no external information is available; thus, only internal validation measures are used. Although there are several different measures available, there is no guidance on which is best (
Internal Validation of Possible Cluster Solutions
Analysis of the validation measures shows that most measures support the 5-cluster solution. Of the eight considered measures, only two support the 8-cluster solution. This ambiguity could be explained by the Dunn index, for example, not being suitable for data with sub-clusters (
Results and Discussion
In this section we present the results of the cluster analysis. We first compare the characteristics of the different clusters with each other and then analyze which information is represented in the HTS and CTS. At the end of this section we consider the implications of our findings and provide some limitations of the study.
Cluster Characteristics
The characteristics of each cluster in the 5-cluster solution are presented in Table 3. Although all groups contain observations from the CTS, cluster 5 does not contain any observations from the HTS. Cluster 4 also includes relatively few observations from the HTS, already indicating that although the two surveys cover similar patterns, some are not present or included only to a small degree in the HTS. The largest cluster is cluster 3, with a little over half of observations falling into this group. It contains observations from both the CTS and HTS.
Cluster Characteristics
The characteristics of the cluster variables in the different clusters are presented in Figure 2.
With regard to cumulative trip distance, cluster 2 and cluster 4 show much higher distances than other clusters. Observations in cluster 2 result in a mean distance of 250 km and those in cluster 4 have a mean distance of 470 km. All other clusters present mean cumulative distances between 40 km and 115 km. Cluster 5 is the most distinct group considering total number of trips. Whereas all other clusters show an average number of trips below 10, observations in cluster 5 result in an average number of just over 100 trips. Considering the start of the first trip, drivers in clusters 1 and 2 do not undertake any trips at night and those in cluster 1 start their first trip comparatively late. Whereas the mean of the start time of drivers’ first trips in cluster 5 is similar to those in cluster 2, the variance is higher, especially toward the early hours of the day. Clusters 3 and 4 exceed this variance and cluster 4 especially includes patterns in which the first trip of the day is undertaken rather early.

Boxplots of start time, trip distance, and number of trips.
With regard to the difference between the data sources, most clusters show similar values from both surveys and show that the different ratio of observations is generally unproblematic. The only significant deviation is the start times in cluster 3. In this cluster, the start times in the HTS vary much more across the day. Thus, this cluster is heavily influenced by the many observations from the CTS.
Subsequently, we conducted a bivariate analysis of the cluster variables, and the bivariate plots of the cluster variables are presented in Figure 3. They show that patterns including cluster 1 are those with the least variation of all variables. The start times are limited to the regular hours of a working day and neither the number of trips nor the cumulative trip distance in this cluster are particularly high, which is in accordance with the rather late start times. Because these drivers do not undertake many or long trips, they do not need to start their trips early in the day. This is opposed to cluster 2, in which start times vary much more over the course of a day with a higher cumulative trip distance. With regard to the relationship between trip distance and number of trips, drivers from cluster 2 undertake relatively few but long trips. This also holds true for cluster 4, in which this characteristic is more prevalent. Drivers in cluster 4 do not show a particular pattern with regard to the start time of the first trip. However, with regard to trip distance and number of trips, we can see that the long distances are achieved with relatively few trips. Cluster 3 is comparable with cluster 1, but with higher variable variance in all dimensions. The start times of the first trip are spread over the course of the day. Particularly for trip patterns with high cumulative trip distances and more trips, the trips start earlier in the day. Cluster 5 includes trip patterns with many trips throughout the day. This is achieved by starting relatively early in the day and by undertaking shorter trips.

Bivariate plots colored by cluster.
Our results show that both data sets and, therefore, both survey methods capture different travel patterns. Based on the cluster characteristics, we have identified four groups with distinct work-related travel patterns: average mobile workers (cluster 3); late starters (cluster 1); long-distance travelers (clusters 2 and 4); and highly active workers (cluster 5). The groups that include two clusters all have one cluster with moderate characteristics and one with more extreme characteristics.
The average mobile workers are well represented in both surveys. They tend to start their day early in the morning, suggesting that they start their first trip from home and not from their business location. The mean cumulative trip distance of this cluster suggests that the workers stay within their own area of operation. The same is true for the late starters; however, the start hour of the first trip suggests that they start their working day at the office or location of business. Cluster 2 especially shows late start times, and although cumulative trip distances are not distinctly low, on average, workers in this cluster only undertake 2.33 trips. The long-distance clusters each have very high average cumulative trip distances, and cluster 5 does not contain any observation under 78 km. Cluster 4 has a maximum cumulative trip distance of 2,500 km. To manage such long distances, the travelers start their trips early in the day. Although they make on average more trips a day than the late starters, with an average number of trips per day of 5.17 and 4.84 respectively, these travelers do not make many trips per se. Opposed to these are the highly active workers who undertake many trips. Not only is the mean number of trips high, the minimum numbers are also considerably higher in this cluster: workers in cluster 5 make a minimum of 42 trips and as many as 462 trips a day. Although observations in this cluster show higher cumulative trip distances compared with late starters, the people concerned cannot be regarded as long-distance drivers, especially considering the high number of trips, which suggests that each trip is relatively short considering distance.
After analyzing the characteristics of the trip variables of the clusters, we further analyzed the following: activity duration of the respective trips; the sociodemographic characteristics of the workers; the characteristics of the vehicles used to undertake the trips; industry sector; and trip purposes. Because this information has missing values, we were not able to include it in the cluster analysis. However, it still provides further insights into the trip patterns represented in the different survey types.
Activity Duration
As mentioned before, we did not include activity duration in the cluster variables; however, it is still a valuable characteristic concerning trip patterns. Therefore, we opted to impute missing values and to analyze the activity duration by cluster. We tested different imputation methods and chose the one that yielded the best results when comparing observed and imputed data. We applied multivariate imputation using the R package

Activity duration by cluster.
The plot shows that cluster 1 includes patterns with relatively short activities. The same holds true for clusters 2 and 4, but with more variance toward longer activities. Cluster 3 has a peak of activities that last around 500 min, which is in accordance with the regular period of a working day. Somewhat ambiguous are the results concerning cluster 5: activities of a shorter duration are expected, because this cluster includes trip patterns with many trips and, therefore, many activities of short duration. However, there are a significant number of trips with subsequent activities of a very long duration that cannot be explained intuitively. One explanation could be that these observations are recorded when overnight activities are taking place.
Sociodemographic Characteristics of Workers
Figure 5 shows the relative distribution of the workers’ age in the top plot and their gender in the bottom plot. The age distribution is somewhat similar in both surveys and across clusters, indicating that both surveys represent the age of the working population with a slight shift toward older workers in the HTS.

Scoiodeomographic characteristics of drivers by cluster and data source: (
Considering gender, we can identify a difference both between the surveys and across clusters. Looking at the two surveys, there are fewer female workers represented in the CTS compared with the HTS. This indicates that the CTS is less likely to include jobs that are more usually undertaken by women. Furthermore, there are also differences across the clusters. Female workers are more likely to be included in clusters 1 and 3 and less likely to be included in clusters 2 and 4. The latter include trip patterns that start rather early in the day. Such activities tend to be less compatible with child care obligations, which are still predominantly the woman’s responsibility in Germany.
Vehicle Characteristics
Figure 6 shows the characteristics of the vehicles. The top left plot shows vehicle age in the two surveys and across clusters, whereas the top right plot shows vehicle mileage. We can see that the vehicles reported in the CTS are in general newer than those reported in the HTS. This shows that the HTS captures primarily privately owned vehicles, whereas the CTS captures commercially owned vehicles because, in general, these have a shorter life cycle (i.e., for economic reasons). Nevertheless, differences between clusters are observable as well. Clusters 2 and 4 in the HTS and CTS include more vehicles less than 5 years old than the other clusters. These results are especially striking considering the mileage of these vehicles: over 60% of vehicles in the CTS have done over 100,000 km. This is in line with the definition of these clusters as representing long-distance travelers. The more a vehicle is used the shorter its life.

Vehicle characteristics: Vehicle age (top left), mileage (top right), vehicle type (bottom).
The comparison of vehicle types between the surveys proves to be more difficult because the CTS is a survey solely based on motorized vehicles, whereas the HTS covers all modes of transport. Nevertheless, we can still see some interesting effects. Workers in the CTS utilize trucks much more often than in the HTS, in which the car is the strictly dominant mode of transport. Furthermore, we can see that there is a general discrepancy between the surveys with regard to clusters 2 and 4: the large share of heavy trucks in the CTS indicates that these trips are undertaken for transportation purposes, but in the HTS a large share of the trips are undertaken by public transport, indicating that different trip and tour purposes are present with the same patterns. Additionally, because some truck trips are reported in the HTS but none at all in cluster 5, this shows that not only is there a systematic difference between the surveys but also that the response burden of reporting these trip patterns is too high to be included in the HTS. The evaluation of modes of transport reported in the HTS also indicates that a considerable proportion of work-related trips are undertaken using active modes of transport, indicating that work-related trips are not always undertaken with motorized vehicles.
Industry Sectors and Trip Purposes
Figure 7 shows the relative distribution of the vehicles and trips according to economic sector, industry sector, and purpose by cluster. The industry sectors are classified according to the Statistical Classification of Economic Activities in the European Community (

Relative distribution of vehicles and trips according to economic sector by data source (top left), industry sector (top right) and purpose (bottom) by cluster.
Considering the economic sector, we can see relevant differences across clusters and also across the two surveys. The share of vehicles belonging to the primary sector is relatively low as expected, because this sector includes industries that deal with raw products and are mostly fixed to single locations. This sector is not included in the HTS at all. The secondary sector includes manufacturing businesses. Although these industries are included in the HTS to some extent, the share is still relatively low. The most prevalent observations are attributed to the tertiary sector, also known as the service sector.
Clusters 1 and 2 show very similar distributions of both industry sector and trip purpose. This mostly holds true for the other categories as well, indicating that because the most striking difference is between the start time of the first trips, these clusters include workers from similar occupations who work during different parts of the day.
In cluster 3, the construction sector has the largest share. This is also supported by the distribution considering trip purpose in which the provision of a service makes up about 50%. The results indicate that this cluster mostly includes trip patterns of tradespeople who start their first trip somewhat early to go to the construction site. This is also consistent with the findings that this cluster includes fewer females and that the HTS presents with a large share of motorized vehicles and even some trucks. However, because a considerable share of trips are made using public transport, which is very unsuitable for tradespeople, we can assume there are also trip patterns that look like those of tradespeople but probably belong to workers with different occupations.
The industry sector most prevalent in clusters 4 and 5 is transportation. For these clusters, transportation of goods is the largest share with regard to trip purposes. These clusters are also those that include the fewest observations from the HTS, indicating that freight trips are generally not well represented here. Combining these findings with those pertaining to vehicle characteristics, the results indicate that clusters 4 and 5 include a lot of urban parcel deliveries; these would involve high mileage, but the trips would be undertaken using light trucks.
Implications and Limitations of the Study
There are several overarching implications of our study for survey designers, transport planners, and policymakers. Our results show that work-related travel patterns are quite complex and diverse. Although there are some patterns that are well represented in both the HTS and CTS, trips undertaken by highly mobile workers in particular are not present in the HTS. These results are consistent with findings in previous studies on underrepresentation of trips in HTSs (
Our analyses further show that the travel patterns are formed according to different jobs and with different vehicles. Although this and other establishment-based (
Although we found that some travel patterns are not represented in the HTS at all, our findings show there is a considerable overlap between the HTS and the CTS. This indicates that the surveys do not fully complement each other but that they gather redundant data. When used for forecasting and travel demand modeling, this may lead to biased results if not handled correctly because superposition of trips from both surveys may not provide the true work-related travel demand. This also needs to be considered in the evaluation of policy measures. Car-free policies often only consider private cars and specifically exclude commercial vehicles (
There are also some limitations of this study worth noting. Hilsop (
The work presented here is specific to Germany, because data sources allowed for a comprehensive analysis. Although we may expect similar effects in other countries, further research is needed to confirm this assumption. Furthermore, the data sources are both relatively old and the findings need to be validated against newer data. This is especially important considering that work-related transport is strongly correlated to commercial activity and the economic situation, both of which can be very dynamic. However, although HTSs are conducted regularly this is not the case for CTSs.
This study is intended as an exploratory analysis of work-related trip patterns and should be regarded as the first step toward improving knowledge about commercial transport and required data sources. Although the variables used are generally consistent across the two surveys in the cluster analysis, we recognize that in the future, the data used in this study should be supplemented by other sources to improve the ratio between CTS and HTS observations.
Future work will include analysis based on other data sources such as sensor data in urban areas and GPS tracking from vehicles to assess whether these passive data sources could supplement HTSs, which are known to have a high response burden. Furthermore, it should be examined whether and how different economic situations influence work-related mobility and whether this can be seen in the surveys.
Conclusion
This study examines work-related trip patterns in traditional HTSs and CTSs. It is intended to increase insights into work-related travel patterns and their representation in the respective surveys.
The results shows that work-related travel patterns are quite complex. Although some patterns are covered in both the HTS and CTS, the travel patterns of mobile workers from the transportation sector in particular are not well represented in the HTS. Contrary to this is the problem that the CTS usually only surveys trips undertaken by motorized vehicles. However, our analysis shows that a considerable share of work-related trips are undertaken using public transport or active modes of transport. This is an important factor when assessing policy measures. Our results indicate that some work-related trip patterns involve nonmotorized transport, and policies focusing on commercial transport should not only target vehicle types but also modal shifts.
The results indicate that researchers and transport planners creating travel demand models need to pay more attention to work-related travel behavior and acknowledge that depending on the area of study, although traditional HTSs may not provide a complete sample of the population, simply adding data on commercial trips from commercial travel demand models to data from HTSs does not provide a complete picture of work-related travel either. The design of separate surveys in itself is not problematic, but they should include variables that can be used to identify which work-related trip patterns are included in which survey and whether supplemental data sources are needed.
Footnotes
Acknowledgements
We would like to thank the five anonymous referees who reviewed this paper for providing helpful suggestions and comments to improve the manuscript.
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: A. Reiffer, L. Barthelmes, M. Kagerbauer, P. Vortisch; data collection: A. Reiffer; analysis and interpretation of results: A. Reiffer, L. Barthelmes; draft manuscript preparation: A. Reiffer, L. Barthelmes. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article: We acknowledge support by the KIT-Publication Fund of the Karlsruhe Institute of Technology.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article:
Data Accessibility Statement
The data sets used in this research are available from the Traffic Clearinghouse at the German Aerospace Center (DLR).
