Abstract
World Tourism Organization (UNWTO) is the major source of internationally comparable data on tourism. However, UNWTO data has two drawbacks: it focuses on international trips and ignores differences between regions within individual countries. Alternative sources of big data are increasingly used to enhance tourism statistics. In this paper, we combine traditional information sources with gridded population dataset and Airbnb data to address the limitations of UNWTO statistics. We produce a map of world tourism destinations measured by the number of tourism visits and tourism expenditure in 2019, before the COVID-19 pandemic. We then identify hot spots of tourism and compare the level of spatial concentration of tourism to that of global population and economy. The results illustrate how supply and demand shape the global distribution of tourism, highlight the dominance of domestic travels in global tourism mobility and may help planning tourism policy in the face of current global challenges.
Introduction
Tourism mobility and the tourism industry are geographically uneven phenomena. The spatial distribution of tourism flows and their economic effects have long been among the interests of tourism geographers (Lew, Hall, and Timothy 2008; Pearce 1995). The most important source of quantitative data on the numbers and economic effects of tourism in different parts of the world is the World Tourism Organization (UNWTO). However, UNWTO data on tourism trips has two major drawbacks. First, while comprehensive in terms of international flows, it is incomplete and patchy in describing domestic travels, which represent the majority of global tourism mobility (Bigano et al. 2007; UNWTO 2020a). Second, UNWTO data treat entire countries as tourism destinations, while the distribution of tourism visits is not uniform within individual countries: usually, tourists visit some cities and regions more frequently than others. Many countries gather and publish tourism statistics on lower territorial levels, but statistical methodologies are incoherent in terms of the territorial divisions, definitions, and methods of collecting data. Hence, it is difficult to combine data from national sources to produce a globally comparable dataset.
In this paper, we attempt to combine traditional information sources (UNWTO and national statistics) with alternative sources of large-volume geo-referenced data (gridded population dataset Global Human Settlement Population Layer (GHS-POP) 2015, and geo-referenced data on Airbnb offers in 2019) to produce a detailed and globally comparable map of tourism destinations measured by the number of tourism visits and the amount of tourism expenditure. Based on the result, we identify major global tourism hotspots, and measure the level of spatial concentration of tourism visits and expenditure worldwide in relation to global population and economy. The findings provide insight into the territorial differences in the magnitude of social, environmental, and economic impacts of tourism. They may also help to better understand the geographic dimension of the current challenges that tourism faces, including recovery after the COVID-19 pandemic (Zenker and Kock 2020) and mitigation of, and adaptation to, climate change (Scott, Hall, and Gössling 2012).
Before presenting the detailed methods and results of the analysis, we discuss the existing literature on the determinants and patterns of tourism flows, and review the recent applications of geo-referenced data to enhance tourism statistics. The paper is concluded with a discussion of the results, in terms of the implications for tourism geography and tourism management practices. In the Appendices and Supplemental Files, we provide detailed results of the analysis for further re-use.
Literature Review
Patterns of Tourism Flows and Distribution of Tourism Destinations
Tourism destinations are not evenly distributed across the world. Some countries, regions, and cities are visited more often than others. The first explanation for this is the uneven distribution of tourism resources that attract visitors (Lew 1987; Warszyńska and Jackowski 1978). Destinations for leisure tourism are often located near seacoasts in favorable climatic conditions (Gómez Martín 2005), and specific natural conditions are required to participate in active forms of tourism such as ski tourism (Hinch and Higham 2011). Sightseeing tourism concentrates in locations of high value for cultural and natural heritage, including those recognized by United Nations Educational, Scientific and Cultural Organization (UNESCO) as World Heritage Sites (Yang, Lin, and Han 2010). Yet, the simple existence of physical and cultural resources must be supplemented by the development of tourism industry: infrastructure that creates industrialized attractions, accommodation, transportation, and other services (Leiper 1979; Smith 1994).
Looking at the destination perspective alone is not sufficient to understand the geography of tourism flows. One should consider the relations between tourism origins and destinations, and the transit routes (Leiper 1979). A core–periphery model proposed by Christaller (1964) expects tourism to disperse over the peripheries, following a different pattern than other economic activities, which tend to group together in central places. A core–periphery relation could be noticed in international tourism statistics in history, when European and American countries were clearly divided into central economies representing tourism origins (e.g., United States of America, West Germany) and internal peripheries serving as tourism destinations (e.g., Spain and Portugal; Williams and Zelinsky 1970). On the lower territorial scale, the central areas creating tourism demand are mainly cities. The geographic scope of travels is limited by time and cost of transport, hence distance decay (McKercher, Chan, and Lam 2008) explains the formation of urban recreational hinterlands (Greer and Wall 1979). The distance of travels depends on their purpose, length, and recurrence, so to understand the distance decay it is important to consider a wide spectrum of human mobility (Hall 2005).
The core–periphery pattern is no longer sufficient to describe the contemporary geography of tourism flows. Cities are no longer mere generators of tourism trips, but increasingly also tourism destinations. Cities attract visitors with historic sites, congresses and business meetings, cultural and sport events, and shopping opportunities (Law 1993). In recent decades, attracting tourists was treated by urban governments and stakeholders as part of the strategy in global competition between cities (Abrahamson 2004). In Europe, before the pandemic urban tourism grew faster than tourism in non-urban areas, and the growth of urban tourism was accompanied by increasing concerns of overtourism (Milano 2017; Nilsson 2020).
Tourism mobility patterns are not constant over time. The evolution of tourism destinations follows various paths and stages from exploration to decline, as described by the Tourism Area Life Cycle model (Butler 1980) or evolutionary approaches to tourism geography (Brouder et al. 2017). Tourism geographers have recently been less engaged in identifying and measuring tourism flows, focusing instead on tourism impacts and tourists’ experiences (Hall and Page 2014; Williams 2009). Yet, understanding spatial patterns of tourism is still relevant not only due to ongoing transformations of tourism demand and destinations following global socio-economic processes, but also because of the need to mitigate and adapt to climate change (Peeters and Landré 2011; Scott, Hall, and Gössling 2012). The current study joins the developing line of research utilizing big-data sources (Bauder 2019; Li et al. 2018) to understand the spatial patterns of tourism on spatial and temporal scales more detailed than those offered by conventional statistics.
New Methodological Approaches to the Statistical Mapping of Tourism Destinations
Presenting and explaining the patterns of international tourism flows faces problems of availability of statistical data and its comparability across countries. The World Tourism Organization (UNWTO) has popularized a standard definition of international tourism trips and collects globally comparable data on international tourism arrivals for most countries of the world. This data has its limitations, however. Apart from imperfect comparability of data from various countries and missing data, two basic problems arise. First, data on domestic tourism, the majority of global tourism mobility, is only fragmentary. Second, statistics usually apply to entire countries, so variations within individual countries are not presented.
As an effect, an interpretation of UNWTO data may be biased by the configuration of political borders. The high number of tourists visiting European countries (half of international tourist arrivals; UNWTO 2021a) is partially a result of the political fragmentation of this part of the world. For example, if the state of Florida was treated as a separate country, then, with the estimated number of yearly arrivals exceeding 100 million (Visit Florida 2019), it could surpass France as the largest tourism destination. Also, the statistics of international mobility in East Asia are influenced by the fact that all travels between China and the Special Administrative Units of Hong Kong and Macau are considered international trips (Pratt and Tolkach 2018). If they were treated as domestic instead, the numbers of international tourists’ arrivals to Hong Kong would decrease by 69.5%, and to Macau by 89.5% (UNWTO 2020b).
Some efforts were made to address these issues. Bigano et al. (2007) created a database of domestic and international tourism arrivals in administrative regions based on national statistical sources from 1995. Peeters and Landré (2011) estimated the numbers of domestic tourism to measure global length of tourist travels. Mastercard (2019) and Euromonitor (Yasmeen 2019) publish yearly reports on arrivals and expenditures of international tourists in major global tourism cities, based on official statistics and their own estimations. A consortium of Austrian academic, statistical and tourism marketing organizations maintains TourMIS database on European countries, selected cities and attractions (TourMIS 2021; Wöber 2003). It is based on official statistical sources coupled with the data supplied by tourism professionals from all over Europe, which makes it the most comprehensive database on European city tourism, but also limits the scope and comparability of data.
In recent years, new opportunities for measuring tourism flows with high spatial and temporal resolution are provided by employing user-generated big data (Eurostat 2017; Li et al. 2018). Previous studies have proven it is useful to acquire information about tourism distribution from mobile tracking techniques (Deville et al. 2014; Saluveer et al. 2020), geo-located tweets (Hawelka et al. 2014) and photographs (Preis, Botta, and Moat 2019; Wood et al. 2013), travel reviews (Tilly, Fischbach, and Schoder 2015), website traffic indicators (Gunter and Önder 2016), social media activity (Chen, Becken, and Stantic 2021; Önder, Gunter, and Gindl 2019), or web travel diaries (Zeng 2018). Vaguet and Cebeillac (2021) proved that Airbnb data can be used to infer about spatial and temporal characteristics of tourist trips taking visitors to Iceland as a sample.
So far, the only example of the use of big data to improve the detail of tourism distribution statistics of a large area was made by Batista e Silva et al. (2018). The authors integrated traditional statistics with the location of accommodation establishments derived from Booking.com and TripAdvisor platforms to create a database of tourism in 100 × 100-m grid cells in four seasons of the year. Methodically, Batista e Silva et al. (2018) study drew from the idea of dasymetric mapping, which integrates statistical data with ancillary (e.g., land-use) data to show more accurate distribution of population (Leyk et al. 2019; Semenov-Tian-Shansky 1928). The dasymetric method was used in classical form to map tourism at smaller spatial scales (Vaz and Campos 2013). The geographic database elaborated by Batista e Silva et al. (2018) is similar to multiple grid population databases (Leyk et al. 2019) and grid GDP database by Kummu, Taka, and Guillaume (2018). This approach is reliant on statistics that are available from Eurostat, so it cannot be replicated for other areas than Europe. Our study, in turn, has a global scope and the results enable comparisons across different parts of the world.
Research Procedure, Data, and Methods
We followed a three-step research procedure outlined in Figure 1. First, based on official statistical sources, we collected data on the numbers of international tourism arrivals, domestic tourism trips, and tourism expenditure in countries and administrative units of large countries. At this stage we also estimated missing data where necessary. Second, we disaggregated the numbers of tourists (international tourist arrivals and domestic tourism trips) in each country and administrative unit into hexagonal grid cells using statistical dasymetric modeling with area, population, and locations of Airbnb offers used as ancillary variables. To estimate the functions linking these variables with the number of tourist visits we used Eurostat (2020a) data on the numbers of tourists visiting second level regions according to the Nomenclature of Territorial Units for Statistics (NUTS-2 units) in the European Economic Area. After estimating numbers of domestic and international tourists visiting each grid cell, we calculated the estimated local expenditures of international and domestic tourists, and ratios of tourism visits per inhabitant and tourist expenditures per Gross Domestic Product (GDP). Third, we found the global hotspots of the largest concentration and intensity of tourism visits and expenditures, and we calculated the concentration index of tourism visits to compare it with the territorial concentration of global population and GDP. The analysis was done in R, and cartographic presentation in ArcGIS. We presented results on maps and appended datasets. Apart from that, we append the R file and raw data files produced at consecutive steps of the analysis, which enables the readers to reproduce the analysis. Below, we describe in detail each stage of the procedure.

Research procedure.
Collecting and Complementing Official Statistics on Tourism in Countries and Sub-National Territorial Units
The first step of the analysis comprised the collection of data on numbers of inbound tourism arrivals and domestic tourism trips and tourism expenditure by country and territory provided by World Tourism Organization (UNWTO) and World Travel & Tourism Council (WTTC) statistics (UNWTO 2020b; World Bank 2020). We then estimated missing values in the dataset. Later, where possible, we disaggregated the data into smaller territorial divisions of large countries (Figure 1).
Statistics on countries and dependent territories
The UNWTO statistical database includes data on inbound tourism arrivals and domestic tourism trips in 222 countries and dependent territories of the world (UNWTO 2020b). Data is based on information from national statistical institutions. Some countries have not yet delivered data for the year 2019. In such situations, we used the last available data, extrapolating them proportionally to average global growth. We used data on total inbound arrivals, including same-day visits, for countries that do not separately report the numbers of tourist arrivals. They are mostly island countries with a naturally limited number of international same-day visits. At this stage, we excluded from the analysis countries and territories for which UNWTO did not provide data on inbound arrivals or for which data estimation was impossible due to missing socio-economic data in the World Bank (2020) database. As a result, 195 countries and territories were used in further analysis.
Of those 195, only 75 had data on domestic trips available (including data extrapolated from earlier years in cases where data for 2019 was missing). We estimated these missing data based on linear regression models, similarly as Bigano et al. (2007) and Peeters and Landré (2011) did. As explaining variables, we used socio-economic indicators according to the World Bank (2020). Where necessary, previous figures were extrapolated to 2019 based on data on previous years and the global trend. We filled missing numbers in the database using The National Institute of Statistics and Economic Studies (INSEE, 2019) statistics on the populations of French overseas departments and figures for Cuba and Taiwan compiled from the Central Intelligence Agency, International Monetary Fund (IMF) and United Nations databases. We assumed that the number of domestic trips per capita is affected by the level of socio-economic development of the country, the availability of domestic destinations measured by country area, and differences between global regions. We tried various constructions of the model to obtain the best fit with all predictors (including any region as a categorical value) significant at p < .05. The final equation is as follows (see Appendix A for detailed estimation of the regression model parameters):
Tourist activity is highly dependent on the level of economic development measured by gross national income per capita in purchasing power parity (GNI). Its impact on tourism travels is non-linear, so we included both raw values and their natural logarithm as predictors. We assumed the area of a country would raise the number of residents’ domestic trips (by increasing choice of domestic destinations), but this effect would be limited in the case of big countries. So the
There are also important differences between countries in various regions of the world. Hence, we included dummy variables denoting location in various part of the world according to the UNWTO division, with Europe as a reference region.
Data on inbound and domestic tourism expenditure is derived from the World Travel and Tourism Council and we accessed them via the World Bank (2020) database. Twenty-three and twenty-one countries and territories had missing data on inbound and domestic tourism expenditure, respectively. We estimated these values using the numbers of inbound arrivals/domestic trips and nominal GDP per capita in current USD as predictors. The equations are as follows (see Appendix A for detailed estimation of the regression models parameters):
The resulting data on countries is available in the countries.shp file in the Supplemental Files.
Disaggregating data into administrative regions of selected countries
To enhance the spatial resolution of the dataset, we divided some countries into smaller territorial units and calculated tourist visits and expenditure in each unit based on national statistical sources. We performed this operation only for countries with relatively large areas, large numbers of tourists, and the availability of appropriate data.
In the case of EU countries, the United Kingdom, Norway, and Switzerland, we used Eurostat (2020a) data on the number of international and domestic tourism arrivals in NUTS-2 territorial units in 2019 to refine the previously obtained data on countries. We did not directly use Eurostat data for the entire countries, as for some countries they differ from those provided by UNWTO, due to methodological differences. Instead, we used Eurostat data to proportionally divide the numbers of international arrivals, domestic trips and, similarly, tourism expenditure into territorial units of countries.
In the case of Canada we followed a similar procedure, assuming that the 2019 distribution of total tourism arrivals and expenditures in 13 provinces and territories was proportional to the distribution of tourism demand in 2014, while the distribution of international tourism arrivals and expenditures was proportional to international exports from tourism in 2014 (Statistics Canada 2018). In India we divided the numbers of tourism visits and expenditure across states and union territories proportionally to the distribution of inbound tourism arrivals and domestic tourism trips in 2017 according to the Indian Statistical Office (Government of India 2018). In the case of Indonesia, the basis for the division of national numbers into provinces was the distribution of international and domestic guests in both classified and non-classified hotels in 2018 (Statistics Indonesia 2020). In Turkey, we assumed that the distribution of international and domestic tourism visits and expenditure in provinces in 2019 was proportional to the distribution of international and domestic tourism guests in tourism establishments in 2016 based on the last available data published by the Republic of Turkey Ministry of Culture and Tourism (2017a, 2017b).
In the case of Russia we based the disaggregation on the statistics on foreign citizens and Russian citizens accommodated in collective accommodation facilities (Federal State Statistic Service 2021a, 2021b). Note that, for data consistency, we use Russian statistics on Republic of Crimea and Sevastopol, despite their disputed political status, as Ukrainian tourism statistics do not provide information about the region (Ukrstat 2019). Still, this data should be treated with caution (StopFake 2018).
A more refined calculation was performed in the case of the United States of America, based on several statistical sources. We assumed that the distribution of total tourism visits and expenditures across 50 states and the District of Columbia was proportional to the mean of the share of total tourism expenditures and of the total employment in tourism in 2018 according to the U.S. Travel Association (2020). The distribution of inbound tourism arrivals was calculated based on data of the National Travel and Tourism Office (2020), by adding together separately calculated overseas arrivals, arrivals from Canada, and arrivals from Mexico, which in turn was aggregated from air trips and arrivals by ground means of transportation estimated based on the distribution of population of Mexican origin, assuming a large proportion of visiting friends and relatives (VFR) tourism. Later, total tourism expenditure was divided into inbound and domestic tourism based on the proportions between domestic and international visits in states.
The detailed source data for state disaggregation is collected in the states.xlsx file in the data collection. The final data for sub-national units are provided in the states_nuts.shp file.
Disaggregating Tourism Numbers Into Grid Cells
The next step of analysis was to disaggregate the numbers of tourists and tourism expenditures into territorial units smaller than the countries or administrative regions (Figure 1). We used the term total tourist visits to denote the sum of the unit’s share in total international tourist arrivals and its share in the endpoints of domestic tourist trips. Due to differences in administrative divisions of countries around the world, we used geometric grid cells as the units of analysis. We cannot assume that tourist numbers are uniformly distributed over the territories of the countries and regions. Therefore, we followed a dasymetric mapping using statistically derived weights informed by multiple ancillary variables (Leyk et al. 2019): area, population, and the number of Airbnb reviews.
The assumption that the distribution of tourism is somehow proportional to the area of territorial units is justified by Christaller’s (1964) model of tourism location. Yet, hosting tourists requires infrastructure and usually happens in populated places, so we used population as a second ancillary variable. However, these two variables do not suffice to notice variability in tourism location caused by the distribution of natural conditions, attractions, amenities, accessibility, etc. Following the approach of Batista e Silva et al. (2018), we assumed that the supply of accommodation offered on global internet platforms can serve as a proxy of the distribution of tourism visits. We used the Airbnb platform due to its global presence. Airbnb, established in 2008, is now the largest peer-to-peer rental platform in the world (Dolnicar 2018), marketing more than seven million rental homes, apartments, rooms, and other forms of accommodation (Airbnb 2020). Even though the original idea of the company was to sell unused space in one’s own dwelling in periods of high demand for accommodation, it now serves mostly for commercial rental of tourism accommodation (Adamiak 2019), so we can expect the distribution of the offer to be representative for other forms of tourism accommodation. The general correlation between Airbnb presence and other tourism accommodation and tourism activity has been proven on the scale of individual cities (Gutiérrez et al. 2017; Yang and Mao 2018) and country (Adamiak et al. 2019). We used the aggregate number of reviews posted to listings rather than the number of listings, as listings include those that are not really used by tourists, while it is the review that marks actual tourist activity. We did not use Airbnb data as the only ancillary data, but combined it with area and population, because of the differences in the levels of use of Airbnb in different countries (Adamiak 2019).
To estimate the link functions that enable the disaggregation of total tourism numbers (international arrivals and domestic trips) for countries or administrative units into smaller grid cells using ancillary variables, we used ready data on NUTS-2 units of countries covered by Eurostat statistics. Then, we used the same models to predict the distribution of visits within all regions and countries for which the total numbers were determined in the first stage of the analysis.
Source data: Natural Earth, Global Human Settlement, and Airbnb
Source files for geographic analysis were based on the Natural Earth (2020) open geographic information database and Eurostat (2020b) Nomenclature of Territorial Units for Statistics (NUTS) area limits at the scale 1:10 million. Vector data manipulations were done in sf and rmapshaper packages in R (Pebesma 2018; Teucher and Russell 2020). For counting population, we used the Global Human Settlement Population Layer (GHS-POP) published by the European Commission Joint Research Centre (Schiavina, Freire, and MacManus 2019). We used data for 2015 with a spatial resolution of 30 arcseconds. We calculated populations in spatial units using functions of the raster package for R (Hijmans 2020). We calculated nominal Gross Domestic Product (GDP) in spatial units by disaggregating values for countries (World Bank 2020) proportionally to population.
We obtained the information on the offers on the Airbnb platform and numbers of reviews (which indicate the rental demand for each listing) through web-scraping the platform website utilizing a Python script (Slee 2018). We performed the data collection in September 2019, and only listings available for rent any time in the following months were saved in our dataset. After removing wrongly located listings, the raw database included 5,718,551 listings. In several countries there were no listings or only a few. They were mainly small island countries or territories, and countries where American firms cannot operate due to United States government sanctions. Therefore, apart from the countries not considered anyway due to lack of UNWTO data, we excluded Iran from further analysis at this stage.
The study uses the number of reviews posted to Airbnb listings to indicate the frequency of Airbnb use and general tourism visits to the destination. We aggregated the numbers of reviews in territorial units, treating reviews as an indication of the number of rental transactions. Therefore, we treated them as transactional data rather than textual user-generated data (Li et al. 2018). Not all Airbnb stays result in a review: in 2014, 67% of guests wrote a review after their stay (Fradkin, Grewal, and Holtz 2018); Inside Airbnb estimates this share at 30.5% (Inside Airbnb, 2021). The propensity to post reviews may differ between countries, tourists’ nationalities, socio-economic background of guests, among others. We are not able to account for most of these potential differences. However, the international differences in the probability of posting a review would not affect our estimations because Airbnb review data is used only to disaggregate tourist numbers within countries, not between countries.
Estimating functions linking predictors with tourist arrivals based on NUTS 2 Eurostat data
To link the proposed explaining factors with actual distribution of tourism visits on sub-national geographic scale, we estimated the parameters of linear regression models based on the data on tourism arrivals for NUTS-2 regions of European countries (European Union, The United Kingdom, Switzerland, and Norway) provided by Eurostat (2020a). We elaborated the models to predict the share of tourism in a country that falls into a specific region. Both predictors and result variables are compositional data (Aitchison 1986), so we performed center log-ratio transformations of all variables. We constructed the models to obtain minimum residual standard error and keep all variables significant at p < .05. The resulting equations are:
Detailed model parameters estimations are provided in Appendix B. Multi-collinearity check showed that variance inflation factor (VIF) values were not higher than 2 in both models. The models reveal that Airbnb activity distribution predicts international tourism visits distribution quite efficiently. In the case of domestic tourism, the use of population and area greatly improved model fit.
The dataset based on which we modeled the link functions was saved into the nuts_modelling.shp file in Supplemental Files.
Creating the geometric grid and disaggregating data
We divided each country (and each administrative unit of larger countries) into geometric spatial units for which we estimated the numbers of tourist visits and tourism expenditures using the data and models described above. We did not use square grid cells because of their different sizes dependent on latitude. Instead, we used a grid of hexagons, projecting a Goldberg polyhedron similar to a soccer ball on the Earth’s surface (Goldberg 1934) using the dggridR package in R (Barnes 2018). We divided the world into Icosahedral Snyder Equal Area Aperture 3 Hexagonal Grid with scale factor 8, which resulted in 65,612 units with an area of 7,774 km2 each. Twelve units are pentagons that are five sixths the size of the others, but they are mostly located on the sea or in desert areas.
We temporarily divided geometric polygons containing borders between countries or administrative regions into smaller areas by intersecting them by political borders. In each of the 25,344 parts of hexagons (excluding sea and areas of the countries for which no data was available) we calculated the area, population, and number of Airbnb reviews, and based on this we estimated the numbers of international and domestic tourist visits using the link functions developed in the previous section. Then, we also disaggregated the sums of tourism expenditure into parts of hexagons assuming that the international and domestic tourism expenditures are proportional to the numbers of international and domestic tourists, respectively. The effect of this step was stored in the subhex.shp file in Supplemental Files.
We then merged the parts of hexagons back to 19,491 original hexagonal grid cells (excluding sea and countries with no data). As the models tend to overestimate numbers of tourists in sparsely populated areas, which greatly affects the cartographic presentation of the results, we filtered out hexagons of fewer than 150,000 inhabitants (which roughly corresponds to 20 inhabitants per square kilometer) and less than USD 150M of GDP at the same time, leaving final dataset of 9,601 hexagons. They were 49.3% of all grid cells covering 51.1% of their total area, but concentrating 99.1% of population, 99.8% of GDP, 99.0% of tourism visits and 99.1% of tourism expenditure.
Finally, we divided the total number of tourism visits in each grid cell by its population to obtain a relative measurement of the intensity of tourism. In similar manner, we divided tourism expenditure in each grid cell by its total nominal GDP.
Finding Tourism Hotspots and Measuring Territorial Concentration of Tourism Activity
In the last step of the analysis (Figure 1), we attempted to find the general patterns in the maps produced by employing hot spot analysis using the spdep package in R (Bivand and Wong 2018). We performed the analysis using five input variables: the absolute numbers of tourist visits, international tourist visits and tourism expenditures, and the relative importance of tourism in relation to local population and GDP. To delimit hotspots, we used the Getis–Ord Gi* metric (Getis and Ord 1992) with a contiguity-based matrix of spatial relationship. Before the analysis we took natural logarithms of all variables to normalize their distributions. We set a threshold of p = .05 to determine the positive clusters (hotspots) and negative clusters (cold spots) of the values of five variables.
The final dataset of hexagons combining original data, estimated numbers of tourism visits and expenditures, and assignment of grid cells to hotspot areas, was stored in the file hexagon.shp in the attached dataset.
To measure the level of spatial concentration of tourism and compare it with the concentration of population and global economy we employed the Gini coefficient and Lorenz curve. Both methods were originally designed to measure the concentration of income and wealth in the society (Lorenz 1905), but later they were applied to measure geographic concentration of industries (Krugman 1991) and other social and natural phenomena. In tourism studies, both methods are primarily used to measure tourism seasonality (Wanhill 1980), but also spatial concentration of tourist arrivals (Lacher and Nepal 2013).
Results
Tourist Visits and Tourist Expenditures in Countries
UNWTO dataset contains information on 1.426 billion international and 4.611 billion domestic tourist trips in the analyzed countries in 2019. After assessing the missing values on domestic trips (3.764 billion), we estimated the total number of domestic trips at 8.375 billion, so almost six times more than international trips. Such proportion roughly corresponds to the one assessed by UNWTO in previous editions of Tourism Highlights (UNWTO 2015). The distribution of domestic and total tourism visits across parts of the world differs significantly from the distribution of international trips alone (Table 1). Europe, which accounts for half of international tourist arrivals, has only a one-fifth share in the global number of tourist visits when domestic trips are included in the calculation. Asia and the Pacific account for over half of tourism mobility, which corresponds to the share of this part of the world in global population. The shares of Africa and Middle East in total tourism mobility are much smaller than their similarly small shares in international arrivals, while the shares of the Americas are similar to these calculated only based on international tourism figures.
International Tourism Arrivals, Domestic Tourism Trips, and Tourism Expenditure by UNWTO Global Region.
International tourism has a higher share in total tourism expenditure than in the number of tourist trips. Still, almost three quarters of global tourism spending is related to domestic travels. When expenditures are taken into account, the position of European and American countries is relatively higher than the number of visits: almost one third of total expenditures fall to European countries, which is only a little less than Asia and the Pacific.
The highest tourist traffic is observed in the countries with the highest populations. The 30 countries with the largest aggregate number of international tourist arrivals and domestic trips include 19 of the 30 most populated countries in the world (Table 2, Figure 2). In China and India, the number of tourists is twice as high as in the third country—the United States—and five to eight times as high as in the next countries—Japan, Russia, and France. In the countries with the highest numbers of tourist visits, domestic tourists predominate, and inbound tourists usually do not exceed 10% of the total. Only in European countries (e.g., France, Spain, and Germany) is the proportion of international tourists higher. On the other hand, tourism develops almost exclusively based on international tourists in many small and low-income countries of Africa, Antilles, and Oceania. A predominance of international visitors is also observed in the south-east of Europe (Albania, Croatia, Montenegro, Malta) and Arab states of the Persian Gulf.
Countries with the Largest Total Numbers of Total Tourism Visits and Largest Tourism Expenditure.
Data on domestic tourism trips/expenditure based on estimation.

Numbers of international tourist arrivals and domestic tourist trips in countries.
The distribution of tourism expenditure does not directly correspond to the numbers of tourism visits (Table 2, Figure 3). While the USA and China are the leaders of both lists (tourists spent over a trillion dollars in each of these countries), in terms of tourism expenditures they are followed mostly by large Western European countries (Germany, the United Kingdom, Italy, and France). The share of international tourism in tourism expenditure is relatively high in South-East Asia, Southern Europe, and the Middle East, as well as the Caribbean countries and Africa.

International and domestic tourism expenditure in countries.
Distribution of Tourism Visits and Expenditure in Grid Cells
The analysis of the data on lower spatial scale of homogeneous geometric fields shows great variations in the density of tourism visits and expenditure within individual countries. Major concentrations of tourism visits are found in Europe, East Asia, and India (Figure 4). The highest tourist attendance is observed in large metropolises (not only national capitals) and their surroundings. Twenty-seven of the thirty most visited grid cells contain large Asian cities (12 in China [including Hong Kong and Macau]; 11 in India; as well as Seoul, Tokyo, Bangkok, and Jakarta; Table 3). In Europe, the largest number of tourists visit capital cities and grid cells on the Mediterranean coast and along the highly urbanized belt stretching from Northern Italy to England. In North America, the largest number of tourists visit the north-eastern megalopolis of the USA, California, Florida, central Mexico, and the Rio de Janeiro region. There are only minor tourism clusters in the African continent, for example, some grid cells located in South Africa, Egypt, Morocco, and Tunisia.

Total number of tourist visits in grid cells.
Grid Cells with the Highest Total Numbers of Tourist Visits, Highest Numbers of International Tourist Visits and Largest Tourism Expenditure.
International tourism visits are evidently more concentrated in Europe, while the relative position of Mainland China and India is lower (Figure 5). The group of 30 grid cells most frequently visited by foreign tourists includes 11 European capitals and six other tourist destinations in Europe (including Turkey). Apart from major cities, they also include leisure destinations in the Mediterranean region (Antalya, Palma) and the Alps (Innsbruck; Table 3). In Asia and the Middle East, Macau, Hong Kong, Bangkok, Singapore, and Dubai are among the top international tourism destinations. Ignoring domestic tourism also elevates the relative position of Central America, the Caribbean, and South-East Asia.

Number of international tourist visits in grid cells.
The distribution of tourism expenditure depends not only on the number of tourist visits, but also on the level of economic development of a given region, which also translates into the cost of living. Hence the high position of north-western and Alpine Europe, East Asia, and urbanized parts of the United States (Figure 6). Grid cells with the highest amounts of money spent by tourists include major European and Asian metropolises (particularly the ones often visited by international tourists), as well as major North American and Middle Eastern destinations (Table 3).

Tourism expenditure in grid cells.
Figures 7 and 8 present the spatial variation of the intensity of tourism relative to the size of local population and economy. Unlike what can be seen on Figures 4 and 5, they do not follow the global distribution of population and economic product. Europe retains high values, along with North America, Australia, and New Zealand. Regional differences are explained by the access to leisure amenities: sea (in Europe, coasts of China, and North America), mountains (the Alps, the Rocky Mountains, Yunnan, and Sichuan provinces), and favorable climatic conditions (areas to the south of major population concentrations in the northern hemisphere). Apart from primary world destinations, a high ratio of tourists per population characterises regions with relatively low population densities but high tourist activity of local residents, for example, the Nordic countries.

Tourism visits per 1,000 inhabitants in grid cells.

Tourism expenditure per 100 USD of GDP in grid cells.
When relative importance of tourism for local economy is measured, southern Europe, Central America and the Carribean as well as parts of South-East Asia stand out. In the Middle East, coastal resorts on the Red Sea and cities on the southern coast of the Persian Gulf are characterised by high relative tourism numbers and expenditures. Moreover, island locations are often particularly tourism-intensive, including European islands (the Balearic Islands, Canary Islands, Greek and Croatian Islands, and Iceland), Hawaii, Maldives, and New Zealand.
World maps presented in Figures 4 to 8 are too general to read the patterns of presented variables on the regional or national scale, so a more detailed supplementary map is attached to the article. A section of this map is demonstrated in Figure 9. Semicircle sizes are proportional to the numbers of tourists visiting a given grid cell (left semicircles) and tourism expenditure (right semicircles). The color tint displays the intensity of tourism, while color hue informs about the share of international tourists in total tourism visits and expenditure. Apart from this map, borders of grid cells and basic indicators of tourism intensity may be consulted on a web map (http://puma.uci.umk.pl/~czeslaw/global-destinations/), and a shapefile with the resulting dataset (hexagons.shp) is provided as a Supplemental File for further re-use.

Fragment of World map of tourism destinations (printed version contains a simplified figure—see online version for the full version; entire map in Supplemental File).
Hotspots of Global Tourism
Hotspot analysis provides a generalized image of the territorial variation in tourism size and intensity (Figure 10). Out of four major clusters of tourism visits (Europe, East Asia, India, and western North America) only Europe is also a large hotspot of international tourism. Smaller hotspots exist in various parts of the world—mainly East and South-East Asia and North America. When tourism expenditure is measured instead of the number of visits, eastern North America takes the place of India as the third global concentration. Low tourism numbers and expenditures are clustered in central Africa and also, with regard to foreign tourists, in Asia.

Hotspots and cold spots of tourism in hexagonal grid cells: (A) Total tourism visits. (B) Visits of international tourists. (C) Total tourism expenditure. (D) Tourist visits per population. (E) Tourist expenditure per GDP.
This pattern changes when the intensity of tourism is presented in relative terms. Clusters of high tourism visits per capita are present mostly in Europe, North America, Australia, and New Zealand, thanks to high numbers of domestic and short-distance international tourism trips. Hotspots of tourism intensity measured in monetary terms are more dispersed and located away from the core areas of the global economy: on the Mediterranean coasts, in Central, South-East and Southern Asia, as well as in Central America and western North America. Again, in most regions of Central Africa, tourism does not play an important role, even in relation to the total economy.
Global Concentration of Tourism
The Lorenz curves and Gini coefficients presented in Figure 11 were elaborated only based on the 9,601 grid cells after excluding those located on sparsely populated areas. Still, over half of the population and two thirds of the economic productivity is concentrated in just 10% of the area of these grid cells. The distribution of tourism visits turns out to be more spatially concentrated than the distribution of population. Tourism expenditure is also slightly less dispersed over the territory than general economic product. The global distribution of international travels is substantially more concentrated on small territories than domestic tourism mobility, both in terms of the number of visits and expenditure.

Lorenz curves and Gini coefficients presenting the concentration of tourism in grid cells.
Discussion
The study aimed to map and describe the distribution of global tourism destinations with more spatial detail than using conventional statistics. This was done by employing novel big-data sources: gridded population database and geo-referenced data on Airbnb accommodation offers. The resulting maps enable comparison of the total and relative numbers of tourist visits and tourism expenditure in geometric grid cells covering most countries of the world. The maps confirm that the distribution of tourism is a resultant of the distribution of demand (population and its purchasing power) and supply (natural amenities and tourism resorts). The maps of absolute number of tourist visits and total tourism expenditure roughly correspond to the maps of total population and economic product, respectively. If relative instead of absolute values are considered, semi-peripheral locations not too distant from the major population and economic centers and offering favorable natural conditions stand out.
Apart from confirming the established knowledge, the results help to confront some common misconceptions about the spatial patterns of global tourism. First, they emphasize the dominant role of domestic tourism in the structure of global tourism flows. Domestic trips are not only the vast majority (85%) of total tourist travels, but they also account for three-quarters of the global tourism economy. The distribution of domestic destinations often differs from the distribution of international destinations within a country, and domestic trips tend to be more dispersed over the territory of a country than inbound visits. Despite the growing number of geographical studies acknowledging the importance of domestic tourism (Rogerson 2015), it is still often considered less important than international travels not least because of the difference in the availability of statistical data (Peeters and Landré 2011). Second, considering both international and domestic tourism reduces the perceptual dominance of Europe, or more generally of high-income countries, on the global map of tourism. Their share in tourism mobility and economy is considerable smaller than that of the aggregated middle-income countries of Asia (including China and India), the Americas, Eastern Europe, and the Middle East. Low-income countries in Africa and Asia, apart from small areas (often islands) turn out to be tourism cold spots. This means that, first, tourism demand and economy follow the national income along an S-shaped curve, and second, that with growing economy tourism tends to disperse over the territories of countries.
The study adds depth and detail to the conventional tourism statistics, yet it still inherits some of their limitations. First, the two measures of tourism that are used: the number of visits and expenditure, do not sufficiently describe the complexity of tourism behaviors and impacts. Tourists can stay at a destination for varied lengths of time, so the number of nights is often a better measurement of the volume of tourism than the number of visits. Taking into account the number of nights spent by tourists instead of the number of visits would probably reduce the relative position of capital cities, which are often destinations for short business trips, and increase the position of coastal and mountain destinations visited for longer leisure stays. In the case of the second measure—tourist expenditure—the problem of localizing economic impact arises: we assumed that the entire expenditure is bound to the destination, while in fact, for example, the cost of transportation may be incurred in other locations. Evaluating and localizing the total economic impact of tourism would require the analysis of both the supply and demand for goods and services associated with tourism activity. Such perspective is a basis of tourist satellite accounts, which are usually evaluated for the entire countries, and the possibility of their international comparisons, even on the European Union level, is very limited (Eurostat 2019). Also, tourism statistics are usually based on tourism accommodation statistics, and a large part of tourism stays remain unobserved (De Cantis et al. 2015), including stays: in second homes (Müller and Hall 2018), with friends and relatives (Jackson 1990), and in informal accommodation establishments, including peer-to-peer accommodation (Guttentag 2015). Differences in national methodologies of tourism statistics result in limitations in the comparability of the data about the numbers of both international arrivals (e.g., differences between countries measuring arrivals at frontiers and at accommodation establishments) and domestic trips. Comparing the reported numbers of domestic trips with predictions of the model used for extrapolating the missing data gives some idea about the extent of such bias. For example, the reported numbers of domestic trips are extremely low compared to the estimation in some Eastern and Southern European countries (e.g., Belarus—2.5 times lower, Italy—2.4 times lower), while they are much higher in India (5 times), Finland (2.2 times), and Czechia (2 times). This may well be a result of specific factors not measured by the model, but may also stem from data incompatibility.
The second source of the limitations is the employed procedure of disaggregating tourism visits into smaller territorial units. We assumed that each tourist trip is directed only to a single destination, while in fact a tourist may visit multiple places during one trip. This means that our dataset may underrate the number of visits in grid cells, particularly international visits in large countries. We decided on the size of reference grid cells as a trade-off between larger grid cells producing more robust estimations and the greater detail offered by smaller cells. In Europe this size of grid cells may appear large (e.g., Batista e Silva et al. 2018, used 100 × 100-m grid cells for the dataset and presented the results at 10 × 10-km resolution), yet it enables the entire world to be covered and legible world maps to be designed. Finally, the formula for disaggregating the numbers of tourists and tourism expenditures into grid cells was evaluated based on European data and it is not obvious that it is appropriate to other parts of the world. To verify the validity of this estimation, we confronted our results concerning the distribution of international visits in China with official Chinese statistics on international tourist visits to province-level administrative divisions of the country (National Bureau of Statistics of China 2019; we did not use this data in the first step of the analysis due to the lack of official data on the provincial distribution of domestic visits). We assumed that the distribution of tourism within each grid cell is constant over territory, so part of the error results from the provincial borders intersecting grid cells. Still, in most provinces the difference between the estimated and actual numbers of international tourist arrivals is lower than 50% (Figure 12). High discrepancy only pertained to provinces with low absolute numbers of international tourism visits (bottom left corner of the chart).

Estimation of international tourism arrivals to provincial-level administrative divisions of China compared with the national statistics.
From the perspective of destination management and tourism policy, the results may help more informed planning of tourism development on large territorial scales. The predominance of domestic tourism and common locational difference between domestic and international tourism destinations support the idea that, particularly in middle-income countries, tourism development should be directed to the domestic, and not only international market, to obtain a territorially balanced distribution of its economic benefits (Rogerson 2015; Seckelmann 2002; UNWTO 2020a). Considering the challenges of mitigating climate change and the growing contribution of tourism mobility, particularly long-haul air travel, to greenhouse gas emissions, the current research gives support to the claim that global tourism development is not inherently dependent on long-haul travel and could develop on the basis of shorter trips with the use of ground transportation (Peeters and Eijgelaar 2014; Peeters and Landré 2011). A detailed map of tourism destinations may also help to add precision to the evaluation of the future impacts of climate change (Scott, Hall, and Gössling 2019). Many global tourism clusters are located in the parts of the world particularly endangered by increasing water deficits and more frequent extremely hot days (e.g., Mediterranean coasts, Caribbean and Central America, southern China, islands in the tropical zone; IPCC 2018, 2020).
The results of the current study may also help monitor the territorial effects of the COVID-19 pandemics. Tourism has been among the sectors of the economy most affected by the spread of the pandemic and the non-pharmaceutical interventions employed to contain it (Gössling, Scott, and Hall 2021). These impacts have varied geographically between and within countries (UNWTO 2021b; Yang et al. 2021). Due to border closures, international tourism was particularly affected (UNWTO estimates international tourism to have fallen by 74% in 2020 compared to the previous year), while domestic and proximity tourism is expected to recover more quickly (Romagosa 2020; UNWTO 2020a; Zenker and Kock 2020). The reduction in international mobility may be long-term due to continuous administrative and economic constraints, perceived barriers, and structural changes in the tourism industry. The current study identifies areas particularly economically dependent on tourism in general and on international tourism in particular, which may help forecast the future development of tourism destinations in the post-COVID reality.
The presented study belongs to a growing body of research employing big-data sources to analyze the spatial patterns of tourism. Such an approach has been extensively developed during the pandemic as it enables tracking dynamic changes in a near-real-time manner. Booking data, Internet searches, user-generated data, and mobile device data have been used for monitoring the situation in passenger transportation, hotel and short-term rental markets (Gallego and Font 2021; Gössling et al. 2021; Napierała, Leśniewska-Napierała, and Burski 2020; Nhamo, Dube, and Chikodzi 2020; Yang et al., 2021). The method used in the current study may contribute to this development, and two avenues of its development seem particularly promising. First, repeating similar studies in the future will be useful to explore the time trend, for example, global changes in the level of concentration of tourism on general spatial scale (Lacher and Nepal 2013) and changes in the location of major tourism hotspots. The current pandemic may accelerate such changes, as, apart from the short-term impacts, long-term processes are hypothesized to occur, for example, changes in destination images, development of peripheral tourism destinations at the cost of central ones, reduction in business trips, shift of transport modes toward private cars, or new more sustainable paths of destination and business development (Li, Nguyen, and Coca-Stefaniak 2020; Niewiadomski 2020; Zenker and Kock 2020). The second possible development of the method is to include the origin of tourism travels in the database. Considering tourism origins, destinations, and transit routes may enable, for example, disaggregation of the contribution that tourist travels to specific destinations make to greenhouse gas emissions (Peeters and Landré 2011). In the post-COVID context, it may also help to track the long-term changes in travel behavior, for example, the pace of return of international mobility. Mapping tourism origins could be based on user-generated content on Internet platforms and later estimations considering location and level of urbanization.
Supplemental Material
sj-7z-1-jtr-10.1177_00472875211051418 – Supplemental material for Combining Conventional Statistics and Big Data to Map Global Tourism Destinations Before COVID-19
Supplemental material, sj-7z-1-jtr-10.1177_00472875211051418 for Combining Conventional Statistics and Big Data to Map Global Tourism Destinations Before COVID-19 by Czesław Adamiak and Barbara Szyda in Journal of Travel Research
Supplemental Material
sj-7z-2-jtr-10.1177_00472875211051418 – Supplemental material for Combining Conventional Statistics and Big Data to Map Global Tourism Destinations Before COVID-19
Supplemental material, sj-7z-2-jtr-10.1177_00472875211051418 for Combining Conventional Statistics and Big Data to Map Global Tourism Destinations Before COVID-19 by Czesław Adamiak and Barbara Szyda in Journal of Travel Research
Supplemental Material
sj-png-3-jtr-10.1177_00472875211051418 – Supplemental material for Combining Conventional Statistics and Big Data to Map Global Tourism Destinations Before COVID-19
Supplemental material, sj-png-3-jtr-10.1177_00472875211051418 for Combining Conventional Statistics and Big Data to Map Global Tourism Destinations Before COVID-19 by Czesław Adamiak and Barbara Szyda in Journal of Travel Research
Footnotes
Appendix
| Variable | SD | Coefficient | SE | t | p |
|---|---|---|---|---|---|
| Dependent variable | |||||
| Inbound trips | 1.114 | ||||
| Independent variables | |||||
| Area | 1.045 | −0.168 | 0.033 | −5.096 | .000 |
| Population | 0.728 | 0.297 | 0.061 | 4.878 | .000 |
| Airbnb reviews | 1.229 | 0.698 | 0.034 | 20.617 | .000 |
N = 278.
R2 = 0.779.
Adjusted R2 = 0.776.
Residual standard error = 0.526.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
