Abstract
International migration affects the socio-economic and spatial development of a country. However, mainly due to data limitations, less attention has been given to the specific group of expatriates in China. Specifically, academia does not have a clear understanding of the factors that influence expatriates’ migration to China, and it is unable to predict the size of the expatriate population correctly and accurately. To address these issues, in this study, we used the keyword search index related to “China” on the Google Trends search engine to reflect the mobility intentions, information, and aspirations of the expatriate population based on the migration theory from a psychological perspective and compared the index with the actual number of expatriates according to China’s Seventh Census data. Correlation and regression analyses were performed to reveal the interaction between expatriates’ migration intentions and migration behaviors. This study demonstrates that the information collection of China’s political situation is the primary factor indicating the expatriate population to China, whereas factors such as travel and climate can further predict expatriates’ decision to move to China. In addition, the study demonstrates that Google Trends data can be used as an effective data source for studying expatriate population relocation in China. This paper can provide a reference for formulating many policies in China, such as economic and trade policies, population policies, and social integration policies.
Introduction
Transnational migration has received widespread attention due to its far-reaching economic, social, and political impacts on outgoing and incoming countries (McAreavey, 2017; McGhee et al., 2012). Many theoretical and empirical studies have adequately addressed the unidirectional flow of migrants from developing countries to developed countries in North America, Europe, and Oceania (Alba and Nee, 2003; King, 2015; Lo et al., 2017; Paul, 2011). Among them, macro factors such as economic levels, welfare systems, and social inclusiveness of outflows and inflows, as well as micro factors such as individual characteristics of migrants, family factors, and perceived willingness, have become key explanatory variables in transnational migration motivation studies (Huang and Cheng, 2014; Hum and Simpson, 2004; Li et al., 2009a, 2009b). However, given that Western scholars have long viewed China as a migrant-sending country, theoretical insights and data collection on expatriate migration to China are still lacking.
Since the post-1978 economic reform, along with China’s rapid economic growth and expanding international influence, the number of expatriates in China has steadily increased. China has become a typical migrant-receiving country worldwide (Guo et al., 2022). Studies have focused on the “transnational class” destined for China in the context of globalization, emphasizing that most of them are involved in cross-border trade and the distribution of manufactured products, thus facilitating the establishment of new interactions between their home countries and China (Lyons et al., 2008). Moreover, many foreign enterprise agglomerations and international communities have emerged in various cities in China (Kown, 1997; Liang and Billon, 2018). According to the Seventh Census, there are 845,697 expatriates from over 150 countries within China. These expatriate populations have had a profound impact on China’s population policy, public facility allocation, and community governance (Guo et al., 2022). In this context, understanding the willingness of expatriate populations to migrate to China can help to accurately predict the flow of expatriates migrating to China and serve central and local governments in formulating various migration-related policies. These policies aim to facilitate the interaction between China and the countries of origin at the economic and trade levels, help the expatriate population integrate into Chinese society, and promote the sustainable development of China’s social environment.
Migration studies in China focus on internal rural–urban migration (Fan, 2011; Gao et al., 2024; Liu et al., 2015), and the study on the expatriate population is still lacking. China’s rural–urban migration studies rely on data from population censuses, sample surveys, and monitoring data on the dynamics of migrant populations (Lin et al., 2023; Wei and Gong 2019; Xu et al., 2023). However, tracking the micro characteristics of expatriates moving to China, including their socio-economic characteristics, source country information, migration motivation, and decision-making information, is often difficult. Furthermore, migration decision-making is a less-examined field than migration behaviors in research and practice, which determines migration behaviors (Czaika et al., 2021; Fishbein and Ajzen, 1977). Nevertheless, the process of migration decision-making has drawn more scholarly attention only in recent years, and information is an important dimension of migration decision-making, requiring more studies (Czaika et al., 2021; Thompson, 2017; Zittoun, 2020). Therefore, there is an urgent need to improve data collection, research methodologies, and the empirical and theoretical understanding of migration decision-making to provide a scientific basis for accurately portraying the scale and characteristics of expatriate migration to China and expatriate decision-making.
In recent years, with the rapid development of information technology, big data based on total mobile communication data, social network data, and location data of real-time population migration provided by internet companies or communication providers have been widely used in academic research (Abel and Sander, 2014; Alexander et al., 2020; Ginsberg et al., 2008). In this study, we use the Google Trends index as a proxy indicator to characterize expatriates’ intention to move, supplemented by data from China’s Seventh Population Census, in an attempt to reveal the key factors influencing the expatriate population’s intention to move. Specifically, this study aims to answer the following questions: What are the origin status and spatial distribution of expatriates in China, and what are the search terms in Google Trends that characterize expatriates’ relocation intentions? To what extent does the migration intention characterized by search terms influence expatriates’ migration to China? To fill the above research gaps, we used the keyword search index data related to ‘China’ in Google Trends, which can reflect expatriates’ intention and desire to migrate to China, and construct a correlation analysis and regression analysis with the number of expatriates in each country published in China’s Seventh Population Census data. Correlation and regression analyses were conducted, and the research results systematically revealed the interaction between the intention of foreigners to move to China and their actual migration behavior and improved the analytical framework of expatriates’ migration flow in the context of China.
This paper is organized as follows. The next section reviews the research lineage of individual migration intentions and introduces the application of big data in emerging migration research. This is followed by a section systematically explaining the data sources and methods used in this study. The next section first demonstrates the spatial distribution pattern of the expatriate population in China with the help of GIS spatial visualization tools and then further reveals the interaction between the migration intention of the expatriates and their migration behavior through correlation analysis and linear regression analysis. The “Discussion” section discusses and summarizes the feasibility and effectiveness of the Google Trends index in the study of expatriates’ relocation to China based on the above analysis.
A review of migration intentions and information collection
Theories of migration behaviors
There are some well-established theories in the study of migration, specifically migration behaviors. The literature emphasizes that migration is influenced by the economic, social, and political environments of both out- and in-migration sites (Heberle, 1938; Taylor, 1999). Levenstein’s “Seven Laws of Migration” suggest that the main purpose of migration is to improve economic conditions. Neoclassical economic theory argues that the fundamental motivation for mobility lies in the pursuit of maximizing economic benefits and that the expected urban-rural income gap is the core element that promotes mobility (Ellis, 1998; Stark and Bloom, 1985). In addition, the global city theory emphasizes the role of economic globalization in facilitating international migration (Sassen, 2001), and labor market segmentation theory, which suggests that there is a dual labor market within the country of destination and that immigrants fill in the lower tiers of the labor market that native laborers are unwilling to enter (Piracha and Vadean, 2010). Immigration policy not only directly removes intermediate barriers to migration by releasing the right to enter the country, but also regulates economic factors by reducing the risk of migration, and it is the most direct and easiest way for the government to regulate population mobility through immigration policy among the many policies to control population mobility (Papademetriou and Sumption 2013).
Several theories pay more attention to migrants’ social attributes and mainly include human capital theory and social network theory. The former theory emphasizes the role of individual characteristics of migrants, such as education, skills, language, and work experience, in the migration process (Grubel and Scott, 1966; Schultz, 1975). Highlighting the social dimension of migration, the latter focuses on the embeddedness of migrants in the social network of the destination country, suggesting that the social network not only provides financial support for migrants to increase their willingness to migrate and settle in the destination country but also provides emotional support to help migrants adapt and integrate into the mainstream society of the destination country (Huang et al., 2018). The above theories explain migration behaviors such as geographical mobility and retention, and spatial distribution.
Theories of migration decision-making
The above theories are consistent with the rational choice perspective, which emphasizes the rational trade-offs and judgments that individuals make when deciding whether to migrate. However, these theories have not sufficiently explored the psychological factors of individual migrants, including migration motivations, aspirations, and beliefs (de Haas and Fokkema, 2011; Massey et al., 1993). Although economic and work-related factors tend to dominate the migration decision-making process of migrants, economic rewards per se are not necessarily the ultimate goal of migration but rather a means to fulfill the specific needs and desires of migrants (De and Gordon, 2000). In fact, the study of migration decision-making from a psychological perspective has become a new direction in recent migration research (Carling and Pettersen, 2014; de Haas and Fokkema, 2011; Gubhaju and de Jong, 2009). The Theory of Planned Behavior (TPB) is a typical theory in this field of research (Fishbein and Ajzen, 1977), emphasizing that migrants’ intention to migrate is the primary determinant of their migration behavior. Furthermore, the potential desire to migrate has been shown to be closely related to the information migrants receive about the destination country, which moderates migrants’ attitudes and perceptions toward migration and thus influences their migration decisions (Thompson, 2017; Zittoun, 2020).
Acquiring knowledge and information about future life is an important prerequisite for shaping migrants’ desire (intention) to migrate (Appadurai, 2004; Goodman, 1981). People’s knowledge and understanding of social, economic, political, and other opportunities may change their migration intentions (Appadurai, 2004). As shown in Figure 1, as migrants actively collect and process an increasing amount of information related to the destination country, migrants imagine the future migration process and future life at the destination based on the information collected, and this process further regulates migrants’ migration aspirations and migration behavior (Halfacree, 2004). This information comes from education, social networks, social media, web pages, etc. (Hugo, 2003). The literature suggests that in the context of the highly developed internet, search engines that provide users with accurate search results have become one of the key ways for migrants to obtain information about their country of migration (Gibson et al., 2010). Migrants use search engines to search for information about the target country before moving, assess the risks, costs, and feasibility of moving to a potential destination country, form an informational perception of the destination, and judge whether it meets their expectations, which in turn influences migrants’ migration intentions and behaviors (Fantazzini et al., 2021).

Process of the immigration decision.
In this context, the search for information about the destination country plays an important role in the migration process. Potential migrant populations collect and process various types of information about available migration opportunities at different stages of the migration journey, including information about potential destinations, possible entry routes, and possible job opportunities (Baláž et al., 2016; Czaika et al., 2021). Moreover, they assess potential risks and uncertainties associated with migration journeys and outcomes, such as language difficulties, health risks, unemployment, and crime rates. Research has demonstrated that a range of online big data can present a full picture of migrants’ information search for destination countries and predict the migration movements of international migrants. For example, studies have been conducted to measure international migration rates through the use of Facebook’s advertising platform (Zagheni et al., 2017). Hughes et al. (2016) utilized approximately 500,000 pieces of Twitter data to infer population migration patterns and further validated the relationships between characteristics such as age, gender, and nationality of Twitter users and their migration patterns. Google search terms have been shown to be useful in exploring factors influencing Puerto Rican migration to the United States, as well as Puerto Rican immigrants’ search preferences for five favorable states in the U.S. (Vicéns-Feliberty and Ricketts, 2016). Wladyka (2013), through a case study of labor mobility from Latin America to Spain, tested the ability of the Google search engine’s search power to predict international migration
However, we find that some opaqueness remains in the existing studies, especially in the Chinese context. First, most migration research in China focuses on domestic mobility from rural to urban areas, resulting in a wealth of relevant research (Zhu, 2017). Few studies have systematically examined the expatriate population migrating to China. Second, existing studies on the expatriate population in China have focused mainly on the migration dynamics, transnational ties, and integration patterns of the expatriate population (Li et al., 2009a, 2009b; Lyons et al., 2008), and the understanding of the mechanisms of interaction between migrants’ individual migratory intentions and their migratory behaviors is insufficient. Third, most of the existing studies on the expatriate population in China are qualitative, whereas quantitative studies are lacking (Guo et al., 2022). Therefore, with the help of Google Trends data, this study demonstrates that the search level of highly relevant keywords in the Google search engine for China is a sign of migration intention, as well as revealing the possible factors affecting the migration intention of the expatriate population.
Data and methodology
Data
Using the internet to search for information is the main way people obtain information about the outside world. People primarily use search engines to search for information directly. In recent years, Google Trends data have been shown to predict international migration (Choi and Varian, 2012; Fantazzini et al., 2021). Therefore, three main types of data were used in this study: the Google Trends index, data on expatriates and their countries of origin published in the Seventh Population Census of China, and data related to the country of origin of the expatriate population as a control variable.
The Google Trends index are derived from the Google Trends website (trends.google.com), which provides keyword search data from January 2004 to the present. Using the Google Trends website, the relative search frequency of a keyword and its trend can be counted. Specifically, relative search frequency refers to the ratio of the number of searches for a keyword to the number of searches for all other keywords in the Google search engine in a given time period to reflect the search intensity of that keyword. Focusing on the keyword search emphasis rather than the total number of searches can eliminate the effect of the increase in the number of internet users caused by internet penetration. Therefore, keyword search intensity was the core index of this study.
Second, the information on the number of expatriates and the country of origin used in this paper are derived from the China Census Yearbook 2020, the Seventh Census data. Research has shown that migrants’ potential migration behavior usually lags behind their search behavior (Ginsberg et al., 2008). Scholars have correlated data on search behavior with data on the actual migration of immigrants, and the results show that the lag is usually about one year (Wanner, 2021). Therefore, given that this study used actual migration data of the expatriate population in 2020, the Google Trends index search data used was from 2019.
Third, our empirical model incorporates several key control variables, including origin countries’ GDP, population size, official language, education level, and cultural influence, as well as China’s entry policy. These variables are recognized as key factors influencing population migration in traditional migration studies (Hao et al., 2016). The data on the control variables were partly obtained from statistics published by the United Nations Department of Economic and Social Affairs, International Migration 2019, and partly from the 2019 data published by the UNESCO Institute for Statistics (UIS). The remaining data related to China’s entry policies are sourced from the consular services of the Ministry of Foreign Affairs of China and the policy data on the official website of the National Immigration Administration of China.
Methodology
The methodology consists of three steps: index selection, spatial analysis, and correlation and regression analysis. The first step of index selection helps to select possible keywords to predict migration. The second step of spatial analysis examines the spatial pattern of these keywords and migration to understand the geographical differentiation among the variables in detail. The last step of correlation and regression analysis examines the relationship between the selected keywords and migration, and finds out main keywords that can calculate migration.
The Google Trends index covers many keywords in various fields (Fantazzini et al., 2021). Therefore, this study extracted keywords that reflect expatriates’ predicted migration intentions from the numerous keywords on the Google Trends website as the independent variables of the model. The specific process was as follows:
Based on the literature and Figure 1, we systematically summarized the influencing factors of migrants’ migration intentions, including objective and subjective factors in Table 1 (de Haas and Fokkema, 2011; Guo et al., 2022; Huang et al., 2018). The initial list of keywords includes individual characteristics, policies, economic and cultural factors, living environments, migration opportunities, migration aspirations, and behavioral perceptions.
We further expanded and supplemented the initial keywords list with the help of web search tools, including relevant concepts, phenomena, and trends, to generate a keyword thesaurus.
The keywords in the keyword thesaurus were combined with “China” or “Chinese” to form a phrase, and the keywords that were not included in the Google Trends index or had less than 15 valid data points were excluded, thus forming the final keyword list.
The keywords that were similar in concept were merged. The total number of keywords selected in this study was eight: China economy, China live, China travel, Chinese culture, China map, China policy, China health, and China climate.
List of initial keywords.
Typically, expatriate populations conduct keyword searches in the official language of their country (Rumbaut and Massey, 2013). Therefore, this study translated the selected keywords into the official language of each country and searched the translated keywords on the Google Trends website. This then counts the Google Trends index of the specific keywords in different language environments and adds the Google Trends indices to obtain the keyword’s final Google Trends index. For example, we translated “China live” into the official languages of different countries, searched for it on the Google Trends website, and finally retrieved the following languages (Table 2). Specifically, the search languages used in this study included English, Simplified Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Portuguese, Urdu, Indonesian, German, Japanese, Malay, Korean, Vietnamese, Thai, and Burmese.
Translation of “China live” in different languages.
Additionally, we measured the control variables in following ways:
We adopted the per capita GDP and population size figures of the source countries of expatriates in 2019.
According to the descriptive analysis regarding the popular languages expatriates use for searching information, we classified the official languages of the source countries into four categories, namely English, Spanish, Arabic, and other languages.
To represent the general educational level of expatriates, we tallied the proportion of education expenditure by governments of various countries to their GDP in 2019 as a continuous variable.
To measure the dissemination capacity of the native culture of the source countries of expatriates, we used the proportion of cultural product exports of each source country among all products in 2019 as a continuous variable.
Given the impact of China’s entry policies on immigration, we analyzed China’s visa policies for various countries in 2019 and classified them into three levels based on the difficulty of obtaining the visas, namely visa-free countries, ordinary visas, and visas requiring special permission or security review.
This study integrated the methods of spatial visualization, correlation analysis, linear regression, and interaction effect analysis. First, with the help of ArcGIS 10.6 software, we performed spatial visualization operations on the countries of origin of the expatriate population in China and the provinces of residence in China to form a global understanding of the expatriate population in China. Second, to further analyze the interaction between web search keywords and migration decisions, this study analyzed the Google Trends index of eight selected keywords and correlated them with the actual number of expatriates in China from each country. The correlation between the Google Trends index of each keyword and the actual number of expatriates in China in each country was assessed by calculating Pearson’s correlation coefficient (Pearson’s r). With the support of Stata 17.0 software, linear regression analysis screened out significantly related keywords and excluded variables that were not closely related to migration decision-making to improve the explanatory power and predictive ability of the model. In addition, to compare the reliability of the Google Trends index in predicting expatriate migration, we constructed a benchmark model where the traditional international migration population model only took the population size and GDP per capita and other migration-related factors of the source country as independent variables. We further analyzed whether the regression model that included the Google Trends index significantly improved the prediction accuracy of expatriate migration by comparing the goodness-of-fit (R² value) of the two models. A larger R² value indicated a better fit of the model. Third, we applied the interaction effect analysis to examine the potential interactions between these related keywords.
Empirical results
Descriptive analysis
Descriptive analysis has been carried out in this section, including the spatial distribution of expatriates’ origins and destinations and the search intensity of Google Trends in different language contexts, both of which are closely related to the control variables in the following regression models. Specifically, information on the trajectories of expatriates helps to explain their decision-making process, as the social, economic, and political backgrounds of different source countries and destinations vary greatly. Statistically analyzing the search intensity in different languages contributes to selecting the most popular searching languages. Additionally, such information also implies that countries with the same languages tend to establish deeper trade relations or cultural exchanges with China.
First, the spatial distribution of the source countries of expatriates in China was mapped by taking each country in the world as a unit of analysis, with the number of expatriates migrating to China from each country in the unit as the core indicator. Figure 2 shows that the main source countries of expatriates in China are located in Asia, North America, Europe, and Oceania. According to the Seventh Census data, countries with more than 5,000 expatriates migrating to China include Myanmar, Vietnam, South Korea, Japan, Pakistan, India, Cambodia, the Philippines, Laos, Malaysia, North Korea, and Singapore in Asia; the United States and Canada in North America; the United Kingdom, France, Germany, and Russia in Europe; and Australia in Oceania.

Spatial distribution of the origin countries of expatriates.
Second, the spatial distribution of the expatriate population in China was mapped by taking the provinces, municipalities, and autonomous regions of China as the unit of analysis and using the number of expatriate inflows in the unit as an indicator. Figure 3 shows that expatriates in China are concentrated mainly in Beijing, Guangzhou, Yunnan, and the southeastern coastal provinces. Specifically, Beijing is the capital city of China and the center of Chinese politics, economy, science and technology, and culture; thus, it is one of the top choices for expatriates to come to China (Wu and Webber, 2003). Guangzhou is China’s historic international trade port and one of the cities at the forefront of China’s reform and opening up, attracting many traders from Africa, the Middle East, and Southeast Asia (Li et al., 2009a, 2009b; Mathews, 2015; Zhang, 2008). Bordering several Asian countries, Yunnan Province is the radiation center for South Asia and Southeast Asia under China’s “One Belt, One Road” strategy, which is of strategic significance in cooperation with countries along the Belt and Road, and many expatriates from Southeast Asian and South Asian countries gather there (Wu et al., 2015). Meanwhile, the tourist destination of Yunnan Province has attracted many expatriate residents from Europe, the United States, Japan, and South Korea due to its livable environment and tourism resources (Guo and Spanposh, 2015). The southeastern coastal provinces, which were the first to be developed in China after the reform and opening up, are home to a number of cities aiming to become “global cities” (Feng, 2023; Ponzini, 2021), such as Shanghai, Shenzhen, and Xiamen, as well as emerging global cities such as Ningbo and Fuzhou, which play crucial roles in promoting China’s deeper connections with the world. The role of emerging global cities, such as Shanghai, Shenzhen, Xiamen, Ningbo, and Fuzhou, is crucial in promoting China’s greater connection with the world.

Spatial distribution of expatriates in China.
We counted the search intensity of Google Trends for the eight keywords in different linguistic contexts (Figure 4). Overall, Spanish, one of the most spoken languages in the world, has the greatest search intensity for the eight keywords, ranking first among all languages. Given that the main countries where Spanish is spoken are concentrated in Latin America and Europe (Spain), we can infer that immigrants from these countries or regions are more willing to search for information about China. Second, the English environment is second only to the Spanish environment in terms of search intensity for the eight keywords. Importantly, the search intensity for the keywords China Climate and China Economy is significantly greater in English than in Spanish, suggesting that expatriates for whom English is their official language are more concerned about China’s climate change and economic development. Third, we notice that the group of potential expatriates using Arabic is more concerned with China’s economic development and geography, the group of potential immigrants using Malay is particularly concerned with China’s climate, and the group of potential immigrants using Russian pays particular attention to China’s policies, geography, and economic development. The linguistic contexts are correlated to the online information search for migration.

Online search intensity of keywords in different languages.
Modeling results
With reference to the literature on population migration behaviors and its migration intentions, this study used the Google Trends index of specific search terms that reflect the migration intentions of expatriates, as well as the economic and demographic data of expatriates’ countries of origin, to explore the factors that affect the migration of expatriates to China. This study consists of the following two steps: first, we constructed a correlation model to determine whether there is a correlation between the Google Trends indices of the eight selected keywords and the number of expatriates coming to China from each country; second, we conducted linear regression analyses on the significantly correlated independent variables together with the control variables, to reveal the key factors influencing the expatriate migration from each country to China.
First, this study analyzed the Google Trends index of eight keywords: “China live, China map, China policy, China travel, China culture, China economy, China climate, and China health” and correlated them with the number of expatriates living in China from each country around the world. Table 3 shows the results of the correlation analysis between the Google Trends indices of the eight selected keywords and the number of expatriates from each country in China. The results show that there is a correlation between the Google Trends indices of all eight keywords and the number of expatriates in China. Among them, two keywords, China policy, and China economy, were strongly correlated with the number of expatriates from various countries, with correlation coefficients exceeding 0.5; the correlation coefficients of the remaining six keywords were all greater than 0.3, signifying a moderate correlation. The results of the significance test indicate that the correlations of the eight keywords are all significant, and the significance (p value) of the six keywords China live, China map, China policy, China travel, Chinese culture, and China economy with the number of expatriates from different countries migrating to China was less than 0.01, which is statistically significant.
Results of the correlation analysis.
Note: *p < 0.10, **p < 0.05, and ***p < 0.01.
In the second step, we set up the initial linear regression model (Table 4, Model 1) by taking the natural logarithm of the population size, natural logarithm of per capita GDP, language of origin, education level of origin, cultural influence of origin, and entry policy of China as independent variables, and the natural logarithm of the expatriate population in China as the dependent variable. In the covariance test of the initial linear regression model, the variance inflation factor (VIF) of the model’s independent variables = 1.77 < 5 (Table 5, Model 1); that is, there was no covariance between these independent variables. Among them, the regression coefficient value of the logarithm of population of origin was 0.612 (t = 4.460, p = 0.000 < 0.01), which means that the logarithm of population has a significant positive effect on the logarithm number of expatriates moving to China. An expatriate population that comes from countries with large population sizes may indicate that China has a significant position in the global economy and has stepped into the ranks of high-income economies with strong regional or global influence (Sassen, 2005). Moreover, compared with expatriates who speak other languages, expatriates who speak Arabic have a lower tendency to move to China, as the coefficient value of language of origin was −1.096(t = −1.810, p = 0.076 < 0.10). The welfare system and good living environment in Arab countries may have reduced the possibility of expatriates from these countries migrating to China. The result did not show a significant correlation between the remaining control variables and the logarithm of the number of expatriate population in China. However, countries with visa-free travel to China have more citizens in China than other countries. This implies visa policies may matter to international migration in China. This suggests the importance of policies to transnational migration. However, the goodness-of-fit value (R2) of the initial linear regression model is only 0.433, which suggests that the lack of key explanatory variables in the model leads to an unsatisfactory fit of the model.
Results of the linear regression model.
Note: *p < 0.10, **p < 0.05, and ***p < 0.01.
Collinearity test results for the independent and control variables.
Therefore, we next constructed an updated linear regression model by adding the Google Trends indices of eight keywords as independent variables of the model, using the natural logarithm of the population size, natural logarithm of per capita GDP, language of origin, education level of origin, cultural influence of origin, and entry policy of China in each country as the control variables, and the logarithm of the number of expatriate population of each country in China as the dependent variable (Table 4, Model 2). The regression results show that the goodness-of-fit value of the model (R2) reached 0.922, indicating that the model has high explanatory power and further proves the feasibility and effectiveness of the Google Trends index in explaining the migration behavior of the expatriate population in China. Furthermore, this updated model displays the close relations between the logarithm of per capita GDP of origin and the dependent variable, which means countries high GDP per capita may have harsh competitive environments and high costs of living, which may lead potential migrant groups from these countries to move to China, where labor market demand is high and the living cost is lower. The change in the significance of the control variable precisely indicates that the addition of independent variables has corrected the model settings. However, in this model, the VIF value of the independent variables was greater than 10 (Table 5, Model 2), which indicates the existence of covariance between the independent variables, i.e., there is a strong correlation or linear relationship between the independent variables.
Therefore, we adopted the stepwise regression method to solve the problem of covariance between independent variables (Table 4, Model 3). After several tests, adding five independent variables, namely, “China policy”, “China travel”, “China climate”, “China culture”, and “China economy” to the initial regression model showed the best fitting effect, predictive ability, and significance test. The data show that the model’s goodness-of-fit (R2) reached 0.901, which means that the model can explain 90.1% of the changes in the logarithm of the expatriate population in China, and the goodness-of-fit improved substantially compared with that of the initial regression model. The Model 3 in Table 5 shows that the VIF values of the independent variables in this model are all less than 10, indicating that there is no covariance problem among the variables. With respect to the significance of the variables, the two variables “China policy” (t = 4.900, p = 0.000 < 0.01) and “China travel” (t = 2.250, p = 0.039 < 0.05), have significant positive impacts on the migration behavior of expatriates to China, while the index of the keyword “China culture” (t = −1.970, p = 0.066 < 0.10) lays the negative impact on expatriates’ migration behavior. Among them, “China policy” has the highest significance in terms of the Google Trends index.
Specifically, expatriates’ collection of policy information related to China positively affects their behavior to move to China. On the one hand, policy information can help expatriates keep abreast of China’s visa policies, work permits, tax regulations, social welfare system, etc., and help them assess the likelihood of migration as well as the potential risks of migration (Guo et al., 2022; Ponzini, 2021). On the other hand, expatriates’ collection of Chinese policies promotes their migration choices, reflecting the strong attractiveness of China’s globally oriented policies. For example, the Chinese government’s “green card” policy for foreign professionals, subsidies for high-end talent, and inclusive policies to improve the living and working conditions of expatriates, convey to the world the stability of China’s policies. The sustainability of China’s economic and social development will stimulate the migration decisions of potential migrant groups (Xiang, 2016). Expatriates’ collection of tourism information related to China, including information on China’s natural scenery, historical sites, and cultural activities, can make potential migrant groups interested in China and stimulate their willingness to migrate to China; moreover, the collection of tourism information about China can be interpreted as expatriates’ attempts to gain initial knowledge about China by choosing China as a short-term tourist destination. Migrants may gain in-depth knowledge of China’s job market, education, entrepreneurship, and other information during subsequent trips, laying the foundation for the eventual implementation of their migration plans. Thus, expatriates’ collection of information on Chinese tourism facilitates their migration behavior. Notably, the Google Trends index of “China culture” (t = −1.970, p = 0.066 < 0.10) has a negative effect on the expatriate population’s relocation to China; i.e., the expatriate population’s collection of information related to China’s culture hinders their relocation to China to a certain extent. The possible reasons for this are the differences between Chinese and foreign cultures and the pressure of cultural integration. First, with an in-depth understanding of Chinese culture, including etiquette habits, language, and social norms, the expatriate population will realize the significant differences between Chinese culture and their own culture, which will reduce their willingness to migrate (Chang and Kim, 2016). Second, the collection of information about Chinese culture may strengthen expatriates’ awareness of their status and enable them to anticipate the pressure of integration after migration, as well as possible social discrimination and cultural isolation (Wang and Lau, 2008).
To reveal the potential interactions between these significantly related variables, we further set up a linear regression model incorporating the following variables: “China policy,” “China travel,” “China climate,” “China culture,” “China economy,” and the interaction variable generated by calculating the product of each pairwise combination of these variables. To mitigate multicollinearity concerns, we employed a stepwise procedure to iteratively include the interaction variable into the model. As shown in Table 6, the results revealed statistically significant interaction effects between the keywords “China policy” and “China economy”, and between “China policy” and “China culture.” However, the VIF values exceeding 10 in both models 4 and 5 of Table 7 indicated substantial multicollinearity between these interaction variables and other variables, compromising the model’s explanatory power and coefficient stability.
Results of the interactive linear regression model.
Note: *p < 0.10, **p < 0.05, and ***p < 0.01.
Collinearity test results for the interactive, independent, and control variables.
Based on Google Trends data, this study explores the key factors influencing expatriates’ migration to China via correlation analysis and linear regression analysis. The results show that the search intensity of the keywords “China policy” and “China travel” has the strongest correlation with the number of expatriates in China. The regression analysis reveals that these factors have a significant positive effect on migration behavior, indicating that policy factors and China’s tourism attractiveness play a dominant role in the decision-making of expatriates’ relocation. In particular, migrants’ collection of information on China’s policies can help them assess the possibility and risk of relocation in a more comprehensive way, whereas the collection of tourism information can stimulate migrants’ interest in China and lay the foundation for long-term settlement in China through short-term tourism. However, the collection of Chinese cultural information has a slight negative effect, suggesting that the expatriate population’s concern about cultural differences may hinder migration intentions to some extent. Taken together, the results of this study confirm that the Google Trends index effectively reflects expatriates’ intention to move to China and that information retrieval related to policy, tourism, and culture can influence expatriates’ actual decision to migrate.
Discussion
The significant correlation between the migration and the three keywords of policy, travel, and culture are further explained as follows. China’s policies, such as visa regulations, work permits, tax incentives, and permanent residency cards for expatriates, play a pivotal role in shaping migrants’ perceptions of the feasibility and attractiveness of moving to China. When individuals consider migrating to a foreign land, they require assurance that the host country’s policies are stable, inclusive, and conducive to their settlement and integration. The findings suggest that China’s efforts to streamline and publicize its migration policies have effectively communicated a message of openness and stability to potential migrants worldwide. On the basis of it, the prominence of “China policy” searches underscore the need for transparent and accessible immigration regulations. Simplifying visa procedures, enhancing digital platforms for policy dissemination, and promoting initiatives like the “green card” system for high-skilled migrants could amplify China’s attractiveness.
The positive correlation between the keyword of “China travel” and expatriate migration highlights the dual role of tourism in migration. Tourism serves as a gateway for migrants to gain firsthand experience of China’s cultural, social, and economic environment. It allows them to assess the compatibility of their aspirations with the realities of life in China. This is consistent with the idea that migration decisions are often preceded by exploratory trips or temporary stays. The rich tourism resources of China, including its historical sites, natural landscapes, and cultural festivals, not only attract short-term visitors but also pique the interest of potential long-term migrants. For instance, migrants can establish networks, gather practical information, and develop a more nuanced understanding of the opportunities and challenges that await them in China during their business travels. In the trend of reverse globalization, travel to China is likely to increase, in response to China’s continuous policies opening its doors to the world such as the Belt-Road Initiative.
The results also indicate that the information collection about “China culture” may reduce the probability of expatriates migrating to China, which highlights the dual role of cultural awareness in expatriates’ migration decision. While understanding the host-country culture is often assumed to facilitate migration and integration, our findings suggest that deeper exposure to cultural differences may instead amplify the perceived barriers of expatriates, triggering their anxiety about migration and adaptation. We further point out that the lack of successful showcasing of cultural integration between expatriates and Chinese people may be the main reason that the collection of Chinese cultural information has a negative impact on the migration behavior of expatriates.
In an era of data-driven governance, this study exemplifies how digital traces can illuminate human mobility patterns. For China, leveraging these insights could enhance its global competitiveness in attracting talent, fostering cross-cultural exchange, and sustaining its transition from a labor exporter to a migration destination. As globalization faces headwinds in the 2020s, understanding the interplay of policy, culture, and information will remain pivotal in navigating the complexities of transnational migration.
Conclusions
Expatriate migration has accompanied the process of globalization in China. Since the implementation of China’s Belt-Road Initiative, new employment opportunities, unprecedented convenience, and stronger incentives have prompted expatriates to migrate to China (Guo et al., 2022). Traditional transnational migration research has taken the lead in exploring the triggers of transnational migration, emphasizing that it is a rational decision of individual migrants or families (Hao and Tang, 2015; Qian et al., 2016). Recent studies have shown that migrants with similar socio-economic characteristics have differentiated migration behaviors, emphasizing that in addition to objective factors such as economy, employment, education, and welfare, psychological factors such as migrants’ migration intentions, aspirations, and perceptions still have an important influence on their migration behaviors (Li et al., 2024; Lin et al., 2023; Xu et al., 2023). However, owing to the limitations of traditional survey data, few studies have measured migrants’ migration intentions and information perceptions, and there is a relative lack of attention given to the group of expatriates migrating to China. Furthermore, existing studies are inadequate for analyzing how transnational migrants’ migration intentions and information perceptions affect their migration behaviors.
This paper examines the international migration in China, using Google Trends index and the expatriate population data released by China’s Seventh Population Census. It finds that, of all foreign internet users who speak different languages, Spanish speakers have the highest proportion of searches of eight keywords in the Google Trends indices, which are China economy, China live, China travel, China culture, China map, China policy, China health, China map, China policy, and China health, with English speakers ranking second. The results of this study reveal that there is a strong correlation between the number of expatriates in China and the eight keywords. This suggests that the expatriate population’s understanding of China focuses on the above eight aspects and aims to obtain knowledge and information closely related to their own migration intentions through internet searches.
Importantly, linear regression analyses further reveal that the collection of information about China’s policies, travel, and culture significantly influences migration decisions, whereas other key information is not statistically significant. Specifically, the correlation between Google Trends index of Chinese policies and the number of expatriates in each country is the strongest, suggesting that Chinese policies and institutions may be the primary factors for expatriates to consider to moving to China. The Google Trends index of travel positively facilitates expatriates’ migration to China, whereas the Google Trends index of Chinese culture constrains expatriates’ migration to China.
This study has the following contributions. At the theoretical level, it constructs a migration decision-making framework from a psychological perspective, emphasizing that migration behavior is affected by not only objective factors such as economy, employment, and education but also psychological factors such as migrants’ individual migratory intentions, aspirations, and information perceptions, which improves the traditional framework of cross-country migration analysis (de Haas and Fokkema, 2011). Importantly, as an empirical analysis, this study confirms that migration intentions and information perceptions can be measured via online search data (Choi and Varian, 2012; Fantazzini et al., 2021). On this basis, this study overcomes the limitations of data sources in traditional migration research of China and suggests that the use of the Google Trends index can provide a novel dataset and a comprehensive understanding of migration intentions and information perceptions of transnational migrant populations. The results of this study can help the Chinese government formulate policies on expatriate populations and accurately allocate public resources.
Future research can examine the correlation and causality between the Google Trends index and population migration, and adopt artificial intelligence models such as deep learning to predict population migration. Further studies could address these limitations through multi-method approaches. Combining Google Trends with other big data sources—such as Twitter sentiment analysis or LinkedIn job mobility patterns—could triangulate migration intentions more robustly. Machine learning models, particularly deep learning algorithms, might improve predictive accuracy by capturing non-linear interactions between variables. Moreover, longitudinal designs tracking migrants’ search behaviors pre- and post-migration could elucidate how intentions evolve into actions or abandonment.
Footnotes
Acknowledgements
No acknowledgment.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by National Nature Science Foundation of China, Grant Number 42471275, The Fund of Humanities and Social Science of the Ministry of Education of China, Grant Number 24YJAZH031, and Research Funds for the Central Universities the Fundamental 2042024kf0004.
Ethical approval and informed consent statements
Not applicable. This study uses online public data and no human subject data, and no ethical approval and informed consent statements are required or needed.
Consent to participate
Not applicable. There are no human subjects or animal subjects involved in this research.
Consent for publication
Not applicable
