Sage Journals: Discover world-class research

Abstract

Urban mobility analysis using Twitter as a proxy has gained significant attention in various application fields; however, long-term validation studies are scarce. This paper addresses this gap by assessing the reliability of Twitter data for modeling inner-urban mobility dynamics over a 27-month period in the metropolitan area of Rio de Janeiro, Brazil. The evaluation involves the validation of Twitter-derived mobility estimates at both temporal and spatial scales, employing over 1.6 × 10¹¹ mobile phone records of around three million users during the non-stationary mobility period from April 2020 to June 2022, which coincided with the COVID-19 pandemic. The results highlight the need for caution when using Twitter for short-term modeling of urban mobility flows. Short-term inference can be influenced by Twitter policy changes and the availability of publicly accessible tweets. On the other hand, this long-term study demonstrates that employing multiple mobility metrics simultaneously, analyzing dynamic and static mobility changes concurrently, and employing robust preprocessing techniques such as rolling window downsampling can enhance the inference capabilities of Twitter data. These novel insights gained from a long-term perspective are vital, as Twitter - rebranded to X in 2023 - is extensively used by researchers worldwide to infer human movement patterns. Since conclusions drawn from studies using Twitter could be used to inform public policy, emergency response, and urban planning, evaluating the reliability of this data is of utmost importance.

Keywords

human mobility urban Twitter X mobile phone records Rio de Janeiro COVID-19

Introduction

The substantial increase in the volume of geodata collected worldwide on human mobility behavior has the potential to yield valuable insights about various application domains, including urban transportation planning and epidemiology (Barbosa et al., 2018). By leveraging information on human trajectories, urban planners and policymakers can create more livable, sustainable, and responsive cities that cater to the needs of their inhabitants. This ranges from optimizing traffic flow and more efficient resource allocation to understanding infectious disease dynamics (Ruan et al., 2020; Wang et al., 2021a, 2021b). However, the availability of freely-accessible mobility data sources with high spatio-temporal resolution is limited, which often hampers quantitative research on unexplained phenomena associated with human movement patterns. Consequently, researchers have commonly resorted to open-access and georeferenced Twitter data as a proxy for inferring human mobility patterns. Twitter, a social media platform named X since 2023, enables users to tag their online posts with geocoordinates. Inferring mobility patterns from this data involves tracking the successive tweet locations of individuals over time. These locations typically do not represent trajectories in the conventional sense of semi-continuous paths but rather a random collection of locations with temporal references. Nonetheless, given that not all individuals use Twitter and not all content is posted with geocoordinates, there exists a concern regarding potential biases in this data and its inference capabilities for mobility patterns of the general population (Tsou et al., 2017; Zhao et al., 2021).

In literature, there is a paucity of studies that justify and validate the use of Twitter as a reliable proxy for mobility patterns, particularly on a small spatial scale where Twitter data may be extremely sparse. The interest in using Twitter data for mobility-related urban phenomena, however, is increasingly high, encompassing real-time event monitoring for example of traffic congestion and accidents (Bao et al., 2017; Zia et al., 2022), disaster relief to improve coordination of rescue efforts (Reynard and Shirgaokar, 2019; Wang and Taylor, 2018), social sensing of urban land use (Soliman et al., 2017), urban planning (Milusheva et al., 2021), as well as the early detection and analysis of disease outbreaks (Bisanzio et al., 2020a, 2020b; Huang et al., 2020a, 2020b). Validation studies that exist on larger scales have employed survey data (Terroso-Saenz et al., 2022a, 2022b), census tracts (Petutschnig et al., 2022), or tourism statistics (Hawelka et al., 2014; Provenzano et al., 2018) for evaluation purposes. At the urban scale similar data sources have been utilized, but only five validation studies have been conducted to the best of our knowledge. Kurkcu et al. (2016) compared Twitter data with regional household travel surveys by calculating various mobility metrics, such as the radius of gyration and origin-destination flows, for New York City. However, this study did not examine temporal mobility trends over longer time periods. Lenormand et al. (2014) performed a comparison of Twitter data, mobile phone records, and census statistics, assessing spatio-temporal mobility metrics for Barcelona and Madrid. This study compared datasets from two different time frames, which we consider to have limited validity particularly during non-stationary periods like pandemics. The same limitation applies to the studies conducted by Qian et al. (2018), Steiger et al. (2015), and Osorio-Arjona and García-Palomares (2019), as they used either survey or census data from earlier years than when the Twitter data was collected.

Consolidating aforementioned findings highlights the research gap concerning long-term validation studies pertaining to inner-urban mobility metrics extracted from Twitter data. More specifically, this relates to employing a time-overlapping validation set to assess the accuracy and reliability of Twitter-derived mobility estimates for urban areas over an extended time period. Given these limitations, this paper introduces a novel urban validation study comparing long-term mobility dynamics extracted from geolocated tweets with mobile phone records covering a time frame of 27 months. The research was carried out in the city of Rio de Janeiro during and after COVID-19-related lockdowns, specifically from April 6th, 2020 to June 30th, 2022. The second-largest city in Brazil was chosen due to the availability for mobile phone records and the extended use of Twitter in the country, which ranks fourth globally in terms of Twitter usage (Statista, 2023). Furthermore, the metropolitan area of Rio de Janeiro, with its nearly 14 million inhabitants, provided a suitable urban landscape to address research objectives related to urban science. More specifically, we addressed the following two research questions (RQs):

• RQ1: To what extent can the method of rolling window downsampling assist in counteracting the scarcity of daily-geocoded tweet sequences in cities?

• RQ2: How similar are urban mobility patterns derived from Twitter to long-term spatio-temporal mobility metrics derived from mobile phone data?

Materials and methods

In order to answer the derived research questions, we propose a consecutive framework of data processing, modelling, and validation (cf. Figure 1). The processing part describes the retrieval and filtering of applied datasets as well as the generation of individual movement trajectories and collective origin-destination (OD) matrices. In the modelling section, a stack of five representative spatio-temporal mobility metrics were calculated. The validation part was divided into two studies: (i) a dynamic assessment of long-term mobility trends and (ii) a static validation of mobility change detection capabilities.

Figure 1.

Workflow for the comparison of inner-urban mobility metrics derived from Twitter and mobile phone records. The quantitative validation study is divided into two parts: (i) a long-term trend analysis and (ii) a validation of Twitter’s capability for static mobility change detection.

Data processing

Twitter data

Twitter data was derived from the publicly available Twitter API v2 (Twitter, Inc. 2023b) with special terms of use for academic research (Twitter, Inc. 2023a). The research licence supported the collection of more precise, complete, and unbiased datasets than the publicly available API for commercial use. API access policies and privacy concerns can undergo constant change. We treated all data in accordance with stringent privacy by design guidelines published in Kounadi et al. (2018) and Kounadi and Resch (2018). During the API request we specified an API token, start and end timestamps for the period of analysis, details regarding the case study region presented as rectangular bounding boxes, and two parameters to filter retweeted content and tweets lacking geolocation. Twitter stores geotags implicitly via place IDs. A place ID can be either a point of interest (POI) such as a bus stop close to a user location, a neighborhood, a city or a country name. As we are only interested in inner-urban movement patterns for the city of Rio de Janeiro, tweets with any ‘place type’ larger than a city were excluded from the API request. This selection resulted in 696,235 tweets for the whole study period of 27 months.

After this initial data retrieval, tweets which were located inside the rectangular bounding boxes but outside of the city boundaries of Rio de Janeiro were removed. Since we encountered issues with the shape and naming of city districts within Twitter, tweets with the ‘place type’ tag ‘neighborhood’ were additionally filtered out. This filtering step was applied to prevent possible distortion of the Twitter data in space and, consequently also resulted in a higher resolution of geocoded tweets. The final analysis was therefore conducted on tweets of ‘place type = poi’ only. The ratio of tweets per user exhibited notable heterogeneity over time (cf. Figure 2 - top right), as indicated by a standard deviation of 20.96. To address this imbalance, we employed supplementary bot filtering technique. Through a comprehensive examination of tweet distributions across all users, we identified and filtered out tweets originating from potential bot accounts by implementing a maximum daily tweet threshold of 50 and a maximum daily tweet share threshold of one percent. This was found to be consistent with methodologies employed in other twitter studies (Osorio-Arjona and García-Palomares, 2019; Terroso-Saenz et al., 2022b).

Figure 2.

Sankey diagram of Twitter filtering process (top left); daily geolocated Twitter stream used for long-term validation study with corresponding number of unique users and tweet amount per user in the city of Rio de Janeiro (top right); histogram of inter-tweet time for study period in the city of Rio de Janeiro (bottom left); schematic rolling window downsampling concept for temporal Twitter signals (bottom right).

After data cleaning 420,518 geolocated tweets from 107,500 unique users were used to build individual user movement sequences with the scikit-mobility python library (Pappalardo et al., 2022). To address the daily scarcity of geolocated tweet sequences from individual users (cf. Figure 2 - bottom left), a rolling window downsampling approach was implemented. This method, contingent upon the chosen window size, can increase data volume, enabling the calculation of individual movement trajectories by enhancing the length of tweet sequences from unique Twitter users (Li, 2008). This method enables the calculation of daily mobility metrics while effectively smoothing out short-term fluctuations and outliers, thereby preserving the temporal trend within the dataset. This collective functionality renders it a suitable approach for deriving daily mobility trend signals from limited datasets, such as daily geo-tagged tweets from urban areas, aligning with the necessary objectives of the study. From a practical standpoint, this method involves aggregating and sequencing tweets accumulated over multiple days to compute mobility metrics specifically for a single day positioned at the center of the aggregation window (cf. Figure 2 - bottom right).

However, it is crucial to acknowledge that this method may also entail certain adverse consequences, such as diminished granularity or analytical precision. To address this concern, we applied a rolling window downsampling approach using a grid search across various, uneven window sizes spanning from three to 31 days, resulting in 15 distinct temporal signals. The optimal choice of these window sizes to calculate urban mobility metrics was evaluated as described in Section “Sensitivity of rolling window size”. The selection of the range of window sizes employed for the grid search was predicated on the objective of encompassing around 75% of individual displacements identified in the Twitter data (cf. Figure 2 - bottom left).

Individual human movement trajectories were retrieved from the list of temporally-ordered tweet locations of single users. The collection of these sequences over all users contributed to the generation of collective OD matrices. During this process each tweet location was matched to one of the 163 neighborhoods present in the city of Rio de Janeiro, characterized by a different ratio of tweets per capita for the residential population (cf. Figure 3). The scale of neighborhoods was chosen to align with many census statistics, which could be potentially relevant for follow-up studies. The geographic matching process resulted in daily OD matrices of shape 163 × 163 used for the subsequent calculation of spatio-temporal mobility metrics as explained in Section “Mobility metrics”. To enhance comparison capabilities with mobile phone data, OD matrix entries on the diagonal were set to zero and normalized by the amount of measured movements, which is equal to the remaining sum of OD matrix entries.

Figure 3.

Ratio of tweets per capita for the residential neighorhood population in the city of Rio de Janeiro.

Mobile phone data

As a validation set we used anonymized mobile phone records provided by a large Brazilian telecommunications company. The dataset included individual antenna connections from approximately three million unique users over a time period of 27 months. This is equal to an approximated penetration rate of around 45% for the population of the city of Rio de Janeiro. The temporal resolution of the raw data was 5 min. The data was provided at the level of the antennas (cf. Figure 4 - top right). The mobile phone user is typically connected to the closest antenna, which is used as a proxy for the position of the user at this point in time. The number of antennas in our data set varied daily between 1,200 and 1,250 due to technical failures of some antennas. An antenna connection from an user was recorded when sending a text message, using mobile internet data, or making a call. We retrieved and processed the data of 164,250 million mobile phone records via the distributed computing tool Apache Spark as well as the GPU-accelerated parallel computing framework Dask using the mobilkit python library (Ubaldi et al., 2021). As a first cleaning step, we dropped connections with antennas outside the city boundaries. In order to derive human movement patterns, we generated a sequence of antenna connections for each user over the whole time period using a machine with 7 TB of local scratch. To increase the informative power of successive antenna connections for inferring human movement patterns, we introduced a lower bound (LB) and upper bound (UB) as filters for the inter event time (IET) between sequential antenna connections from a single user as proposed by Zhao et al. (2019). As a result, successive antenna connections between which less than 15 min (LB) or more than 4 hours (UB) elapsed were not counted as movements (cf. Figure 4 - top left). The introduction of a LB was justified by the fact that antenna congestion can cause the user to jump back and forth between antennas without physical moving. A UB was introduced to avoid the counting of movements that are not necessarily made in a direct way. The choice of the lower threshold was selected based on Zhao et al. (2019) and Schlosser et al. (2020). The choice of the upper threshold was inspired by Barboza et al. (2021). OD matrices were created based on IET-filtered daily user sequences. The entries in the diagonal of daily OD matrices were set equal to zero. To ensure comparability with the OD matrices of Twitter data, the OD matrices were normalized by the overall amount of movement activity before being converted from antenna format (1,250 × 1,250) to district format (163 × 163) using methods from Fabrikant (2017) (cf. Figure 4 - bottom right).

Figure 4.

Schematic time line of recorded antenna connections with IET-filtering (top left); resulting antenna to antenna OD matrix flows for the first day of analysis derived from mobile phone data, where darker color represents larger movements (top right); formula for OD matrix conversion from antenna to admin level with schematic illustrations of ”antenna to admin” (orange) and ”admin to antenna” (grey) matrix calculation (bottom right); resulting admin to admin OD matrix heatmap for the first day of analysis using mobile phone data, where darker shade of green describes a higher percentage of measured movement in the city (bottom left).

Mobility metrics

In order to answer our second research question, whether Twitter is a good proxy for modeling inner-urban human movement patterns, we calculated five spatio-temporal mobility metrics. These include the (i) total number of movements, (ii) the average movement distance of individuals, (iii) land use activity metrics, (iv) graph modularity, and (v) the radius of gyration (cf. Supplemental Appendix Figure A1).

M_{t} = \sum_{i = 1}^{163} \sum_{j = 1}^{163} a_{i, j}

(1)

O D_{t} = (\begin{array}{c} a_{1, 1} & \dots & a_{1, 163} \\ ⋮ & ⋱ & ⋮ \\ a_{163, 1} & \dots & a_{163, 163} \end{array})

(2)

\bar{D_{t}} = \frac{1}{M_{t}} \sum_{u = 1}^{U} (\sum_{i = 1}^{n_{u} - 1} \sqrt{{(x_{u, i + 1} - x_{u, i})}^{2} + {(y_{u, i + 1} - y_{u, i})}^{2}})

(3)

Inspired by previous research on human movement patterns (Aletta et al., 2020; de Haas et al., 2020; Hensher et al., 2021; Li et al., 2021; Mützel and Scheiner, 2022; Schlosser et al., 2020), the total number of all movements, denoted as M_t, (cf. Formula (1)), was calculated using daily OD matrices, where a_i,j = 0 for i = j (cf. Formula (2)). The daily average travel distance over all users U, denoted as D^¯_t, was derived from the number of visited locations n_u in the IET-filtered user sequences for each user u (cf. Formula (3)). For each movement from the ith location visited by user u on day t to the (i + 1)-th location, the Euclidean distance between each consecutive pair of visited locations by user u (x_u,i, y_u,i) was calculated. The geolocations of sequential tweets were used as location coordinates for Twitter data. For mobile data, the distance between antennas was utilized. The sum of all tracked paths was then divided by the total number of considered movements, M_t, to calculate the average travel distance in kilometers, following the precedent set by other research papers (Abdullah et al., 2020, 2021; Engle et al., 2020; Fatmi, 2020; Gao et al., 2020a, 2020b; Pardo et al., 2021; Park et al., 2022).

% activity in residential {area}_{t} = \frac{Number of Tweets or Mobile Activity in Residential Areas on day t}{Total Number of Tweets or Mobile Activity on day t} \times 100 %

(4)

We calculated land use-dependent activity metrics using land use land cover maps from the DATA.RIO portal (Municipality of Rio de Janeiro, 2022). These metrics can provide information about the percentage of Twitter or mobile activity that can be assigned to a certain land use structure (Aktay et al., 2020; Da Cavalcante Silva et al., 2021; Hakim et al., 2021; Nanda et al., 2022; Ossimetha et al., 2021; Paez, 2020; Saha et al., 2020, 2021; Shumway-Cook et al., 2005; Sulyok and Walker, 2020; Zhu et al., 2020). In our analysis, we measured the percentage of activity for six types of typical urban land cover categories (residential, public, leisure, industry, education, commerce) present in the city of Rio de Janeiro. For the validation of Twitter, only the percentage of activity in residential areas was used as a representative of this mobility metric type (cf. Formula (4)). The inclusion of all land use-dependent activity metrics was rejected to improve clarity and diversify the analyzed mobility metrics in this study. Several metrics of land use-dependent activity were considered redundant. Residential areas were chosen as the land class of highest interest as they promised the highest variability related to lockdown style policies. For calculating land use-dependent mobility metrics, tweet POI and antenna location were used correspondingly.

The graph modularity, a measure indicating the extent of links within communities compared to links between communities (cf. Figure 5), was calculated using the Louvain algorithm. Modularity, denoted by Q, is computed as the difference between the observed fraction of intra-community edges and the expected fraction if edges were distributed randomly (Blondel et al., 2008). It is defined as follows

Q_{t} = \frac{1}{2 m_{t}} \sum_{i j} (A_{i j, t} - \frac{k_{i, t} k_{j, t}}{2 m_{t}}) δ (c_{i, t}, c_{j, t})

(5)

where A_ij is the element in the adjacency matrix representing the connection between nodes i and j, k_i and k_j are the degrees of nodes i and j respectively, m is the total number of edges, and δ(c_i, c_j) is 1 if nodes i and j belong to the same community and 0 otherwise. In our context, nodes represent neighborhoods, and edge weights represent the sum of traced movements between neighborhoods. The Louvain algorithm operates on an undirected graph constructed using the origin-destination (OD) matrices specified beforehand in Formula (2), which were initially directed to represent one-way movements between origins and destinations and made undirected by multiplying them with their transposes. This process ensures that each element A_ij of the adjacency matrix represents the total movements between nodes i and j, accounting for both directions, in contrast to a_i,j which represents one-way movements. The Louvain modularity value ranges from −0.5 to 1, where higher values indicate mobility networks with more inner-community movements than outer-community movements (Heiler et al., 2020; Newman, 2006; Yildirimoglu and Kim, 2018).

R_{g, t} = \frac{1}{M_{t}} \sum_{u = 1}^{U} \sqrt{\frac{1}{n_{u, t}} \sum_{i = 1}^{n_{u, t}} {(x_{u, i, t} - \bar{x_{u, t}})}^{2} + {(y_{u, i, t} - \bar{y_{u, t}})}^{2}}

(6)

Figure 5.

Schematic concept of graph modularity and radius of gyration. Graph modularity is a measure for the strength of the division of a graph into communities, based on the density of connections within communities compared to connections between communities. The radius of gyration refers to the average travel distance of an individual measured from the center of its movement circle, representing the overall distribution of visited places.

The radius of gyration R_g indicates the average radius of movement of a single user u (cf. Figure 5). We averaged this value over all recorded users U and calculated it on a daily basis. Analogous to the methodology applied in computing the average movement distance, we conducted distance calculations between the Twitter POIs and the antenna location, respectively. Both calculations were run on the basis of the IET-filtered user sequences (Hernando et al., 2021; Kishore et al., 2020; Liu et al., 2018; Wang and Taylor, 2014). The variables x¯_u and y¯_u correspond to the mean of the x-coordinates or y-coordinates of user’s visited locations on day t.

Long-term validation of urban mobility patterns derived from Twitter

The long-term validation of urban mobility metrics derived from Twitter was conducted over a non-stationary mobility period of 817 days. This time period covers the major peaks of the COVID-19 pandemic including subsequent months with high to low mobility restrictions implemented by the local state and municipal government of Rio de Janeiro (Mathieu et al., 2020). The capability of Twitter as a data source to detect long-term mobility change in urban environments was evaluated using mobile phone records. To justify the utilization of mobile phone records as a ’ground-truth’ validation set in our case study, we previously tested spatio-temporal mobility metrics derived from mobile phone data as valid evaluation sets for modeling real-world human movement behavior at an urban scale. For this evaluation, we obtained the stringency index for the city of Rio de Janeiro (cf. Figure 6), which is a globally-standardized indicator of politically-implemented mobility restrictions affecting human movement behaviour (Mathieu et al., 2020). It is a widely-used indicator derived from ordinal measurements for containment, closure policies, and public information campaigns. For the whole study time period during and after the COVID-19 pandemic, we calculated an average absolute Pearson correlation coefficient of 0.7 between all mobility metrics and the stringency index. The graph modularity mobility metric showed the highest overall Pearson correlation coefficient of 0.77. The main advantage of mobile phone records over the stringency index as an assessment dataset for this case study was the high temporal resolution of mobility measurements on a daily basis.

Figure 6.

Stringency index recording the strictness of lockdown style policies in the city of Rio de Janeiro and graph modularity measurements derived from mobile phone data (red). On- and offset time periods indicate manually selected time frames of high to low mobility restrictions defined for static mobility change detection analysis.

The quantitative assessment involved the computation of moving window synchrony among long-term mobility trend signals indicating individual and collective mobility metrics derived from Twitter and mobile phone data as outlined in Section “Mobility metrics”. Time series synchrony denotes the extent to which time series exhibit similar patterns across multiple time steps. Unlike correlation, which quantifies the strength and direction of the linear relationship between time series, synchrony characterizes the temporal alignment and similarity in temporal patterns. We approximated the moving window synchrony by calculating the daily Pearson’s correlation coefficients applying a window size of 60-days.

Long-term trend signals of calculated daily mobility metrics were generated by applying a moving average of 28 days and MinMax-Standardization considering the whole time frame of analysis. Moving average size for trend decomposition was selected based on visual diagnostics to remove weekly oscillations and outliers that appear due to technical antenna failures (cf. Supplemental Appendix Figure A1). The moving average size of 28 days seemed to generate a plausible trade-off signal between long-term trend and short-term mobility changes. Absolute moving window synchrony surpassing values of 0.7 was classified as indicating a high level of alignment, while values below 0.3 were considered to signify a weak tendency to exhibit similar temporal pattern. Intermediate moving window synchrony values ranging from 0.3 to 0.7 represented moderate alignment of events and changes in our study.

Three on- and offset periods were defined based on the stringency index to evaluate the capability of Twitter to additionally measure static change detection (cf. Figure 6). Considering implemented mobility restrictions in the city of Rio de Janeiro, we classified the two-month time periods from April 6th to June 6th in 2020 and 2021 as lockdown style periods (onset) and the time frame from April 6th to June 6th in 2022 as post-lockdown period (offset). With this selection, our goal was to include time intervals that exhibit diverse levels of human mobility, independent of potential seasonal fluctuations, throughout the 3 years of analysis. We selected a two-month interval period starting at the beginning of our analysis to capture both static mobility circumstances and their associated changes. The outcomes of the static mobility change detection were displayed through boxplots and compared with weekday/weekend onsets and offsets extracted from the entire analysis time period. To provide statistical quantification for static urban mobility changes, we conducted Mann-Whitney U tests between on- and offset periods, applying a confidence threshold of 0.05.

Results and discussion

RQ1: evaluation of rolling window downsampling

Examining the initial time period of analysis spanning from April 2020 to September 2020 (cf. Figure 7), all computed mobility metrics derived from Twitter exhibited discernible patterns that aligned with our expectations based on the implemented lockdown measures in the city of Rio de Janeiro. Notably, while the long-term trend of the graph modularity metrics and the percentage of activity in residential areas decreased, the long-term trends of average movement distance, overall movement volume, and the radius of gyration increased.

Figure 7.

Standardized inner-urban mobility metrics derived from daily tweet sequences applying rolling window downsampling (RWDS). Results of 7-day and 27-day rolling windows (dark blue) are compared with the daily raw and trend signal of Twitter mobility metrics without applying RWDS (light blue). The trend signals are calculated using a moving average of 28 days. Non-standardized mobility metrics derived from Twitter for a 11-day rolling window size are visualized in cf. Supplemental Appendix Figure A2.

During the subsequent time period from September 2020 to May 2021, all mobility metrics derived from Twitter, except the percentage of activity in residential areas, displayed unexpected changes. They all showed a rapid shift starting in February 2021 dis-aligning our assumptions on more or less constant mobility behaviour in that time period. Coinciding with this period, there was a sharp decline in the number of geolocated tweets collected via the public Twitter API (cf. Figure 2). We hypothesize that this decline was attributed to changes in the terms of use implemented by Twitter. However, official evidence of regulatory changes during that specific time period has not been found. Additional experiments using a constant amount of tweets per day, derived by the 98th percentile of tweet volume in the corresponding rolling window subset, showed a similar shift in mobility metrics (cf. Supplemental Appendix Figure A3). This highlights the robustness of calculated mobility metrics in the face of daily fluctuations in the number of tweets.

For the analysis period subsequent to May 2021, the calculated mobility metrics once again aligned with our expectations and confirmed our knowledge of fewer mobility restrictions implemented in the city of Rio de Janeiro following the COVID-19 pandemic.

The results also demonstrate that, while a moving average can effectively eliminate weekly fluctuations and data noise, it does not suffice for generating accurate long-term trends for all considered mobility metrics in this analysis. However, when combined with the specifically designed rolling window downsampling (RWDS) approach, more precise long-term mobility trends can be derived. This effect becomes particularly evident when examining the calculated graph modularity metrics in our case study, as the modularity values between the 1-day window size signals and the seven- or 27-day rolling window size signals exhibit larger differences. In contrast, for other calculated mobility metrics, the impact of RWDS appears to have relatively low significance and yields effects comparable to those obtained by calculating a 1-day window trend signal. Supplementary materials provide corresponding results of daily mobility metrics calculated without applying a moving average (cf. Supplemental Appendix Figure A3). The influence of different rolling window sizes is more extensively investigated in the subsequent section in conjunction with long-term trends derived from mobile phone data.

RQ2: validation of long-term urban mobility patterns derived from Twitter

Long-term validations of urban mobility metrics derived from Twitter are infrequent, despite the well-established usage of Twitter applications in various research domains worldwide. However, the outcomes of our comprehensive long-term validation study emphasize the need for caution when utilizing Twitter data for urban studies within restricted time frames. Although urban mobility metrics derived from Twitter may exhibit high correlation values with mobility metrics computed from mobile phone data during short time periods, long-term validation with mobile phone data reveals fluctuating deviations (cf. Figure 9). This phenomenon can potentially give rise to erroneous assumptions when relying solely on Twitter as a reliable source for modeling human movement patterns.

Sensitivity of rolling window size

The results presented earlier in Section “Evaluation of rolling window downsampling” demonstrate that the RWDS method is a valuable tool for addressing the data scarcity challenge associated with urban Twitter data and deriving more precise long-term mobility trends. However, additional findings highlight the significant dependence of these findings on the chosen rolling window size (cf. Figure 8). In our experiments we observed the highest average correlation value between mobility metrics from Twitter and mobile data when using an 11-day rolling window size. Increasing the window size from 1 day to 3 days had the most pronounced effect on the calculated Pearson correlation values. For window sizes exceeding 11 days, the correlation values remained consistently high but showed a slight flattening. This can be attributed to the loss of high-resolution information resulting from the application of larger window sizes beyond 11 days. These findings align with our expectations regarding the functionality of the RWDS method described in Section “Twitter data”. The mean movement distance index yielded the highest average Pearson correlation coefficient among all considered mobility metrics, achieving its peak of 0.48 at the 11-day rolling window downsampling size (cf. Supplemental Appendix Table 1).

Figure 8.

Long-term mobility metrics derived from Twitter (blue) applying rolling windows size of 11 days and mobile phone data (red) including moving window synchrony of 60 days (black), where on- and offset represent time periods of high and low mobility restrictions. The moving window correlations exhibited statistical significance, except for transitional phases between positive and negative synchrony. Non-standardized mobility metrics are visualized in Supplemental Appendix Figure A1 and Supplemental Appendix Figure A2. (cf. Supplemental Appendix Figure A4 for more detailed visualization).

Long-term mobility trend

During the dynamic analysis of the long-term trend of calculated mobility metrics using moving window synchrony, it becomes evident that the Pearson’s correlation coefficients exhibit substantial variations over time for all the calculated mobility signals (cf. Figure 9). We observed the occurrence of short time periods characterized by both extremely high and extremely low correlation values. These findings indicate that the informative capacity of mobility metrics derived from Twitter exhibits temporal variability and is strongly contingent upon the chosen time frame for analysis. During the initial phase of the study period, when the most stringent mobility restrictions were implemented (cf. Figure 6), we observed high positive correlation values across all metrics simultaneously. Conversely, we did not observe similar prolonged time periods characterized by a weak alignment, as indicated by low Pearson’s correlation coefficients around zero. Notably, higher moving window correlation values exhibited greater statistical significance than lower values.

Figure 9.

Mean Pearson’s correlation coefficients calculated over the whole time period of analysis between mobility metrics derived form Twitter and mobile phone data considering varying window sizes for RWDS.

To eliminate the possibility of spurious correlations, all time series were examined for unit roots using the appropriate version of the Dickey-Fuller test before calculating Pearson correlation coefficients. The test results indicated that seven out of ten time series were stationary, allowing for the calculation of Pearson correlation coefficients. However, the time series for “Number of movements”, “Graph modularity”, and “% activity in residential areas” measured based on Twitter data, remained non-stationary. Following the ”Standard sequence of steps for dealing with non-stationary time series” as outlined by Studenmund (2017), we tested the pairs of Twitter data and mobile phone time series for the metrics “Number of movements”, “Graph modularity”, “% activity in residential areas” for cointegration using the Engle-Granger test. The Engle-Granger test results indicated that the time series for the metrics “Number of movements” and “% activity in residential areas” were cointegrated at a confidence level of 95%, while the time series for the metric “Graph modularity” were cointegrated at a confidence level of 90%. According to Studenmund (2017), if the variables have unit roots and are also cointegrated, this allows for the calculation of the Pearson correlation coefficient using the original units, thereby ruling out spurious correlations.

Static mobility change detection

Additional findings from a static change detection analysis reinforce the results of our long-term trend analysis (cf. Figure 10). While it is evident that Twitter data does not always accurately capture long-term mobility trends, it does have the potential to detect significant (cf. Supplemental Appendix Table 2) inner-urban mobility changes measured by mobile phone data and indicate the correct direction of the shift. In our case study, this holds true for all the measured variables except for the percentage of activity in residential areas during the time period of the second onset. In summary, we conclude that both the Twitter and mobile phone datasets synchronously detected the shift in inner-urban human movement behavior between the years 2020, 2021, and 2022, attributable to COVID-19 lockdown policies. Static mobility changes between weekdays and weekends were not detected to be significant (cf. Supplemental Appendix Table 2) when testing both datasets, leading to the conclusion that Twitter can be a useful substitute for mobile phone records when trying to derive the direction of static inner-urban mobility shifts.

Figure 10.

Static urban mobility change detection applying Twitter and mobile phone data. Comparison between weekday/weekend (top row) and lockdown style/post lockdown style time periods (bottom row). p-values of applied Mann-Whitney U tests for static urban mobility change detection are listed in cf. Supplemental Appendix Table 2.

Limitations

We performed a sensitivity analysis of various window sizes for RWDS. Thereby, we employed a combination of different modeling techniques. This included a dynamic mobility trend analysis and a static mobility change detection. In addition, we considered a set of five distinct mobility metrics. However, our findings show certain limitations, primarily stemming from the choice of a 28-day moving average for trend calculation, a 60-day window synchrony time frame for analyzing dynamic alignment of trend signals, and the temporal selection of on- and offsets for static change detection analysis. Furthermore, our results may be subject to potential biases due to the uneven distribution of Twitter user groups within the overall population (Li et al., 2013; Malik et al., 2015). We did not account for the spatial distribution of inferential uncertainty in our analysis either, although districts with fewer geocoded tweets can be expected to exhibit a higher degree of uncertainty (Huang and Carley, 2019; Huang and Wong, 2015). This particularly affects the graph modularity metrics calculated based on daily OD matrices. The spatial distortion in the applied datasets is supported by the low correlation of non-zero OD matrix entries aggregated over the entire analysis period (cf. Figure 11). Additional results from spatial data exploration, which highlight these issues, are provided in the supplementary GitHub repository (Knoblauch and Groß, 2023).

Figure 11.

Comparison of temporally aggregated OD matrix entries from Twitter and mobile phone data without considering zero values. Here an OD matrix entry represents a movement between two distinct neighborhoods.

To address these limitations, several approaches might be applicable: Recent studies on semantic analysis (Hu et al., 2023; Serere et al., 2023) demonstrate promising results in deriving geolocalized information from tweet texts of non-geolocated tweets, which could enhance the Twitter dataset with supplemental geoinformation. Another approach involves utilizing the locations provided in user profiles as a further source of geoinformation. However, it should be noted that these techniques have limited applicability in the context of inner-urban mobility studies (Nguyen et al., 2022).

Another aspect of discussion in our long-term validation study pertains to the disparate spatial and temporal resolutions of the employed datasets. Additionally, the raw Twitter data utilized represents less than one percent of the total mobile phone records used in this validation study, leading to a substantial imbalance with potential implications on our validation outcomes (Zhao et al., 2021). Furthermore, certain assumptions were made during the pre-processing stage to facilitate the generation of our validation signal. These assumptions include the selection of lower and upper bounds for IET filtering and the assumption of a uniform distribution of cellular activity in space when converting antenna-based OD matrices into neighborhood-based mobility flows. Additionally, we assumed that the sequential activities of individual users directly represent movements, disregarding the possibility of detours which may introduce a bias in our results. However, we believe that the overall impact of these constraints is relatively minor. We anticipate that conducting supplementary sensitivity analyses on the model parameters would not alter the main findings of this novel long-term validation study, primarily because all parameters and steps were carefully chosen and justified, as described in Section “Materials and methods”.

Conclusion

Our findings demonstrate the effectiveness of employing rolling window downsampling as a viable strategy to address the limited availability of geolocated tweets in urban areas (cf. Figure 7). Our results indicate that Twitter has the potential to capture short-term changes in mobility at an inner-urban scale (cf. Figure 10), although long-term disparities were observed when compared to mobility metrics derived from mobile phone data in our case study (cf. Figure 9). To enhance the reliability of short-term inference from Twitter data on inner-urban human movement patterns, we propose a combination of multiple analysis techniques, including dynamic and static mobility change detection, simultaneous consideration of various human movement metrics, and sensitivity analysis for modeling parameters. Implementing these approaches can significantly mitigate the risk of false inference in diverse application domains where Twitter is commonly utilized as an open-source proxy for deriving human movement patterns.

Considering the increasingly stringent open-access limitations to Twitter data, this long-term study establishes a foundation for assessing the validity of also upcoming social media platforms. Voluntarily shared geo-social media data can be a powerful and promising tool, especially in locations where other mobility data sources are not openly-accessible or to costly to generate. Since the availability of data sources significantly impacts applications, future research should encompass not only data performance metrics for delineating mobility patterns but also sustainability in terms of long-lasting and openly accessible APIs. Another research option could involve the fusion of data from multiple sources such as Waze, GDELT, Facebook, Instagram, Reddit, Telegram, YouTube, or Weibo. The methods developed in this paper could then be transferred to other geo-social media platforms. Besides that developed methods and egnerated insights could always be applied with payment plans for API access offered by Twitter.

By conducting this study, our aim was not only to support researchers in effectively utilizing social media data for modeling human movement patterns but also to gain valuable insights into human mobility within the city of Rio de Janeiro, Brazil. These findings open up new avenues for future research on unexplained mobility-driven phenomena in urban science, such as the location of informal economy (López-García, 2023), accessibility impacts of transport policy (Pereira, 2019), and inner-urban transmission processes of mosquito-borne diseases (Ramadona et al., 2019).

Supplemental Material

Supplemental Material - Long-term validation of inner-urban mobility metrics derived from Twitter

Supplemental Material for Long-term validation of inner-urban mobility metrics derived from Twitter by Steffen Knoblauch, Simon Gross, Sven Lautenbach, Antonio A de A Rocha, Marta C González, Bernd Resch, Dorian Arifi, Thomas Jänisch, Ivonne Morales, Alexander Zipf in Environment and Planning B: Urban Analytics and City Science.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the German Research Foundation (DFG) [grant number 451956976].

ORCID iDs

Steffen Knoblauch

Simon Groß

Data availability statement

All digitally shareable materials necessary to reproduce the reported methodology have been made available in a public, open-access repository (https://doi.org/10.5281/zenodo.8305678).

Supplemental Material

Supplemental material for this article is available online.

Steffen Knoblauch is a PhD candidate at the Graduate School of Mathematical and Computational Methods for the Sciences (HGS MathComp) at Heidelberg University, under the supervision of Prof. Dr. Alexander Zipf, head of the GIScience Research Group. Prior to joining Heidelberg University, he worked as a part-time research assistant at HeiGIT gGmbH in Heidelberg. He earned his B.Sc. and M.Sc. in Industrial Engineering from the Karlsruhe Institute of Technology (KIT), specializing in Data Science, Machine Learning, Stochastic Optimization, and Market Engineering. During his studies, he gained practical experience working for various companies, including the start-up Lilium GmbH and Daimler, among others. He had previous scientific affiliations with the “Smart Grids and Energy Markets” research group at KIT and the FZI Research Center for Information Technology in Karlsruhe. His current research focuses on spatio-temporal modeling, deep learning, and big spatial data analytics.

Simon Groß is a master's student in geoinformatics at the University of Vienna, having completed his bachelor's degree in geography at Heidelberg University. Previously, he worked as a student assistant at the GIScience research group at Heidelberg University.

Sven Lautenbach is adjunct professor at Heidelberg University and chief scientist at the HeiGIT. He is involved in the research activities at the GIScience Research Group at the Heidelberg University. He owns a Diplom (equivalent to a M.Sc.) in geography from the University of Heidelberg, Germany and a Diplom (equivalent to a M.Sc.) in applied system science from the University of Osnabrück, Germany. He got his Dr. rer.-net. (equivalent to a PhD) from the University of Osnabrück, Germany. Afterwards he worked as postdoctoral researcher/senior scientist at the department of computational landscape ecology at the Helmholtz Centre for Environmental Research – UFZ in Leipzig Germany. During that time he acted as a substitute for an assistant professorship in geomatics at the Humboldt University, Berlin, Germany. Afterwards he worked as an assistant professor for land use modelling and ecosystem services at the agricultural faculty of the University of Bonn and successfully passed his midterm evaluation. In addition he worked as adjunct faculty for the George Mason University, Fairfax, VA, USA.

Antonio Augusto de Aragão Rocha is Associate Professor in the Computer Science Department from the Institute of Computing at the Fluminense Federal since 2011. He received a MSc and PhD degrees in Computer and Systems Engineering (PESC/COPPE) from the Federal University of Rio de Janeiro (UFRJ) Brazil, in 2003 and 2010, respectively. During PhD, in 2008–2009, he has been a visiting scholar in the Computer Science at University of Massachusetts Amherst (UMass). He worked as a post-doc researcher at UFRJ, supported by INCT WebScience. He received his bachelor's degree in Computer Science at University of Salvador (UNIFACS) in 2000. He has a Research Productivity Fellowship granted by CNPq (since 2014). His areas of interest include performance evaluation, traffic engineering, network measurement, next generation Internet, network science and security systems. Dr. Antonio Rocha has published many papers in important journals and conferences and his work has received a few awards.

Marta C Gonzalez is Associate Professor of City and Regional Planning at the University of California, Berkeley, and a Physics Research faculty in the Energy Technology Area (ETA) at the Lawrence Berkeley National Laboratory (Berkeley Lab). With the support of several companies, cities and foundations, her research team develops computer models to analyze digital traces of information mediated by devices. They process this information to manage the demand in urban infrastructures in relation to energy and mobility. Her recent research uses billions of mobile phone records to understand the appearance of traffic jams and the integration of electric vehicles into the grid, smart meter data records to compare the policy of solar energy adoption and card transactions to identify habits in spending behavior. Prior to joining Berkeley, Marta worked as an Associate Professor of Civil and Environmental Engineering at MIT, a member of the Operations Research Center and the Center for Advanced Urbanism. She is a member of the scientific council of technology companies such as Gran Data, PTV and the Pecan Street Project consortium.

Bernd Resch is an Associate Professor at University of Salzburg’s Department of Geoinformatics – Z_GIS and a Visiting Scholar at Harvard University (USA). Bernd Resch did his PhD in the area of “Live Geography” (real-time monitoring of environmental geo-processes) together with University of Salzburg and MIT. His research interest revolves around understanding cities as complex systems through analysing a variety of digital data sources, focusing on developing machine learning algorithms to analyse human-generated data like social media posts and physiological measurements from wearable sensors. The findings are relevant to a number of fields including urban research, disaster management, epidemiology, and others. Bernd received the Theodor Körner Award for his work on “Urban Emotions”. Amongst a variety of other functions, he is an Editorial Board Member of IJHG, IJGI and PLOS ONE, a scientific committee member of various international conferences (having chaired several conferences), an Associated Faculty Member of the doctoral college “GIScience”, and an Executive Board member of Spatial Services GmbH.

Dorian Arifi is a PhD student and holds a M.Sc. in Data Science from the University of Salzburg with a particular focus on Deep Learning and Database Management. Prior to that, he earned a B.Sc. in Economics from the LMU Munich, where he specialized in statistical analysis and behavioral economics. His research interests include the development of Artificial Intelligence models for geospatial data analysis and Natural Language Processing for social media analysis.

Thomas Jänisch is an Infectious Disease Epidemiologist and Clinical Scientist. For the last 15 years, he has coordinated multicentric observational clinical research projects on arbovirus infections like Dengue or Zika. His research was instrumental to provide key evidence for the WHO Dengue classification of 2009. After that, he focused on warning signs for severe Dengue and on standardized severe disease endpoints in Dengue. When the Zika epidemic hit Latin America in 2016, he was able to use the existing network of partners to mount a research response against Zika. Dr. Jaenisch is involved in large multicentric pregnant women and children cohorts in Latin America as well as in Data Sharing and Harmonization of Infectious Disease cohorts, building on the ongoing Zika birth cohorts. Dr Jaenisch was trained as a Medical Doctor in Germany, has obtained a PhD in International Health at Johns Hopkins University Bloomberg School of Public Health, and has worked in Tropical Medicine and Global Health at Heidelberg University Hospital from 2005–2019. He was recently recruited as the Director of the new Arbovirus Research Consortium (ARC), located at the Center for Global Health at the Colorado School of Public Health.

Ivonne Morales currently serves as an Epidemiologist at University Hospital Heidelberg. He previously worked as a Postdoctoral Research Fellow at the National Institutes of Health in the USA. He holds a Master's degree in Public Health from Johns Hopkins Bloomberg School of Public Health and earned his PhD in Virology from Heidelberg University.

Alexander Zipf is chair of GIScience (Geoinformatics) at Heidelberg University (Department of Geography) since late 2009. He is member of the Centre for Scientific Computing (IWR) and founding member of the Heidelberg Center for the Environment (HCE). From 2012–2014 he was Managing Director of the Department of Geography, Heidelberg University. In 2011–2012 he acted as Vice Dean of the Faculty for Chemistry and Geosciences, Heidelberg University. Currently he is busy establishing the Heidelberg Institute for Geoinformation Technology (HeiGIT).

References

Abdullah

Dias

Muley

, et al. (2020) Exploring the impacts of covid-19 on travel behavior and mode preferences. Transportation Research Interdisciplinary Perspectives 8: 100255. DOI: 10.1016/j.trip.2020.100255.

Abdullah

Ali

Hussain

, et al. (2021) Measuring changes in travel behavior pattern due to covid-19 in a developing country: a case study of Pakistan. Transport Policy 108: 21–33. DOI: 10.1016/j.tranpol.2021.04.023.

Aktay

Bavadekar

Cossoul

, et al. (2020) Google covid-19 community mobility reports: anonymization process description (version 1.1). DOI: 10.48550/arXiv.2004.04145.

Aletta

Brinchi

Carrese

, et al. (2020) Analysing urban traffic volumes and mapping noise emissions in rome (Italy) in the context of containment measures for the covid-19 disease. Noise Mapping 7(1): 114–122. DOI: 10.1515/noise-2020-0010.

Bao

Liu

, et al. (2017) Incorporating twitter-based human activity information in spatial analysis of crashes in urban areas. Accident Analysis & Prevention 106: 358–369. DOI: 10.1016/j.aap.2017.06.012.

Barbosa

Barthelemy

Ghoshal

, et al. (2018) Human mobility: models and applications. Physics Reports 734: 1–74. DOI: 10.1016/j.physrep.2018.01.001.

Barboza

MHC

Alencar

Chaves

, et al. (2021) Identifying human mobility patterns in the rio de janeiro metropolitan area using call detail records. Transportation Research Record: Journal of the Transportation Research Board 2675(4): 213–221. DOI: 10.1177/0361198120977655.

Bisanzio

Kraemer

MUG

Bogoch

, et al. (2020a) Use of twitter social media activity as a proxy for human mobility to predict the spatiotemporal spread of covid-19 at global scale. Geospatial health 15(1). DOI: 10.4081/gh.2020.882.

Bisanzio

Kraemer

MUG

Brewer

, et al. (2020b) Geolocated twitter social media data to describe the geographic spread of sars-cov-2. Journal of Travel Medicine 27(5): taaa120. DOI: 10.1093/jtm/taaa120.

10.

Blondel

Guillaume

Lambiotte

, et al. (2008) Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10): P10008. DOI: 10.1088/1742-5468/2008/10/P10008.

11.

Da Cavalcante Silva

Monteiro de Almeida

Oliveira

, et al. (2021) Comparing community mobility reduction between first and second covid-19 waves. Transport Policy 112: 114–124. DOI: 10.1016/j.tranpol.2021.08.004.

12.

de Haas

Faber

Hamersma

(2020) How covid-19 and the Dutch ’intelligent lockdown’ change activities, work and travel behaviour: evidence from longitudinal data in The Netherlands. Transportation Research Interdisciplinary Perspectives 6: 100150. DOI: 10.1016/j.trip.2020.100150.

13.

Engle

Stromme

Zhou

(2020) Staying at home: mobility effects of covid-19. SSRN Electronic Journal. DOI: 10.2139/ssrn.3565703.

14.

Fabrikant

(2017) A guide to creating a mobility matrix: cell towers, chiefdoms, and anonymized call detail records. The Medium. https://mikefabrikant.medium.com/cell-towers-chiefdoms-and-anonymized-call-detail-records-a-guide-to-creating-a-mobility-matrix-d2d5c1bafb68.

15.

Fatmi

(2020) Covid-19 impact on urban mobility. Journal of Urban Management 9(3): 270–275. DOI: 10.1016/j.jum.2020.08.002.

16.

Gao

Rao

Kang

, et al. (2020a) Mapping county-level mobility pattern changes in the United States in response to covid-19. SIGSPATIAL Special 12(1): 16–26. DOI: 10.1145/3404820.3404824.

17.

Gao

Rao

Kang

, et al. (2020b) Association of mobile phone location data indications of travel and stay-at-home mandates with covid- 19 infection rates in the us. JAMA Network Open 3(9): e2020485. DOI: 10.1001/jamanetworkopen.2020.20485.

18.

Hakim

Victory

Chevinsky

, et al. (2021) Mitigation policies, community mobility, and covid-19 case counts in Australia, Japan, Hong Kong, and Singapore. Public Health 194: 238–244. DOI: 10.1016/j.puhe.2021.02.001.

19.

Hawelka

Sitko

Beinat

, et al. (2014) Geo-located twitter as proxy for global mobility patterns. Cartography and Geographic Information Science 41(3): 260–271. DOI: 10.1080/15230406.2014.890072.

20.

Heiler

Reisch

Hurt

, et al. (2020) Country-wide mobility changes observed using mobile phone data during covid-19 pandemic. In: 2020 IEEE international conference on big data (big data), Atlanta, GA, 10–13 December 2020. IEEE, pp. 3123–3132.

21.

Hensher

Beck

Wei

(2021) Working from home and its implications for strategic transport modelling based on the early days of the covid-19 pandemic. Transportation Research Part A: Policy and Practice 148: 64–78. DOI: 10.1016/j.tra.2021.03.027.

22.

Hernando

Mateo

Bayer

, et al. (2021) Radius of Gyration as Predictor of COVID-19 Deaths Trend with Three-Weeks Offset. DOI: 10.1101/2021.01.30.21250708.

23.

Resch

, et al. (2023) Geographic information extraction from texts (geoext). In: Kamps

Goeuriot

Crestani

, et al. (eds) Advances in Information Retrieval, Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, Vol. 13982, 398–404. DOI: 10.1007/978-3-031-28241-6_44.

24.

Huang

Carley

(2019) A large-scale empirical study of geotagging behavior on twitter. In: Spezzano

Chen

Xiao

(eds) Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, Vancouver British Columbia Canada, 27–30 August 2019. New York, NY, USA: ACM, 365–373. DOI: 10.1145/3341161.3342870.

25.

Huang

Wong

DWS

(2015) Modeling and visualizing regular human mobility patterns with uncertainty: an example using twitter data. Annals of the Association of American Geographers 105(6): 1179–1197. DOI: 10.1080/00045608.2015.1081120.

26.

Huang

Jiang

, et al. (2020a) Twitter, human mobility, and covid-19. DOI: 10.48550/arXiv.2007.01100.

27.

Huang

Jiang

, et al. (2020b) Twitter reveals human mobility dynamics during the covid-19 pandemic. PLoS One 15(11): e0241957. DOI: 10.1371/journal.pone.0241957.

28.

Kishore

Kiang

Engø-Monsen

, et al. (2020) Measuring mobility to monitor travel and physical distancing interventions: a common framework for mobile phone data analysis. The Lancet. Digital health 2(11): e622–e628. DOI: 10.1016/S2589-7500(20)30193-X.

29.

Knoblauch

Gross

(2023) Github repository for this manuscript called long-term validation of inner-urban mobility metrics derived from twitter. Zenodo. doi: 10.5281/zenodo.8304597.

30.

Kounadi

Resch

(2018) A geoprivacy by design guideline for research campaigns that use participatory sensing data. Journal of Empirical Research on Human Research Ethics 13(3): 203–222. DOI: 10.1177/1556264618759877.

31.

Kounadi

Resch

Petutschnig

(2018) Privacy threats and protection recommendations for the use of geosocial network data in research. Social Sciences 7(10): 191. DOI: 10.3390/socsci7100191.

32.

Kurkcu

Ozbay

Morgul

(2016) Evaluating the usability of geo-located twitter as a tool for human activity and mobility patterns: a case study for new york city. In: Transportation research board’s 95th annual meeting, Washington, D.C., 10–14 January 2016, pp. 1–20.

33.

Lenormand

Picornell

Cantú-Ros

, et al. (2014) Cross-checking different sources of mobility information. PLoS One 9(8): e105184. DOI: 10.1371/journal.pone.0105184.

34.

(2008) Upsampling and downsampling. https://www.eetimes.com/multirate-dsp-part-1-upsampling-and-downsampling/.

35.

Goodchild

(2013) Spatial, temporal, and socioeconomic patterns in the use of twitter and flickr. Cartography and Geographic Information Science 40(2): 61–77. DOI: 10.1080/15230406.2013.777139.

36.

Zhao

Haitao

, et al. (2021) How did micro-mobility change in response to covid-19 pandemic? A case study based on spatial-temporal-semantic analytics. Computers, Environment and Urban Systems 90: 101703. DOI: 10.1016/j.compenvurbsys.2021.101703.

37.

Liu

Yang

Zhao

, et al. (2018) Temporal understanding of human mobility: a multi-time scale analysis. PLoS One 13(11): e0207697. DOI: 10.1371/journal.pone.0207697.

38.

López-García

(2023) Worker Mobility and Urban Policy in Latin America: Policy Interactions and Urban Outcomes in Mexico City. Routledge Advances in Regional Economics, Science and Policy. 1 edition. New York, NY: Routledge.

39.

Malik

Lamba

Nakos

, et al. (2015) Population bias in geotagged tweets. Proceedings of the International AAAI Conference on Web and Social Media 9(4): 18–27. DOI: 10.1609/icwsm.v9i4.14688.

40.

Mathieu

Ritchie

Rode´s-Guirao

, et al. (2020) Coronavirus pandemic (covid-19). https://ourworldindata.org/coronavirus.

41.

Milusheva

Marty

Bedoya

, et al. (2021) Applying machine learning and geolocation techniques to social media data (twitter) to develop a resource for urban planning. PLoS One 16(2): e0244317. DOI: 10.1371/journal.pone.0244317.

42.

Municipality of Rio de Janeiro (2022) Land use land cover (lulc) data. https://www.data.rio/apps/PCRJ::uso-do-solo-1/about.

43.

Mützel

Scheiner

(2022) Investigating spatio-temporal mobility patterns and changes in metro usage under the impact of covid-19 using taipei metro smart card data. Public Transport 14(2): 343–366. DOI: 10.1007/s12469-021-00280-2.

44.

Nanda

Nursetyo

Ramadona

, et al. (2022) Community mobility and covid-19 dynamics in jakarta, Indonesia. International Journal of Environmental Research and Public Health 19(11): 6671. DOI: 10.3390/ijerph19116671.

45.

Newman

MEJ

(2006) Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America 103(23): 8577–8582. DOI: 10.1073/pnas.0601602103.

46.

Nguyen

Tsolak

Karmann

, et al. (2022) Efficient and reliable geocoding of German twitter data to enable spatial data linkage to official statistics and other data sources. Frontiers in Sociology 7: 910111. DOI: 10.3389/fsoc.2022.910111.

47.

Osorio-Arjona

García-Palomares

(2019) Social media and urban mobility: using twitter to calculate home- work travel matrices. Cities 89: 268–280. DOI: 10.1016/j.cities.2019.03.006.

48.

Ossimetha

Kosar

, et al. (2021) Socioeconomic disparities in community mobility reduction and covid-19 growth. Mayo Clinic Proceedings 96(1): 78–85. DOI: 10.1016/j.mayocp.2020.10.019.

49.

Paez

(2020) Using google community mobility reports to investigate the incidence of covid-19 in the United States. Findings. DOI: 10.32866/001c.12976.

50.

Pappalardo

Simini

Barlacchi

, et al. (2022) scikit-mobility: a python library for the analysis, generation, and risk assessment of mobility data. Journal of Statistical Software 103(4): 1–38. DOI: 10.18637/jss.v103.i04.

51.

Pardo

Zapata-Bedoya

Ramirez-Varela

, et al. (2021) Covid-19 and public transport: an overview and recommendations applicable to Latin America. Infectio 25(3): 182. DOI: 10.22354/in.v25i3.944.

52.

Park

Kim

CST

(2022) Analysis of travel mobility under covid-19: application of network science. Journal of Travel & Tourism Marketing 39(3): 335–352. DOI: 10.1080/10548408.2022.2089954.

53.

Pereira

(2019) Future accessibility impacts of transport policy scenarios: equity and sensitivity to travel time thresholds for bus rapid transit expansion in rio de janeiro. Journal of Transport Geography 74: 321–332. DOI: 10.1016/j.jtrangeo.2018.12.005.

54.

Petutschnig

Albrecht

Resch

, et al. (2022) Commuter mobility patterns in social media: correlating twitter and lodes data. ISPRS International Journal of Geo-Information 11(1): 15. DOI: 10.3390/ijgi11010015.

55.

Provenzano

Hawelka

Baggio

(2018) The mobility network of european tourists: a longitudinal study and a comparison with geo-located twitter data. Tourism Review 73(1): 28–43. DOI: 10.1108/TR-03-2017-0052.

56.

Qian

Kats

Malinchik

, et al. (2018) Geo-tagged social media data as a proxy for urban mobility. In: Hoffman

(ed) Advances in Cross-Cultural Decision Making, Advances in Intelligent Systems and Computing. Cham: Springer International Publishing, Vol. 610, 29–40. DOI: 10.1007/978-3-319-60747-4_4.

57.

Ramadona

Tozan

Lazuardi

, et al. (2019) A combination of incidence data and mobility proxies from social media predicts the intra-urban spread of dengue in yogyakarta, Indonesia. PLoS Neglected Tropical Diseases 13(4): e0007298. DOI: 10.1371/journal.pntd.0007298.

58.

Reynard

Shirgaokar

(2019) Harnessing the power of machine learning: can twitter data be useful in guiding resource allocation decisions during a natural disaster? Transportation Research Part D: Transport and Environment 77: 449–463. DOI: 10.1016/j.trd.2019.03.002.

59.

Ruan

Bao

Liang

, et al. (2020) Dynamic public resource allocation based on human mobility prediction. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4(1): 1–22. DOI: 10.1145/3380986.

60.

Saha

Barman

Chouhan

(2020) Lockdown for covid-19 and its impact on community mobility in India: an analysis of the covid-19 community mobility reports, 2020. Children and Youth Services Review 116: 105160. DOI: 10.1016/j.childyouth.2020.105160.

61.

Saha

Mondal

Chouhan

(2021) Spatial-temporal variations in community mobility during lockdown, unlock, and the second wave of covid-19 in India: a data-based analysis using google’s community mobility reports. Spatial and Spatio-Temporal Epidemiology 39: 100442. DOI: 10.1016/j.sste.2021.100442.

62.

Schlosser

Maier

Jack

, et al. (2020) Covid-19 lockdown induces disease-mitigating structural changes in mobility networks. Proceedings of the National Academy of Sciences of the United States of America 117(52): 32883–32890. DOI: 10.1073/pnas.2012326117.

63.

Serere

Resch

Havas

(2023) Enhanced geocoding precision for location inference of tweet text using spacy, nominatim and google maps. a comparative analysis of the influence of data selection. PLoS One 18(3): e0282942. DOI: 10.1371/journal.pone.0282942.

64.

Shumway-Cook

Patla

Stewart

, et al. (2005) Assessing environmentally determined mobility disability: self-report versus observed community mobility. Journal of the American Geriatrics Society 53(4): 700–704. DOI: 10.1111/j.1532-5415.2005.53222.x.

65.

Soliman

Soltani

Yin

, et al. (2017) Social sensing of urban land use based on analysis of twitter users’ mobility patterns. PLoS One 12(7): e0181657. DOI: 10.1371/journal.pone.0181657.

66.

Statista (2023) Number of active twitter users in selected countries. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/.

67.

Steiger

Westerholt

Resch

, et al. (2015) Twitter as an indicator for whereabouts of people? Correlating twitter with UK census data. Computers, Environment and Urban Systems 54: 255–265. DOI: 10.1016/j.compenvurbsys.2015.09.007.

68.

Studenmund

(2017) A Practical Guide to Using Econometrics. 7th edition, global edition edition. Harlow, England: Pearson.

69.

Sulyok

Walker

(2020) Community movement and covid-19: a global study using google’s community mobility reports. Epidemiology and Infection 148: e284. DOI: 10.1017/S0950268820002757.

70.

Terroso-Saenz

Mun˜oz

Arcas

, et al. (2022a) An analysis of twitter as a relevant human mobility proxy: a comparative approach in Spain during the covid-19 pandemic. GeoInformatica: 1–30. DOI: 10.1007/s10707-021-00460-z.

71.

Terroso-Saenz

Muñoz

Arcas

, et al. (2022b) Can twitter be a reliable proxy to characterize nation- wide human mobility? A case study of Spain. Social Science Computer Review 41(3): 886–904. DOI: 10.1177/08944393211071071.

72.

Tsou

Zhang

Jung

(2017) Identifying data noises, user biases, and system errors in geo-tagged twitter messages (tweets). DOI: 10.48550/arXiv.1712.02433.

73.

Twitter, Inc (2023a) Twiter api for academic research. https://developer.twitter.com/en/products/twitter-api/academic-research.

74.

Twitter, Inc (2023b) Twitter api documentation. https://developer.twitter.com/en/docs/twitter-api.

75.

Ubaldi

Yabe

Jones

NKW

, et al. (2021) Mobilkit: a python toolkit for urban resilience and disaster risk management analytics using high frequency human mobility data DOI: 10.48550/arXiv.2107.14297.

76.

Wang

Taylor

(2014) Quantifying human mobility perturbation and resilience in hurricane sandy. PLoS One 9(11): e112608. DOI: 10.1371/journal.pone.0112608.

77.

Wang

Taylor

(2018) Coupling sentiment and human mobility in natural disasters: a twitter-based study of the 2014 south napa earthquake. Natural Hazards 92(2): 907–925. DOI: 10.1007/s11069-018-3231-1.

78.

Wang

Zhang

Chan

EHW

, et al. (2021a) A review of human mobility research based on big data and its implication for smart city development. ISPRS International Journal of Geo-Information 10(1): 13. DOI: 10.3390/ijgi10010013.

79.

Wang

Lai

Huang

, et al. (2021b) Estimating traffic flow in large road networks based on multi-source traffic data. IEEE Transactions on Intelligent Transportation Systems 22(9): 5672–5683. DOI: 10.1109/TITS.2020.2988801.

80.

Yildirimoglu

Kim

(2018) Identification of communities in urban mobility networks using multi-layer graphs of network traffic. Transportation Research Part C: Emerging Technologies 89: 254–267. DOI: 10.1016/j.trc.2018.02.015.

81.

Zhao

Shaw

Yin

, et al. (2019) The effect of temporal sampling intervals on typical human mobility indicators obtained from mobile phone location data. International Journal of Geographical Information Science 33(7): 1471–1495. DOI: 10.1080/13658816.2019.1584805.

82.

Zhao

Yin

, et al. (2021) Data and model biases in social media analyses: a case study of covid-19 tweets. In: AMIA ... Annual Symposium Proceedings. AMIA Symposium 2021, San Diego, California, 30 October–3 November 2021, 1264–1273.

83.

Zhu

Mishra

Han

, et al. (2020) Social distancing in Latin america during the covid-19 pandemic: an analysis using the stringency index and google community mobility reports. Journal of Travel Medicine 27(8): taaa125. DOI: 10.1093/jtm/taaa125.

84.

Zia

Fürle

Ludwig

, et al. (2022) Socialmedia2traffic: derivation of traffic information from social media data. ISPRS International Journal of Geo-Information 11(9): 482. DOI: 10.3390/ijgi11090482.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.54 MB

Long-term validation of inner-urban mobility metrics derived from Twitter/X

Abstract

Keywords

Introduction

Materials and methods

Data processing

Twitter data

Mobile phone data

Mobility metrics

Long-term validation of urban mobility patterns derived from Twitter

Results and discussion

RQ1: evaluation of rolling window downsampling

RQ2: validation of long-term urban mobility patterns derived from Twitter

Sensitivity of rolling window size

Long-term mobility trend

Static mobility change detection

Limitations

Conclusion

Supplemental Material

Supplemental Material - Long-term validation of inner-urban mobility metrics derived from Twitter

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

Data availability statement

Supplemental Material

References

Supplementary Material