Research and application of the global positioning system (GPS) clustering algorithm based on multilevel functions

Abstract

With the rapid development and widespread adoption of wearable technology, a new type of lifelog data is being collected and used in numerous studies. We refer to these data as informative lifelog which usually contain GPS, images, videos, text, etc. GPS trajectory data in lifelogs is typically categorized into continuous and discrete trajectories. Finding a point of interest (POI) from discrete trajectories is a challenging task to do and has caught little attention so far. This paper suggests an LP-DBSCAN model for mining personal trajectories from discrete GPS trajectory data. It makes use of the hierarchical structure information implied in GPS trajectory data and it is suggested a variable-levels, variable-parameters clustering method (LP-DBSCAN) based on the DBSCAN algorithm to increase the precision of finding POI information. Finally, the Liu lifelog dataset is subjected to a systematic evaluation. In terms of GPS data that are not evenly distributed geographically, the experimental results demonstrated that the proposed algorithm could more accurately identify POI information and address the adverse effects caused by the global parameters of the traditional DBSCAN algorithm.

Keywords

Personal big data lifelog points of interest discrete trajectory DBSCAN

1. Introduction

The importance of personal big data and lifelog data lies in their ability to provide valuable insights into individuals’ behaviors, preferences, and patterns. These datasets offer a wealth of information that can be leveraged for various purposes, such as personalized recommendations, targeted marketing strategies, and advancements in healthcare and well-being. Understanding and analyzing personal big data and life log data can lead to enhanced decision-making, improved services, and a deeper understanding of human behavior in the digital age.

With the rapid advancement of science and technology, we have paid attention to a class of personal stored data that primarily includes personal health data, lifelog data, etc., which refers to the Personal Big Data (PBD) [1, 2]. The obvious distinction between PBD and Big Data is that PBD refers to data generated by a specific person, and this kind of data is also of great importance. One of the sources of PBD is lifelog, and individuals can digitally record various aspects of their daily lives [3, 4, 5]. As a result, lifelog is produced for various scenes and purposes [6, 7, 8, 9]. Another of the typical PBD representatives is lifelog, which possesses all the traits associated with big data, including diversity of data, enormous scale, and little information in each data unit. However, the data exhibits privacy-related characteristics [10, 11], and there are many issues with the collection, management, storage, processing, and presentation of lifelog [12, 13, 14, 15].

There are many ways to record lifelogs today, which include writing blogs or tweeting, uploading images or videos to social media sites and applications, cell phone recordings, etc. These methods can allow us to take the action of recording actively. When those data, including GPS data, is generated during the implementation of life records, we can use it to analyze the locations or areas of interest to users. For instance, we take photos during social occasions and post them on Twitter, which includes geolocation information. POI is known as an information point, and it is any non-geographically significant point on a map, which contains location and other attributes information. As a result, POI detection is crucial for uncovering user behavior patterns and preferences. However, WeChat, Twitter, and other social media are not exclusively used to record personal lifelog, which is inclined to express users’ personal opinions. Currently, no long-term public dataset is dedicated to recording lifelog worldwide.

The lifelog typically contains the user’s GPS track information, which is divided into continuous and discrete types. The difference between the two types of GPS track information is the different time granularity. The former typically takes seconds or minutes as the basic unit, and this kind of data typically comes from portable GPS positioning devices or smartphones, while the latter typically takes hours or days as the basic unit. This kind of data typically comes from the lifelog [1, 16].

Data sources for interest point mining are classified into various types, such as textual information from social networking sites [17, 18, 19] and GPS track information [20, 21, 22, 33], which can be used for exploring travel purposes [22, 23], movement pattern detection [24], discovering hotspot areas [18, 19, 20, 21, 25, 26, 27, 28, 29, 30, 31, 32], spatial structure analysis [34, 35], etc. The following section mainly focuses on the use of smart devices, such as cell phones, which collects user GPS track information in the literature [26, 27, 28, 29, 30, 31, 32]. Those classes of studies are analyzed individual travel behavior and evaluated the activity space by improving the clustering algorithm in different aspects, which clusters the raw data to obtain POI. For instance, the literature [26] converts indoor location data into residential point sequences with rich semantic information, and combines the weighted edit distance algorithm to measure the similarity of trajectories, realizes trajectory clustering of indoor positioning data, and can effectively discover customersâ€™ behavioral pattern in indoor environments. In the literature [27], data from 95 respondents and POI were analyzed and introduced as indicators. Finally, it was shown how GPS-based accessibility metrics in activity space could be used to investigate travel behavior. In the face of cyberspace, the literature [28] suggested adopting a clustering method called NS-DBSCAN that can precisely identify regions with a high concentration of cyber events. Based on nearly 6 months of cell phone tracking data, the literature [29] used a spatiotemporal detection algorithm to cluster and identify 93 locations that are popular with leisure travelers. To establish a foundation for encouraging healthy aging in cities, the literature [30] collected the activity space of 76 older adults for evaluation using smartphone GPS for 102 consecutive days. By comparing the similarity of Wi-Fi measurements, the literature [31] combined Wi-Fi and GPS to identify POIs and user behavior patterns. The literature [32] used smartphones for trip endpoint recognition and proposed a clustering algorithm based on spatiotemporal density, which considered both spatiotemporal travel track point density. Secondly, it further proposed three optimization models to optimize the recognition results.

In conclusion, there is a dearth of research on discrete GPS data, and the majority of current research on lifelog data has been done on continuous GPS data. Additionally, there are notable differences between continuous and discrete GPS data sets. Compared to continuous data, discrete data are more likely to be concentrated in specific areas and have apparent distributional differences. In the lifelog data we gathered, for instance, some densely populated areas have many data, such as Shenyang City in Liaoning Province, where there are 8265 pieces of data, and some sparsely populated areas have very little data, such as Siping City in Jilin Province, where there are only 680 pieces of data, and Dalian City in Liaoning Province, where there are only 38 pieces of data. However, these areas may also be very important to users. When using the DBSCAN method with discrete GPS data, the clustering effect is poor if the same global set of parameters is used. In order to extract POIs of discrete GPS data from lifelog data, a variable-levels, variable-parameters, and density-based clustering method (LP-DBSCAN) is designed and applied to the Liu lifelog dataset. Compared to the conventional density-based clustering method, the experimental results are shown that the accuracy of clustering results is increased, and the scalability and robustness of the algorithm are ensured.

The organization of the remaining sections of this article is as follows. Section 2 provides a brief introduction to the problem definition and the meaning of symbols. In Section 3, an improved algorithm model is proposed. Section 4 presents the clustering results and evaluates the effectiveness of the LP-DBSCAN algorithm on the dataset. Finally, Section 5 concludes this study, discussing the impact of the proposed algorithm on current research and providing some suggestions for future studies.

2. Problem statement

POI extraction is used a discontinuous lifelog dataset with GPS data, which is the issue that will be studied in this paper. The original data is first clustered, followed by an analysis of the clustering outcomes to determine the user’s area of interest. Finally, those data have extracted the POI.

For the convenience of description, relevant terms and symbols are defined as follows:

The GPS track information database $D$ is known, which stores the id, date, GPS location information, description, and other information of a large number of moving objects sampled at different times. In this paper, only the position information of the sampled point users is retained for analysis, given the data set $S=\{{x_{i},y_{i}}\}_{i=1,\ldots N}$ , $x_{i}$ denotes the record, $y_{j}$ denotes the label of the record. Where $x_{i}$ , as a GPS point, is a part of the trajectory entry, which can be represented by a two-tuple, $x_{i}=({\textit{lng}_{i},\textit{lat}_{i}})$ , where $\textit{lng}_{i}$ denotes longitude and $\textit{lat}_{i}$ denotes latitude. $y_{i}$ includes multi-level and multi-category labels, which can be expressed as $\{{y_{i}^{j}|1\leqslant j\leqslant M,j\in N^{\ast}}\}$ , $M$ denotes the total number of levels, $j$ denotes the level, and $y_{i}^{j}$ is the label of the GPS point $x_{i}$ at level $j$ .

The easiest and most effective way is to stratify the data containing GPS information, which is to divide it by administrative regions. For example, the administrative division of the United States includes National level, State level, County level, and Municipal level. In China, it can be divided by country, province, city, and county. As shown in the following Table 1.

Table 1
Examples of division by administrative region

	GPS points	Label
	$({\textit{lng}_{i},\textit{lat}_{i}})$	Country ( $j=1$ )	Province ( $j=2$ )	City ( $j=3$ )
$x_{1}$	(123.420656, 41.771474)	China ( $y_{1}^{1}$ )	Liaoning province ( $y_{1}^{2}$ )	Shenyang city ( $y_{1}^{3}$ )
$x_{2}$	(124.379913, 43.169697)	China ( $y_{2}^{1}$ )	Jilin province ( $y_{2}^{2}$ )	Siping city ( $y_{2}^{3}$ )
$x_{3}$	(114.372097, 30.544315)	China ( $y_{3}^{1}$ )	Hubei province ( $y_{3}^{2}$ )	Wuhan city ( $y_{3}^{3}$ )

$x_{1}$ , $x_{2}$ , and $x_{3}$ represent three different GPS points. GPS points are represented by the longitude-latitude coordinate tuple corresponding to the point, and Label means the multi-level label corresponding to the point. In this example, we assume that $M$ is 3, and the three levels are country, province, and city. $y_{1}^{2}$ means the label of point $x_{1}$ at the second level, and $y_{3}^{1}$ means the label of point $x_{3}$ at the first level.

In this paper, clustering is conducted using the LP-DBSCAN algorithm, based on $\{{x_{i}}\}_{i=1,\ldots,N}$ . Subsequently, the clustering results are analyzed to extract POI. Finally, $y_{i}$ is used to calculate the accuracy of the obtained results.

3. Algorithm model

The DBSCAN or its improved algorithm is typically used to solve the problem of clustering GPS points to extract POI [11, 24, 25, 29, 30, 31, 34, 35]. However, the accuracy of this algorithm when applied to this kind of problem is not high, and there will be two defects. The first is that the algorithm clustering is not variable level. If the clustering results are divided according to administrative regions to draw the clustering tree, all clustering results (leaf nodes) are at the same level. The second is that globally predetermined parameters are used, which cannot be changed dynamically based on where GPS points are distributed geographically. In order to improve the robustness by using the characteristics of the data set, this paper proposes an algorithm model with variable levels and variable parameters based on the traditional DBSCAN algorithm to solve the above problems.

(1) Variable levels

The balance of data distribution can meet strict criteria according to the conventional DBSCAN algorithm. GPS data has a close relationship with the division of geographic regions, and this relationship can be applied to GPS clustering analysis as an a priori knowledge. This relationship can be seen through the analysis of the intuitive visualization of the data, which reveals obvious regional differences in the distribution of GPS points. The number of GPS coordinates in various regions varies significantly, so the data are firstly split into coarse-grained categories based on the number of GPS coordinates in the region.

This paper performs a hierarchical top-down decomposition of the provided dataset, which is inspired by the hierarchical clustering algorithm [28, 36]. According to the relationship between samples $x_{i}$ , all data are treated as one class at the top level before being divided into a few subclasses. Each subclass is then recursively decomposed until further decomposition is impossible. The clustering results from the decomposition above method ultimately result in a tree structure. If the clustering results are obtained using the conventional clustering method, all clusters belong to the same level. Then, much like the clustering tree previously described, where all leaf nodes are at the same depth.

To achieve a better clustering effect, this paper firstly clusters $x_{i}$ to a certain level when generating the clustering tree, where the distribution of samples $x_{i}$ is considered when decomposing the leaf nodes, and the decomposition can be stopped when the number of samples of a specific leaf node does not exceed a certain threshold $th_{i}$ . The proposed strategy can effectively avoid the resulting error caused by the unbalanced distribution of node samples, which saves the extra computational overhead. The scheme is shown in Algorithm 1.

Algorithm 1: Variable Levels Division
Input: Sample set $S=\{{X,Y}\}$ List of variable levels division thresholds $th_{i}$ ;
Output: Clustering results $Z$
1. Initialization: $Z=\emptyset$
2. While $X\neq\emptyset$
3. Find every cluster $\{{x_{i}^{a}}\}$ in $X$
4. If $x_{i}^{a}$ Amount of data contained $\textit{card}({\{{x_{i}^{a}}\}})>th_{i}$
5. Algorithm $({\{{x_{i}^{a},y_{j}^{i}}\},th_{i}})$
6. Endif
7. If $x_{i}^{a}$ Amount of data contained $\textit{card}({\{{x_{i}^{a}}\}})\leqslant th_{i}$
8. $X=X\backslash\{{x_{i}^{a}}\}$
9. $Z=Z\mathop{\cup}\nolimits\{{x_{i}^{a}}\}$
10. Endif
11. Endwhile

(2) Variable parameters

This section proposes a DBSCAN-based variable parameters partitioning method for further refinement of the data after the initial partitioning described above. In order to further achieve clustering of a large number of unbalanced samples, on the basis of the DBSCAN method, it is modified for its use of global parameters.

This section dynamically changes the parameters for exact clustering based on the local density of the samples. Specifically, this method takes each leaf node of the clustering tree obtained in the previous subsection as the input of the algorithm and defines the region radius as $\textit{eps}=n\alpha_{i}$ , where $n$ is the total number of samples in the leaf nodes and ${\alpha}_{i}$ is the shrinkage factor of the level in which the nodes are located, and this value is input by the user and set based on experience. The uneven data distribution may lead to large differences in the amount of data in different leaf nodes. This method can avoid misjudging sparse regions as noise points and dividing a large amount of data into the same category by dynamically changing the clustering region radius, which can achieve accurate clustering of unevenly distributed data. The scheme is shown in Algorithm 2.

Algorithm 2: Variable Parameters Division
Input: Roughly divided set of leaf nodes $T=\{{t_{1},t_{2},\ldots,t_{K}}\}$ , Variable parameters shrinkage factor $\alpha_{i}$ , $i$ indicates the level of the node;
Output: Clustering results $T$
1. For $t_{i}$ in $T$ :
2. $n=\textit{card}({t_{i}})$
3. $T=T\backslash t_{i}\mathop{\cup}\nolimits\textit{DBSCAN}(t_{i},n\alpha_{i})$
4. Endfor

The algorithm divides the GPS data according to the administrative level information (province, city, district, etc.). For all clusters in a certain layer, it compares the current amount of data in the cluster and the threshold value. No further division is needed if the amount of data is not greater than the threshold value. The parameters can be directly calculated and input into DBSCAN for cluster analysis. Otherwise, continue to the next layer, and the model diagram is shown in.

Figure 1.

Model diagram of variable levels and variable parameters.

The LP-DBSCAN algorithm specifically filters the number of coordinates by setting a threshold list $th_{i}$ , where $i$ represents the level at which the current cluster is located. Regions within them with more coordinates than $th_{i}$ advance to the next level of classification, whereas regions with fewer coordinates than $th_{i}$ are directly input into the variable parameters classification method for clustering. As shown in Fig. 1, when data from Siping City, Jilin Province, is divided based on the city level, it exceeds the predetermined threshold, and two regions, Tiedong District and Tiexi District are obtained through further division based on administrative divisions. Contrarily, the data from Changchun City, Jilin Province (designated Cluster A) are sparse, and their quantity does not exceed the predetermined threshold. Therefore, the next division level is not completed, and clustering can be done immediately. The eps parameters are calculated by the variable parameters shrinkage factor $\alpha_{i}$ . Next, the parameters can be fed into the DBSCAN algorithm for clustering.

4. Results and discussion

In this paper, we adopted the Liu lifelog dataset to verify the algorithm’s effectiveness. The dataset was collected between 2011 and 2022 using an App developed by our team, which actively collected data through voluntary reporting. The dataset consists of 25,737 real data entries, including six attributes: ID, date, longitude, latitude, description, and address. Since the dataset contains multiple authors’ information, we selected data from only one author for analysis. Therefore, we pre-selected 10,252 data points. After conducting algorithm testing on these 10,252 GPS information data points and preprocessing the data, the final actual amount of data used for testing was 10,237 points.

First, preprocess the dataset, including manually removing noisy data caused by GPS positioning offset and writing code to remove duplicate data caused by software issues or shaking. After observing the data, it is found that there will be missing values, and the methods to deal with missing values are generally no processing, deletion, and filling. We cannot know the real latitude and longitude information of the missing data rows, which cannot use other data for filling and repairing. Therefore, we can only carry out deletion operations. The latitude and longitude information in the dataset is transformed into hierarchically structured data as extended attribute columns, such as country, province, city, district, and street, which is through the web map interface.

We have proved the effectiveness of the algorithm by testing the traditional DBSCAN algorithm and the LP-DBSCAN algorithm on the same data set. The traditional algorithm takes the radius eps, and the minimum number of samples Minpts as input parameters, and the LP-DBSCAN algorithm is used the minimum number of samples Minpts, variable levels division threshold $th_{i}$ , variable parameters shrinkage factor $\alpha_{i}$ as input parameters for clustering and extracting POI. The Haversine distance is used to calculate the distance between two GPS points close to the actual distance. In this paper, the input parameters MinPts of the algorithm are set to 20 or 40, or 60 for testing. If this value is set too small, it would lead to undesired cluster splitting. If this value is set too big, it would lead to undesired cluster merging. It is also selected based on experience and the characteristics of the data set. Table 2 illustrates the clustering result of the traditional DBSCAN algorithm, and Table 3 is the clustering result of the LP-DBSCAN algorithm.

Table 2
Clustering results of LP-DBSCAN algorithm

Clustering algorithm	Parameter	Quantity clustering	Number of noise points	Accuracy
DBSCAN algorithm	$\textit{Eps}=$ 200, $\textit{MinPts}=$ 20	38	3464	0.968
	$\textit{Eps}=$ 500, $\textit{MinPts}=$ 20	29	2248	0.915
	$\textit{Eps}=$ 700, $\textit{MinPts}=$ 20	20	1931	0.846
	$\textit{Eps}=$ 200, $\textit{MinPts}=$ 40	19	4270	0.992
	$\textit{Eps}=$ 500, $\textit{MinPts}=$ 40	16	2813	0.921
	$\textit{Eps}=$ 700, $\textit{MinPts}=$ 40	14	2429	0.874
	$\textit{Eps}=$ 200, $\textit{MinPts}=$ 60	9	5022	0.995
	$\textit{Eps}=$ 500, $\textit{MinPts}=$ 60	11	3297	0.936
	$\textit{Eps}=$ 700, $\textit{MinPts}=$ 60	9	2769	0.879

Figure 2.

DBSCAN clustering results: Eps $=$ 200, MinPts $=$ 20.

Table 3

Clustering results of the DBSCAN algorithm

Clustering algorithm	Parameter	Quantity clustering	Number of noise points	Accuracy
				Level 1	Level 2	Level 3	Level 4
LP-DBSCAN algorithm	$\textit{MinPts}=$ 20 $th=$ [1000, 500, 200, 50] $\alpha=$ [0.4, 0.3, 0.2, 0.02]	41	2733	1	0.9997	0.9997	0.9412
	$\textit{MinPts}=$ 40 $th=$ [1000, 500, 200, 50] $\alpha=$ [0.4, 0.3, 0.2, 0.02]	23	3496	1	0.9998	0.9998	0.9698
	$\textit{MinPts}=$ 60 $th=$ [1000, 500, 200, 50] $\alpha=$ [0.4, 0.3, 0.2, 0.02]	11	4319	1	0.9998	0.9998	0.9670

Figure 3.

LP-DBSCAN clustering results: $\textit{MinPts}=$ 20, $th=$ [1000, 500, 200, 50], $\alpha=$ [0.4, 0.3, 0.2, 0.02].

The discrete lifelog has a certain randomness, and the GPS data distribution is not regular. The clustering results in Table 2 show that the GPS points with too sparse distribution are identified as noisy points. Since GPS point clustering differs from other types of point clustering, the evaluation indexes, such as Silhouette Coefficient or Calinski-Harabasz Index, cannot accurately reflect the good or bad clustering results, so they are not calculated in this paper.

In this paper, we have calculated the accuracy rate, which refers to the probability $e$ that clusters are not incorrectly merged in the results generated by using administrative division labels to judge clustering. Clusters are incorrectly merged means: if cluster $M$ contains $a+b$ points, compare the real labels at a certain level, where $a$ points should belong to region $p$ , and the remaining $b$ point belongs to region $q$ , and $a+b$ points are merged into the same cluster, it means that cluster merging has occurred. At this point, we can calculate the number of GPS points that are not incorrectly merged in the current cluster as $\max({a,b})$ . If the total number of GPS points is $N$ and the number of noise points is $n$ , it is calculated the total number $m$ of GPS points that are not incorrectly merged in all clusters in the result, then $e=\frac{m}{({N-n})}$ .

However, the accuracy rate can only roughly estimate the feasibility of a particular set of parameters, which needs to be judged manually. Figures 2 and 3 present some of the results obtained using the two algorithms and plotting them on a map for visual display.

The traditional DBSCAN algorithm is provided in Fig. 2. It is used globally unique parameters as input for clustering, which incorrectly sets some points as noise points. However, these points may precisely form the user’s region of interest, such as the region with some POIs, such as the Shenyang University of Chemical Technology, Shenyang Second Hospital of Traditional Chinese Medicine, Shenyang Taoxian International Airport, etc. However, the idea of variable parameters of LP-DBSCAN algorithm can overcome such shortcomings.

Figure 4.

POI 1 of LP-DBSCAN algorithm clustering results.

Figure 5.

POI 2 of LP-DBSCAN algorithm clustering results.

Screenshots of the visualization of the LP-DBSCAN algorithm’s clustering results are shown in Figs 4 and 5. One of the lifelog authors frequently travels to the two areas, as shown in two figures. it was determined that both were indeed included in the POI information for this city. There is no significant difference in the number of noise points between the results obtained from the two algorithms. Additionally, during data preprocessing, we are unable to identify the noise points, thus making it impossible to perform any preprocessing. However, the LP-DBSCAN algorithm can attempt to overcome the limitations of conventional algorithms in locating POIs.

The experimental results show that the LP-DBSCAN algorithm can extract the user’s interest points geographically, which unevenly distributed GPS point set well compared with the traditional DBSCAN algorithm. The division accuracy is significantly improved to obtain effective division results.

5. Conclusion

To address the issue of low accuracy when extracting POI from personal behavior trajectories, particularly discrete trajectories, this paper proposes the LP-DBSCAN algorithm. The LP-DBSCAN algorithm considers not only the density of the sample neighborhood but also the hierarchical structure of administrative divisions implied by GPS data. This algorithm includes two aspects: variable levels and variable parameters. Different threshold values are set for clusters at different levels, and different eps parameters are set for clusters that need clustering. Finally, a clustering tree structure is generated. The advantage of this method is that the GPS points can be reasonably clustered by using the uneven distribution of the data itself to finally obtain the user’s POI. Even though the algorithm suggested in this paper has a high accuracy in obtaining POI, there will still be a lot of noisy points. In the future work, the area where the noise points with low access frequency are located should be analyzed to make appropriate improvements to the algorithm. So as to further reduce the impact of noise points and improve the performance of the algorithm. We can further study the application of this method in other fields. For example, we could explore applying the method to urban planning to help analyze people’s travel behavior and hotspots in cities. In addition, we can also consider applying this method to social media data to extract users’ points of interest and activity patterns.

References

Yen

Ang

Chu

Tsai

Huang

Chen

. Visual lifelog retrieval: humans and machines interpretation on first-person images. Multimed Tools Appl. 2023. doi: 10.1007/s11042-023-14344-x.

Shen

Guo

Shen

Duan

Dong

Zhang

, et al. Personal big data pricing method based on differential privacy. Comput Secur. 2022; 113: 102529.

Ribeiro

Trifan

Neves

AJR

. Lifelog retrieval from daily digital data: Narrative review. JMIR mHealth uHealth. 2022; 10(5): e30517.

Nestik

Zhuravlev

. Big data analysis in psychology and social sciences: perspective directions of research. Psikhol Zh. 2019; 40(6): 5-17.

Jalal

Batool

Kim

. Sustainable wearable system: Human behavior modeling for life-logging activities using K-ary tree hashing classifier. Sustainability. 2020; 12(24): 10324.

Gupta

Crane

Gurrin

. Considerations on privacy in the era of digitally logged lives. Online Inform Rev. 2021; 45(2): 278-296.

Alam

Graham

. Memento: a prototype search engine for LSC 2021. Multimed Tools Appl. 2023. doi: 10.1007/s11042-023-15067-9.

Lee

Ryu

. Comparison of the change in interpretative stances of lifelog photos versus manually captured photos over time. Online Inform Rev. 2020; 44(2): 521-541.

Sugawara

Ochi

Yamashita

Yamauchi

Saigusa

Wagata

, et al. Maternity Log study: A longitudinal lifelog monitoring and multiomics analysis for the early prediction of complicated pregnancy. BMJ Open. 2019; 9(2): e025939.

10.

Ksibi

Alluhaidan

ASD

Salhi

El-Rahman

. Overview of lifelogging: Current challenges and advances. IEEE Access. 2021; 9: 62630-62641.

11.

Liu

Rehman

. Personal trajectory analysis based on informative lifelogging. Multimed Tools Appl. 2021; 80(14): 22177-22191.

12.

Lee

Urtnasan

Hwang

Lee

Koh

Youk

. Concept and proof of the lifelog bigdata platform for digital healthcare and precision medicine on the cloud. Yonsei Med J. 2022; 63: S84-S92.

13.

Del Molino

Lin

Fang

Subbaraju

Lim

. Lifelog image retrieval based on semantic relevance mapping. ACM T Multim Comput. 2021; 17(3): 92.

14.

Seo

Choi

Sung

. Recommendation of indoor luminous environment for occupants using big data analysis based on machine learning. Build Environ. 2021; 198: 107835.

15.

Bum

Choo

Whang

. Image-Based Lifelogging: User Emotion Perspective. CMC-Comput Mater Con. 2021; 67(2): 1963-1977.

16.

Kim

. Feature-first add-on for trajectory simplification in lifelog applications. Sensors. 2020; 20(7): 1852.

17.

Gui

. Analysis of enterprise social media intelligence acquisition based on data crawler technology. Entrep Res J. 2021; 11(2): 3-23.

18.

Nguyen

Shin

. An improved density-based approach to spatio-textual clustering on social media. IEEE Access. 2019; 7: 27217-27230.

19.

Tran

Shin

. An improved approach for estimating social POI boundaries with textual attributes on social media. Knowl-Based Syst. 2021; 213: 106710.

20.

Jiang

Guan

Yang

. Measuring taxi accessibility using grid-based method with trajectory data. Sustainability. 2018; 10(9): 3187.

21.

Dong

Chen

Zhang

Guo

Qiu

. A spatio-temporal flow model of urban dockless shared bikes based on points of interest clustering. ISPRS Int J Geo-Inf. 2019; 8(8): 345.

22.

Chen

Liao

Xie

Wang

Zhao

. Trip2Vec: A deep embedding approach for clustering and profiling taxi trip purposes. Pers Ubiquit Comput. 2019; 23(1): 53-66.

23.

Xing

Wang

. Exploring travel patterns and trip purposes of dockless bike-sharing by analyzing massive bike-sharing data in Shanghai, China. J Transp Geogr. 2020; 87: 102787.

24.

Cui

Zhong

Wang

. Anomalous urban mobility pattern detection based on GPS trajectories and POI data. ISPRS Int J Geo-Inf. 2019; 8(7): 308.

25.

Chen

Tao

Gao

Zhou

. Applicability evaluation of several spatial clustering methods in spatiotemporal data mining of floating car trajectory. ISPRS Int J Geo-Inf. 2021; 10(3): 161.

26.

Cheng

Yue

Pei

. Clustering indoor positioning data using E-DBSCAN. ISPRS Int J Geo-Inf. 2021; 10(10): 669.

27.

van Dijk

Krygsman

. Analyzing travel behavior by using GPS-Based activity spaces and opportunity indicators. J Urban Technol. 2018; 25(2): 105-124.

28.

Wang

Ren

Luo

Tian

. NS-DBSCAN: A density-based clustering algorithm in network space. ISPRS Int J Geo-Inf. 2019; 8(5): 218.

29.

Ponce-Lopez

Ferreira

Jr. Identifying and characterizing popular non-work destinations by clustering cellphone and point-of-interest data. Cities. 2021; 113: 103158.

30.

Yin

Zhan

. Identifying the daily activity spaces of older adults living in a high-density urban area: A study using the smartphone-based global positioning system trajectory in shanghai. Sustainability. 2021; 13(9): 5003.

31.

Marakkalage

Lau

BPL

Zhou

Liu

Yuen

Yow

Chong

. WiFi fingerprint clustering for urban mobility analysis. IEEE Access. 2021; 9: 69527-69538.

32.

Yao

Yang

Guo

Jin

. Trip end identification based on spatial-temporal clustering algorithm using smartphone positioning data. Expert Syst Appl. 2022; 197: 116734.

33.

Liu

Rehman

. Toward storytelling from personal informative lifelogging. Multimed Tools Appl. 2021; 80(13): 19649-19673.

34.

. Rethinking the identification of urban centers from the perspective of function distribution: A framework based on point-of-interest data. Sustainability. 2020; 12(4): 1543.

35.

Dong

Yang

Zhang

. Study on the spatial classification of construction land types in Chinese cities: A case study in Zhejiang province. Land. 2021; 10(5): 523.

36.

Gao

Molloy

Axhausen

. Trip purpose imputation using GPS trajectories with machine learning. ISPRS Int J Geo-Inf. 2021; 10(11): 775.

Research and application of the global positioning system (GPS) clustering algorithm based on multilevel functions

Abstract

Keywords

1. Introduction

2. Problem statement

Table 1 Examples of division by administrative region

(1) Variable levels

(2) Variable parameters

Table 2 Clustering results of LP-DBSCAN algorithm

References

Table 1
Examples of division by administrative region

Table 2
Clustering results of LP-DBSCAN algorithm