Mining spatial–temporal motion pattern for vessel recognition

Abstract

Approaches of vessel recognition are mostly accomplished by sensing targets and extracting target features, without taking advantage of spatial and temporal motion features. With maritime situation management systems widely applied, vessels’ spatial and temporal state information can be obtained by many kinds of distributed sensors, which is easy to achieve long-time accumulation but are often forgotten in databases. In order to get valuable information from large-scale stored trajectories for unknown vessel recognition, a spatial and temporal constrained trajectory similarity model and a mining algorithm based on spatial and temporal constrained trajectory similarity are proposed in this article by searching trajectories with similar motion features. Based on the idea of finding matching points between trajectories, baseline matching points are first defined to provide time reference for trajectories at different time, then the almost matching points are obtained by setting the spatial and temporal constraints, and the similarity of pairwise almost matching points is defined, which derives the spatial and temporal similarity of trajectories. By searching the matching points from trajectories, the similar motion pattern is extracted. Experiments on real data sets show that the proposed algorithm is useful for similar moving behavior mining from historic trajectories, which can strengthen motion feature with the length increases, and the support for vessel with unknown property is larger than other models.

Keywords

Spatial and temporal constrained trajectory similarity trajectory mining target recognition information fusion

Introduction

Target recognition is well known as one of the key functions in a distributed multisensor information fusion system. The general approaches for target recognition are accomplished by extracting target features such as electromagnetic (Jacobs and Sullivan,¹ Liu et al.,² Wang et al.³), radiation source (Zhang et al.,⁴ Hung et al.,⁵ Zhao et al.⁶), optical imaging (Wang et al.,⁷ Su,⁸ Li ⁹), or microwave imaging characteristics (Novak t al.¹⁰ Duman and Cetin,¹¹ Tang et al.¹²) with single or multiple sensors. However, the objects’ trajectory features (spatial–temporal motion pattern) are not fully utilized as well as researched, especially as trajectories with abundant motion patterns stored in the database need to be well exploited. With the rapid development of distributed multisensor data fusion systems in maritime surveillance, vessels’ spatial–temporal state can be easily acquired and the data are usually managed in the database, resulting in accumulation of a great amount of multisource spatial–temporal trajectory data, which record different types of vessels in the regional sea area. These trajectory data serve as an important foundation for understanding maritime traffic condition and vessels’ mobility behavior. However, it is impossible for all sensors to cover the same region at all times limited to position, circumstance, working principle, and so on. It is essential to utilize some trajectory data with efficient identity attributes to reinforce sensors with a strong ability of positioning but weak ability of recognition. Thus, we hope to explore mining spatial–temporal motion feature by trajectory analysis to establish such relation for multisource trajectories.

Spatiotemporal trajectory analysis has been applied to many aspects such as commuting choice, transportation management, commercial recommendation, urban planning, tourism service, and criminal investigation (Zheng,¹³ Feng and Zhu,¹⁴ Yuan et al.¹⁵). For instance, Palma et al.¹⁶ and Liu et al.¹⁷ extract hot regions from trajectories database, Chen et al.¹⁸ and Yin et al.¹⁹ mine frequent active routes for moving objects, Zhang et al.²⁰ find the periodic activities hidden in the trajectories. The abovementioned mainly focus on the trajectories from objects that move on predefined spatial networks such as road segments, railways, and invisible air routes. Focusing on maritime situational awareness, Pallotta et al.²¹ found traffic routes, route entrance and exit, and stationary areas by analyzing stored automatic identification system (AIS) data. However, existing literatures related to the issue we are concerned with are less. Zhu et al.²² are the only ones we could find, but they have only two vessels in the validation data for experiment, which is less convincing.

A basic method for trajectory feature correlation is calculating similarity by defining proper similarity models. Detailed discussions of the trajectory similarity measure can be found in Wang et al.²³ and Toohey and Duckham.²⁴ Some applications of trajectory analysis are mainly based on similarity measure. For example, moving objects can be correlated by searching similar moving trajectories in predefined spatial networks (road segments, railways, and so on; Tiakas et al.²⁵). Accuracy of location-based services could be improved by mining users’ long-term activity similarity based on their trajectories (Lv et al.²⁶). Clustering and classification of moving objects in video image could be achieved by measuring the similarity of the position and direction of trajectories (Wei et al.²⁷). Users’ movement behavior can be captured by searching spatial and temporal clues in trajectories and clustering similar trajectories into groups in the condition of urban road networks (Hung et al.²⁸). Anomaly objects and suspicious activities can be extracted with fusion of data and information, including kinematic features, geospatial features, contextual information, and maritime domain knowledge and vessels motion patterns can be modeled based on machine learning approaches (Shahir et al.²⁹).

Many existing methods measure the similarity from the perspective of spatial attributes, such as the longest common subsequence (LCSS), edit distance with real penalty (ERP), edit distance on real sequence (EDR), Hausdorff distance (HD), and the improvement of these methods (Chen et al.³⁰). These methods neither lack consideration of temporal factors nor can deal only with trajectories of equal length. While dynamic time warping (DTW; Keogh and Ratanamahatana³¹), spatial assembling distance (SpADe; Chen et al.,³² Chen et al.³³), w-constrained discrete Fréchet (wDF) distance (Ding et al.³⁴), clue-aware trajectory similarity (CATS; Hung et al.²⁸), and so on can be used to compare the trajectories similarity from the perspective of temporal attributes. The methods of DTW, SpADe, and wDF have been thoroughly analyzed in the literature (Hung et al.²⁸). But these methods are either not applicable to compare similar motion behaviors or working with too many constraints. Moreover, the CATS model is difficult to deal with a large scale of historical trajectories because of the simple setting of temporal threshold for searching clues.

According to the existing research results above, we find that targets with the same characteristics will show some certain spatial and temporal similarity on the trajectory. As for vessel targets, trajectories from different types of vessels of different types are always different in length, width, tonnage, draft depth, and sailing speed. Targets with similar characteristics have some similar motion patterns, which can help us to extract some features to confirm the targets’ identities. In order to recognize targets with unknown identities, the main idea in this article is searching similar trajectories with close motion patterns from historical trajectory database as much as possible, with some information extracted to support vessel identification. In order to distinguish trajectories from the difference of targets’ moving characteristics, motion characteristics must be highlighted when measuring the similarity between trajectories.

In a distributed multisensor system, targets’ trajectories from different sensors are different in positioning methods, ranging accuracy, and time intervals. Even if two trajectories are from the same source but across the timescale, it is impossible to record the same motion pattern completely. Taking radars as an example, the sampling interval of detect data from radars can be very short, very long, or even not periodic, due to different carrying platforms, bands, and purposes of radars.

Hung et al.²⁸ believe that there are many clues that may be extracted from spatially and temporally co-located data points from trajectories with similar movement behaviors and cluster similar trajectories into groups by spatial and temporal clues, which inspired our work in this article greatly. If some sensing manners can continuously record the trajectories of two targets with the same movement behavior, then two data points representing the same spatial–temporal state will be recorded, respectively, which should be fully matching. Since the data recording manner cannot completely record the target activities continuously but by discretely sampling data points, approximately matching data points should be found in some certain spatial and temporal ranges representing the same movement behavior, which we call “almost matching.” Based on the basic ideas above, a spatial and temporal constrained trajectory similarity (STCTS) model is proposed as the main contribution of this article, which can find the approximately matching points on two similar trajectories by spatial and temporal constraints and then calculate similarity. Based on this model, an STCTS-based similar trajectory mining algorithm is proposed to search similar trajectories from historical trajectories database. Finally, the validity of the model and algorithm is verified using real data sets, and the recognition of targets with unknown attributes can be realized.

Problem modeling

A trajectory $T_{i}$ is a time-ordered sequence of sampling spatial data points, expressed as $T_{i} = < p_{i, 1}, p_{i, 2}, . . ., p_{i, n} >$ , where $p_{i, k} = (l_{i, k}, t_{i, k}) (k \in {1, 2, . . ., n})$ represents the location $l_{i, k}$ of the target at the moment $t_{i, k}$ , $t_{i, k} < t_{i, k + 1}$ , and n represents the length of the trajectory. The location $l_{i, k}$ usually represents a data point in a two-dimensional or three-dimensional space. Trajectory $T_{i}$ is actually a spatial–temporal prism structure (Liu et al.,¹⁷ Hung et al.²⁸). If targets moves in a two-dimensional plane, then the projection of trajectory $T_{i}$ on the XY-plane is a curve connected by each data point $l_{i, k}$ . In addition, the trajectory $T_{i}$ contains start and stop time, where start time is $t_{i, 1}$ and stop time is $t_{i, n}$ . The sketch is shown in Figure 1.

Figure 1.

Sketch of trajectory and similar trajectories.

To determine whether motion patterns from two trajectories are similar, not only should we compare from the perspective of space, but also from the time when the moving behavior occurs. As shown in Figure 1, the start time of trajectory $T_{1}$ is $t_{1}$ and the end time is $t_{3}$ , and the start time of trajectory $T_{4}$ is $t_{6}$ while the end time is $t_{8}$ . If the trajectories meet the conditions: (1) the projection of $T_{1}$ and $T_{4}$ on the XY-plane are close and (2) the start and end time of $T_{1}$ and $T_{4}$ are close, then we can say that $T_{1}$ and $T_{4}$ are similar. We cannot say two trajectories are similar if they do not meet any of the above conditions. In this article, we propose an STCTS model to quantitatively evaluate the two conditions above. The related contents will be introduced in section “Trajectory similarity measure.”

In the actual situation, it is generally difficult to directly measure the similarity between the whole trajectories. In most cases, affected by factors such as tidal variation, wind, waves, ship avoidance, and so on, there are only subtrajectory segments that meet the similarity under spatial and temporal constraints. The notion of similar subtrajectory segments is somewhat analogous to the common subsequence in LCSS model, but this similar subtrajectory is a spatial–temporal prism structure here.

Considering a trajectory with unknown identity and a set of historical trajectories, we hope that all trajectories similar to the current motion pattern can be mined from the historical data set based on the STCTS model, and the current target with unknown identity can be confirmed from the perspective of spatial–temporal trajectory motion features. To realize the purpose of identifying targets with unknown identities, the historical data set should provide targets’ identity information.

In this article, a data set provided by the AIS is used for validation experiments. AIS can automatically broadcast ships dynamics and some other information to all other installations in a self-organized manner. In AIS, each vessel has a unique identity MMSI (Maritime Mobile Service Identity) registered by the International Maritime Organization (IMO; Harati-Mokhtari et al.³⁵). AIS transmitter on vessels can broadcast positioning data provided by positioning devices (such as Global Positioning System (GPS)) through AIS messages, while the current mainstream positioning equipment is differential GPS, whose positioning accuracy is better than 10 m. AIS can also provide ship static data such as vessel name, call sign, size, type, destination, and so on. According to AIS technical standards, the interval of AIS dynamic messages provided by the sailing vessels is 2–10 s, and the data sampling rate is higher. AIS data is sent out in broadcast form, which is easy to obtain, and the data size of one single message is small and easier to store. Based on the reasons above, AIS data is very appropriate to be used for historical motion pattern records and we used it for validation experiment.

Trajectory similarity measure

If some sensors can continuously record trajectories of two targets with similar motion behavior, the same motion behavior will record one data point on the two trajectories, respectively, and the two data points should be totally matching without considering the bias. This match is theoretical, and we call them matching point.

Definition 1

Matching points. Data points represent exactly the same motion pattern from two trajectories.

The concept of matching points is a pair of points, which means there is a data point on the two trajectories representing the same moving behavior. Matching points is too strict to be satisfied. Affected by sensor performance, working principles, measurement bias, sampling rate, and other factors, the general record data point samples data discretely, thus it is difficult to record the sampling data points representing the same behavior synchronously. We need to set some spatial and temporal constraints to select the approximately matching point with the similarity measured. We call such approximately matching points as spatial–temporal joint constrained matching points, spatial–temporal matching point for short.

Since the timestamp of the two comparing trajectories are usually different, it is necessary to set a relative start time separately to place temporal constraints on data points. Before giving the definition of the spatial–temporal matching points, we define the concept of baseline matching point first.

Definition 2

Baseline matching point. A matching point that represents the relative start time.

We need to determine one data point on each of trajectories as the baseline matching point, and there are three ways to determine.

Specify a data point as baseline matching point on two trajectories, respectively;

Specify a data point on one trajectory, and set some rules to search one optimal matching point on the other trajectory;

Set some rules to search the optimal matching point on the two trajectories, respectively.

Theoretically, the spatial–temporal matching points representing the same motion pattern from two trajectories should be close enough to each other in space and be close enough relative to the respective baseline matching point in time. The definition of spatial–temporal matching points is given below.

Definition 3

Spatial–temporal matching points. Given a spatial threshold $ϵ$ , a temporal threshold $τ$ , $p_{i, k} \in$ $T_{i}$ is a data point on trajectory $T_{i}$ and $p_{i, d_{i}}$ $\in T_{i}$ is the baseline matching point, $p_{j, ℓ} \in$ $T_{j}$ is a data point on trajectory $T_{j}$ and $p_{j, d_{j}}$ $\in T_{j}$ is the baseline matching point. Then, we call $p_{j, ℓ}$ as a spatial–temporal matching point of $p_{i, k}$ if $p_{j, ℓ}$ satisfies the following conditions, which is denoted as $< p_{i, k}, p_{j, ℓ} >$

$| (t_{i, k} - t_{i, d_{i}}) - (t_{j, ℓ} - t_{j, d_{j}}) | \leq τ$ ;

$dist (l_{i, k}, l_{j, ℓ}) \leq ϵ$ , where $dist (\cdot, \cdot)$ denotes the Euclidean distance between two data points.

If not specified in the following, matching points refers to the spatial–temporal matching points.

Figure 2 shows an example of baseline matching points and matching points. In Figure 2, $T_{1}$ and $T_{2}$ represent two trajectories with the same moving behavior recorded at different time. The underlined number represents the record time of the data point. The data points $p_{1, 4}$ on $T_{1}$ and $p_{2, 2}$ on $T_{2}$ are the baseline matching points, respectively, shown with the double-head arrow thick dotted line in Figure 2. Therefore, the relative start time of $T_{1}$ is $t_{1, 4} = 10$ and it is $t_{2, 2} = 71$ of $T_{2}$ . Given a spatial threshold $ϵ = 5$ and a temporal threshold $τ = 4$ , data points $p_{1, 2}$ and $p_{2, 1}$ are a pair of matching points, which satisfies $dist (l_{1, 2}, l_{2, 1}) \leq 5$ and $| (t_{1, 2} - t_{1, 4}) - (t_{2, 1} - t_{2, 3}) |$ at the same time. In addition, the other data points do not satisfy the conditions. So under the threshold conditions with $ϵ = 5$ and $τ = 4$ , the matching points of $p_{1, 2}$ on $T_{2}$ is only $p_{2, 1}$ . Under the same conditions, we can get that $p_{2, 4}$ and $p_{2, 5}$ are matching points of $p_{1, 5}$ on $T_{2}$ , $p_{2, 8}$ and $p_{2, 9}$ are matching points of $p_{1, 8}$ on $T_{2}$ , as shown with the thin dotted arrow in Figure 2.

Figure 2.

Sketch of baseline matching points and matching points.

For any data point on reference trajectory, the quantity of matching points on the other trajectory may be none, only one, or more than one. How to distinguish these matching points and which one can represent the state of target most approximately needs to be compared quantitatively.

Definition 4

The similarity between matching points. Given a spatial threshold $ϵ$ and a temporal threshold $τ$ , $p_{i, k} \in T_{i}$ is a data point on trajectory $T_{i}$ , $p_{j, ℓ} \in T_{j}$ is a data point on trajectory $T_{j}$ , and $p_{j, ℓ}$ is a matching point of $p_{i, k}$ on trajectory $T_{j}$ . Then the similarity between matching points $< p_{i, k}, p_{j, ℓ} >$ is defined as $f_{ϵ} (p_{i, k}, p_{j, ℓ}) = 1 - \frac{dist (l_{i, k}, l_{j, ℓ})}{ϵ}$ , where $dist (\cdot, \cdot)$ denotes Euclidean distance between two data points.

The range of function $f_{ϵ} (p_{i, k}, p_{j, ℓ})$ is $[0, 1]$ , that is to say the closer the two data points in space, the larger the function value is, which means more similar. If the locations of two data points are exactly the same, the function value is 1. The parameter $ϵ$ is not a spatial threshold, but also tolerates the spatial bias of sensor measurements. Basically, $f_{ϵ} (p_{i, k}, p_{j, ℓ})$ performs a continuous space quantization (i.e. from 0 to 1), which reflects the closeness between two data points in contrast to the discrete space quantization in LCSS and EDR (i.e. 0 or 1; Chen et al.³⁰).

Definition 5

Optimal matching point. Given a spatial threshold $ϵ$ , a temporal threshold $τ$ , and $p_{i, k} \in T_{i}$ is a data point on trajectory $T_{i}$ , $M_{j} \subseteq T_{j}$ is the match points set for $p_{i, k}$ on trajectory $T_{j}$ . We call $p_{j, ℓ} \in M_{j}$ is the optimal matching point of $p_{i, k}$ if $p_{j, ℓ}$ satisfies $scor e_{ϵ, τ} (p_{i, k}, p_{j, ℓ}) = max {f_{ϵ} (p_{i, k}, p_{j, s}) | p_{j, s} \in M_{j}}$ , and $scor e_{ϵ, τ} (p_{i, k}, p_{j, ℓ})$ is the similarity measure of $< p_{i, k}, p_{j, ℓ} >$ .

The optimal matching point is the matching point which can mostly reflect the motion pattern of the reference data in all the matching points. If two trajectories represent one similar motion pattern, we should find the optimal matching points from these two trajectories as much as possible. We will use the example in Figure 2 to illustrate the selection of optimal matching point, as shown in Figure 3. Given $ϵ = 5$ and $τ = 4$ , all the optimal matching points of $T_{1}$ on $T_{2}$ in the example of Figure 2 are marked with a dotted arrow line in Figure 3. As shown in Figure 3, due to the influence of missing data points, measurement error, temporal delay, and other factors, data points from the two trajectories are not one-to-one matching. There may be no data point on the reference trajectory that can be matching, such as $p_{1, 3}$ on $T_{1}$ , as there is no data point on $T_{2}$ satisfying both spatial and temporal conditions. It is also possible that there may exist data points on the compare trajectory that cannot be matching to any data point on reference trajectory such as $p_{2, 3}$ , $p_{2, 7}$ , and $p_{2, 8}$ on $T_{2}$ . It is still possible that there may exist more than one data points matching to the same data point on the reference trajectory. For example, the optimal matching point of $p_{1, 1}$ and $p_{1, 2}$ is $p_{2, 1}$ , and the optimal matching point of $p_{1, 8}$ and $p_{1, 9}$ is $p_{2, 9}$ . Among those data points, $p_{2, 8}$ on $T_{2}$ is different from others. As shown in Figure 2, both $p_{2, 8}$ and $p_{2, 9}$ are potential matching points of $p_{1, 8}$ . Because the similarity $scor e_{5, 4} (p_{1, 8}, p_{2, 8})$ is smaller than $scor e_{5, 4} (p_{1, 8}, p_{2, 9})$ , $p_{2, 8}$ is not the optimal matching point of $p_{1, 8}$ , but $p_{2, 9}$ . However, if we search data points on $T_{1}$ for those data points on $T_{2}$ with the same spatial threshold and temporal threshold, $p_{1, 8}$ is the optimal matching point of $p_{2, 8}$ , while $p_{1, 9}$ is the optimal matching point of $p_{2, 9}$ , as the reference data point has been changed. The procedure of selecting the optimal matching point is summarized as follows. Step 1: determine whether the data point satisfies both spatial and temporal conditions, if yes, the data point is a potential matching point; Step 2: calculate similarity score between the reference data point and all potential matching points; Step 3: choose the data point with the maximum score as the optimal matching point for the reference data point.

Figure 3.

Sketch of optimal matching points and an illustrative example for computing similarity.

If the sampling data points are sparse, we can increase the spatial threshold and temporal threshold appropriately according to the sampling rate. As for the data missing, if the situation in some areas is not very serious, we can also increase the spatial threshold and temporal threshold appropriately as the sparse situation. Otherwise, if the data missing is very serious, we believe that such trajectories are no longer of high value, because such trajectories lack enough information and are no longer reliable, which is not the focus of this article.

With the definition of optimal matching point and the similarity between matching points, we can define the spatial–temporal similarity between trajectories.

Definition 6

Spatial–temporal similarity between data point and trajectory. Given a spatial threshold $ϵ$ , a temporal threshold $τ$ , $p_{i, k} \in T_{i}$ is a data point on trajectory $T_{i}$ , and $p_{j, ℓ}$ is the optimal matching point of $p_{i, k}$ on trajectory $T_{j}$ . The spatial–temporal similarity between $p_{i, k}$ and $T_{j}$ is defined as $S_{ϵ, τ} (p_{i, k}, T_{j}) = scor e_{ϵ, τ} (p_{i, k}, p_{j, ℓ})$ .

Definition 7

Spatial–temporal similarity between trajectories. Given a spatial threshold $ϵ$ and a temporal threshold $τ$ , the spatial–temporal similarity between trajectory $T_{i}$ and $T_{j}$ is defined as $T S_{ϵ, τ} (T_{i}, T_{j}) = \frac{1}{| T_{i} |} \times \sum_{p_{i, k} \in T_{i}} S_{ϵ, τ} (p_{i, k}, T_{j})$ , where $| T_{i} |$ represents data points number of $T_{i}$ .

Figure 3 illustrates an example of calculating the spatial–temporal similarity between trajectories. Assume that the spatial threshold is $ϵ = 5$ and the temporal threshold is $τ = 4$ , $S_{5, 4} (p_{1, 1}, T_{2}) = scor e_{5, 4} (p_{1, 1}, p_{2, 1}) = 1 - dist (l_{1, 1}, l_{2, 1}) / 5 = 1 - \sqrt{2} / 5 \approx 0.717$ , and the similarity of other data points can be calculated analogously, that is, $S_{5, 4} (p_{1, 2}, T_{2}) = scor e_{5, 4} (p_{1, 2}, p_{2, 1}) = 0.5$ , $S_{5, 4} (p_{1, 3}, T_{2}) = 0$ $S_{5, 4} (p_{1, 4}, T_{2}) = scor e_{5, 4} (p_{1, 4}, p_{2, 2}) = 1$ , $S_{5, 4} (p_{1, 5}, T_{2}) = scor e_{5, 4} (p_{1, 5}, p_{2, 4}) \approx 0.717$ , $S_{5, 4} (p_{1, 6}, T_{2}) = scor e_{5, 4} (p_{1, 6}, p_{2, 5}) = 0.9$ , $S_{5, 4} (p_{1, 7}, T_{2}) = scor e_{5, 4} (p_{1, 7}, p_{2, 6}) = 0.8$ , $S_{5, 4} (p_{1, 8}, T_{2}) = scor e_{5, 4} (p_{1, 8}, p_{2, 9}) \approx 0.717$ $S_{5, 4} (p_{1, 9}, T_{2}) = scor e_{5, 4} (p_{1, 9}, p_{2, 9}) = 0.8$ . Consequently, $T S_{5, 4} (T_{1}, T_{2}) = (0.717 + 0.5 + 0 + 1 + 0.717 + 0.9 + 0.8 + 0.717 + 0.8) / 9 \approx 0.683$ .

When we calculate $T S_{ϵ, τ} (T_{i}, T_{j})$ , we regard $T_{j}$ as reference trajectory and search matching points on $T_{j}$ for each data point of $T_{i}$ . If we regard $T_{i}$ as reference trajectory reversely, the similarity measure will be different because of different matching point pairs, as shown in Figures 2 and 3. For example, when we calculate $T S_{ϵ, τ} (T_{1}, T_{2})$ , the optimal matching point of $p_{2, 7}$ is $p_{1, 7}$ , and the optimal matching point of $p_{2, 8}$ is $p_{1, 8}$ , yet when calculating $T S_{ϵ, τ} (T_{1}, T_{2})$ , neither $p_{2, 7}$ nor $p_{2, 8}$ is optimal matching point of any data points on $T_{1}$ . Consequently, $T S_{ϵ, τ} (T_{i}, T_{j})$ and $T S_{ϵ, τ} (T_{j}, T_{i})$ are generally not equal. However, as long as $T_{i}$ and $T_{j}$ are similar enough, $T S_{ϵ, τ} (T_{i}, T_{j})$ and $T S_{ϵ, τ} (T_{j}, T_{i})$ are enough to represent the similarity of $T_{i}$ and $T_{j}$ .

Experiment and discussion

Experimental environment

Two real data sets are used for the experiment, which include an AIS data set as the historical motion pattern reference and a maritime radar data set as the test data with identity under confirmed. The historical AIS data set is collected by the shore-based AIS receive equipment and the space range of the data set is part of the Yellow Sea area between Yantai port and Dalian port, with longitude range from 121.2°E to 122.2°E and latitude range from 37.7°N to 38.7°N. The time range of historical data set is from 1–31 August 2015, which contains 24,886 trajectories from 8444 vessels, counting 13,220,910 data points in total. The radar data set is reported from a shore-based sea surveillance radar, whose scanning period is about 20 s, and the spatial cover is the same as the AIS data set. The time range of radar data is from 13:23 to 14:57, 1 September 2015, and time intervals of radar data range from 20–60 s according to the stability of tracking (if stable, reported every 3 circles), resulting in the different length of radar trajectory data. Meanwhile, all the targets in radar data set have correlated AIS information, to supply true vessel type for evaluating.

It is worth noting that for most vessels, the time cost to sail across the area selected for experiment is 3.5–5 h, so the time spans of historical AIS trajectories and radar trajectories differ widely. It is essential to cut AIS trajectory to make the time span close to radar trajectory. First of all, we should determine the baseline points on trajectories. We can specify the latest data point on radar trajectory as the baseline point and the other one on AIS trajectory can be determined by the maximum similarity measure. The baseline matching point provides a reference start time for the other data point, which cannot be determined by time but can only be determined by space. The baseline data point on historical trajectory must be the most similar data point compared to the baseline data point on the reference trajectory. Certainly, if the distance between baseline data points is larger than spatial threshold $ϵ$ , we believe that this historical trajectory is not a similar trajectory.

When we carry out the experiment, each trajectory in the test data set is regarded as the reference trajectory, and search similar trajectories in the historical data set according to the trajectory data index one by one, with all target hits and the same type of hits recorded respectively. In the end, the support is derived.

Experiment results

The experimental parameters are set as follows: $ϵ = 1000 m$ , $τ = 120 s$ .

The experimental results are shown in Figures 4 and 5, which, respectively, shows the total hits and correct hits under different similarity thresholds, and correct type support under different similarity thresholds.

From the results shown in Figures 4 and 5, hits decrease gradually along the trajectory data lengths while the support increases. This illustrates that the STCTS-based mining algorithm can accumulate and highlight the motion characteristics of targets. The motion significance of the target trajectory can be strengthened with increase in data points. As the similarity threshold increases, some radar target hits become 0, which play less in our work. So we did not adopt a larger similarity threshold in our experiments.

Figure 4.

Total hits and correct hits.

Figure 5.

Support of correct type hits.

Figure 6(a) and (b) shows the hits and support along trajectory length of radar target 7 with similarity threshold as 0.4.

Figure 6.

Mining results along trajectory length: (a) Hits of radar target 7. (b) Support of radar target 7.

Performance evaluation

In Wang et al.²³ and Toohey and Duckham,²⁴ the existing trajectory similarity measure such as HD, DTW, LCSS, EDR, ERP, SpADe, and wDF and the methods based on the basic ideas above are introduced in detail with comprehensive experiment analysis. Based on the analysis above and the specific work in this article, we conduct comparative experiments with modified Hausdorff distance (MHD) and interpolated modified Hausdorff distance (IMHD) in Wei et al.²⁷ and Shao et al.,³⁶ and LCSS in Michail et al.³⁷ MHD and IMHD are two improved approaches based on HD. Two kinds of path similarity functions of S1 and S2 based on LCSS were proposed in Michail et al.,³⁷ measuring the similarity of two trajectories by the ratio of points number of common parts to the total number of trajectories.

Different from the experiment in section “Experiment results,” we use the total radar trajectory as the conference trajectory to mine similar trajectories in historical AIS trajectories. Because when we compute the similarity between radar trajectory and AIS trajectory, we cut the subtrajectory from AIS, we do the same with the other model.

Since the proposed STCTS-based mining algorithm uses spatial threshold, temporal threshold, and similarity threshold, in order to uniform the conditions of experiments, the parameters of the proposed method in the comparative experiment are set the same as the experiments in section “Experiment results.” The spatial threshold in similar trajectory mining methods based on MHD, IMHD, and LCSS model is set to the value corresponding to the similarity threshold, respectively.

The experiment runs on the desktop computer, the operating system is Windows7, CPU is Intel Core i5-4570, the memory is 8G, all algorithms are written in C# language, and the program running environment is VS2010. Data set used for comparative experiment is the same as in section “Experimental environment,” but we only use the whole trajectories in the validation data set as reference trajectory. The run time of algorithms and the support of the same type vessels will be recorded. In order to reduce the random factors, we repeat the experiment 10 times and take the averaged value as the final result. The run time of various methods is shown in Table 1. In addition, the distance between the trajectories was output by MHD and IMHD, so the spatial threshold works when output. The spatial threshold in LCSS was used for searching, and different thresholds will result in different experiment results, which run separately for different spatial thresholds.

Table 1.

Runtime of different models.

Model	Runtimes (s)
STCTS	133.45
MHD	362.10
IMHD	1855.07
LCSS ( $ϵ = 600$ )	97.88
LCSS ( $ϵ = 400$ )	88.94

STCTS: spatial and temporal constrained trajectory similarity; MHD: modified Hausdorff distance; IMHD: interpolated modified Hausdorff distance; LCSS: longest common subsequence.

As shown in Table 1, the time performance of STCTS and LCSS is better. Because there is no need to compare temporal attributes, the time cost of LCSS model is lower than STCTS.

As the results shown in Table 2, the proposed STCTS-based mining algorithm obtains the highest support, which means the STCTS model can measure the spatial–temporal motion pattern better. We take an example of radar target 150 to show the mining similar trajectories of different methods, shown in Figure 7(a)–(d).

Table 2.

Hits and supports with different thresholds.

Thresholds	STCTS			LCSS			MHD			IMHD
Thresholds	Total hits	Correct hits	Support	Total hits	Correct hits	Support	Total hits	Correct hits	Support	Total hits	Correct hits	Support
0.1 (900)	4829	4107	0.8505	7640	5969	0.7813	3916	2986	0.7625	3921	2991	0.7628
0.2 (800)	2798	2388	0.8535	6637	5191	0.7821	3340	2519	0.7542	3343	2522	0.7544
0.3 (700)	1725	1485	0.8609	5668	4432	0.7819	2696	2013	0.7467	2700	2015	0.7463
0.4 (600)	1102	958	0.8693	4710	3696	0.7847	2061	1543	0.7487	2067	1548	0.7489
0.5 (500)	648	566	0.8735	3665	2896	0.7902	1413	1036	0.7332	1417	1038	0.7325
0.6 (400)	360	319	0.8861	2599	2065	0.7945	811	600	0.7398	816	602	0.7377

STCTS: spatial and temporal constrained trajectory similarity; LCSS: longest common subsequence; MHD: modified Hausdorff distance; IMHD: interpolated modified Hausdorff distance.

Figure 7.

Similar trajectories mining results with different methods: (a) STCTS-based method. (b) LCSS-based method. (c) MHD-based method. (d) IMHD-based method.

According to the comparative experiment results above, we can draw the following conclusions:

Similar trajectory mining methods can help to identify targets, and the support is relatively high with different similarity models;

The proposed STCTS model can measure the motion characteristics of trajectory better. The mining algorithm based on the similarity model can effectively extract a similar spatial–temporal motion pattern with similar motion features and obtain a higher support.

Conclusion

Based on the idea of searching similar matching points, we define spatial and temporal thresholds to constrain the range of approximately matching points, then we choose the matching point with the highest similarity, that is, the optimal matching point, to measure the similarity between two trajectories. Based on the STCTS, a mining algorithm to mine similar motion pattern from historical trajectories is proposed and targets with unknown identity are confirmed by comparing current interested trajectory with historical trajectories. Experiments results with real data sets show that the STCTS-based mining algorithm can effectively realize the mining of similar motion patterns from historical data set, which can provide a relatively higher support for target recognition.

The proposed algorithm can be applied to daily awareness of maritime situations, which can run in the background for searching similar trajectories compared to current trajectories. In addition, we can set several special trajectory motion patterns in advance, and focus on the targets matching with those motion patterns.

Footnotes

Handling Editor: Janos Botzheim

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China (61531020) and by the Special Funds of Taishan Scholars Construction Engineering.

ORCID iD

Lu Sun

References

Jacobs

Sullivan

JAO

. Automatic target recognition using sequences of high resolution radar range-profiles. IEEE T Aero Elec Sys 2000; 36(2): 364–381.

Liu

Yuan

et al . Progress in radar automatic target recognition based on high range resolution profile. J Electron Info Technol 2005; 27(8): 1328–1334.

Wang

Zhao

et al . Radar target recognition algorithm based on RCS observation sequence-set-valued identification method. J Syst Sci Complex 2016; 29(3): 573–588.

Zhang

Guan

. Study on radar emitter recognition signal based on rough sets and RBF neural network. In: Proceedings of the international conference on machine learning and cybernetics, Hebei, China, 12–15 July 2009, pp.1225–1230. New York: IEEE.

Hung

Lin

Chu

. An extended algorithm of similarity measures and its application to radar target recognition based on intuitionistic fuzzy sets. J Test Eval 2015; 43(4): 878–887.

Zhao

Zhu

Liu

et al . Ship target identification method based on multi-dimensional signal feature fusion. J Mil Commun Technol 2016; 37(1): 48–5173.

Wang

Tian

. State-of-the-art of ship detection and recognition in optical remotely sensed imagery. Acta Automat Sin 2011; 37(9): 1029–1039.

. Research on ship recognition with image processing. Comput Digit Eng 2015; 43(7): 1207–1211.

. Atomic potential matching: an evolutionary target recognition approach based on edge features. Int J Light Electron Opt 2016; 127(5): 3162–3168.

10.

Novak

Owirka

Weaver

. Automatic target recognition using enhanced resolution SAR data. IEEE T Aero Elec Sys 1999; 35(1): 157–175.

11.

Duman

Cetin

. Target detection and classification in SAR images using region covariance and co-difference. In: Proceedings of SPIE—the international society for optical engineering, Orlando, FL, 29 April 2009, pp.73370–73378. New York: SPIE.

12.

Tang

Gao

. Target classification of ISAR images based on feature space optimisation of local non-negative matrix factorisation. IET Signal Pr 2012; 6(5): 494–502.

13.

Zheng

. Trajectory data mining: an overview. ACM T Intel Syst Tech 2015; 6(3): 1–41.

14.

Feng

Zhu

. A survey on trajectory data mining: techniques and applications. IEEE Access 2016; 14: 2056–2067.

15.

Yuan

Sun

Zhao

et al . A review of moving object trajectory clustering algorithms. Artif Intell Rev 2016; 47: 123–144.

16.

Palma

Bogorny

Kuijpers

et al . A clustering-based approach for discovering interesting places in trajectories. In: Proceedings of the 2008 ACM symposium on applied computing, Fortaleza, 16–20 March 2008, pp.863–868. New York: ACM.

17.

Liu

Xiao

Ding

et al . Discovery of hot region in trajectory databases. J Softw 2013; 24(8): 1816–1835.

18.

Chen

Shen

Zhou

. Discovering popular routes from trajectories. In: Proceedings of the 2011 IEEE 27th international conference on data engineering (ICDE), Hannover, 11–16 April 2011, pp.900–911. New York: IEEE.

19.

Yin

Yao

et al . Mining frequent spatio-temporal items in trajectory data. Int J Database Theory Appl 2015; 8(4): 149–156.

20.

Zhang

Lee

. Periodic Pattern Mining for Spatio-Temporal Trajectories: A Survey. In: Proceedings of the 2015 10th international conference on, intelligent systems and knowledge engineering (ISKE), Taipei, Taiwan, 24–27 November 2015, pp.306–313. New York: IEEE.

21.

Pallotta

Vespe

Bryan

. Traffic knowledge discovery from AIS data. In: Proceedings of the 2013 16th international conference on information fusion (FUSION), Istanbul, 9–12 July 2013, pp.1996–2003. New York: IEEE.

22.

Zhu

Shao

. Trajectory similarity measure based on multiple movement features. Geomat Info Sci Wuhan Univ 2017; 42(12): 1703–1710.

23.

Wang

Mueen

Ding

et al . Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 2013; 26(2): 275–309.

24.

Toohey

Duckham

. Trajectory similarity measures. J Spatial Inf Sci 2015; 7(1): 43–50.

25.

Tiakas

Papadopoulos

Nanopoulos

et al . Searching for similar trajectories in spatial networks. J Syst Softw 2009; 82(5): 772–788.

26.

Chen

. Mining user similarity based on routine activities. Inf Sci 2013; 236(1): 17–32.

27.

Wei

Teng

et al . Trajectory classification based on Hausdorff distance and longest common subsequence. J Electron Inf Technol 2013; 35(4): 784–790.

28.

Hung

Wen-Chih

Wang-Chien

. Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J 2015; 24(2): 169–192.

29.

Shahir

Glasser

Shahir

et al . Maritime situation analysis framework: Vessel interaction classification and anomaly detection. In: Proceedings of the 3rd IEEE international conference on big data, Santa Clara, CA, 29 October–1 November 2015, pp.1279–1289. New York: IEEE.

30.

Chen

Zsu

Oria

. Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD international conference on management of data, Baltimore, MD, 14–16 June 2005, pp.491–502. New York: ACM.

31.

Keogh

Ratanamahatana

. Exact indexing of dynamic time warping. Knowl Inf Syst 2005; 7(3): 358–386.

32.

Chen

Nascimento

Ooi

et al . SpADe: on shape-based pattern detection in streaming time series. In: Proceedings of the IEEE international conference on data engineering, Istanbul, 15–20 April 2007, pp.786–795. New York: IEEE.

33.

Chen

Nascimento

. Effective and efficient shape-based pattern detection over streaming time series. IEEE T Knowl Data Eng 2012; 24(2): 265–278.

34.

Ding

Trajcevski

Scheuermann

. Efficient similarity join of large sets of moving object trajectories. In: Proceedings of the 15th international symposium on temporal representation and reasoning (TIME ’08), Montreal, QC, Canada, 16–18 June 2008, pp.79–87. New York: IEEE.

35.

Harati-Mokhtari

Wall

Brooks

et al . Automatic identification system (AIS): data reliability and human error implications. J Navig 2007; 60: 373–389.

36.

Shao

Cai

. A modified Hausdorff distance based algorithm for 2-dimensional spatial trajectory matching. In: Proceedings of the 2010 5th international conference on computer science & education, Hefei, China, 24–27 August 2010. New York: IEEE.

37.

Vlachos

Gunopoulos

Kollios

. Discovering similar multidimensional trajectories. In: Proceedings of the international conference on data engineering, San Jose, CA, 26 February–1 March 2002, pp.673–684.