Abstract
As a widely used technique for discovering developmental trajectory of a specific field of science and technology, main path analysis armed with global search strategy prefers longer citation paths rather than shorter ones. An obvious feature of longer main paths is that the theme of documents may not be so coherent, though longer paths may provide more details on the development of a field than shorter ones. Thereupon, a new measure, named as intermediacy, was proposed in the literature for recognising important scientific publications. However, the intermediacy is only applicable to the citation network with one single target node and one single source node. For purpose of loosening this limitation of the intermediacy and benefitting from main path analysis and intermediacy, this work raises an alternative approach for discovering developmental trajectory by combining node importance and edge importance via edge and node integrated modes. Extensive experimental results on the weak signals and education fields indicate that similar trajectories can be obtained through these two integrated modes, and richer implications can be encoded in our discovered trajectories than those from main path analysis and intermediacy. In addition, our framework is able to scale very well to a large citation network.
Keywords
1. Introduction
As early as 1964, Garfield et al. [1] observed that historical context of science could be represented by a series of important events in chronological order, so they proposed to trace the sequential connection of key articles in a citation network. Since then, many scholars were dedicated to cherry-picking the critical documents that played historically important roles in the process of knowledge diffusion [2–4]. However, as the citation network in a focal field becomes larger and more complicated, it is not trivial to identify important nodes. Mapping citation networks and highlighting structural backbone of a developing scientific field is an effective way to describe the developmental trajectory [5], explore the flow of scientific knowledge [5,6] and conduct a literature review [7].
As a representative approach in this direction, the main path analysis, pioneered by Hummon and Dereian [2], takes the citation links in a citation network as the knowledge diffusion channels between scientific publications. The citation-based traversal counts [2,8] are utilised to measure the resulting knowledge significance. Then, one can obtain a skeleton structure from the citation network of an interested domain through a forward/backward/two-way local/global search strategy [9]. This skeleton structure is usually referred to as the main paths, through which important developments in a particular field can be understood.
However, Šubelj et al. [10] found that main path analysis armed with the global search strategy preferred to longer citation paths rather than shorter ones. An obvious feature of longer main paths is that the theme of documents may not be so coherent [11,12], though longer paths may provide more details on the development of a field than shorter ones. Thereupon, a new measure, named as intermediacy, was put forward by Šubelj et al. [10] for recognising important scientific publications that connected an older scholarly article and a more recent one in a citation network. The older article and the more recent one are called the target and the source, respectively. 1 The publications with a high intermediacy seem to play a major role in the historical development from the target to the source.
In this way, intermediacy seems be very similar to centrality [3,13]. As a matter of fact, intermediacy is measured relative to a pair of source and target nodes, not relative to a citation network as a whole. To say it in another way, intermediacy is only applicable to the citation network with one single target node and one single source node. If the nodes that are cited while referring to no other nodes are viewed as targets, and the nodes that are referred to while citing to no nodes as sources, Šubelj et al. [10] did not provide any guideline on how to deal with this situation. To loosen this limitation about multiple target nodes and multiple source nodes and combine the benefits of citation-based traversal count and intermediacy, this work raises an alternative approach for discovering developmental trajectory with the following main contributions.
A novel approach for tracing scientific developments is proposed by combining main path analysis and intermediacy with node and edge integrated modes.
In contrast to original intermediacy, our approach can be applicable to many real-world situations through transforming a citation network to its standard form.
Discovered developmental trajectories of the fields of weak signals and education show that our framework can serve as an alternative to main path analysis.
The remainder of this article is organised as follows. After main path analysis and intermediacy are briefly reviewed in Section 2, a framework for tracing knowledge development is put forward in Section 3, and a running example is also illustrated for ease of understanding in this section. The datasets pertaining to weak signals and education are collected with different search strategies from the Scopus and Web of Science databases, respectively, in Section 4 and parameter settings are also described in this section. To determine which edge and node weighting approaches are combined and to obtain valuable insights on the relationship between these weighting approaches, Section 5 conducts the correlation analysis. Then, Section 6 compares comprehensively developmental trajectories of weak signals and education fields from main path analysis, intermediacy and our approach. Section 7 concludes this work with possible limitations of our work and next studies.
2. Related work
Before delving into more specifies, discussion of the literature pertinent to main path analysis and intermediacy is in order.
2.1. Main path analysis
The main path analysis, pioneered by Hummon and Dereian [2], pays more attention to the importance of links. The main paths consisting of important links reflect the main knowledge trajectory of a targeted domain. Since a citation network is actually a directed acyclic graph, global or local priority first search algorithm [14] is usually utilised to extract these main paths. To measure the importance, the following traversal weighting methods are proposed in the literature: Node Pair Projection Count (NPPC) [2], Search Path Link Count (SPLC) [2], Search Path Node Pair (SPNP) [2], Search Path Count (SPC) [8] and so on.
Among these weighting methods, the NPPC method only considers the reachability of different nodes. The SPLC method counts the node pairs passing through a link when considering different search paths, but the destination must be one of the sink nodes. The SPNP approach calculates the number of node pairs passing through any link when considering different search paths. The SPC approach measures the total times that a citation link is travelled if one runs through all the possible citation paths from all the sources. Because of its high computational-complexity, the NPPC method is rarely used in real-world applications. Liu et al. [6] analysed the commonality and specialty among SPNP, SPLC and SPC with a messenger-and-tollway analogy, and concluded that the SPLC was closer to the actual situation of knowledge diffusion.
Intuitively, if these edge weights reflect the amount of knowledge flow and each node produces a piece of knowledge, the total inflow into an intermediate node (i.e. nodes with non-zero in-degree and non-zero out-degree) should be less than the total outflow out of that node. Naturally, as for any intermediate node, its weighted in-degree (WiD) (i.e. the sum of all incident arc weights) and weighted out-degree (WoD) (i.e. the sum of all outgoing arc weights) can reflect the total knowledge inflow and knowledge outflow, respectively. Thereupon, Kuan [15] re-investigated these three popular traversal counting methods by comparing their WiDs and WoDs. An intermediate node under the SPC was found to always have identical WiD and WoD [15], which is in line with the observation on following Kirchhoff’s node law [8]. Under the SPLC, the WiD of an intermediate node is always less than its WoD, but its WiD under the SPNP may be less than, greater than or equal to its WoD [15]. Therefore, Kuan [15] argued that the SPNP should not be dismissed so easily.
Several variations of main path analysis are also put forward for different situations. Jiang et al. [16] developed a solution for the cycles in a citation network. It is well known that all references may not have the same value to an interested citing publication [17,18]. On the basis of full texts, Hao [19] armed the main path analysis with the ability of important citation identification. Another well-known phenomenon is that when knowledge flows via the citation chain, the strength decays. To deal with this case, Liu and Kuan [20] raised a decay version for main path analysis. In addition, Liu and Lu [9] integrated multiple variants. As for search strategy, forward search from sources to sinks [2], backward search from sinks to sources [9] and bi-directional search from key routes [9] are involved in the literature.
2.2. Intermediacy
Different from main path analysis, intermediacy [10] emphasises the measurement on the importance of nodes rather than that of edges. In more details, given a target node
In this way, the probability that there exists an active target–source path, which passes the node
It is worth mentioning that each edge in the network
3. Research framework and methodology
From Section 2, it can be readily seen that the traversal counting approaches, such as SPLC [2], SPNP [2] and SPC [8], actually measure the importance of each edge, and intermediacy [10] measures the importance of each node. For purpose of integrating them, one can come up with the following two modes. (1) The intermediacy is converted to the edge weighting counterpart, and then combined with a focal traversal counting approach, as shown by blue lines in Figure 1. (2) A focal traversal counting approach is converted to the node weighting counterpart, and then combined with the intermediacy, as shown by red lines in Figure 1. The former is referred to as edge integrated mode, and the latter as node integrated mode in this study. In what follows, each phase in Figure 1 is described at length. Before this, to facilitate understanding our methodology, the citation network illustrated in Figure 2(a) is taken as a running example in this section. In this citation network, each node denotes a document, and the cited node points to the citing one, indicating the fact that knowledge in the cited document flows to the citing one.

Research framework for discovering developmental trajectory by integrating main path analysis and intermediacy. Note that the SPC-WiD, SPC-WoD and SPC-WxD are equivalent to each other [15].

A running example of the citation network. (a) An original citation network, (b) the citation network in standard form, (c) the citation network with intermediacy to weight the nodes, (d) the citation network with normalised intermediacy to weight the nodes, (e) the citation network with intermediacy – both to weight the edges, (f) the citation network with SPLC to weight the edges, (g) the citation network with normalised SPLC to weight the edges, (h) the citation network with SPLC-WxD to weight the nodes and (i) the citation network with normalised SPLC-WxD to weight the nodes.
3.1. Transform to standard form
As mentioned in Section 1, the intermediacy is only applicable for the citation network with one single target and one single source. To loosen this limit, we transform a citation network to its standard form [8] by extending the set of nodes with a common target
3.2. Calculate intermediacy and convert to edge weights
By following the Monte Carlo algorithm in the study by Šubelj et al. [10], one can readily calculate the intermediacy for each node in the transformed citation network, as depicted in Figure 2(c). Although the intermediacy value reflects the importance of the corresponding node from the target
To facilitate subsequent integration with a focal traversal counting method, one can convert the intermediacy of each node to the weights of its incident edges (intermediacy-in for short) or outgoing edges (intermediacy-out for short). Of course, each edge can be also weighted as the average of the intermediacies of head and tail nodes (intermediacy-both for short). It is not difficult to see that the first conversion method assigns a same weight to the incident edges of the source
3.3. Calculate edge weights and convert to node weights
In the literature, there are many approaches for estimating the importance of each citation link (cf. Subsection 2.1). Here, the SPLC is taken as a representative of traversal counting methods, as shown in Figure 2(f). To directly estimate the importance of each node from these edge weights, Liu et al. [6] proposed that the importance of each node could be measured as the average of incident and outgoing edge weights. For narrative convenience, a suffix ‘-Liu’ is appended to each traversal counting method, such as SPC-Liu, SPLC-Liu and SPNP-Liu in Figure 1.
In addition, each node can also be weighted by following the idea of a focal traversal counting method, for example, SPLC. Kuan [15] developed two counterparts for weighting each node through the WiDs and WoDs of its preceding nodes and succeeding ones, referred to SPLC-WiD and SPLC-WoD (cf. Figure 1). Another counterpart is to allocate the node weight by averaging its SPLC-WiD and SPLC-WoD, which was named as SPLC-WxD in the study by Kuan [15] (cf. Figure 1). Thus, for each traversal counting method, three node weighting alternatives can be obtained, though Kuan [15] suggested that the SPLC and SPNP should be preferred in real-world applications. Here, Figure 2(h) depicts the citation network with SPLC-WxD to weight the nodes.
It is noteworthy that since the SPC follows Kirchhoff’s node rule [8,15], three SPC-based counterparts for node weights are actually equivalent to each other. Therefore, hereinafter, the SPC-WxD is chosen to collectively refer to these three methods.
3.4. Normalise edge/node weights
Since different edge/node weighting approaches follow different calculation rules (cf. Subsection 3.2 and 3.3), it is very possible that their values may be in different magnitude. Hence, before combining them, the resulting weights should be linearly normalised with equations (2) and (3) to the interval
Here,
3.5. Combine edge/node edges
From Figure 1, one can see that multiple edge and node weighting approaches are simultaneously taken into consideration in this study. If we want, all edge/node weight calculation approaches can be combined by means of linear weighting. However, this work aims to discover developmental trajectory by benefitting from main path analysis and intermediacy. Hence, the intermediacy-based method is just integrated with the traversal counting–based one in this study with equations (4) and (5)
Here,
3.6. Priority first search
Once the edge/node weights of a citation network are ready, many priority first search algorithms [14] can be used, such as forward search from targets to sources [2], backward search from sources to targets [9], and bi-directional search from key routes [5,7,9] and so on. For simplicity, only forward search from targets to sources is utilised in this work. To better understand the effect of edge and node integrated modes, several candidate values of
The resulting ranks for top four main paths by combing edge weights.
The resulting ranks for top four main paths by combining node weights.

The corresponding ranks with different values for
Table 2 illustrates the resulting ranks for these paths with different values for
By comparing these situations, one can conclude that the intermediacy-based weighting method favours shorter paths, while the traversal counts–based weighting method favours longer paths. This observation is in line with the study by Šubelj et al. [10].
In the end, similar to the study by Verspagen [25], one can merge multiple top main paths to form a developmental trajectory of an interested domain. If top four main paths are merged, three different developmental trajectories can be obtained from the citation network in Figure 2(a), as shown in Figure 4. It seems to have nothing to do with whichever integrated mode is utilised. These three trajectories correspond to different values of the tradeoff hyper-parameter

The developmental trajectory of the citation network in Figure 2(a) (a)
4. Datasets and parameter setting
Two bibliographic datasets are utilised in this study to empirically evaluate our framework (cf. Figure 1). The first dataset only consists of 204 scientific publications about weak signals, which is collected according to a designed query from the Scopus database. We also collect 7424 scholarly articles in eight journals in the field of education from the Web of Science database. These two datasets are obtained by following different strategies (query-based strategy for the first dataset and journal-based strategy for the second dataset), and different scales of documents are involved in these two datasets. In this way, the robustness and scalability of our framework can be evaluated comprehensively.
4.1. Weak signals
To find the publications related to weak signals research and ensure the quality of the analysis, we collected publications from the Scopus database on 20 August 2020 from the library of Beijing University of Technology. The following search strategy is adopted here: TITLE-ABS-KEY (‘weak sign*’OR‘horizon scan*’OR‘environmental scan*’OR‘seed* of change’OR‘wild card*’OR‘black swan*’OR‘early warning sign*’OR‘future sign*’OR‘emerging sign*’OR‘anticipation of the future’OR‘strategic surprise’). The publication year spans from 1975 to 2020, and the document type is limited to article, review, and proceedings paper. The research area includes computer science, social sciences, business, management and accounting, economics, econometrics and finance, decision sciences.
The reason that this case is selected is that we are well acquainted with the topic. To make sure that all publications are pertaining to weak signals, scholarly articles in the retrieved results are checked carefully one by one by reading the corresponding titles and abstracts. In the process of manual screening and inspection, several important articles are found to miss from our original dataset. We manually add them with their references to the dataset. Finally, 204 articles are included in this dataset for further analysis. The distribution of number of scholarly articles over year is illustrated in Figure 5. It can be seen from Figure 5 that the number of articles increased at a prolonged rate before 2012, but after 2012, the number of publications increased rapidly. This shows that this field has been receiving more and more attention.

Distribution of number of publications over year for weak signals dataset.
4.2. Education
According to the opinions of domain experts and 2020 Journal Citation Report, eight journals in the education discipline are chosen, as illustrated in Table 3. The publications in these journals were collected from the Web of Science database on 26 August 2021 from the library of Peking University. The document type is limited to Article, Article; Early Access, Article; Proceedings Paper, Database Review, Reprint, and Review, and publication year spans from 1983 to 2020. In the end, the number of scholarly articles is 7424, and Figure 6 reports the number distribution of publications in each journal over years. In addition, to construct the citation network, the Digital Object Identifier (DOI) of each cited reference is further cleaned with the method given in the study by Xu et al. [26].
The statistics of eight journals in the education discipline.

The distribution of number of publications over year in the education dataset.
From Figure 6, one can observe an obvious trend of increase in terms of the number of publications for several journals, such as British Journal of Educational Technology and British Educational Research Journal. However, in general, the trend in the number of articles published in most journals is relatively flat. Laakso [27] found that the average number of articles published yearly in each scholarly journal in the Social Sciences and Humanities is roughly between 16 and 18 articles. From Table 3, the majority of journals are close to this average number of articles per year. Two exceptions are American Educational Research Journal and Harvard Educational Review.
4.3. Parameter setting
Three core parameters are involved in our approach: the probability
As for the number of top main paths
There remain the trade-off hyper-parameters
5. Correlation analysis
Our framework for developmental trajectory discovery involves multiple edge and node weighting approaches (cf. Figure 1). Intuitively, it should be better to combine two weighting methods with a low correlation so that the discovered developmental trajectory can encode richer implications, such as the characteristics of topology structure and the importance of nodes. Therefore, to gain some valuable insight of the relationship among these weighting approaches, the Spearman’s rank correlation coefficients are calculated in this section on the weak signals and education datasets, respectively, as depicted in Figures 7–10. Apart from the weighting approaches in Figure 1, three groups of centrality based measures, including degree centrality, closeness centrality and betweenness centrality, are also taken into considerations. In this way, there are 15 edge weighting approaches and 14 node weighting ones in total.

The Spearman’s rank correlation coefficients among edge weighting approaches on the weak signals dataset.

The Spearman’s rank correlation coefficients among edge weighting approaches on the education dataset.

The Spearman’s rank correlation coefficients among node weighting approaches on the weak signals dataset.

The Spearman’s rank correlation coefficients among node weighting approaches on the education dataset.
In addition, the average ranks of edge and node weighting approaches on the weak signals and education datasets are reported, respectively, in Tables 4 and 5. For the sake of calculating the average rank of each weighting approach, a rank is assigned to each approach in advance according to the correlation coefficients between this approach and the others (cf. each row/column in Figures 7–10). For example, the intermediacy-in, intermediacy-out and intermediacy-both are ranked second, fourth and third in the first column of Figure 7 and the intermediacy is ranked second in the first column of Figure 9. After this operation, one can average the resulting ranks of each approach to obtain its average rank. In fact, this average rank can be viewed as an indicator to measure the degree of difference of an approach with the others. The smaller the average rank, more is the disagreement with the others.
The average ranks of edge weighting approaches on the weak signals and education datasets. The best edge weighting approach in term of the average rank is shown in bold.
SPC: search path count; SPLC: search path link count; SPNP: search path node pair.
The average ranks of node weighting approaches on the weak signals and education datasets. The best node weighting approach in term of the average rank is shown in bold.
SPC: search path count; SPLC: search path link count; SPNP: search path node pair.
From Figures 7 and 8, several interesting phenomena can be observed as follows. (1) As expected, the SPC, SPLC and SPNP are highly correlated with one another. In our opinion, this should be attributed to a similar traversal counting idea followed by them. It is this high correlation that enables main path analysis based on these traversal counting approach to often produce almost the same results [8,28]. (2) As for three edge weighting counterparts of each node weighting approach, the both-variant shows a strong correlation with the out-variant, and a weak correlation with the in-variant. It seems no relationship between the out-variant and the in-variant. Let’s take intermediacy as an example. The correlation coefficients among its edge weighting counterparts on the weak signals dataset are 0.835 (intermediacy-both versus intermediacy-out), 0.327 (intermediacy-both versus intermediacy-in) and −0.094 (intermediacy-out versus intermediacy-in). (3) The traversal counting methods including SPC, SPLC and SPNP show higher correlation with three edge weighting counterparts of betweenness centrality than those of other centralities and intermediacy. On the contrary, the edge weighting counterparts of intermediacy show higher correlation with the traversal counting methods and those of degree and closeness centralities.
As for node weighting approaches, several prominent observations can also be summarised from Figures 9 and 10. (1) The WxD variant is highly correlated with Liu variant, such as SPC-WxD versus SPC-Liu. (2) The WxD variant shows a strong correlation with the WiD and WoD variants, and the WiD variant shows a weak correlation with the WoD variant. For example, the correlation coefficients among the node weighting counterparts of SPNP on the education dataset are 0.747 (SPNP-WxD versus SPNP-WiD), 0.660 (SPNP-WxD versus SPNP-WoD), and 0.280 (SPNP-WiD versus SPNP-WoD). (3) All WxD/WiD/WoD/Liu variants agree strongly with one another. For instance, one can readily observe a high correlation among SPC-WxD, SPLC-WxD and SPNP-WxD. Again, we argue that this should be put down to a similar idea they follow. (4) The intermediacy shows a higher correlation with degree centrality than with closeness centrality, betweenness centrality and all node weighting counterparts. The centralities show similar weak correlation with all node weighting counterparts.
From the average ranks in Tables 4 and 5, it is very easy to see that the intermediacy-in and intermediacy rank first on the weak signals and education datasets. This indicates that the intermediacy follows a very different idea from traversal counting approaches and centralities. This is the main reason why the intermediacy is chosen as a component of our integrated approach for discovering developmental trajectories. Once a component is determined, another component can be readily determined from the row or column corresponding to the intermediacy-in in Figures 7 and 8 and intermediacy in Figures 9 and 10. As mentioned in the first paragraph of this section, the approach showing the weakest correlation with the intermediacy-in or intermediacy should be preferred. In our case studies, the SPNP is utilised as another component in the edge integrated mode on both weak signals and education datasets, and SPNP-WoD and SPNP-Liu as another component in the node integrated mode on the weak signals and education datasets, respectively.
6. Developmental trajectories
This section presents developmental trajectories discovered by edge and node weighting approaches on the weak signals and education datasets. This serves as empirical illustrations of the use of our approach.
6.1. Weak signals
From Figures 11 and 12, it is not difficult to see that the developmental trajectory discovered by the SPNP and SPNP-WoD contain the most publications, while the trajectories by the intermediacy-in and intermediacy include the fewest. That is to say, the number of nodes in the trajectory by our approach lies between those in the other trajectories. In our opinion, this phenomenon is related to the following characteristics: the main path analysis armed with global search strategy favours the longer paths (the longest path consists of 10 nodes) and the intermediacy favours the shorter ones (the longest path comprises 8 nodes). Therefore, our method can be viewed as a tradeoff between main path analysis and intermediacy. In addition, by comparing Figures 11(c) and 12(c), similar trajectories can be obtained through edge and node integrated modes.

The developmental trajectories of the weak signals field discovered by edge weighting approaches: (a) intermediacy-in, (b) SPNP and (c) intermediacy-in + SPNP.

The developmental trajectories of the weak signals field discovered by node weighting approaches: (a) intermediacy, (b) SPNP-WoD and (c) intermediacy + SPNP-WoD.
Another interesting phenomenon is that a divergence–convergence–divergence pattern can be observed from the developmental trajectories by the intermediacy and our approach, but only divergence pattern can be observed from the trajectory by the main path analysis. In other words, some pivot articles, such as Hiltunen [29] and Thorleuchter and Van den Poel [30], are highlighted by the intermediacy and our approach. From practical significance, Hiltunen [29] developed a deeper theoretical understanding of weak signals, and re-defined weak signals with three characteristics: the signal, the issue and the interpretation. This definition lays the underpinning on how to identify (semi-)automatically weak signals. Thorleuchter and Van den Poel [30] raised an idea mining algorithm to recognise relevant textual patterns to solve a given strategic problem, which is very different from standard filtering algorithms in environmental scanning procedures. This algorithm can obtain the patterns with low relevant information content.
In addition, it is noteworthy that the ground-breaking work [31] disappears from the trajectory by the main path analysis and more recent articles are emphasised by the main path analysis. Ansoff [31] introduced the term weak signals, and argued that the chances of strategic surprise can be minimised by coping properly with weak signals. In the initial development of the studies on weak signals, an important role was actually played by Ansoff [31]. Koivisto et al. [32], Lee and Park [33] and Kim et al. [34] discussed three applications of weak signals in the fields of mass transport attacks, ethical issues in artificial intelligence and policy research. These three academic articles did not make much theoretical contribution to the development of weak signals. Surprisingly, a review article by Holopainen and Toivonen [35] appears in the trajectory by the intermediacy. The intermediacy seems to give equal attention to initial and recent stages in terms of edge weights. In summary, our integrated framework can benefit from main path analysis and intermediacy approaches while overcoming their shortcomings.
In what follows, the developmental trajectory of weak signals research through edge integrated mode (cf. Figure 11(c)) will be described in more details. According to the divergence–convergence–divergence pattern, the development of weak signals goes roughly through three distinct stages. In the emergence stage, from weak signals concept and related theories [31] to environmental scanning [36], weak signal fields began to receive extensive attention. In the exploration stage, as a key pivot publication, Hiltunen [29] developed a three-dimensional theoretical model of the future sign, which provided a support for identifying quantitatively weak signals. After that, the studies on weak signals mainly focused on perception framework and identification approaches. From 2015 till now, research on weak signals tends to further improve the recognition approaches.
6.1.1. The emergence stage: from 1975 to 2008
This stage involves three articles. Ansoff [31] first discussed the positive implications of weak signals for managing strategic surprises in strategic discontinuity. The strategic planning process requires clear and specific information to help organisers estimate potential impacts, develop countermeasures and formulate strategic plans at an early stage. However, this information is difficult to obtain, so the managers need to explain and convert weak signals in the environment into concrete actions to handle threats and opportunities. Thomas [36] analysed new techniques of environmental scanning that are closely related to business planning and facilitate the organisers to identify important factors in the environment to develop broad strategic and operational plans. After that, Mendonça et al. [37] suggested weak signals as an early manifestation responding to organisational changes, as shown in Table 6.
6.1.2. The exploration stage: from 2008 to 2015
In the exploration stage, Hiltunen [29] developed a three-dimensional model of future signals from a semiotic perspective due to the lack of a formal definition of weak signals and defined the theoretical framework from three subjective and objective dimensions: the signal, the issue and the interpretation. The weak signals were grouped further into two categories: early information and first symptoms. Early information is a type of weak signal with a small number of signals and low visibility of signals. The first sign is a type of weak signal with a large number of signals and visible but difficult to interpret. Kuosa [38] presented a future signals sense-making framework (FSSF) to perceive emerging futures. The framework includes three levels of weak signals, drives and trends, and divides future knowledge into two broad categories, namely, disruptive information and linearly evolving information. The former disrupts and shocks the existing knowledge, and the latter maintains the existing knowledge.
After that, several weak signal recognition approaches were proposed on the basis of the triadic model of future signals [29]. Yoon [39] utilised text mining to extract keywords and construct a keyword portfolio to identify weak signal topics in online news articles. Thorleuchter and Van den Poel [40] explored text information from Internet with Latent Semantic Indexing (LSI) to identify and analyse weak signals. Then, Thorleuchtera et al. [41] improved the LSI to capture change of weak signals over time. Furthermore, Thorleuchter and Van den Poel [30] raised a novel methodology for idea mining to extract and analyse weak signals (Table 6).
A list of publications appearing in the developmental trajectories of the weak signals field.
6.1.3. The development stage: from 2015 to now
In the development stage, Kim and Lee [46] preferred a futuristic data, a future-oriented segment of online data, to attenuate the bias caused by the randomness of data sources. Two characteristics, high rarity and paradigm unrelatedness, are operationalized in the study by Kim and Lee [46] with the help of Local Outlier Factor (LOF) method. Then, Griol-Barres et al. [42] provided a feasible and comprehensive way of integrating text mining and natural language processing techniques by applying to different kinds of large documents to discover weak signals. Cwik et al. [43] suggested an enhancing model of the weak signals’ visibility to deal with upcoming threats and risks. Apart from text mining and natural language processing techniques, a semi-automatic solution involving a conceptual system was recently introduced on the basis of ontologies about organizational knowledge in Garcia-Nunes and da Silva [44]. van Veen et al. [45] discussed several ways to reduce signal loss and obtain more critical signals in the selective process of perceptual filter from the perspective of managers, as shown in Table 6.
6.2. Education
In this subsection, we analyse the literature on the education dataset (Table 7). Top one main path is taken as the developmental trajectory of the education field, as shown in Figure 13. Surprisingly, the same trajectory is captured by each edge weighting approach and its node weighting counterpart, such as intermediacy-in versus intermediacy and SPNP versus SPNP-Liu. Very similar but not identical trajectories can be observed on the weak signals dataset by comparing Figures 11 and 12. This indicates that it does not seem to matter whether edge or node weighting method is utilised in practical applications.
A list of publications appearing in the developmental trajectories of the education field.

The developmental trajectories of the education field discovered by node weighting approaches: (a) intermediacy-in, (b) SPNP, (c) intermediacy-in + SPNP; and by edge weighting approaches: (d) intermediacy, (e) SPNP-Liu and (f) intermediacy + SPNP-Liu.
From the perspective of the number of publications, the trajectory from main path analysis covers the most publications (21 nodes), followed by our approach (20 nodes) and the trajectory from intermediacy involves the fewest articles (16 nodes). This again validates empirically the observation by Šubelj et al. [10]: main path analysis armed with the global search strategy preferred to longer citation paths rather than shorter ones. On closer observation, it can be seen that the trajectory from our approach actually consists of the articles in the trajectories from main path analysis and intermediacy approaches (namely, [47–49,51,53,56,57,59] and [70,73–76,78,79,81,82]) as well as three unseen publications (namely, [61,65,69]). Hence, our method can be viewed as a tradeoff between main path analysis and intermediacy.
More specifically, the first half of trajectory [50,52,54,55,58] discovered by the intermediacy focuses on the improvement of teachers themselves, followed by a discussion of the reflection on the development of citizenship education in different countries [63,64,68]. These publications are not covered by main path analysis and our approach. Many scholarly articles appearing in our trajectory but missing from that by main path analysis provide a critical perspective or propose new approaches, such as [61,65,69,70,73,74], which greatly contribute to knowledge dissemination. The scientific publications missing from our trajectory but appearing in that by main path analysis provide more explanations on the existing phenomenon [60,62,66,67,71,72]. In the following, the developmental trajectory from our approach will be described at length.
Since the 1980s, comparative education had been discussed in many developing countries, including Colombia and Tanzania [48]. However, education aspirations in this period did not coincide with labour needs. Thereupon, diversified academic curriculums by combining with prevocational subjects emerged [47] to improve the educational policy-making and planning rationally [49]. However, the road trying to extend a general curriculum through vocationalisation is not smooth [53]. To tackle this problem, Adams [51] provided two basic modes, the interactive and rational modes of educational planning. Psacharopoulos [53] suggested that the specific plan might be used to implement and put forward several substantive issues and detailed suggestions.
After the 1990s, education research was subdivided into several research topics, such as the relationship between culture and class [56], the modernity and post-modernity [57], the transitologies between historical and future perspective [59], the globalisation and internationalisation impact [61], the global interconnectivity and transnationalism impact [65], international educational transfer [69,73] and global convergence of educational and cultural worlds [70]. In this period, the qualitative research method of educational ethnography observations benefitting from postmodernism has been widely explored [74].
With the gradual increase in the degree of internationalisation of education, the role of international organisations in testing the quality of education has become more and more important. In recent years, cross-country comparative research represented by academic performance measurement has been realised and developed rapidly. The Programme for International Student Assessment (PISA), as a world premier yardstick to measure the performance of students from different countries, was an important move towards international education comparative research [76]. The PISA promoted high-performing education reform since Auld and Morris [75] provided a new paradigm to overcome its inherent issue. However, global recognition of the new PISA remained divisive, especially in low- and middle-income countries [79]. In recent piece, Komatsu and Rappleye [76] provided critical discussion on whether PISA scores proceed to the economic growth, because of its flawed statistics. Grey and Morris [78] suggested that media can be served as an intermediate to promote the development of PISA. In addition, the spate of robust discussion on the decolonise theories [81] and on the diaspora scholarship and theorisation in modern society has been surging across the education field [82].
7. Discussions and conclusion
It is well-documented that the historical sequence of science can be discovered through analysing the citation network among scientific publications [1]. As the citation network in a focal field becomes larger and more complicated, it is very difficult to highlight the structural backbone of knowledge flows. For this purpose, Hummon and Dereian [2] raised the main path analysis approach on the basis of citation links, and Batagelj [8] put forward an efficient algorithm for determining the edge weights. Since then, this approach has been widely utilised to uncover the developmental trajectory in an interested domain.
Recently, Šubelj et al. [10] found that main path analysis approach armed with global search strategy favoured longer citation paths rather than shorter ones. This characteristic may enable the theme of documents along a same main path not to be so coherent [11,12]. Thereupon, a new measure, named as intermediacy, was proposed by Šubelj et al. [10] for recognising important scientific publications rather than important citation links. However, the intermediacy is only applicable to the citation network with one single target node and one single source node, which greatly limits the scope of applications of this measure. Furthermore, no guideline can be followed to deal with this situation by Šubelj et al. [10].
For the sake of loosening this limitation and alleviating the undesirable characteristic of conventional main path analysis, a novel developmental trajectory method is raised in this study through edge and node integrated modes. In more details, a citation network is transformed to its standard form by extending the set of nodes with a common target and a common source, and by adding the corresponding edges. Then, the node importance and link importance are calculated separately and combined by a trade-off hyper-parameter. Finally, top main paths are extracted with priority first search algorithm to form the developmental trajectory. In this way, the trajectory uncovered by our alternative approach is able to benefit simultaneously from the characteristics of important citation links and important nodes. It is worth noting that our integration strategy is also applicable to the variants of main path analysis in Subsection 2.1. In the end, to illustrate the feasibility and scalability of our method, the fields of weak signals and education are taken as two case studies.
In what follows, several conclusions can be drawn from our case studies. (1) Although the edge and node integrated modes are involved in our framework, similar developmental trajectories can be obtained through these two integrated modes. (2) The intermediacy seems to follow a very different idea from the conventional centralities and traversal counting approaches, which enables the intermediacy and its edge weighting counterparts to become an appealing component in our integrated framework. (3) Since two edge/node weighting approaches with the weakest correlation are combined in our framework, the discovered trajectory encodes richer implications than those from main path analysis and intermediacy, especially on the education dataset. (4) Our framework expands the scope of application of the intermediacy and is able to benefit from citation-based traversal count and intermediacy simultaneously. In the meanwhile, our framework is able to scale very well to a large citation network.
However, our work is still subject to limitations. There is no rule of thumb on how to determine the trade-off hyper-parameter and the number of top main paths. In the near future, we will design an indicator from the perspective of network structure of developmental trajectory, so that this parameter can be tuned according to this indicator or prior knowledge from the users. In addition, a scientific verification of our methodology still needs to be further investigated in our next work.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants from the National Natural Science Foundation of China under grant numbers 72074014 and 72004012.
