Sage Journals: Discover world-class research

Abstract

As a widely used technique for discovering developmental trajectory of a specific field of science and technology, main path analysis armed with global search strategy prefers longer citation paths rather than shorter ones. An obvious feature of longer main paths is that the theme of documents may not be so coherent, though longer paths may provide more details on the development of a field than shorter ones. Thereupon, a new measure, named as intermediacy, was proposed in the literature for recognising important scientific publications. However, the intermediacy is only applicable to the citation network with one single target node and one single source node. For purpose of loosening this limitation of the intermediacy and benefitting from main path analysis and intermediacy, this work raises an alternative approach for discovering developmental trajectory by combining node importance and edge importance via edge and node integrated modes. Extensive experimental results on the weak signals and education fields indicate that similar trajectories can be obtained through these two integrated modes, and richer implications can be encoded in our discovered trajectories than those from main path analysis and intermediacy. In addition, our framework is able to scale very well to a large citation network.

Keywords

Developmental trajectory discovery education integrated framework intermediacy main path analysis traversal count weak signals

1. Introduction

As early as 1964, Garfield et al. [1] observed that historical context of science could be represented by a series of important events in chronological order, so they proposed to trace the sequential connection of key articles in a citation network. Since then, many scholars were dedicated to cherry-picking the critical documents that played historically important roles in the process of knowledge diffusion [2 –4]. However, as the citation network in a focal field becomes larger and more complicated, it is not trivial to identify important nodes. Mapping citation networks and highlighting structural backbone of a developing scientific field is an effective way to describe the developmental trajectory [5], explore the flow of scientific knowledge [5,6] and conduct a literature review [7].

As a representative approach in this direction, the main path analysis, pioneered by Hummon and Dereian [2], takes the citation links in a citation network as the knowledge diffusion channels between scientific publications. The citation-based traversal counts [2,8] are utilised to measure the resulting knowledge significance. Then, one can obtain a skeleton structure from the citation network of an interested domain through a forward/backward/two-way local/global search strategy [9]. This skeleton structure is usually referred to as the main paths, through which important developments in a particular field can be understood.

However, Šubelj et al. [10] found that main path analysis armed with the global search strategy preferred to longer citation paths rather than shorter ones. An obvious feature of longer main paths is that the theme of documents may not be so coherent [11,12], though longer paths may provide more details on the development of a field than shorter ones. Thereupon, a new measure, named as intermediacy, was put forward by Šubelj et al. [10] for recognising important scientific publications that connected an older scholarly article and a more recent one in a citation network. The older article and the more recent one are called the target and the source, respectively.¹ The publications with a high intermediacy seem to play a major role in the historical development from the target to the source.

In this way, intermediacy seems be very similar to centrality [3,13]. As a matter of fact, intermediacy is measured relative to a pair of source and target nodes, not relative to a citation network as a whole. To say it in another way, intermediacy is only applicable to the citation network with one single target node and one single source node. If the nodes that are cited while referring to no other nodes are viewed as targets, and the nodes that are referred to while citing to no nodes as sources, Šubelj et al. [10] did not provide any guideline on how to deal with this situation. To loosen this limitation about multiple target nodes and multiple source nodes and combine the benefits of citation-based traversal count and intermediacy, this work raises an alternative approach for discovering developmental trajectory with the following main contributions.

A novel approach for tracing scientific developments is proposed by combining main path analysis and intermediacy with node and edge integrated modes.

In contrast to original intermediacy, our approach can be applicable to many real-world situations through transforming a citation network to its standard form.

Discovered developmental trajectories of the fields of weak signals and education show that our framework can serve as an alternative to main path analysis.

The remainder of this article is organised as follows. After main path analysis and intermediacy are briefly reviewed in Section 2, a framework for tracing knowledge development is put forward in Section 3, and a running example is also illustrated for ease of understanding in this section. The datasets pertaining to weak signals and education are collected with different search strategies from the Scopus and Web of Science databases, respectively, in Section 4 and parameter settings are also described in this section. To determine which edge and node weighting approaches are combined and to obtain valuable insights on the relationship between these weighting approaches, Section 5 conducts the correlation analysis. Then, Section 6 compares comprehensively developmental trajectories of weak signals and education fields from main path analysis, intermediacy and our approach. Section 7 concludes this work with possible limitations of our work and next studies.

2. Related work

Before delving into more specifies, discussion of the literature pertinent to main path analysis and intermediacy is in order.

2.1. Main path analysis

The main path analysis, pioneered by Hummon and Dereian [2], pays more attention to the importance of links. The main paths consisting of important links reflect the main knowledge trajectory of a targeted domain. Since a citation network is actually a directed acyclic graph, global or local priority first search algorithm [14] is usually utilised to extract these main paths. To measure the importance, the following traversal weighting methods are proposed in the literature: Node Pair Projection Count (NPPC) [2], Search Path Link Count (SPLC) [2], Search Path Node Pair (SPNP) [2], Search Path Count (SPC) [8] and so on.

Among these weighting methods, the NPPC method only considers the reachability of different nodes. The SPLC method counts the node pairs passing through a link when considering different search paths, but the destination must be one of the sink nodes. The SPNP approach calculates the number of node pairs passing through any link when considering different search paths. The SPC approach measures the total times that a citation link is travelled if one runs through all the possible citation paths from all the sources. Because of its high computational-complexity, the NPPC method is rarely used in real-world applications. Liu et al. [6] analysed the commonality and specialty among SPNP, SPLC and SPC with a messenger-and-tollway analogy, and concluded that the SPLC was closer to the actual situation of knowledge diffusion.

Intuitively, if these edge weights reflect the amount of knowledge flow and each node produces a piece of knowledge, the total inflow into an intermediate node (i.e. nodes with non-zero in-degree and non-zero out-degree) should be less than the total outflow out of that node. Naturally, as for any intermediate node, its weighted in-degree (WiD) (i.e. the sum of all incident arc weights) and weighted out-degree (WoD) (i.e. the sum of all outgoing arc weights) can reflect the total knowledge inflow and knowledge outflow, respectively. Thereupon, Kuan [15] re-investigated these three popular traversal counting methods by comparing their WiDs and WoDs. An intermediate node under the SPC was found to always have identical WiD and WoD [15], which is in line with the observation on following Kirchhoff’s node law [8]. Under the SPLC, the WiD of an intermediate node is always less than its WoD, but its WiD under the SPNP may be less than, greater than or equal to its WoD [15]. Therefore, Kuan [15] argued that the SPNP should not be dismissed so easily.

Several variations of main path analysis are also put forward for different situations. Jiang et al. [16] developed a solution for the cycles in a citation network. It is well known that all references may not have the same value to an interested citing publication [17,18]. On the basis of full texts, Hao [19] armed the main path analysis with the ability of important citation identification. Another well-known phenomenon is that when knowledge flows via the citation chain, the strength decays. To deal with this case, Liu and Kuan [20] raised a decay version for main path analysis. In addition, Liu and Lu [9] integrated multiple variants. As for search strategy, forward search from sources to sinks [2], backward search from sinks to sources [9] and bi-directional search from key routes [9] are involved in the literature.

2.2. Intermediacy

Different from main path analysis, intermediacy [10] emphasises the measurement on the importance of nodes rather than that of edges. In more details, given a target node $t$ and a source one $s$ , a sub-network, $G = (V, E)$ with the set of nodes $V$ and the set of edges $E$ , connecting from the target to the source is extracted from an interested citation network. Thus, this sub-network is composed of multiple target–source paths. Before illustrating how to calculate the intermediacy, a switch, active/inactive, is introduced in the first place. If all edges on a path from $u \in V$ to $v \in V$ are active, the path is active, denoted as $X_{u, v} = 1$ . Otherwise, the path is inactive, denoted as $X_{u, v} = 0$ . If a node $v \in V$ is located on an active target–source path, the node is active, represented as $X_{s, t} (v) = 1$ . Otherwise, the node is inactive, represented as $X_{s, t} (v) = 0$ .

In this way, the probability that there exists an active target–source path, which passes the node $v \in V$ is $ϕ_{v} = ⪻ (X_{s, t} (v) = 1) = ⪻ (X_{s, v} = 1) ⪻ (X_{v, t} = 1)$ . This probability is exactly the probability that the node $v \in V$ is active, namely the intermediacy of the node $v$ . On closer examination, it is not difficult to see that the key to intermediacy is how to efficiently calculate the probability that there is a path between two nodes in a network. As a matter of fact, this is known as the network reliability problem, which is NP-hard [21]. In other words, it is impossible to find a polynomial-time algorithm. Therefore, a Monte Carlo algorithm was proposed by Šubelj et al. [10] to approximate the intermediacy. Please refer to the study by Šubelj et al. [10] for more details.

It is worth mentioning that each edge in the network $G$ is attached with a same probability $p \in (0, 1)$ , and the intermediacy depends largely on this parameter. However, two intuitively desirable properties of intermediacy, referred to as path addition and path contraction, are independent of this parameter. It is the path contraction property that enables the intermediacy to tend to favour shorter paths over longer ones. This is a fundamental difference between intermediacy and main path analysis. In addition, the requirements of one single target node and one single source one severely limit the scope of applications of intermediacy, but main path analysis does not have this limitation. The complementarity of these two methods inspires this study.

3. Research framework and methodology

From Section 2, it can be readily seen that the traversal counting approaches, such as SPLC [2], SPNP [2] and SPC [8], actually measure the importance of each edge, and intermediacy [10] measures the importance of each node. For purpose of integrating them, one can come up with the following two modes. (1) The intermediacy is converted to the edge weighting counterpart, and then combined with a focal traversal counting approach, as shown by blue lines in Figure 1. (2) A focal traversal counting approach is converted to the node weighting counterpart, and then combined with the intermediacy, as shown by red lines in Figure 1. The former is referred to as edge integrated mode, and the latter as node integrated mode in this study. In what follows, each phase in Figure 1 is described at length. Before this, to facilitate understanding our methodology, the citation network illustrated in Figure 2(a) is taken as a running example in this section. In this citation network, each node denotes a document, and the cited node points to the citing one, indicating the fact that knowledge in the cited document flows to the citing one.

Figure 1.

Research framework for discovering developmental trajectory by integrating main path analysis and intermediacy. Note that the SPC-WiD, SPC-WoD and SPC-WxD are equivalent to each other [15].

Figure 2.

A running example of the citation network. (a) An original citation network, (b) the citation network in standard form, (c) the citation network with intermediacy to weight the nodes, (d) the citation network with normalised intermediacy to weight the nodes, (e) the citation network with intermediacy – both to weight the edges, (f) the citation network with SPLC to weight the edges, (g) the citation network with normalised SPLC to weight the edges, (h) the citation network with SPLC-WxD to weight the nodes and (i) the citation network with normalised SPLC-WxD to weight the nodes.

3.1. Transform to standard form

As mentioned in Section 1, the intermediacy is only applicable for the citation network with one single target and one single source. To loosen this limit, we transform a citation network to its standard form [8] by extending the set of nodes with a common target $t$ and a common source $s$ , and by adding the corresponding edges.² In more details, for convenience, this article denotes the nodes with zero in-degree but non-zero out-degree as $V^{out}$ , and those with zero out-degree but non-zero in-degree as $V^{in}$ . The following edges are added to the original citation network: the edges from the target $t$ to each node in $V^{out}$ , and those from each node in $V^{in}$ to the source $s$ . Thus, one can convert a citation network to its counterpart with a single target node and a single source node. Furthermore, the developmental trajectory discovered from this transformed network is equivalent to that from the original citation network [8]. As for the citation network in Figure 2(a), $V^{out} = {1, 2}$ , $V^{in} = {17, 18}$ , and the standard form is shown in Figure 2(b).

3.2. Calculate intermediacy and convert to edge weights

By following the Monte Carlo algorithm in the study by Šubelj et al. [10], one can readily calculate the intermediacy for each node in the transformed citation network, as depicted in Figure 2(c). Although the intermediacy value reflects the importance of the corresponding node from the target $t$ to the source $s$ , Šubelj et al. [10] claimed that the intermediacy values were most useful from an ordinal perspective. To say it in another way, it is the rank of each node in term of intermediacy value that matters rather than its absolute value. Therefore, the weight of each node is re-assigned in this study with harmonic counting scheme [22,23] based on ranking of intermediacy values. More specifically, let $r_{v} (1 \leq v \leq n)$ denote the rank of the node $v$ in a given citation network with $n$ nodes. The weight of the node $v$ can be formally expressed as follows

interm (v) = \frac{\frac{1}{r_{v}}}{\sum_{v' = 1}^{n} \frac{1}{r_{v'}}}

(1)

To facilitate subsequent integration with a focal traversal counting method, one can convert the intermediacy of each node to the weights of its incident edges (intermediacy-in for short) or outgoing edges (intermediacy-out for short). Of course, each edge can be also weighted as the average of the intermediacies of head and tail nodes (intermediacy-both for short). It is not difficult to see that the first conversion method assigns a same weight to the incident edges of the source $s$ , and the second one allocates a same weight to the outgoing edges of the target $t$ . In other words, the source $s$ in the first method and the target $t$ in the second method do not actually contribute the developmental trajectory at all. The intermediacy-both is similar to the way how Kuan [15] weighted a node by its incident and outgoing edges. Figure 2(e) illustrates the citation network with intermediacy-both to weight the edges.

3.3. Calculate edge weights and convert to node weights

In the literature, there are many approaches for estimating the importance of each citation link (cf. Subsection 2.1). Here, the SPLC is taken as a representative of traversal counting methods, as shown in Figure 2(f). To directly estimate the importance of each node from these edge weights, Liu et al. [6] proposed that the importance of each node could be measured as the average of incident and outgoing edge weights. For narrative convenience, a suffix ‘-Liu’ is appended to each traversal counting method, such as SPC-Liu, SPLC-Liu and SPNP-Liu in Figure 1.

In addition, each node can also be weighted by following the idea of a focal traversal counting method, for example, SPLC. Kuan [15] developed two counterparts for weighting each node through the WiDs and WoDs of its preceding nodes and succeeding ones, referred to SPLC-WiD and SPLC-WoD (cf. Figure 1). Another counterpart is to allocate the node weight by averaging its SPLC-WiD and SPLC-WoD, which was named as SPLC-WxD in the study by Kuan [15] (cf. Figure 1). Thus, for each traversal counting method, three node weighting alternatives can be obtained, though Kuan [15] suggested that the SPLC and SPNP should be preferred in real-world applications. Here, Figure 2(h) depicts the citation network with SPLC-WxD to weight the nodes.

It is noteworthy that since the SPC follows Kirchhoff’s node rule [8,15], three SPC-based counterparts for node weights are actually equivalent to each other. Therefore, hereinafter, the SPC-WxD is chosen to collectively refer to these three methods.

3.4. Normalise edge/node weights

Since different edge/node weighting approaches follow different calculation rules (cf. Subsection 3.2 and 3.3), it is very possible that their values may be in different magnitude. Hence, before combining them, the resulting weights should be linearly normalised with equations (2) and (3) to the interval $[\underline{ℓ}, ℓ]$ $(0 < \underline{ℓ} < \bar{ℓ})$ . It is worth mentioning that a non-zero lower bound $\underline{ℓ}$ can prevent the nodes/edges with minimum value from being removed from a citation network

interm' (\cdot) = (\bar{ℓ} - \underline{ℓ}) \times \frac{interm (\cdot) - mi n_{interm}}{ma x_{interm} - mi n_{interm}} + \underline{ℓ}

(2)

traversal' (\cdot) = (\bar{ℓ} - \underline{ℓ}) \times \frac{traversal (\cdot) - mi n_{traversal}}{ma x_{traversal} - mi n_{traversal}} + \underline{ℓ}

(3)

Here, $mi n_{interm}$ and $ma x_{interm}$ denote the minimum and maximum values of all intermediacy-based node/edge weights, respectively; $mi n_{traversal}$ and $ma x_{traversal}$ are respective minimum and maximum values of all traversal counting–based node/edge weights. After normalisation with $\underline{ℓ} = 0.01$ and $\bar{ℓ} = 0.99$ , Figure 2(c) can be transformed to Figure 2(d), Figure 2(f) to Figure 2(g) and Figure 2(h) to Figure 2(i).

3.5. Combine edge/node edges

From Figure 1, one can see that multiple edge and node weighting approaches are simultaneously taken into consideration in this study. If we want, all edge/node weight calculation approaches can be combined by means of linear weighting. However, this work aims to discover developmental trajectory by benefitting from main path analysis and intermediacy. Hence, the intermediacy-based method is just integrated with the traversal counting–based one in this study with equations (4) and (5)

weight (e) = α * inter m^{'} (e) + (1 - α) * traversa l^{'} (e)

(4)

weight (v) = β * inter m^{'} (v) + (1 - β) * traversa l^{'} (v)

(5)

Here, $α \in [0, 1]$ and $β \in [0, 1]$ . To highlight the distinction between edge and node integrated modes, two different hyper-parameters $α$ and $β$ are utilised to control the tradeoff between two different weight calculation methods. When $α = 1$ or $β = 1$ , the intermediacy-based weighting approach is in action (cf. equation (2)). The traversal counting–based weighting approach (cf. equation (3)) is delivered if $α = 0$ or $β = 0$ .

3.6. Priority first search

Once the edge/node weights of a citation network are ready, many priority first search algorithms [14] can be used, such as forward search from targets to sources [2], backward search from sources to targets [9], and bi-directional search from key routes [5,7,9] and so on. For simplicity, only forward search from targets to sources is utilised in this work. To better understand the effect of edge and node integrated modes, several candidate values of $α$ (cf. the first column in Table 1) and $β$ (cf. the first column in Table 2) are considered here. Top main paths in terms of the sum of edge/node weights are reported in Figure 3 after removing the common target $t$ and the common source $s$ . Note that the algorithm in the study by Eppstein [24] is utilised here to extract several top main paths from a weighted directed network.

Table 1.

The resulting ranks for top four main paths by combing edge weights.

$α$	(a)	(b)	(c)	(d)	(e)	(f)
0.00	3	2	1		4
(0.00,0.19]	3	2	1		4
(0.19,0.29]	4	2	1		3
(0.29,0.61]	4	3	1		2
(0.61,0.81]		3	1	4	2
(0.81,0.83]			1	3	2	4
(0.83,1.00]			2	4	1	3

Table 2.

The resulting ranks for top four main paths by combining node weights.

$β$	(a)	(b)	(c)	(d)	(e)	(f)
0.00	4	2	1		3
(0.00,0.19]	4	2	1		3
(0.19,0.52]	4	3	1		2
(0.52,0.79]		3	1	4	2
(0.79,0.81]		4	1	3	2
(0.81,0.84]			1	3	2	4
(0.84,1.00]			2	4	1	3

Figure 3.

Top main paths sorted by the sum of edge weights (a), (b), (c), (d), (e), and (f) correspond to the main paths in the column in Table 1 and Table 2.

The corresponding ranks with different values for $α$ are reported in Table 1. When $α = 0.00$ , only the SPLC method is at work, the rank of top four paths is as follows: (c), (b), (a) and (e). At the other extreme (namely, $α = 1.0$ ), the intermediacy-both method is at play, top four paths are ranked as below: (e), (c), (f) and (d). When $α$ varies from 0.00 to 0.61, the rank of the longest main path (a) changes from the third to the fourth, and a shorter main path (e) from the fourth to the second. When $α$ varies from 0.62 to 0.81, the main path (a) disappears from top four paths, the position of another main path (d) is promoted from the fourth to the third. When $α$ varies from 0.81 to 1.00, the positions of main paths (e) and (f) are exchanged with those of main paths (c) and (d), respectively.

Table 2 illustrates the resulting ranks for these paths with different values for $β$ . Similar phenomena can be observed from Table 2. When $β = 0.00$ , only the SPLC-WxD approach is in action, the rank of top four paths is in the following order: (c), (b), (e) and (a). At the other extreme (i.e. $β = 1.0$ ), the intermediacy is in role, top four paths are ranked as below: (e), (c), (f) and (d). As the hyper-parameter $β$ increases, the longest main path (a) gradually falls out of top four ones, and two shorter main paths (d) and (f) appear in top four ones. In addition, the position of main path (e) is promoted progressively from the third to the first.

By comparing these situations, one can conclude that the intermediacy-based weighting method favours shorter paths, while the traversal counts–based weighting method favours longer paths. This observation is in line with the study by Šubelj et al. [10].

In the end, similar to the study by Verspagen [25], one can merge multiple top main paths to form a developmental trajectory of an interested domain. If top four main paths are merged, three different developmental trajectories can be obtained from the citation network in Figure 2(a), as shown in Figure 4. It seems to have nothing to do with whichever integrated mode is utilised. These three trajectories correspond to different values of the tradeoff hyper-parameter $α$ and $β$ . From Figure 4, two interesting phenomena can be observed as follows. (1) Several main paths, such as (c) and (e) in Figure 3, are shared by these three trajectories, and a longer path is preferred when the parameter $α \leq 0.61$ or $β \leq 0.52$ (cf. left branches of each panel in Figure 4). (2) If $α$ or $β$ assumes a value in the interval (0.0, 1.0), the discovered developmental trajectory reflects simultaneously the characteristics of topology structure and the importance of nodes. However, the trajectories seem to be insensitive to these hyper-parameters.

Figure 4.

The developmental trajectory of the citation network in Figure 2(a) (a) $α \in [0.00, 0.61]$ or β∈ [0.00,0.52],(b) $α \in (0.61, 0.81]$ or β∈ (0.52,0.81] and (c) $α \in (0.81, 1.00]$ or β∈ (0.81,1.00].

4. Datasets and parameter setting

Two bibliographic datasets are utilised in this study to empirically evaluate our framework (cf. Figure 1). The first dataset only consists of 204 scientific publications about weak signals, which is collected according to a designed query from the Scopus database. We also collect 7424 scholarly articles in eight journals in the field of education from the Web of Science database. These two datasets are obtained by following different strategies (query-based strategy for the first dataset and journal-based strategy for the second dataset), and different scales of documents are involved in these two datasets. In this way, the robustness and scalability of our framework can be evaluated comprehensively.

4.1. Weak signals

To find the publications related to weak signals research and ensure the quality of the analysis, we collected publications from the Scopus database on 20 August 2020 from the library of Beijing University of Technology. The following search strategy is adopted here: TITLE-ABS-KEY (‘weak sign*’OR‘horizon scan*’OR‘environmental scan*’OR‘seed* of change’OR‘wild card*’OR‘black swan*’OR‘early warning sign*’OR‘future sign*’OR‘emerging sign*’OR‘anticipation of the future’OR‘strategic surprise’). The publication year spans from 1975 to 2020, and the document type is limited to article, review, and proceedings paper. The research area includes computer science, social sciences, business, management and accounting, economics, econometrics and finance, decision sciences.

The reason that this case is selected is that we are well acquainted with the topic. To make sure that all publications are pertaining to weak signals, scholarly articles in the retrieved results are checked carefully one by one by reading the corresponding titles and abstracts. In the process of manual screening and inspection, several important articles are found to miss from our original dataset. We manually add them with their references to the dataset. Finally, 204 articles are included in this dataset for further analysis. The distribution of number of scholarly articles over year is illustrated in Figure 5. It can be seen from Figure 5 that the number of articles increased at a prolonged rate before 2012, but after 2012, the number of publications increased rapidly. This shows that this field has been receiving more and more attention.

Figure 5.

Distribution of number of publications over year for weak signals dataset.

4.2. Education

According to the opinions of domain experts and 2020 Journal Citation Report, eight journals in the education discipline are chosen, as illustrated in Table 3. The publications in these journals were collected from the Web of Science database on 26 August 2021 from the library of Peking University. The document type is limited to Article, Article; Early Access, Article; Proceedings Paper, Database Review, Reprint, and Review, and publication year spans from 1983 to 2020. In the end, the number of scholarly articles is 7424, and Figure 6 reports the number distribution of publications in each journal over years. In addition, to construct the citation network, the Digital Object Identifier (DOI) of each cited reference is further cleaned with the method given in the study by Xu et al. [26].

Table 3.

The statistics of eight journals in the education discipline.

Journal	ISSN	Impact factor	Year range	No. ofpublications	Average no. ofarticles per year
Review of Educational Research	0034-6543	12.565	1983–2020	706	18.579
British Journal of Educational Technology	0007-1013	4.929	1983–2020	1756	46.211
American Educational Research Journal	0002-8312	4.811	1983–2020	367	9.658
Harvard Educational Review	0017-8055	2.935	1983–2020	513	13.500
British Educational Research Journal	0141-1926	2.752	2000–2020	888	42.286
Comparative Education	0305-0068	2.453	1983–2020	767	20.184
Oxford Review of Education	0305-4985	2.055	1983–2020	898	23.632
Comparative Education Review	0010-4086	1.896	1983–2020	715	18.816

Figure 6.

The distribution of number of publications over year in the education dataset.

From Figure 6, one can observe an obvious trend of increase in terms of the number of publications for several journals, such as British Journal of Educational Technology and British Educational Research Journal. However, in general, the trend in the number of articles published in most journals is relatively flat. Laakso [27] found that the average number of articles published yearly in each scholarly journal in the Social Sciences and Humanities is roughly between 16 and 18 articles. From Table 3, the majority of journals are close to this average number of articles per year. Two exceptions are American Educational Research Journal and Harvard Educational Review.

4.3. Parameter setting

Three core parameters are involved in our approach: the probability $p$ that an edge is active, the number of top main paths $k$ , and trade-off hyper-parameters $α$ and $β$ . According to suggestions by Šubelj et al. [10], this article takes $p = n / 2 m$ , where $n$ and $m$ denote, respectively, the number of nodes and edges in a citation network. In our case, we have $p \approx 0.11$ and $p \approx 0.18$ for the weak signals dataset and education dataset, respectively.

As for the number of top main paths $k$ , there is not a rule of thumb until now. In our opinion, too greater value will result in a more complicated developmental trajectory, which disables the prominent publications to be highlighted. At another limit, too less value will lead to an overly simple developmental trajectory, from which many prominent publications miss. As for the weak signals dataset, the number of top main paths is chosen from 5 to 20 with a unit step size, and then an exploratory analysis is conducted. When the number of top main paths is fixed to 7 and 6 for edge and node integrated modes, respectively, our expected trajectory including critical publications emerges. As for the education dataset, only top one main path is considered in this study just for demonstrating the difference between intermediacy and main path analysis and the scalability of our approach.

There remain the trade-off hyper-parameters $α$ and $β$ to be discussed. According to our observations, developmental trajectories of a focal domain are not actually insensitive to these two hyper-parameters (cf. Section 4). Therefore, after trial and error, this work fixes these two hyper-parameters to 0.6 and 0.8 in the weak signals and education datasets, respectively.

5. Correlation analysis

Our framework for developmental trajectory discovery involves multiple edge and node weighting approaches (cf. Figure 1). Intuitively, it should be better to combine two weighting methods with a low correlation so that the discovered developmental trajectory can encode richer implications, such as the characteristics of topology structure and the importance of nodes. Therefore, to gain some valuable insight of the relationship among these weighting approaches, the Spearman’s rank correlation coefficients are calculated in this section on the weak signals and education datasets, respectively, as depicted in Figures 7 –10. Apart from the weighting approaches in Figure 1, three groups of centrality based measures, including degree centrality, closeness centrality and betweenness centrality, are also taken into considerations. In this way, there are 15 edge weighting approaches and 14 node weighting ones in total.

Figure 7.

The Spearman’s rank correlation coefficients among edge weighting approaches on the weak signals dataset.

Figure 8.

The Spearman’s rank correlation coefficients among edge weighting approaches on the education dataset.

Figure 9.

The Spearman’s rank correlation coefficients among node weighting approaches on the weak signals dataset.

Figure 10.

The Spearman’s rank correlation coefficients among node weighting approaches on the education dataset.

In addition, the average ranks of edge and node weighting approaches on the weak signals and education datasets are reported, respectively, in Tables 4 and 5. For the sake of calculating the average rank of each weighting approach, a rank is assigned to each approach in advance according to the correlation coefficients between this approach and the others (cf. each row/column in Figures 7 –10). For example, the intermediacy-in, intermediacy-out and intermediacy-both are ranked second, fourth and third in the first column of Figure 7 and the intermediacy is ranked second in the first column of Figure 9. After this operation, one can average the resulting ranks of each approach to obtain its average rank. In fact, this average rank can be viewed as an indicator to measure the degree of difference of an approach with the others. The smaller the average rank, more is the disagreement with the others.

Table 4.

The average ranks of edge weighting approaches on the weak signals and education datasets. The best edge weighting approach in term of the average rank is shown in bold.

	Weak signals	Education
SPC	8.500	7.714
SPLC	8.286	7.500
SPNP	8.143	7.500
Intermediacy-out	7.071	4.500
Intermediacy-in	5.357	3.786
Intermediacy-both	7.500	5.857
Degree-out	6.000	7.214
Degree-in	8.071	7.786
Degree-both	8.857	10.786
Betweenness-out	6.929	6.571
Betweenness-in	5.929	6.071
Betweenness-both	8.571	10.214
Closeness-out	6.714	8.571
Closeness-in	7.357	8.571
Closeness-both	9.214	9.857

SPC: search path count; SPLC: search path link count; SPNP: search path node pair.

Table 5.

The average ranks of node weighting approaches on the weak signals and education datasets. The best node weighting approach in term of the average rank is shown in bold.

	Weak signals	Education
SPC-WoD	9.615	9.923
SPC-Liu	7.923	7.538
SPLC-WiD	5.846	6.385
SPLC-WoD	4.538	5.923
SPLC-WxD	10.308	10.462
SPLC-Liu	8.692	8.231
SPNP-WiD	6.615	7.385
SPNP-WoD	3.769	5.308
SPNP-WxD	10.538	10.462
SPNP-Liu	8.538	8.308
Intermediacy	1.846	1.000
Degree	8.077	6.308
Closeness	7.538	5.077
Betweenness	4.154	5.692

SPC: search path count; SPLC: search path link count; SPNP: search path node pair.

From Figures 7 and 8, several interesting phenomena can be observed as follows. (1) As expected, the SPC, SPLC and SPNP are highly correlated with one another. In our opinion, this should be attributed to a similar traversal counting idea followed by them. It is this high correlation that enables main path analysis based on these traversal counting approach to often produce almost the same results [8,28]. (2) As for three edge weighting counterparts of each node weighting approach, the both-variant shows a strong correlation with the out-variant, and a weak correlation with the in-variant. It seems no relationship between the out-variant and the in-variant. Let’s take intermediacy as an example. The correlation coefficients among its edge weighting counterparts on the weak signals dataset are 0.835 (intermediacy-both versus intermediacy-out), 0.327 (intermediacy-both versus intermediacy-in) and −0.094 (intermediacy-out versus intermediacy-in). (3) The traversal counting methods including SPC, SPLC and SPNP show higher correlation with three edge weighting counterparts of betweenness centrality than those of other centralities and intermediacy. On the contrary, the edge weighting counterparts of intermediacy show higher correlation with the traversal counting methods and those of degree and closeness centralities.

As for node weighting approaches, several prominent observations can also be summarised from Figures 9 and 10. (1) The WxD variant is highly correlated with Liu variant, such as SPC-WxD versus SPC-Liu. (2) The WxD variant shows a strong correlation with the WiD and WoD variants, and the WiD variant shows a weak correlation with the WoD variant. For example, the correlation coefficients among the node weighting counterparts of SPNP on the education dataset are 0.747 (SPNP-WxD versus SPNP-WiD), 0.660 (SPNP-WxD versus SPNP-WoD), and 0.280 (SPNP-WiD versus SPNP-WoD). (3) All WxD/WiD/WoD/Liu variants agree strongly with one another. For instance, one can readily observe a high correlation among SPC-WxD, SPLC-WxD and SPNP-WxD. Again, we argue that this should be put down to a similar idea they follow. (4) The intermediacy shows a higher correlation with degree centrality than with closeness centrality, betweenness centrality and all node weighting counterparts. The centralities show similar weak correlation with all node weighting counterparts.

From the average ranks in Tables 4 and 5, it is very easy to see that the intermediacy-in and intermediacy rank first on the weak signals and education datasets. This indicates that the intermediacy follows a very different idea from traversal counting approaches and centralities. This is the main reason why the intermediacy is chosen as a component of our integrated approach for discovering developmental trajectories. Once a component is determined, another component can be readily determined from the row or column corresponding to the intermediacy-in in Figures 7 and 8 and intermediacy in Figures 9 and 10. As mentioned in the first paragraph of this section, the approach showing the weakest correlation with the intermediacy-in or intermediacy should be preferred. In our case studies, the SPNP is utilised as another component in the edge integrated mode on both weak signals and education datasets, and SPNP-WoD and SPNP-Liu as another component in the node integrated mode on the weak signals and education datasets, respectively.

6. Developmental trajectories

This section presents developmental trajectories discovered by edge and node weighting approaches on the weak signals and education datasets. This serves as empirical illustrations of the use of our approach.

6.1. Weak signals

From Figures 11 and 12, it is not difficult to see that the developmental trajectory discovered by the SPNP and SPNP-WoD contain the most publications, while the trajectories by the intermediacy-in and intermediacy include the fewest. That is to say, the number of nodes in the trajectory by our approach lies between those in the other trajectories. In our opinion, this phenomenon is related to the following characteristics: the main path analysis armed with global search strategy favours the longer paths (the longest path consists of 10 nodes) and the intermediacy favours the shorter ones (the longest path comprises 8 nodes). Therefore, our method can be viewed as a tradeoff between main path analysis and intermediacy. In addition, by comparing Figures 11(c) and 12(c), similar trajectories can be obtained through edge and node integrated modes.

Figure 11.

The developmental trajectories of the weak signals field discovered by edge weighting approaches: (a) intermediacy-in, (b) SPNP and (c) intermediacy-in + SPNP.

Figure 12.

The developmental trajectories of the weak signals field discovered by node weighting approaches: (a) intermediacy, (b) SPNP-WoD and (c) intermediacy + SPNP-WoD.

Another interesting phenomenon is that a divergence–convergence–divergence pattern can be observed from the developmental trajectories by the intermediacy and our approach, but only divergence pattern can be observed from the trajectory by the main path analysis. In other words, some pivot articles, such as Hiltunen [29] and Thorleuchter and Van den Poel [30], are highlighted by the intermediacy and our approach. From practical significance, Hiltunen [29] developed a deeper theoretical understanding of weak signals, and re-defined weak signals with three characteristics: the signal, the issue and the interpretation. This definition lays the underpinning on how to identify (semi-)automatically weak signals. Thorleuchter and Van den Poel [30] raised an idea mining algorithm to recognise relevant textual patterns to solve a given strategic problem, which is very different from standard filtering algorithms in environmental scanning procedures. This algorithm can obtain the patterns with low relevant information content.

In addition, it is noteworthy that the ground-breaking work [31] disappears from the trajectory by the main path analysis and more recent articles are emphasised by the main path analysis. Ansoff [31] introduced the term weak signals, and argued that the chances of strategic surprise can be minimised by coping properly with weak signals. In the initial development of the studies on weak signals, an important role was actually played by Ansoff [31]. Koivisto et al. [32], Lee and Park [33] and Kim et al. [34] discussed three applications of weak signals in the fields of mass transport attacks, ethical issues in artificial intelligence and policy research. These three academic articles did not make much theoretical contribution to the development of weak signals. Surprisingly, a review article by Holopainen and Toivonen [35] appears in the trajectory by the intermediacy. The intermediacy seems to give equal attention to initial and recent stages in terms of edge weights. In summary, our integrated framework can benefit from main path analysis and intermediacy approaches while overcoming their shortcomings.

In what follows, the developmental trajectory of weak signals research through edge integrated mode (cf. Figure 11(c)) will be described in more details. According to the divergence–convergence–divergence pattern, the development of weak signals goes roughly through three distinct stages. In the emergence stage, from weak signals concept and related theories [31] to environmental scanning [36], weak signal fields began to receive extensive attention. In the exploration stage, as a key pivot publication, Hiltunen [29] developed a three-dimensional theoretical model of the future sign, which provided a support for identifying quantitatively weak signals. After that, the studies on weak signals mainly focused on perception framework and identification approaches. From 2015 till now, research on weak signals tends to further improve the recognition approaches.

6.1.1. The emergence stage: from 1975 to 2008

This stage involves three articles. Ansoff [31] first discussed the positive implications of weak signals for managing strategic surprises in strategic discontinuity. The strategic planning process requires clear and specific information to help organisers estimate potential impacts, develop countermeasures and formulate strategic plans at an early stage. However, this information is difficult to obtain, so the managers need to explain and convert weak signals in the environment into concrete actions to handle threats and opportunities. Thomas [36] analysed new techniques of environmental scanning that are closely related to business planning and facilitate the organisers to identify important factors in the environment to develop broad strategic and operational plans. After that, Mendonça et al. [37] suggested weak signals as an early manifestation responding to organisational changes, as shown in Table 6.

6.1.2. The exploration stage: from 2008 to 2015

In the exploration stage, Hiltunen [29] developed a three-dimensional model of future signals from a semiotic perspective due to the lack of a formal definition of weak signals and defined the theoretical framework from three subjective and objective dimensions: the signal, the issue and the interpretation. The weak signals were grouped further into two categories: early information and first symptoms. Early information is a type of weak signal with a small number of signals and low visibility of signals. The first sign is a type of weak signal with a large number of signals and visible but difficult to interpret. Kuosa [38] presented a future signals sense-making framework (FSSF) to perceive emerging futures. The framework includes three levels of weak signals, drives and trends, and divides future knowledge into two broad categories, namely, disruptive information and linearly evolving information. The former disrupts and shocks the existing knowledge, and the latter maintains the existing knowledge.

After that, several weak signal recognition approaches were proposed on the basis of the triadic model of future signals [29]. Yoon [39] utilised text mining to extract keywords and construct a keyword portfolio to identify weak signal topics in online news articles. Thorleuchter and Van den Poel [40] explored text information from Internet with Latent Semantic Indexing (LSI) to identify and analyse weak signals. Then, Thorleuchtera et al. [41] improved the LSI to capture change of weak signals over time. Furthermore, Thorleuchter and Van den Poel [30] raised a novel methodology for idea mining to extract and analyse weak signals (Table 6).

Table 6.

A list of publications appearing in the developmental trajectories of the weak signals field.

Node ID	Publication	Intermediacy	Main path analysis	Our approach
AnsoffH1975	Ansoff [31]	√		√
ThomasP1980	Thomas [36]		√	√
MendoncaS2004	Mendonça et al. [37]		√	√
HiltunenE2008	Hiltunen [29]	√	√	√
KuosaT2010	Kuosa [38]	√	√	√
YoonJ2012	Yoon [39]	√	√	√
HolopainenM2012	Holopainen and Toivonen [35]	√
ThorleuchterD2013	Thorleuchter and van den Poel [40]	√	√	√
ThorleuchterD2014	Thorleuchter et al. [41]	√	√	√
ThorleuchterD2015	Thorleuchter and van den Poel [30]	√	√	√
KoivistoR2016	Koivisto et al. [32]		√
KimJ2017	Kim et al. [46]	√	√	√
LeeY2018	Lee and Park [33]		√
KimH2018	Kim et al. [34]		√
Griol-BarresI2019a	Griol-Barres et al. [42]		√	√
CwikB2019	Cwik et al. [43]		√	√
Garcia-NunesP2019	Garcia-Nunes and da Silva [44]	√	√	√
Van VeenB2019	van Veen et al. [45]		√	√

6.1.3. The development stage: from 2015 to now

In the development stage, Kim and Lee [46] preferred a futuristic data, a future-oriented segment of online data, to attenuate the bias caused by the randomness of data sources. Two characteristics, high rarity and paradigm unrelatedness, are operationalized in the study by Kim and Lee [46] with the help of Local Outlier Factor (LOF) method. Then, Griol-Barres et al. [42] provided a feasible and comprehensive way of integrating text mining and natural language processing techniques by applying to different kinds of large documents to discover weak signals. Cwik et al. [43] suggested an enhancing model of the weak signals’ visibility to deal with upcoming threats and risks. Apart from text mining and natural language processing techniques, a semi-automatic solution involving a conceptual system was recently introduced on the basis of ontologies about organizational knowledge in Garcia-Nunes and da Silva [44]. van Veen et al. [45] discussed several ways to reduce signal loss and obtain more critical signals in the selective process of perceptual filter from the perspective of managers, as shown in Table 6.

6.2. Education

In this subsection, we analyse the literature on the education dataset (Table 7). Top one main path is taken as the developmental trajectory of the education field, as shown in Figure 13. Surprisingly, the same trajectory is captured by each edge weighting approach and its node weighting counterpart, such as intermediacy-in versus intermediacy and SPNP versus SPNP-Liu. Very similar but not identical trajectories can be observed on the weak signals dataset by comparing Figures 11 and 12. This indicates that it does not seem to matter whether edge or node weighting method is utilised in practical applications.

Table 7.

A list of publications appearing in the developmental trajectories of the education field.

Node ID	Publication	Intermediacy	Main path analysis	Our approach
LillisK1983	Lillis and Hogan [47]		√	√
PsacharopoulosG1985	Psacharopoulos and Loxley [48]		√	√
KleesS1986	Klees [49]		√	√
ShulmanL1987	Shulman [50]	√
AdamsD1988	Adams [51]		√	√
BorkoH1989	Borko and Livingston [52]	√
PsacharopoulosG1990	Psacharopoulos [53]		√	√
SabersD1991	Sabres et al. [54]	√
KaganD1992	Kagan [55]	√
WelchA1993	Welch [56]		√	√
CowenR1996	Cowen [57]		√	√
LeatD1999	Leat [58]	√
CowenR2000	Cowen [59]		√	√
KazamiasA2001	Kazamias [60]		√
WelchA2001	Welch [61]			√
NovoaA2003	Novoa and Yariv-Mashal [62]		√
DaviesL2005	Davies and Issitt [63]	√
SuárezD2008	Suárez [64]	√
CarneyS2009	Carney [65]			√
GrekS2009	Grek et al. [66]		√
LingardB2011	Lingard and Rawolle [67]		√
MoonR2011	Moon and Koo [68]	√
RappleyeJ2011	Rappleye et al. [69]			√
CarneyS2012	Carney et al. [70]	√		√
SellarS2013	Sellar and Lingard [71]		√
CrossleyM2014	Crossley [72]		√	√
CowenR2014	Cowen [73]	√	√
SchweisfurthM2014	Schweisfurth [74]	√		√
AuldE2016	Auld and Morris [75]	√	√	√
KomatsuH2017	Komatsu and Rappleye [76]	√	√	√
ElliottJ2018	Elliott et al. [77]	√
GreyS2018	Grey and Morris [78]	√	√	√
AuldE2018	Auld et al. [79]		√	√
DimmockC2019	Dimmock [80]	√
VickersE2020	Vickers [81]		√	√
KimT2020	Kim [82]		√	√

Figure 13.

The developmental trajectories of the education field discovered by node weighting approaches: (a) intermediacy-in, (b) SPNP, (c) intermediacy-in + SPNP; and by edge weighting approaches: (d) intermediacy, (e) SPNP-Liu and (f) intermediacy + SPNP-Liu.

From the perspective of the number of publications, the trajectory from main path analysis covers the most publications (21 nodes), followed by our approach (20 nodes) and the trajectory from intermediacy involves the fewest articles (16 nodes). This again validates empirically the observation by Šubelj et al. [10]: main path analysis armed with the global search strategy preferred to longer citation paths rather than shorter ones. On closer observation, it can be seen that the trajectory from our approach actually consists of the articles in the trajectories from main path analysis and intermediacy approaches (namely, [47 –49,51,53,56,57,59] and [70,73 –76,78,79,81,82]) as well as three unseen publications (namely, [61,65,69]). Hence, our method can be viewed as a tradeoff between main path analysis and intermediacy.

More specifically, the first half of trajectory [50,52,54,55,58] discovered by the intermediacy focuses on the improvement of teachers themselves, followed by a discussion of the reflection on the development of citizenship education in different countries [63,64,68]. These publications are not covered by main path analysis and our approach. Many scholarly articles appearing in our trajectory but missing from that by main path analysis provide a critical perspective or propose new approaches, such as [61,65,69,70,73,74], which greatly contribute to knowledge dissemination. The scientific publications missing from our trajectory but appearing in that by main path analysis provide more explanations on the existing phenomenon [60,62,66,67,71,72]. In the following, the developmental trajectory from our approach will be described at length.

Since the 1980s, comparative education had been discussed in many developing countries, including Colombia and Tanzania [48]. However, education aspirations in this period did not coincide with labour needs. Thereupon, diversified academic curriculums by combining with prevocational subjects emerged [47] to improve the educational policy-making and planning rationally [49]. However, the road trying to extend a general curriculum through vocationalisation is not smooth [53]. To tackle this problem, Adams [51] provided two basic modes, the interactive and rational modes of educational planning. Psacharopoulos [53] suggested that the specific plan might be used to implement and put forward several substantive issues and detailed suggestions.

After the 1990s, education research was subdivided into several research topics, such as the relationship between culture and class [56], the modernity and post-modernity [57], the transitologies between historical and future perspective [59], the globalisation and internationalisation impact [61], the global interconnectivity and transnationalism impact [65], international educational transfer [69,73] and global convergence of educational and cultural worlds [70]. In this period, the qualitative research method of educational ethnography observations benefitting from postmodernism has been widely explored [74].

With the gradual increase in the degree of internationalisation of education, the role of international organisations in testing the quality of education has become more and more important. In recent years, cross-country comparative research represented by academic performance measurement has been realised and developed rapidly. The Programme for International Student Assessment (PISA), as a world premier yardstick to measure the performance of students from different countries, was an important move towards international education comparative research [76]. The PISA promoted high-performing education reform since Auld and Morris [75] provided a new paradigm to overcome its inherent issue. However, global recognition of the new PISA remained divisive, especially in low- and middle-income countries [79]. In recent piece, Komatsu and Rappleye [76] provided critical discussion on whether PISA scores proceed to the economic growth, because of its flawed statistics. Grey and Morris [78] suggested that media can be served as an intermediate to promote the development of PISA. In addition, the spate of robust discussion on the decolonise theories [81] and on the diaspora scholarship and theorisation in modern society has been surging across the education field [82].

7. Discussions and conclusion

It is well-documented that the historical sequence of science can be discovered through analysing the citation network among scientific publications [1]. As the citation network in a focal field becomes larger and more complicated, it is very difficult to highlight the structural backbone of knowledge flows. For this purpose, Hummon and Dereian [2] raised the main path analysis approach on the basis of citation links, and Batagelj [8] put forward an efficient algorithm for determining the edge weights. Since then, this approach has been widely utilised to uncover the developmental trajectory in an interested domain.

Recently, Šubelj et al. [10] found that main path analysis approach armed with global search strategy favoured longer citation paths rather than shorter ones. This characteristic may enable the theme of documents along a same main path not to be so coherent [11,12]. Thereupon, a new measure, named as intermediacy, was proposed by Šubelj et al. [10] for recognising important scientific publications rather than important citation links. However, the intermediacy is only applicable to the citation network with one single target node and one single source node, which greatly limits the scope of applications of this measure. Furthermore, no guideline can be followed to deal with this situation by Šubelj et al. [10].

For the sake of loosening this limitation and alleviating the undesirable characteristic of conventional main path analysis, a novel developmental trajectory method is raised in this study through edge and node integrated modes. In more details, a citation network is transformed to its standard form by extending the set of nodes with a common target and a common source, and by adding the corresponding edges. Then, the node importance and link importance are calculated separately and combined by a trade-off hyper-parameter. Finally, top main paths are extracted with priority first search algorithm to form the developmental trajectory. In this way, the trajectory uncovered by our alternative approach is able to benefit simultaneously from the characteristics of important citation links and important nodes. It is worth noting that our integration strategy is also applicable to the variants of main path analysis in Subsection 2.1. In the end, to illustrate the feasibility and scalability of our method, the fields of weak signals and education are taken as two case studies.

In what follows, several conclusions can be drawn from our case studies. (1) Although the edge and node integrated modes are involved in our framework, similar developmental trajectories can be obtained through these two integrated modes. (2) The intermediacy seems to follow a very different idea from the conventional centralities and traversal counting approaches, which enables the intermediacy and its edge weighting counterparts to become an appealing component in our integrated framework. (3) Since two edge/node weighting approaches with the weakest correlation are combined in our framework, the discovered trajectory encodes richer implications than those from main path analysis and intermediacy, especially on the education dataset. (4) Our framework expands the scope of application of the intermediacy and is able to benefit from citation-based traversal count and intermediacy simultaneously. In the meanwhile, our framework is able to scale very well to a large citation network.

However, our work is still subject to limitations. There is no rule of thumb on how to determine the trade-off hyper-parameter and the number of top main paths. In the near future, we will design an indicator from the perspective of network structure of developmental trajectory, so that this parameter can be tuned according to this indicator or prior knowledge from the users. In addition, a scientific verification of our methodology still needs to be further investigated in our next work.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants from the National Natural Science Foundation of China under grant numbers 72074014 and 72004012.

ORCID iD

Xin An

Notes

References

Garfield

Sher

Torpie

RJ.

The use of citation data in writing the history of science. Philadelphia, PA: Institute for Scientific Information (ISI), 1964.

Hummon

Dereian

Connectivity in a citation network: the development of DNA theory. Soc Networks 1989; 11(1): 39–63.

Leydesdorff

Betweenness centrality as an indicator of the interdisciplinarity of scientific journals. J Am Soc Inf Sci Tec 2007; 58(9): 1303–1319.

Tan

Liu

Mao

et al. AceMap: a novel approach towards displaying relationship among academic literatures. In: Proceedings of the 25th international conference on World Wide Web, Montréal, QC, Canada, 11–15 April 2016, pp. 437–442. New York: ACM.

Park

Magee

CL.

Tracing technological development trajectories: a genetic knowledge persistence-based main path approach. PLoS One 2017; 12(1): e0170895.

Liu

MHC

. A few notes on main path analysis. Scientometrics 2019; 119(1): 379–391.

Hao

et al. Review on emerging research topics with key-route main path analysis. Scientometrics 2020; 122(1): 607–624.

Batagelj

Efficient algorithms for citation network analysis, vol. 41. Ljubljana: Department of Theoretical Computer Science, Institute of Mathematics, Physics and Mechanics, University of Ljubljana, 2003, pp. 1–27.

Liu

LY.

An integrated approach for main path analysis: development of the Hirsch index as an example. J Am Soc Inf Sci Tec 2012; 63(3): 528–542.

10.

Šubelj

Waltman

Traag

et al. Intermediacy of publications. R Soc Open Sci 2020; 7(1): 190207.

11.

Davison

BD.

Topical locality in the Web: experiments and observations. In: Proceedings of the 23rd annual international conference on research and development in information retrieval, Athens, 24–28 July 2000, pp. 272–279. New York: ACM.

12.

Chen

Zhu

et al. A semantic main path analysis method to identify multiple developmental trajectories. J Informetr 2022; 16: 101281.

13.

Liu

Zhai

et al. Overlapping thematic structures extraction with mixed-membership stochastic blockmodel. Scientometrics 2018; 117(1): 61–84.

14.

Korf

RE.

Artificial intelligence search algorithms. In: Atallah

(ed.) Handbook of algorithms and theory of computation. Boca Raton, FL: CRC Press, 1998, pp. 1–20.

15.

Kuan

CH.

Regarding weight assignment algorithms of main path analysis and the conversion of arc weights to node weights. Scientometrics 2020; 124(1): 775–782.

16.

Jiang

Zhu

Chen

Main path analysis on cyclic citation networks. J Assoc Inf Sci Tech 2020; 71(5): 578–595.

17.

Sun

et al. Important citations identification by exploiting generative model into discriminative model. J Inf Sci. Epub ahead of print 7 February 2021. DOI: 10.1177/0165551521991034.

18.

Sun

Important citations identification with semi-supervised classification model. Scientometrics. Epub ahead of print 20 January 2022. DOI: 10.1007/s11192-021-04212-6.

19.

Hao

Research on emerging technology detection based on multi-source heterogeneous information. Master’s Thesis, Beijing University of Technology, Beijing, China, 2021.

20.

Liu

Kuan

CH.

A new approach for main path analysis: decay in knowledge diffusion. J Am Soc Inf Sci Tec 2016; 67(2): 465–476.

21.

Ball

MO.

Complexity of network reliability computations. Networks 1980; 10(2): 153–165.

22.

Hagen

NT.

Harmonic allocation of authorship credit: source-level correction of bibliometric bias assures accurate publication and citation analysis. PLoS One 2008; 3(12): e4021.

23.

Hao

Yang

et al. A topic models based framework for detecting and forecasting emerging technologies. Technol Forecast Soc 2021; 162: 120366.

24.

Eppstein

Finding the k shortest paths. SIAM J Sci Comput 1999; 28(2): 652–673.

25.

Verspagen

Mapping Technological trajectories as patent citation networks: a study on the history of fuel cell research. Adv Complex Syst 2007; 10(1): 93–115.

26.

Hao

et al. Types of DOI errors of cited references in Web of Science with a cleaning method. Scientometrics 2019; 120(3): 1427–1437.

27.

Laakso

Study of the Nordic SSH Journal Publishing Landscape: a report for the Nordic publications committee for humanities and social science periodicals (NOP-HS), 2021, https://www.aka.fi/globalassets/noshs/study-of-the-nordic-ssh-journal-publishing-landscape.pdf

28.

Martinelli

An emerging paradigm or just another trajectory? Understanding the nature of technological changes using engineering heuristics in the telecommunications switching industry. Res Policy 2012; 41(2): 414–429.

29.

Hiltunen

The future sign and its three dimensions. Futures 2008; 40: 247–260.

30.

Thorleuchter

Van den Poel

Idea mining for web-based weak signal detection. Futures 2015; 66: 25–34.

31.

Ansoff

HI.

Managing strategic surprise by response to weak signals. Calif Manage Rev 1975; 18(2): 21–33.

32.

Koivisto

Kulmala

Gotcheva

Weak signals and damage scenarios – systematics to identify weak signals and their sources related to mass transport attacks. Technol Forecast Soc 2016; 104: 180–190.

33.

Lee

Park

JY.

Identification of future signal based on the quantitative and qualitative text mining: a case study on ethical issues in artificial intelligence. Qual Quant 2018; 52(2): 653–667.

34.

Kim

Ahn

Jung

WS.

Horizon scanning in policy research database with a probabilistic topic model. Technol Forecast Soc 2019; 146: 588–594.

35.

Holopainen

Toivonen

Weak signals: Ansoff today. Futures 2012; 44(3): 198–205.

36.

Thomas

PS.

Environmental scanning: the state of the art. Long Range Plann 1980; 13(1): 20–28.

37.

Mendonça

Cunha

MPE

Kaivo-oja

et al. Wild cards, weak signals and organisational improvisation. Futures 2004; 36(2): 201–218.

38.

Kuosa

Futures signals sense-making framework (FSSF): a start-up tool to analyse and categorise weak signals, wild cards, drivers, trends and other types of information. Futures 2010; 42(1): 42–48.

39.

Yoon

Detecting weak signals for long-term business opportunities using text mining of Web news. Expert Syst Appl 2012; 39(16): 12543–12550.

40.

Thorleuchter

Van den Poel

Weak signal identification with semantic web mining. Expert Syst Appl 2013; 40(12): 4978–4985.

41.

Thorleuchter

Scheja

Van den Poel

Semantic weak signal tracing. Expert Syst Appl 2014; 41(11): 5009–5016.

42.

Griol-Barres

Milla

Millet

Improving strategic decision making by the detection of weak signals in heterogeneous documents by text mining techniques. AI Commun 2019; 32(5–6): 347–360.

43.

Cwik

Swierszcz

Mitkow

et al. Detecting Igor H. Ansoff’s weak signals: interpretation aspects in relation to threat signals. In: Proceedings of the 33rd international conference on Business Information Management Association, Granada, 10–11 April 2019, pp. 2063–2073. Pennsylvania: IBIMA.

44.

Garcia-Nunes

Da Silva

AEA

. Using a conceptual system for weak signals classification to detect threats and opportunities from web. Futures 2019; 107: 1–16.

45.

van Veen

Ortt

Badke-Schaub

PG.

Compensating for perceptual filters in weak signal assessments. Futures 2019; 108: 1–11.

46.

Kim

Lee

Novelty-focused weak signal detection in futuristic data: assessing the rarity and paradigm unrelatedness of signals. Technol Forecast Soc 2017; 120: 59–76.

47.

Lillis

Hogan

Dilemmas of diversification: problems associated with vocational education in developing countries. Comp Educ 1983; 19(1): 89–107.

48.

Psacharopoulos

Loxley

Curriculum diversification in Colombia and Tanzania: an evaluation. Baltimore, MD: Johns Hopkins University Press, 1985.

49.

Klees

SJ.

Planning and policy analysis in education: what can economics tell us?

Comp Educ Rev 1986; 30(4): 574–607.

50.

Shulman

LS.

Knowledge and teaching: foundations of the new reform. Harvard Educ Rev 1987; 57: 1–22.

51.

Adams

Extending the educational planning discourse: conceptual and paradigmatic explorations. Comp Educ Rev 1988; 32: 400–415.

52.

Borko

Livingston

Cognition and improvisation: differences in mathematics instruction by expert and novice teachers. Am Educ Res J 1989; 26(4): 473–498.

53.

Psacharopoulos

Comparative education: from theory to practice, or are you A: \neo.* or B:\*.ist?

Comp Educ Rev 1990; 34: 369–380.

54.

Sabers

Cushing

Berliner

DC.

Differences among teachers in a task characterized by simultaneity, multidimensional, and immediacy. Am Educ Res J 1991; 28(1): 63–88.

55.

Kagan

DM.

Professional growth among preservice and beginning teachers. Rev Educ Res 1992; 62: 129–169.

56.

Welch

AR.

Class, culture and the state in comparative education: problems, perspectives and prospects. Comp Educ 1993; 29(1): 7–28.

57.

Cowen

Last past the post: comparative education, modernity and perhaps post-modernity. Comp Educ 1996; 32(2): 151–170.

58.

Leat

Rolling the stone uphill: teacher development and the implementation of thinking skills programmes. Oxford Rev Educ 1999; 25(3): 387–403.

59.

Cowen

Comparing futures or comparing pasts?

Comp Educ 2000; 36(3): 333–342.

60.

Kazamias

AM.

Re-inventing the historical in comparative education: reflections on a protean episteme by a contemporary player. Comp Educ 2001; 37(4): 439–449.

61.

Welch

AR.

Globalisation, post-modernity and the state: comparative education facing the third millennium. Comp Educ 2001; 37(4): 475–492.

62.

Novoa

Yariv-Mashal

Comparative research in education: a mode of governance or a historical journey?

Comp Educ 2003; 39(4): 423–438.

63.

Davies

Issitt

Reflections on citizenship education in Australia, Canada and England. Comp Educ 2005; 41(4): 389–410.

64.

Suárez

DF.

Rewriting citizenship? Civic education in Costa Rica and Argentina. Comp Educ 2008; 44(4): 485–503.

65.

Carney

Negotiating policy in an age of globalization: exploring educational ‘policyscapes’ in Denmark, Nepal, and China. Comp Educ Rev 2009; 53(1): 63–88.

66.

Grek

Lawn

Lingard

et al. National policy brokering and the construction of the European education space in England, Sweden, Finland and Scotland. Comp Educ 2009; 45(1): 5–21.

67.

Lingard

Rawolle

New scalar politics: implications for education policy. Comp Educ 2011; 47(4): 489–502.

68.

Moon

Koo

JW.

Global citizenship and human rights: a longitudinal analysis of social studies and ethics textbooks in the Republic of Korea. Comp Educ Rev 2011; 55(4): 574–599.

69.

Rappleye

Imoto

Horiguchi

Towards ‘thick description’ of educational transfer: understanding a Japanese institution’s ‘import’ of European language policy. Comp Educ 2011; 47(4): 411–432.

70.

Carney

Rappleye

Silova

Between faith and science: world culture theory and comparative education. Comp Educ Rev 2012; 56(3): 366–393.

71.

Sellar

Lingard

Looking East: Shanghai, PISA 2009 and the reconstitution of reference societies in the global education policy field. Comp Educ 2013; 49(4): 464–485.

72.

Crossley

Global league tables, big data and the international transfer of educational research modalities. Comp Educ 2014; 50(1): 15–26.

73.

Cowen

Ways of knowing, outcomes and ‘comparative education’: be careful what you pray for. Comp Educ 2014; 50(3): 282–301.

74.

Schweisfurth

Among the comparativists: ethnographic observations. Comp Educ 2014; 50(1): 102–111.

75.

Auld

Morris

PISA, policy and persuasion: translating complex conditions into education ‘best practice’. Comp Educ 2016; 52(2): 202–229.

76.

Komatsu

Rappleye

A new global policy regime founded on invalid statistics? Hanushek, Woessmann, PISA, and economic growth. Comp Educ 2017; 53(2): 166–191.

77.

Elliott

Stankov

Lee

et al. What did PISA and TIMSS ever do for us? The potential of large scale datasets for understanding and improving educational practice. Comp Educ 2019; 55: 133–155.

78.

Grey

Morris

PISA: multiple ‘truths’ and mediatised global governance. Comp Educ 2018; 54(2): 109–131.

79.

Auld

Rappleye

Morris

PISA for development: how the OECD and World Bank shaped education governance post-2015. Comp Educ 2019; 55(2): 197–219.

80.

Dimmock

Connecting research and knowledge on educational leadership in the West and Asia: adopting a cross-cultural comparative perspective. Comp Educ 2020; 56(2): 257–277.

81.

Vickers

Critiquing coloniality, ‘epistemic violence’ and western hegemony in comparative education – the dangers of ahistoricism and positionality. Comp Educ 2020; 56(2): 165–189.

82.

Kim

Diasporic comparative education: an initial tribute to anxiety and hope. Comp Educ 2020; 56(1): 111–126.

83.

Cordella

Foggia

Sansone

et al. An improved algorithm for matching large graphs. In: Proceedings of the IAPR TC-15 workshop on graph-based representation in pattern recognition, Ischia, 23–25 May 2001, pp. 149–159. New York: Springer.

A novel developmental trajectory discovery approach by integrating main path analysis and intermediacy

Abstract

Keywords

1. Introduction

2. Related work

2.1. Main path analysis

2.2. Intermediacy

3. Research framework and methodology

3.1. Transform to standard form

3.2. Calculate intermediacy and convert to edge weights

3.3. Calculate edge weights and convert to node weights

3.4. Normalise edge/node weights

3.5. Combine edge/node edges

3.6. Priority first search

4. Datasets and parameter setting

4.1. Weak signals

4.2. Education

4.3. Parameter setting

5. Correlation analysis

6. Developmental trajectories

6.1. Weak signals

6.1.1. The emergence stage: from 1975 to 2008

6.1.2. The exploration stage: from 2008 to 2015

6.1.3. The development stage: from 2015 to now

6.2. Education

7. Discussions and conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Notes

References