Sage Journals: Discover world-class research

Abstract

Concept prerequisite relation refers to the learning order of concepts, which is useful in education. Concept prerequisite learning refers to using machine learning methods to infer prerequisite relation of a concept pair. The process of concept prerequisite learning requires large amounts of labeled data to train classifier. Usually, the labels of prerequisite relation are assigned by specialists. The specialist labelling method is costly. Thus, it is necessary to reduce labeling expense. An effective strategy is using active learning methods. In this paper, we propose a pool-based active learning framework for concept prerequisite learning named PACOL. It is a fact that concept u and concept v cannot be prerequisite of each other simultaneously. The idea of PACOL is to select the concept pair with the greatest deviation between the classifier’s prediction and the fact. Besides, PACOL can be used in two situations: when specialists assign three kinds of labels or two kinds of labels. In experiments, we constructed data sets for three subjects. Experimental results on both our constructed data sets and public data sets demonstrate that PACOL outperforms than existing active learning methods in all situations.

Keywords

Educational data mining prerequisite relation active learning Wikipedia

1 Introduction

The teaching theory of “mastering learning” is put forth by the esteemed educationalist, Benjamin Bloom [1]. The theory of mastering learning posits that when students learn two ordered concepts, they must have mastered the first concept before learning the second concept. For example, as depicted in Fig. 1, there are five concepts. Students need to master concept “Differentiation” and concept “Integration” before they learn concept “Differential equation”. In the theory of mastering learning, concept prerequisite relation refers to the learning order of concepts. Concept prerequisite relation plays a significant role in designing the course sequencing [2], constructing educational knowledge maps [3], and modeling student behavior [4], which is useful in the education domain.

Fig. 1

An example of concept prerequisite relation. The directed edge from vertex u to v indicates that students need to be proficient in concept u before they can begin to learn v.

A concept pair is an ordered pair that is made up of two distinct concepts. Concept pair <u, v> consists of concept u and concept v. We call concept pair <v, u> is reciprocal concept pair of concept pair <u, v>. There are twenty concept pairs between five concepts in Fig. 1, such as “<Function, Derivative> ”, “<Derivative, Function> ”, “<Function, Integration> ”, and other similar ordered pairs. For example, “<Function, Derivative> ” and “<Derivative, Function> ” are reciprocal concept pairs of each other.

Concept prerequisite learning focuses on predicting prerequisite relation between two concepts in one concept pair by using machine learning models. Previous works infer prerequisite relation from Wikipedia [5–7], text [8], MOOC [9–11], and scientific corpus [12].

The process of concept prerequisite learning requires large amounts of labeled data to train the classifier. Usually, the labels of prerequisite relation are assigned by specialists [5]. In some situations, specialists assign concept pair <u, v> with three kinds of labels: (1) u is prerequisite to v; (2) v is prerequisite to u; (3) there is no prerequisite relation between u and v. In the other situations, specialists label concept pair <u, v> with two kinds of labels: (1) u is prerequisite to v; (2) u is not prerequisite to v. Some examples of concept pairs and assigned labels are shown in Table 1.

Table 1

Some examples of concept pairs and assigned labels

No.	Concept u	Concept v	Concept pair <u, v>	The label when specialists assign three kinds of labels	The label when specialists assign two kinds of labels
1	Function	Derivative	<Function, Derivative>	u is prerequisite to v	u is prerequisite to v
2	Derivative	Function	<Derivative, Function>	v is prerequisite to u	u is not prerequisite to v
3	Derivative	Integration	<Derivative, Integration>	u is prerequisite to v	u is prerequisite to v
4	Integration	Derivative	<Integration, Derivative>	v is prerequisite to u	u is not prerequisite to v
...
19	Differentiation	Integration	<Differentiation, Integration>	There is no prerequisite relation between u and v	u is not prerequisite to v
20	Integration	Differentiation	<Integration, Differentiation>	There is no prerequisite relation between u and v	u is not prerequisite to v

The specialist labeling method is inefficient and expensive in the context of the vast amount of educational resources available, whether specialists assign three or two kinds of labels. Thus, it is important to reduce the expense of labeling. An effective way to reduce the labeling expense is to use active learning methods. Several previous works focus on introducing active learning methods into concept prerequisite learning. In these works, the instance in active learning refers to the concept pair. Liang compared the performance of several classical active learning methods in concept prerequisite learning when specialists assign two kinds of labels [13]. Hu proposed an active learning method used in the situation where specialists assign three kinds of labels named CPAL [14]. The idea of CPAL is selecting two concept pairs where the classifier disagrees with their labels. But for the concept pair where the label indicates no prerequisite relation, CPAL is unable to select it, which is because of the limitation of CPAL’s evaluation function.

In this paper, we propose Pool-based Active learning framework for COncept prerequisite Learning named PACOL. Figure 2 illustrates the processes of passive learning and PACOL. In passive learning, all unlabeled concept pairs are given to specialists to assign labels. Then all labeled concept pairs are used to train the machine learning model. In the process of PACOL, firstly, PACOL uses query strategy to select informative concept pairs from the pool of unlabeled concept pairs. Secondly, the selected concept pairs are given to specialists to assign labels. Then the selected concept pairs and their corresponding labels are added to the pool of labeled concept pairs. The labeled concept pairs are used to train the machine learning model. Then the selection process is repeated. Hence, PACOL can efficiently reduce the number of labeled concept pairs necessary for training.The following are contributions that the paper makes.

Fig. 2

The processes of passive learning and PACOL.

•Be designed for all situations of concept prerequisite learning. The process of concept prerequisite learning requires large amounts of labeled data to train the classifier. Usually, the labels of prerequisite relation are assigned by specialists. In some situations, specialists assign three kinds of labels. In the other situations, specialists assign two kinds of labels. It is necessary to reduce the labeling expense in both situations. Different from previous methods that can only be used in one situation, we propose a general framework for situations where specialists assign three kinds of labels or two kinds of labels. In other words, PACOL can be used in all situations to reduce the labeling expense in concept prerequisite learning.

•Utilizes the fact of concept prerequisite relation. It is a fact that concept u and concept v cannot be prerequisite of each other simultaneously. The idea of PACOL is to select the concept pair with the greatest deviation between the classifier’s prediction and the fact. The greater the deviation between the classifier’s predictions and the fact, the higher the likelihood of selecting the concept pairs.

•Performs well on constructed data sets and public data sets. There are several public data sets of concept prerequisite relation. But there are no data sets of primary math, secondary math, and data structure. In experiments, we have constructed data sets for three subjects. The code and data are available on our GitHub repository 1 . The experimental results, obtained from our constructed data sets and public data sets, consistently demonstrate that PACOL outperforms other active learning methods.

2 Related work

2.1 Concept prerequisite learning

With the development of information technology, the explosive growth of educational data is a significant challenge to educational institutions [15]. Concept prerequisite relations have a wide range of applications in education. Concept prerequisite relation is significant in large-scale web-based learning, where learners are confronted with a multitude of instructional materials. In addition to designing courses [16] and constructing educational knowledge maps [3], some researchers have combined concept prerequisite relations with knowledge tracing models [4]. And the combined models perform well in predicting students’ abilities. Chen combined student modeling with course information to jointly infer prerequisite relations and student ability [17]. Similarly, Carmona added concept prerequisite relations to existing student models to enhance their efficiency and accuracy [18]. Passalacqua presented a set of visualization methods to assist researchers in better comprehending the problem of concept prerequisite relations, with the ultimate goal of making a meaningful contribution to designing intelligent textbooks [19].

Concept prerequisite learning refers to using machine learning models to infer concept prerequisite relation. Wikipedia is a reliable, consistent, and accurate online encyclopedia containing thousands of concepts [20]. Some researchers extracted prerequisite relation from Wikipedia. Talukdar predicted the order of reading material by extracting features of Wikipedia pages, the authors held that before reading a page, one must first understand the page of prerequisite concepts [5]. Liang held that a concept can be expressed through its associated concepts, and they calculated the reference distance for predicting relations [6]. Sayyadiharikandeh proposed that in the click stream of Wikipedia pages, most of the clicks are directed from the concept to its prerequisite concepts [7].

Some researchers extracted prerequisite relation from courses, including MOOC and traditional courses. Pan extracted features using various types of materials such as text and video of MOOC courses, and they proposed a method to infer the learning order of concepts based on representation learning [9]. A transformer-based model was proposed by Xiao in order to predict relation of concepts, which can then be used to judge whether there is a watching order between two teaching videos [11]. Alzetta proposed a model to extract prerequisite relations based on deep learning techniques in a real-world educational setting [21]. And they tested the effect of generating learning modules for concepts. Yang conducted an analysis of large-scale educational data to predict the dependencies among concepts, and then courses were designed according to the identified prerequisite relations [22]. And experiments were performed in randomly selected schools to demonstrate the method’s effectiveness. Sun proposed a contextual-knowledge-aware approach to discover concept prerequisite relations based on sparse and unstructured educational data [23]. By using resource concept graph, Zhang proposed a method based on graph network to identify concept prerequisite relation [24].

In addition to Wikipedia data and course data, text and scientific corpus can also be utilized to infer prerequisite relations. Liu mined concept prerequisite relations from the text based on two features: local properties of learning dependencies and distributional asymmetry of concepts [8]. Li used graph neural networks to infer concept prerequisite relations [10]. Gordon inferred prerequisite relations from scientific corpus to generate reading lists for students, which can help students optimally learn technical material [12]. Jia extracted concept prerequisite relation by combining both concept representation with features of concept pairs [25].

2.2 Active learning

Generally, domain specialists assign the labels of prerequisite relation, which is a costly process [26]. Thus, it is essential to reduce labeling expense. A suggested solution to solve the problem is active learning. Active learning is a kind of semi-supervised learning method that concentrates on selecting the most informative instances to query labels [27, 28]. Pool-based sampling is a well-established technique in the field of active learning [29]. Several well-known pool-based active learning methods can be applied in not only binary classification but also multi-classification problems, such as Query by Committee (QBC), Diversity Sampling (DS), Querying Informative and Representative Examples (QUIRE), and Learning Active Learning (LAL). There are also several pool-based active learning methods that are particularly designed for multi-classification problem, such as Margin Sampling (MS), Multiclass-Level Uncertainty (MCLU), and Kernel Machine Committee (KMC).

QBC strategy [30] maintains a committee containing multiple machine learning models, each of which has been trained on the currently labeled data set and assigns candidate labels for unlabeled instances. The framework then selects the instance with the greatest divergence. DS strategy [31] selects the instance whose features differ most from the labeled instances, thus increasing the diversity of labeled instances. QUIRE strategy [32] uses the interval between the instance and the current classification surface as a risk function. In other words, the smaller the interval with the classification surface, the higher the risk of the instance. Then the instances near the classification surface are selected. LAL strategy trains a regressor for predicting candidate instances’ predicted error reduction [33]. Expected Model Change Maximization (EMCM) strategy selects instances that result in the biggest change to the present model. The model change is assessed by the changes of model parameters [34]. ALLSH strategy selects instances with the greatest discrepancy between their predictive likelihoods and their perturbations [35]. And ALLSH performs well on several natural language processing tasks.

MS strategy selects instances that are closest to the classification interface [36]. MCLU strategy chooses two instances that are the furthest away from the classification interface and uses the difference of their distances as the criterion [37]. KMC strategy combines discriminative sparse kernel machines and Bayesian model to select instances [38]. In KMC strategy, sparse structure and convex duality are used to optimize joint evaluation function. Liu proposed an SVM-based active learning method to select instances by using uncertainty in multi-class classification [39].

Some researchers introduced active learning into the domain of concept prerequisite learning. In these works, the instance in active learning work refers to the concept pair. Liang compared several classical active learning models’ performances in concept prerequisite learning, and he concluded that QBC performs best among classical active learning models in the situation where specialists assign two kinds of labels [13]. Liang proposed relational reasoning that can be introduced to classical active learning methods to improve them [40]. Hu proposed an active learning model used in the situation where specialists assign three kinds of labels [14]. But this model cannot select the instances where the label indicates no prerequisite relation.

3 Preliminary

General pool-based active learning methods divide data set $D$ into labeled data set $L$ and unlabeled data set $U$ , where $L \cap U = \emptyset$ and $L \cup U = D$ . Initial labeled data set $L$ contains some labeled instances that are randomly selected from $D$ . Firstly, a classifier is trained on $L$ . Subsequently, query strategy is used to choose an unlabeled instance x_s from $U$ in each iteration. Then the chosen instance is assigned to specialists to label. The currently chosen instance along with the queried label is then added to $L$ . The classifier is retrained on updated $L$ . Finally, the classifier’s performance is evaluated using the holdout test data.

Query strategies that are used in distinct pool-based active learning methods follow a greedy strategy: $x_{s}^{*} = \underset{x_{s} \in U}{argmax} f (x_{s}, U),$ (1) where $f (x_{s}, U) \in ℝ$ is the evaluation function. $f (x_{s}, U)$ is utilized to assess the risks associated with choosing x_s from $U$ . The evaluation functions used in different pool-based active learning methods are distinct.

Let Concepts = {1, 2, . . . , C} be C concepts in the course, <u, v> is a concept pair. $D = {(x_{ij}, y_{ij}) | i \in Concepts, j \in Concepts, i \neq j}$ represents the data set, where x_uv represents the feature of <u, v>, $y_{uv} \in Y$ represents the label of <u, v>. There is a classifier Δ: for <u, v>, Δ (u, v) → y_uv. Our objective is to design an efficient evaluation function for assigning instances’ priorities. Priorities are used to choose the instance that provides the most valuable information for training model Δ.

The kinds of concept prerequisite labels given by specialists can only be three or two. There are five concepts in Fig. 1, and there are twenty concept pairs between five concepts. Table 2 shows some examples of concept pairs.

Table 2

Some examples of concept pairs

No.	Concept u	Concept v	Concept pair <u, v>	$y_{uv}^{Tri}$	$y_{uv}^{Bin}$
1	Function	Derivative	“<Function, Derivative> ”	1	1
2	Derivative	Function	“<Derivative, Function> ”	-1	-1
3	Derivative	Integration	“<Derivative, Integration> ”	1	1
4	Integration	Derivative	“<Integration, Derivative> ”	-1	-1
...
19	Differentiation	Integration	“<Differentiation, Integration> ”	0	-1
20	Integration	Differentiation	“<Integration, Differentiation> ”	0	-1

In some situations, specialists label concept pair <u, v> with three kinds of labels. Concept prerequisite learning can be considered as a triple classification problem in this situation, and Δ^Tri is a triple classifier. Let $Y^{Tri} = {1, 0, - 1}$ . $y_{uv}^{Tri} = 1$ means u is prerequisite to v, $y_{uv}^{Tri} = 0$ means there is no prerequisite relation between two concepts, $y_{uv}^{Tri} = - 1$ means v is prerequisite to u. As shown in Table 2, when specialists assign concept pair <u, v> with three kinds of labels, the label of concept pair “<Function, Derivative> ” assigned by specialists is 1, which means that concept “Function” is prerequisite to concept “Derivative”. The label of concept pair “<Integration, Differentiation> ” assigned by specialists is 0, which means that there is no prerequisite relation between concept “Differentiation” and concept “Integration”. The label of concept pair “<Integration, Derivative> ” assigned by specialists is -1, which means that concept “Derivative” is prerequisite to concept “Integration”.

In the other situations, specialists label concept pair <u, v> with two kinds of labels. In this situation, concept prerequisite learning can be considered as a binary classification problem, and Δ^Bin is a binary classifier. Let $Y^{Bin} = {1, - 1}$ . $y_{uv}^{Bin} = 1$ means u is prerequisite to v, $y_{uv}^{Bin} = - 1$ means u is not prerequisite to v. For example, when specialists assign two kinds of labels, as shown in Table 2, the label of concept pair “<Function, Derivative> ” assigned by specialists is 1, which means that concept “Function” is prerequisite to concept “Derivative”. The label of concept pair “<Integration, Differentiation> ” assigned by specialists is -1, which means that concept “Integration” is not prerequisite to concept “Differentiation”. The label of concept pair “<Integration, Derivative> ” assigned by specialists also equals -1, which means that concept “Integration” is not prerequisite to concept “Derivative”.

Relational reasoning was proposed by Liang [40], which can be introduced into any standard active learning method to improve its performance. Concept prerequisite relation is seen as a strict partial order, and data sets are updated through closure operations. When updating $L$ with a new instance $< u, v > \in U$ whose label y_uv is obtained by querying, the process involves calculating $\bar{L}'$ , i.e., the closure of $L \cup {< u, v >}$ . Then, $L$ is set to $\bar{L}'$ , and $U$ is set to $D ∖ \bar{L}'$ .

4 Methodology

4.1 Overview of PACOL

In some situations, specialists assign three kinds of labels. In the other situations, specialists assign two kinds of labels. In both situations, it is a fact that concept u and concept v cannot be prerequisite of each other simultaneously. PACOL can be used in both situations. The following intuitive understanding serves as the foundation of PACOL. The greater the deviation between the classifier’s predictions and the fact, the higher the likelihood of selecting the concept pairs. PACOL concentrates on one concept pair and its reciprocal concept pair. The following is PACOL’s query strategy: $\begin{matrix} \begin{matrix} (< u^{*}, v^{*} >, < v^{*}, u^{*} >) \\ = \underset{< u^{*}, v^{*} > \in U, < v^{*}, u^{*} > \in U}{argmax} G (x_{uv}, x_{vu}) \end{matrix} \end{matrix}$ (2) where G (x_uv, x_vu) is the evaluation function we proposed, which is utilized to evaluate the risks of choosing <u^*, v^*> and <v^*, u^*> from $U$ .

Algorithm 1 pseudo code of PACOL

Require: initial labeled data set $L$ , unlabeled data set $U$

1: while $U \neq \emptyset$ and model did not reach stop condition do

2: Train classifier Δ using labeled data set $L$ ;

3: For each $< u^{*}, v^{*} > \in U, < v^{*}, u^{*} > \in U$ , calculate evaluation function value G (x_uv, x_vu)

4: Select <u^*, v^*> , < v^*, u^* > whose G (x_u^*v^*, x_v^*u^*) is the maximum;

5: Query the label y_<u^*,v^*> of <u^*, v^*> and y_<v^*,u^*> of <v^*, u^*>;

6: $L \leftarrow L \cup {< u^{*}, v^{*} >, < {v^{*}, u}^{*} >}$ ;

7: $U \leftarrow D ∖ L$ ;

8: end while

ENSURE: The trained classifier Δ.

We depict PACOL’s pseudo code in Algorithm 1. Figure 3 shows the flowchart of PACOL. The input of PACOL is the initial labeled data set $L$ and unlabeled data set $U$ . The first step involves training the classifier Δ using the labeled data set $L$ . The second step involves calculating the evaluation function value for each concept pair $< u^{*}, v^{*} > \in U, < v^{*}, u^{*} > \in U$ . In the situation where specialists assign three kinds of labels and two kinds of labels, PACOL employs different evaluation functions accordingly. The third step involves selecting the concept pair with the maximum evaluation function value. In the fourth step, the selected concept pairs are given to specialists to assign labels. The fifth step involves updating the labeled dataset $L$ and the unlabeled dataset $U$ . Then the judgment statement is executed to decide whether to continue the loop. If the output of the judgment statement is true, the process is repeated. If the output is false, the trained classifier Δ is outputted. Then we analyze the time and space complexity of PACOL. Let N represent the number of repeats. In each repeat, PACOL performs five operations: training the classifier, calculating the evaluation function value, selecting concept pairs, querying labels, and updating data sets. The time complexity of every operation is O (1). Consequently, the overall time complexity of PACOL is O (N). Assuming $U$ contains M instances. In the step of calculating evaluation function value, PACOL needs to store all concept pairs along with their corresponding evaluation function values. Therefore, the space complexity of PACOL is O (M).

Fig. 3

Flowchart of PACOL.

4.2 Evaluation function when specialists assign three kinds of labels

When specialists assign three kinds of labels, concept prerequisite learning can be seen as a triple classification problem. In this situation, $Y^{Tri} = {1, 0, - 1}$ .

As shown in Table 3, when specialists assign three kinds of labels to concept pairs, it is a fact that concept u and concept v cannot be prerequisite of each other simultaneously: $y_{uv}^{Tri} = {\begin{matrix} 1 & if y_{vu}^{Tri} = - 1 \\ 0 & if y_{vu}^{Tri} = 0 \\ - 1 & if y_{vu}^{Tri} = 1 \end{matrix},$ (3) which could be described simply, $y_{uv}^{Tri} = {- y}_{vu}^{Tri} .$ (4)

Table 3

Some examples of concept pair and its reciprocal concept pair when specialists assign three kinds of labels

No.	Concept pair <u, v>	$y_{uv}^{Tri}$	Concept pair <v, u>	$y_{vu}^{Tri}$
1	“<Function, Derivative> ”	1	“<Derivative, Function> ”	-1
2	“<Derivative, Function> ”	-1	“<Function, Derivative> ”	1
3	“<Derivative, Integration> ”	1	“<Integration, Derivative> ”	-1
4	“<Integration, Derivative> ”	-1	“<Derivative, Integration> ”	1
...
19	“<Differentiation, Integration> ”	0	“<Integration, Differentiation> ”	0
20	“<Integration, Differentiation> ”	0	“<Differentiation, Integration> ”	0

Based on feature x_uv, the prediction of the model Δ^Tri for the concept pair <u, v> is denoted as Δ^Tri (x_uv). Based on feature x_vu, the prediction of the model Δ^Tri for the concept pair <v, u> is denoted as Δ^Tri (x_vu).

Let $P_{u \to v}^{Tri} = P (Δ^{Tri} (x_{uv}) = 1 | x_{uv}, L)$ and $P_{v \to u}^{Tri} = P (Δ^{Tri} (x_{vu}) = 1 | x_{vu}, L)$ . In ideal conditions, when Δ^Tri predicts whether concept u is prerequisite to concept v based on x_uv, and whether concept v is prerequisite to concept u based on feature x_vu, the difference between $P_{u \to v}^{Tri}$ and $P_{v \to u}^{Tri}$ should be as large as possible. The smaller the difference is, the more likely the concept pair <u, v> and the concept pair <v, u> are to be selected.

$G_{\to}^{Tri} (x_{uv}, x_{vu})$ represents the divergence between the predictions of Δ^Tri and the fact, when Δ^Tri predicts whether the label equals 1, that is $G_{\to}^{Tri} (x_{uv}, x_{vu}) = 1 - | P_{u \to v}^{Tri} - P_{v \to u}^{Tri} | .$ (5)

Let $P_{u \to v}^{Tri} = P (Δ^{Tri} (x_{uv}) = 0 | x_{uv}, L)$ and $P_{v \to u}^{Tri} = P (Δ^{Tri} (x_{vu}) = 0 | x_{vu}, L)$ . In reality, based on feature x_uv and feature x_vu, when Δ^Tri predicts whether labels equal 0, the difference between $P_{u \to v}^{Tri}$ and $P_{v \to u}^{Tri}$ should be as small as possible. In other words, the bigger the difference is, the more likely the concept pair <u, v> and the concept pair <v, u> are to be selected.

$G_{\to}^{Tri} (x_{uv}, x_{vu})$ represents the divergence between the predictions of Δ^Tri and the fact, when Δ^Tri predicts whether the label equals 0, that is $G_{\to}^{Tri} (x_{uv}, x_{vu}) = | P_{u \to v}^{Tri} - P_{v \to u}^{Tri} | .$ (6)

Let $P_{u \leftarrow v}^{Tri} = P (Δ^{Tri} (x_{uv}) = - 1 | x_{uv}, L)$ and $P_{v \leftarrow u}^{Tri} = P (Δ^{Tri} (x_{vu}) = - 1 | x_{vu}, L)$ . Under ideal conditions, based on feature x_uv and x_vu, when Δ^Tri predicts whether concept v is prerequisite to concept u based on x_uv, and whether concept u is prerequisite to concept v based on feature x_vu, the difference between $P_{u \leftarrow v}^{Tri}$ and $P_{v \leftarrow u}^{Tri}$ should be as large as possible. The smaller the difference is, the more likely the concept pair <u, v> and the concept pair <v, u> are to be selected.

$G_{\leftarrow}^{Tri} (x_{uv}, x_{vu})$ represents the divergence between the predictions of Δ^Tri and the fact, when Δ^Tri predicts whether the label equals -1, that is $G_{\leftarrow}^{Tri} (x_{uv}, x_{vu}) = 1 - | P_{u \leftarrow v}^{Tri} - P_{v \leftarrow u}^{Tri} | .$ (7)

We derive the evaluation function G^Tri (x_uv, x_vu), $\begin{matrix} \begin{matrix} G^{Tri} (x_{uv}, x_{vu}) = G_{\to}^{Tri} (x_{uv}, x_{vu}) \\ + G_{\to}^{Tri} (x_{uv}, x_{vu}) \\ + G_{\leftarrow}^{Tri} (x_{uv}, x_{vu}), \end{matrix} \end{matrix}$ (8) which is used when specialists assign three kinds of labels.

4.3 Evaluation function when specialists assign two kinds of labels

When specialists assign two kinds of labels, concept prerequisite learning can be viewed as a binary classification problem. In this situation, $Y^{Bin} = {1, - 1}$ .

As shown in Table 4, when specialists assign two kinds of labels to concept pairs, there is a fact: $y_{uv}^{Bin} = - 1 when y_{vu}^{Bin} = 1,$ (9) but not vice versa. In other words, $y_{uv}^{Bin} = 1$ and $y_{vu}^{Bin} = 1$ cannot be valid simultaneously, $y_{uv}^{Bin} = - 1$ and $y_{vu}^{Bin} = - 1$ can be valid simultaneously.

Table 4

Some examples of concept pair and reciprocal concept pairs when specialists assign two kinds of labels

No.	Concept pair <u, v>	$y_{uv}^{Bin}$	Concept pair <v, u>	$y_{vu}^{Bin}$
1	“<Function, Derivative> ”	1	“<Derivative, Function> ”	-1
2	“<Derivative, Function> ”	-1	“<Function, Derivative> ”	1
3	“<Derivative, Integration> ”	1	“<Integration, Derivative> ”	-1
4	“<Integration, Derivative> ”	-1	“<Derivative, Integration> ”	1
...
19	“<Differentiation, Integration> ”	-1	“<Integration, Differentiation> ”	-1
20	“<Integration, Differentiation> ”	-1	“<Differentiation, Integration> ”	-1

Based on feature x_uv, the prediction of the model Δ^Bin for the concept pair <u, v> is denoted as Δ^Bin (x_uv). Based on feature x_vu, the prediction of the model Δ^Bin for the concept pair <v, u> is denoted as Δ^Bin (x_vu).

Let $P_{u \to v}^{Bin} = P (Δ^{Bin} (x_{uv}) = 1 | x_{uv}, L)$ and $P_{v \to u}^{Bin} = P (Δ^{Bin} (x_{vu}) = 1 | x_{vu}, L)$ . In ideal conditions, when the binary classifier Δ^Bin predicts whether concept u is prerequisite to concept v and whether concept v is prerequisite to concept u, the difference between $P_{u \to v}^{Bin}$ and $P_{v \to u}^{Bin}$ should be as large as possible. The smaller the difference is, the more likely the concept pair <u, v> and the concept pair <v, u> are to be selected.

$G_{\to}^{Bin} (x_{uv}, x_{vu})$ represents the divergence between the predictions of Δ^Bin and the fact, when Δ^Bin predicts whether the label equals 1, that is $G_{\to}^{Bin} (x_{uv}, x_{vu}) = 1 - | P_{u \to v}^{Bin} - P_{v \to u}^{Bin} | .$ (10)

We derive the evaluation function G^Bin (x_uv, x_vu), $G^{Bin} (x_{uv}, x_{vu}) = G_{\to}^{Bin} (x_{uv}, x_{vu}),$ (11) which is used in binary classification problem.

5 Experiments

We conduct experiments to answer the following questions.

Question 1: Can active learning methods greatly reduce labeling expense in both triple classification problem and binary classification problem?

Question 2: Does PACOL offer a greater reduction in labeling expense than other methods?

Question 3: Does PACOL outperform other active learning methods, regardless of whether or not relational reasoning is introduced?

5.1 Data sets

There are two public data sets named MOOCCube [9] and LectureBank [10]. MOOCCube and LectureBank are widely used in concept prerequisite learning [11, 23–25], which involves utilizing machine learning methods to infer the prerequisite relation between concept pairs. MOOCCube contains concepts collected from computer science courses and math courses, respectively. For convenience, we call these two data sets MCS and MMT in this paper. There are 231 concepts collected from computer science courses in MCS. MCS contains various concepts such as “operating system”, “upload”, and “octal”. There are 221 mathematics concepts in MMT, such as “wiener process” and “chain rule”. LectureBank contains 208 concepts collected from network courses, such as “metropolitan area network”, and “hypertext transfer protocol server”. For convenience, we call this data set LNW in this paper.

We constructed data sets named EduRelation from three education subjects: primary math, secondary math, and data structure. For convenience, we abbreviate data sets as EPM, ESM, and EDS in this paper. The rationale behind constructing three new data sets is as follows. While MOOCCube contains 221 mathematics concepts, the majority of them, such as “wiener process” and “chain rule,” are excessively complex. Therefore, our endeavor is to construct data sets for primary and secondary math courses. Furthermore, data structure is a fundamental course in computer departments. The learning order of concepts in the data structure course is important. For instance, students need to master concept "subtree” before they learn concept “left subtree". Hence, we aim to construct a data set specifically for the data structure course.

Firstly, we obtained educational concepts from textbooks and generated concept pairs. Then we assigned concept pairs to domain specialists for annotating and we got the final labels by majority voting. Finally, in order to ensure the correctness of the labels, we solicited the assistance of other specialists and asked them to review the entire data set.

Table 5 presents a summary of the original data sets’ details. There are a large number of pages with defined formats that can give comprehensive explanations of concepts in Wikipedia. Following Liang, for each concept pair, we extracted 17 text-based features and 15 graph-based features from Wikipedia [13]. And these features were used to train the classifier.

Table 5
Description of the original data sets

Data set # of Concepts # of Concept pairs # of Prerequisite relations

EPM 138 18906 868

ESM 260 67340 1824

EDS 176 30800 901

MCS 231 53130 501

MMT 221 48620 519

LNW 208 43056 921

Data set	# of Concepts	# of Concept pairs	# of Prerequisite relations
EPM	138	18906	868
ESM	260	67340	1824
EDS	176	30800	901
MCS	231	53130	501
MMT	221	48620	519
LNW	208	43056	921

5.2 Experiments when specialists assign three kinds of labels

As shown in Table 6, we generated data sets suitable for the binary classification problem from the original data sets, and we named them EPM-Tri, ESM-Tri, EDS-Tri, MCS-Tri, MMT-Tri, LNW-Tri. For example, EPM-Tri is constructed according to EPM, there are 18906 concept pairs <u, v> in EPM-Tri. Among these concept pairs, the number of concept pairs <u, v> that u is prerequisite to v equals 868, and the number of concept pairs <u, v> that v is prerequisite to u also equals 868, and there are 17170 concept pairs <u, v> that there is no prerequisite relation between u and v.

Table 6
Description of constructed data sets for binary classification problem

Data set # of Concept Pairs(<u, v>) # of <u, v> # of <u, v> # of <u, v>

u → v u ↮ v u ← v

EPM-Tri 18906 868 17170 868

ESM-Tri 67340 1824 63692 1824

EDS-Tri 30800 901 28998 901

MCS-Tri 53130 501 52128 501

MMT-Tri 48620 519 47582 519

LNW-Tri 43056 921 41214 921

Data set	# of Concept Pairs(<u, v>)	# of <u, v>	# of <u, v>	# of <u, v>
EPM-Tri	18906	868	17170	868
ESM-Tri	67340	1824	63692	1824
EDS-Tri	30800	901	28998	901
MCS-Tri	53130	501	52128	501
MMT-Tri	48620	519	47582	519
LNW-Tri	43056	921	41214	921

We chose common machine learning methods as the classifier, such as Support Vector Machine (SVM), Naive Bayes (NB), Logistic Regression (LR), Gradient Boosting Decision Tree (GBDT), and Random Forest (RF). In our experiments, we employed One-Versus-the-Rest(OVR) algorithm [41] for each classifier to solve triple classification problem. Synthetic minority over-sampling technique (SMOTE) [42] and random undersampling were used to solve the data imbalance problem.

5-fold cross validation was applied on all data sets. Micro precision, micro recall, micro F1-score, and micro AUC of ROC were reported as evaluation metrics. The results are shown in Table 7. Among all six data sets, all of the models perform well in ESM-Tri, which is probably because the proportion of prerequisite relations in ESM-Tri is higher than the other five data sets.

Table 7

Comparison results of triple classifiers

Metric	Data set	NB	RF	SVM	LR	GBDT
micro	EPM-Tri	0.906	0.937	0.916	0.912	0.925
precision	ESM-Tri	0.922	0.949	0.937	0.939	0.956
	EDS-Tri	0.882	0.918	0.894	0.891	0.905
	MCS-Tri	0.904	0.921	0.923	0.932	0.912
	MMT-Tri	0.812	0.891	0.803	0.822	0.842
	LNW-Tri	0.925	0.941	0.952	0.927	0.946
micro	EPM-Tri	0.889	0.915	0.894	0.897	0.924
recall	ESM-Tri	0.917	0.939	0.933	0.949	0.944
	EDS-Tri	0.907	0.919	0.928	0.911	0.925
	MCS-Tri	0.892	0.942	0.939	0.923	0.930
	MMT-Tri	0.756	0.868	0.808	0.878	0.801
	LNW-Tri	0.916	0.939	0.907	0.949	0.915
micro	EPM-Tri	0.897	0.925	0.904	0.904	0.924
F1	ESM-Tri	0.919	0.943	0.934	0.943	0.949
	EDS-Tri	0.894	0.918	0.910	0.900	0.919
	MCS-Tri	0.897	0.930	0.931	0.928	0.927
	MMT-Tri	0.783	0.879	0.805	0.849	0.820
	LNW-Tri	0.921	0.940	0.928	0.938	0.930
micro	EPM-Tri	0.717	0.852	0.728	0.718	0.882
AUC	ESM-Tri	0.715	0.893	0.761	0.767	0.805
of ROC	EDS-Tri	0.698	0.865	0.721	0.729	0.842
	MCS-Tri	0.672	0.801	0.558	0.625	0.792
	MMT-Tri	0.710	0.884	0.566	0.643	0.845
	LNW-Tri	0.695	0.823	0.627	0.658	0.796

As demonstrated in Table 7, NB has a poor performance in all evaluation metrics, because the features cannot meet the premise of strong independence. LR and SVM show similar performance, particularly in terms of micro F1-score. RF performs better than the other four methods. This might be because of that RF employs random feature selection, whereas some methods rely on linear combination of features. In the subsequent experiments, RF was chosen as the triple classifier, and micro AUC of ROC was used as the evaluation metric.

Standard active learning

We conducted comparison experiments on all data sets. The data was divided into the training set $D_{train}$ and the test set $D_{test}$ . $D_{train}$ contains 80% instances and $D_{test}$ contains 20% instances. We then created the initial labeled data set $L$ by randomly selecting 10% of instances from $D_{train}$ , and the rest $U = D_{train} - L$ were viewed as the unlabeled data set. Based on query strategy, one concept pair and its reciprocal concept pair were selected from $U$ in each iteration. Then $L$ were updated by adding selected concept pairs. The model was retrained on the updated $L$ . Taking into account the influence of initial labeled data set, we ran each active learning method 200 times by generating initial labeled data set randomly.

Our comparative strategies are as follows.

•Random: Random strategy selects instances from $U$ randomly.

•LAL: LAL strategy selects instances by using a trained regressor.

•QBC: QBC strategy selects instance that is more difficult to be distinguished by committees’ voting. We use query-by-bagging [43] to construct committees.

•ALLSH: ALLSH strategy selects the instances whose predictive likelihoods diverge the most from their perturbations.

•CPAL: CPAL strategy selects instances whose labels are not agreed by the classifier. But instances where the label indicates no prerequisite cannot be selected because of the limitation of the evaluation function.

•MCLU: MCLU strategy selects the two instances farthest from the classification interface.

•KMC: KMC strategy selects instances by using discriminative sparse kernel machines and Bayesian model.

LAL, QBC, and ALLSH can be utilized in both binary classification problems and multi-classification problems. And in this section, they are specifically applied to the triple classification problem. Figure 4 shows the experimental results.

Fig. 4

Comparison of standard active learning methods in the triple classification problem.

Active learning after introducing relational reasoning

Liang introduced relational reasoning to enhance commonly used query strategies [40]. Liang improved standard active learning methods by incorporating relational reasoning in the process of selecting valuable instances and expanding the training set.

Following Liang, we introduced relational reasoning to PACOL and other comparative models. And we named them PACOL-R, Random-R, LAL-R, QBC-R, ALLSH-R, CPAL-R, MCLU-R, and KMC-R respectively. Then we conducted comparative experiments on all six data sets. Figure 5 shows the performance of active learning methods after introducing relational reasoning.

Fig. 5

Comparison of methods after introducing relational reasoning in the triple classification problem.

5.3 Experiments when specialists assign two kinds of labels

As shown in Table 8, we generated data sets suitable for the binary classification problem from the original data sets, and we named them EPM-Bin, ESM-Bin, EDS-Bin, MCS-Bin, MMT-Bin, LNW-Bin.

Table 8
Description of constructed data sets for binary classification problem

Data set #Concept #Concept Pairs #Concept Pairs

Pairs(<u, v>) u → v u ↛ v

EPM-Bin 18906 868 18038

ESM-Bin 67340 1824 65516

EDS-Bin 30800 901 29899

MCS-Bin 53130 501 52629

MMT-Bin 48620 519 48101

LNW-Bin 43056 921 42135

Data set	#Concept	#Concept Pairs	#Concept Pairs
EPM-Bin	18906	868	18038
ESM-Bin	67340	1824	65516
EDS-Bin	30800	901	29899
MCS-Bin	53130	501	52629
MMT-Bin	48620	519	48101
LNW-Bin	43056	921	42135

For example, EPM-Bin is constructed according to EPM, there are 18906 concept pairs <u, v> in EPM-Bin. Among these concept pairs, the number of concept pairs <u, v> that u is prerequisite to v equals 868, and the number of concept pairs <u, v> that u is not prerequisite to v equals 18038.

Similar to the conduction in the triple classification problem, we chose SVM, NB, LR, GBDT, and RF as the classifier. SMOTE was used for data augmentation and random undersampling was used to solve the data imbalance problem. 5-fold cross validation was applied in all data sets. Then average precision, recall, F1-score, and AUC of ROC were reported.

Table 9 shows that the classification results vary depending on methods used. Similar to the performance of classifiers in the triple classification problem, RF and GBDT perform well on all evaluation metrics. In the following experiments, we chose GBDT as the binary classifier.

Table 9

Comparison results of binary classifiers

Metric	Data set	NB	RF	SVM	LR	GBDT
precision	EPM-Bin	0.852	0.903	0.882	0.886	0.904
	ESM-Bin	0.887	0.907	0.894	0.892	0.912
	EDS-Bin	0.856	0.933	0.874	0.869	0.88
	MCS-Bin	0.834	0.910	0.887	0.884	0.905
	MMT-Bin	0.826	0.889	0.795	0.880	0.931
	LNW-Bin	0.869	0.895	0.881	0.902	0.922
recall	EPM-Bin	0.792	0.845	0.823	0.837	0.831
	ESM-Bin	0.802	0.827	0.834	0.825	0.832
	EDS-Bin	0.768	0.838	0.829	0.841	0.836
	MCS-Bin	0.830	0.908	0.892	0.870	0.915
	MMT-Bin	0.818	0.906	0.807	0.915	0.909
	LNW-Bin	0.866	0.916	0.866	0.899	0.914
F1	EPM-Bin	0.861	0.892	0.867	0.876	0.887
	ESM-Bin	0.876	0.914	0.907	0.905	0.917
	EDS-Bin	0.841	0.904	0.882	0.889	0.903
	MCS-Bin	0.832	0.909	0.889	0.877	0.910
	MMT-Bin	0.822	0.897	0.801	0.897	0.920
	LNW-Bin	0.868	0.906	0.874	0.900	0.917
AUC of ROC	EPM-Bin	0.803	0.865	0.853	0.837	0.881
	ESM-Bin	0.851	0.879	0.866	0.845	0.902
	EDS-Bin	0.803	0.874	0.831	0.845	0.897
	MCS-Bin	0.681	0.859	0.609	0.769	0.857
	MMT-Bin	0.662	0.789	0.675	0.729	0.787
	LNW-Bin	0.720	0.849	0.758	0.765	0.853

Standard active learning

Similar to the experiments in the triple classification problem, the data was divided into the training set $D_{train}$ and the test set $D_{test}$ . We selected 10% of instances from $D_{train}$ as initial labeled data set $L$ , and the rest as $U$ . Taking into consideration the influence of initial labeled data set, each active learning method was conducted 100 times by randomly generating initial $L$ .

We use Random, LAL, QBC, ALLSH, DS, and QUIRE as comparative strategies. These methods can be used in both binary classification problems and multi-classification problems. And in this section, they are used for the binary classification problem. The descriptions of DS and QUIRE are as follows.

•DS: DS strategy selects unlabeled instances whose feature gap between the labeled instances is the largest, thus incorporating diversity of the labeled instances in active learning.

•QUIRE: QUIRE strategy selects the instance with the least distance from the classification surface formed by the labeled instances now.

Figure 6 shows the experimental results in the binary classification problem. The performance of various active learning methods on distinct data sets varies dramatically. However, PACOL generally outperforms the other methods.

Fig. 6

Comparison of standard active learning methods in the binary classification problem.

Active learning after introducing relational reasoning

Similar to the conduction when specialists assign three kinds of labels, we added relational reasoning to PACOL and other comparative methods. We named them PACOL-R, Random-R, LAL-R, QBC-R, ALLSH-R, DS-R, and QUIRE-R, respectively. Figure 7 demonstrates the experimental results of active learning methods after introducing relational reasoning.

Fig. 7

Comparison of methods after introducing relational reasoning in the binary classification problem.

5.4 Analysis

We conduct experiments to answer questions mentioned at the beginning of this chapter.

Answer 1: As we can see from all figures in this chapter, when no unlabeled instances have been queried, the performances of Random strategy and other active learning methods are same. As the number of queries rises, other active learning methods exhibit a more rapid improvement in the classifier’s performance compared to the Random strategy. This holds true for both triple classification problem and binary classification problem. The classifier trained using concept pairs selected by other active learning methods performs better than the classifier using Random strategy, when the number of labeled instances is fixed. This implies that active learning methods can reduce labeling expense in concept prerequisite learning.

Answer 2: As shown in Figs. 4 and 6, when the number of labeled instances is fixed, the performance of the classifier that has been trained using data selected by PACOL is better than the performance of the classifier that has been trained using data selected by other active learning methods. In other words, when the labeling expense is fixed, instances selected by PACOL are more informative compared to instances selected by other active learning methods. As shown in Figs. 4 and 6, when all active learning methods perform equally, PACOL requires a smaller number of labeled instances compared to other active learning methods. To put it another way, PACOL can reduce more labeling expenses than other methods.

Answer 3: As shown in Figs. 4 and 6, PACOL performs better than other methods in standard situation. Besides, as we can see from Figs. 5 and 7, after introducing relational reasoning into PACOL and other comparative methods, PACOL still outperforms other methods. In other words, PACOL performs better than other active learning methods, whether or not relational reasoning is introduced into active learning.

Then we analyze every figure in this chapter. As shown in Fig. 4, when the classifier is trained on initial labeled data set, the performances of all active learning methods are same. PACOL performs better than other models virtually all of the time when the number of labeled instances is fixed. This is due to the fact that PACOL selects the concept pair with the greatest deviation between the classifier’s prediction and the fact. So the selected concept pair contains more information. Then the selected concept pairs are used to train a well-performed classifier. Unlike other active learning methods that treat all instances equally, PACOL can obtain a well-performed model at the early stage. The results of experiments demonstrate that PACOL excels at selecting the most informative instances for concept prerequisite learning.

In addition, for the other models besides PACOL, CPAL performs better in general. The idea of CPAL is selecting two concept pairs where the classifier disagrees on their labels. However, CPAL is unable to select instances with label 0, which is due to its limitation. CPAL performs better than other active learning methods. This is probably due to the fact that there are many instances with label 0 in the data sets, which are less important than other instances. LAL strategy, ALLSH strategy, and QBC strategy perform worse than MCLU strategy and KMC strategy. This is most likely due to the fact that the former two strategies are designed for both binary classification and multi-classification problems, while the latter two strategies are designed particularly for multi-classification problems. ALLSH strategy outperforms both the LAL and QBC strategies in the triple classification problem. This is likely attributed to the effectiveness of ALLSH in handling imbalanced data sets.

Figure 5 shows that PACOL-R still outperforms the other models after the introduction of relational reasoning. This proves that PACOL performs better than the other models in triple classification problem, regardless of whether relational reasoning is introduced or not.

As Fig. 6 shows, PACOL performs better than other models most of the time when the number of labeled instances is fixed. After a certain proportion of instances have been labeled, the performances of active learning models reach a stability stage. This might be due to the abundance of similar instances in the data sets, which offer limited information. Compared to other active learning methods, the performance of PACOL stabilizes significantly faster, which proves that PACOL can select useful instances more rapidly than other active learning methods. In general, QBC strategy and ALLSH strategy outperform the other models in the binary classification problem. QBC strategy demonstrates the advantages of using an ensemble method. Additionally, ALLSH strategy showcases the effectiveness of utilizing local sensitivity. Figures 4 and 6 demonstrate the effectiveness of PACOL in standard active learning.

As shown in Fig. 7, PACOL-R still outperforms the other methods after the introduction of relational reasoning. In other words, PACOL performs better than the other active learning methods in binary classification problem whether introducing relational reasoning or not. Figure 5 and 7 demonstrate the effectiveness of PACOL after introducing relational reasoning.

Discussion: In the situation where specialists assign three kinds of labels, Figs. 4 and 5 demonstrate the effectiveness of PACOL. The former shows the performance in standard active learning, and the latter shows the performance after introducing relational reasoning. Figures 6 and 7 demonstrate the effectiveness of PACOL in the situation where specialists assign two kinds of labels. Similarly, the former shows standard active learning and the latter shows active learning after introducing relational reasoning. Regardless of whether the specialists assign three or two kinds of labels, PACOL outperforms several existing active learning methods. But we cannot compare the performance of PACOL in the situations where specialists assign three and two kinds of labels, because we are unaware of the difference of labeling expenses between the two situations.

6 Conclusion

A novel pool-based active learning framework for concept prerequisite learning named PACOL is proposed in this paper. PACOL can be used in situations where the kinds of labels given by the specialists are three or two. In other words, PACOL is designed for using in both triple classification problem and binary classification problem in concept prerequisite learning. PACOL utilizes the fact of concept prerequisite relations to perform computational analysis on one concept pair and its reciprocal concept pair. Experimental results on our data sets and other public data sets reveal that PACOL outperforms several existing active learning methods in both problems, regardless of whether or not relational reasoning is introduced.

Future work can be explored in the following potential aspects. Firstly, we will try to extract prerequisite relations from courses by using natural language processing approaches. Secondly, we will try to construct the educational knowledge map based on concept relations. Thirdly, we will introduce the prerequisite relation into the process of exercise recommendation.

Footnotes

Acknowledgments

This work is funded by Youth Innovation Promotion Association of Chinese Academy of Sciences (CAS).

10% of the data from the constructed data sets are provided for review. The data and code can be obtained from .

References

Bloom

B.S.

, All our children learning, MCGRAW-HILL, 1981.

Ramirez-Noriega

Alan

, Juarez-Ramirez

Reyes

, Jimenez

Samantha

, Martinez-Ramirez

Yobani

and Perez

J. Francisco Figueroa

, Determination of the course sequencing to intelligent tutoring systems using an ontology and Wikipedia, Journal of Intelligent & Fuzzy Systems 34(5) (2018), 3177–3185.

Wang

Shuting

, Ororbia

Alexander

, Wu

Zhaohui

, Williams

Kyle

, Liang

Chen

, Pursel

Bart

and Giles

C. Lee

, Using Prerequisites to Extract Concept Maps from Textbooks. In CIKM. pages 317–326. ACM, 2016.

Chen

Penghe

, Lu

, Zheng

Vincent W.

and Pian

Yang

, Prerequisite-Driven Deep Knowledge Tracing. In 2018 ICDM, pages 39–48, Singapore, November 2018. IEEE.

Talukdar

Partha

and Cohen

William

, Crowdsourced Comprehension: Predicting prerequisite structure inWikipedia. In Proceedings of the SeventhWorkshop on Building Educational Applications Using NLP, pages 307–315. ACL, June 2012.

Liang

Chen

, Wu

Zhaohui

, Huang

Wenyi

and Giles

C. Lee

, Measuring Prerequisite Relations Among Concepts. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1668–1674, Lisbon, Portugal, ACL (2015).

Sayyadiharikandeh

Mohsen

, Gordon

Jonathan

, Ambite

Jose-Luis

and Lerman

Kristina

, Finding prerequisite relations using the wikipedia clickstream. In Companion Proceedings of The 2019 World Wide Web Conference, WWW’19, pages 1240–1247, New York, NY, USA, May ACM (2019).

Liu

Jun

, Jiang

, Wu

Zhaohui

, Zheng

Qinghua

and Qian

Yanan

, Mining learning-dependency between knowledge units from text, The VLDB Journal 20(3) (2011), 335–345.

Pan

Liangming

, Li

Chengjiang

, Li

Juanzi

and Tang

Jie

, Prerequisite relation learning for Concepts in MOOCs. In Proceedings of the 55th Annual Meeting of the ACL, pages 1447–1456, ACL 2017.

10.

Irene

, Fabbri

Alexander R.

, Tung

Robert R.

and Radev

Dragomir R.

, What should i learn first: Introducing lecturebank for nlp education and Prerequisite Chain Learning, Proceedings of the AAAI Conference on Artificial Intelligence 33(01) (2019), 6674–6681.

11.

Xiao

Kui

, Bai

Youheng

and Zhang

Yan

, Extracting precedence relations between video lectures in moocs. In Proceedings of the 2022 International Conference on Multimedia Retrieval, pages 608–614, Newark NJ USA, June 2022. ACM.

12.

Gordon

Jonathan

, Zhu

Linhong

, Galstyan

Aram

, Natarajan

Prem

and Burns

Gully

, Modeling concept dependencies in a scientific corpus. In Proceedings of the 54th Annual Meeting of the ACL, pages 866–875, Berlin, Germany, August 2016 ACL.

13.

Liang

Chen

, Ye

Jianbo

, Wang

Shuting

, Pursel

Bart

and Giles

C. Lee

, Investigating active learning for concept prerequisite learning, Proceedings of the AAAI Conference on Artificial Intelligence 32(1) (2018), 7913–7919.

14.

Xinying

, He

and Sun

Guangzhong

, Active Learning for concept prerequisite learning in Wikipedia. In 2021 13th International Conference on Machine Learning and Computing, ICMLC 2021, pages 582–587, New York, NY, USA, February 2021. ACM.

15.

Tuo

Meimei

and Yang

Wenzhong

, Review of entity relation extraction, Journal of Intelligent & Fuzzy Systems, Preprint(Preprint): 1–15, January 2023.

16.

Bai

Youheng

, Zhang

Yan

, Xiao

Kui

, Lou

Yuanyuan

and Sun

Kai

, A bert-based approach for extracting prerequisite relations among wikipedia concepts, Mathematical Problems in Engineering 2021 (2021), e3510402.

17.

Chen

Yetian

, Gonzalez-Brenes

Jose P

and Tian

Jin

, Joint discovery of skill prerequisite graphs and student models, International Educational Data Mining Society (2016), 46–53.

18.

Carmona

Cristina

, Millan

Eva

, Cruz

Jose-Luis Perez-de-la

, Trella

Monica

and Conejo

Ricardo

, Introducing prerequisite relations in a multi-layered bayesian student model. International Conference on user Modeling, pages 347–356. Springer, 2005.

19.

Passalacqua

Samuele

, Koceva

Frosina

, Alzetta

Chiara

, Torre

Ilaria

and Adorni

Giovanni

, Visualisation Analysis for Exploring Prerequisite Relations in Textbooks. In iTextbooks@ AIED, pages 18–29, 2019.

20.

Ayala-Gomez

Frederick

, Daroczy

Balint

, Benczur

Andras

, Mathioudakis

Michael

and Gionis

Aristides

, Global citation recommendation using knowledge graphs, Journal of Intelligent & Fuzzy Systems 34(5) (2018), 3089–3100.

21.

Alzetta

Chiara

, Miaschi

Alessio

, Adorni

Giovanni

, Dell’Orletta

Felice

and Koceva

Frosina

, Passalacqua

Samuele

and Torre

Ilaria

. Prerequisite or not prerequisite? that’s the problem! an nlp-based approach for concept prerequisite learning. In CLiCit, page 8, 2019.

22.

Yang

Yiming

, Liu

Hanxiao

, Carbonell

Jaime

and Ma

Wanli

, Concept graph learning from educational data. In WSDM, pages 159–168. ACM, 2015.

23.

Sun

Hao

, Li

Yuntao

and Zhang

Yan

, ConLearn: Contextual knowledge-aware Concept Prerequisite Relation Learning with Graph Neural Network. In 2022 SDM, pages 118–126, 2022.

24.

Zhang

Juntao

, Lin

Nanzhou

, Zhang

Xuelong

, Song

Wei

, Yang

Xiandi

and Peng

Zhiyong

, Learning Concept Prerequisite Relations from Educational Data via Multi-Head Attention Variational Graph Auto-Encoders. In SDM, pages 1377–1385, 2022.

25.

Jia

Chenghao

, Shen

Yongliang

, Tang

Yechun

, Sun

and Lu

Weiming

, Heterogeneous graph neural networks for concept prerequisite relation learning in educational data. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2036–2047, Online, 2021. ACL.

26.

Bhatt

Arpita Jadhav

, Gupta

Chetna

and Mittal

Sangeeta

, iABC-AL: Active learning-based privacy leaks threat detection for iOS applications, Journal of King Saud University – Computer and Information Sciences 33(7) (2021), 769–786.

27.

Quanbao

, Wei

Fajie

and Zhou

Shenghan

, Early warning systems for multi-variety and small batch manufacturing based on active learning, Journal of Intelligent & Fuzzy Systems 33(5) (2017), 2945–2952.

28.

Zhao

Jianhua

, Liu

Ning

and Malov

, Safe semi-supervised classification algorithm combined with active learning sampling strategy, Journal of Intelligent & Fuzzy Systems 35(4) (2018), 4001–4010.

29.

Lewis

David D

, A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, volume 29, pages 13–19. ACM New York, NY, USA, 1995.

30.

Burbidge

Robert

, Rowland

Jem J.

and King

Ross D.

, Active Learning for regression based on Query by Committee. In Hujun Yin, Peter Tino, Emilio Corchado, Will Byrne and Xin Yao, editors, Intelligent Data Engineering and Automated Learning – IDEAL Lecture Notes in Computer Science, pages 209–218, Berlin, Heidelberg, 2007. Springer.

31.

Yong Cheng

, Active learning based on diversity maximization, Applied Mechanics and Materials 347–350 (2013), 2548–2552.

32.

Huang

Sheng-Jun

, Jin

Rong

and Zhou

Zhi-Hua

, Active learning by querying informative and representative examples, IEEE Transactions on Pattern Analysis and Machine Intelligence 36(10) (2014), 1936–1949.

33.

Konyushkova

Ksenia

, Sznitman

Raphael

and Fua

Pascal

, Learning Active Learning from Data, Advances in Neural Information Processing Systems 30 (2017).

34.

Cai

Wenbin

, Zhang

and Zhou

Jun

, Maximizing Expected Model Change for Active Learning in Regression. In 2013 ICDM, pages 51–60. IEEE Computer Society, December 2013.

35.

Zhang

Shujian

, Gong

Chengyue

, Liu

Xingchao

, He

Pengcheng

, Chen

Weizhu

and Zhou

Mingyuan

, Allsh: Active learning guided by local sensitivity and hardness. In Findings of the ACL: NAACL 2022, pages 1328–1342, 2022.

36.

Campbell

I.C.G.

, Cristianini

and Smola

, Query Learning with Large Margin Classifiers, Proc. 17th Int Conference on Machine Learning. 1 (2000), 111–118.

37.

Demir

Begum

, Persello

Claudio

and Bruzzone

Lorenzo

, Batch-mode active-learning methods for the interactive classification of remote sensing images, IEEE Transactions on Geoscience and Remote Sensing 49(3) (2011), 1014–1031.

38.

Shi

Weishi

and Yu

, Integrating bayesian and discriminative sparse kernel machines for multi-class active learning. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.

39.

Liu

Dongjiang

and Liu

Yanbi

, An active learning algorithm for multi-class classification, Pattern Analysis and Applications 22(3) (2019), 1051–1063.

40.

Liang

Chen

, Ye

Jianbo

, Zhao

Han

, Pursel

Bart

and Giles

C. Lee

, Active learning of strict partial orders: A case study on concept prerequisite relations. In 12th International Conference on Educational Data Mining, EDM 2019, pages 348–353. International Educational Data Mining Society, 2019.

41.

Wei

, Gao

Xiaorong

and Gao

Shangkai

, One-versus-therest(OVR) Algorithm: An extension of common spatial patterns (CSP) Algorithm to multi-class case. In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pages 2387–2390, 2005.

42.

Chawla

Nitesh V.

, Bowyer

Kevin W.

, Hall

Lawrence O.

and Kegelmeyer

W. Philip

, SMOTE: Synthetic minority oversampling technique, Journal of Artificial Intelligence Research 16(1) (2002), 321–357.

43.

Abe

Naoki

and Mamitsuka

Hiroshi

, Query learning strategies using boosting and bagging, Proceedings of the 25th International Conference on Machine Learning. 388 (1998), 1–9.

Pool-based active learning framework for concept prerequisite learning

Abstract

Keywords

1 Introduction

2.1 Concept prerequisite learning

2.2 Active learning

3 Preliminary

4.1 Overview of PACOL

5.1 Data sets

Table 5 Description of the original data sets Data set # of Concepts # of Concept pairs # of Prerequisite relations EPM 138 18906 868 ESM 260 67340 1824 EDS 176 30800 901 MCS 231 53130 501 MMT 221 48620 519 LNW 208 43056 921

6 Conclusion

Footnotes

Acknowledgments

References

Table 5
Description of the original data sets

Data set # of Concepts # of Concept pairs # of Prerequisite relations

EPM 138 18906 868

ESM 260 67340 1824

EDS 176 30800 901

MCS 231 53130 501

MMT 221 48620 519

LNW 208 43056 921