Sage Journals: Discover world-class research

Abstract

In recent years, there has been a growing trend in utilizing deep learning techniques to solve various NP-hard combinatorial optimization problems, mostly using deep neural networks to generate the solutions directly. In this work, we address a famous combinatorial optimization problem on graphs, the graph coloring problem (GCP), and propose novel ways that train and utilize deep node embeddings to facilitate the problem’s solving. Specifically, we propose to use Transformer to learn the correlation between nodes in graphs. The Transformer learns the node embeddings (feature vectors) such that nodes that might be in the same color in (near-)optimal solutions have close embeddings. To generate the labels, we use a typical GCP heuristic called Tabucol to solve each small training instance multiple times. In this way, the labels are generated more efficiently and robustly as compared to using an exact solver. We then apply the learned embeddings to guide several construction and searching algorithms for the GCP, including Tabucol. Empirical results show that all the algorithms could be improved by utilizing the learned node embeddings, and our methods generalize well to graphs on much larger scales than the training graphs.

Keywords

deep node embeddings combinatorial optimization graph coloring problem

1. Introduction

The graph coloring problem (GCP) is a well-known NP-hard combinatorial optimization problem. It was featured in the second DIMACS implementation challenge¹ as one of the most representative NP-hard problems, alongside the maximum clique and satisfiability problems. Given an undirected graph, the GCP aims to color all nodes with the minimum number of colors, constraining that any two adjacent nodes are in different colors. Graph coloring methods can be applied to study other models such as independent sets (Paschos, 2001), cliques (Li et al., 2018), and relaxation cliques (Zhou et al., 2021), and can also be applied to many real-world applications such as timetabling (de Werra, 1985), allocation problems (Chow & Hennessy, 1990; Wang et al., 2014), and human subjects (Kearns et al., 2006).

Recently, the rise of deep learning has provided some new perspectives for solving combinatorial optimization problems, resulting in many innovative and distinctive methods (Bai et al., 2021; Ireland & Montana, 2022; Kool et al., 2019; Nazari et al., 2018; Paulus et al., 2021). Graph neural networks (GNNs) are a kind of popular network structure for combinatorial optimization problems over graphs (Awasthi et al., 2022; Chen et al., 2024; Hudson et al., 2022; Khalil et al., 2017; Prates et al., 2019), including the GCP (Lemos et al., 2019; Li et al., 2022; Schuetz et al., 2022; Wang et al., 2023). These methods for GCP all use GNNs for node embedding. That is, using the trained GNNs to calculate the embeddings, i.e., feature vectors, of nodes in the graph. Then, they further use the obtained embeddings to try to solve the GCP directly. More details about the related studies are referred to in Section 2.

In this paper, we argue that using deep neural networks (DNNs) to solve the complex NP-hard GCP directly is very difficult. The power of deep learning lies in fitting the learned experiences from a large amount of labeled data, while traditional algorithms are powerful in constructing and searching with well-designed logic. We believe that compared to using DNNs to directly solve the problem, using experiences learned from DNNs to guide traditional algorithms is more appropriate and effective.

We suggest some new ways to use deep node embedding to boost traditional GCP algorithms. We propose to use the encoder of Transformer (Vaswani et al., 2017), a very popular and powerful deep learning model, to learn a low-dimensional representation (i.e. embedding) for each node in a graph, making nodes that should be in the same color (i.e. might be in the same color in optimal or near-optimal solutions) have close embeddings. The embeddings are then used to guide and boost traditional algorithms rather than solve the problem directly, which can make the model concentrate on learning the correlation between nodes in graphs. Note that general GNNs are usually homogeneous, assigning similar embeddings to connected nodes. However, the GCP has heterogeneity, meaning that connected nodes should be in different colors. This is also our reason for selecting Transformer, not GNNs, and the experiments also demonstrate the advantages of Transformer over GNNs.

Our Transformer encoder model is trained in a supervised manner. For each training graph, we generate a label to distinguish whether two nodes should be in the same color. A straightforward method to distinguish whether two nodes should be in the same color is to see whether they are in the same color in the optimal solution of the GCP instance. However, such a method has some disadvantages. On the one hand, a GCP instance might have diverse optimal solutions. Two nodes that are in the same color in an optimal solution might be in different colors in other optimal solutions. On the other hand, obtaining the optimal solution for each training instance is expensive. To address these issues, we propose to use a famous heuristic GCP algorithm, Tabucol (Hertz & de Werra, 1987), to generate 100 diverse optimal or near-optimal solutions for each training graph and use the 100 solutions as the labels for model training. Intuitively, two nodes that are assigned to the same color in most of the 100 solutions are more likely to be in the same color in an optimal solution.

To evaluate the performance of the trained model, i.e. the embeddings learned, we apply the embeddings to improve two kinds of GCP heuristics. One is the construction heuristics (Graf, 2021; Tabakhi, 2016), which constructs a solution by sequentially coloring each node. We select the classical Greedy heuristic (Welsh & Powell, 1967) that is widely used in graph coloring tasks and the Fastcolor algorithm (Lin et al., 2017) that can efficiently color massive graphs. The other is a searching heuristic (Alexiadis & Refanidis, 2016; Hussain et al., 2016; Niu et al., 2018), which maintains one or more solutions and explores the solution space to find better solutions. We select the famous and classical Tabucol algorithm (Hertz & de Werra, 1987) because it is the algorithm to generate the training labels in our methods and, more importantly, the most recent effective searching heuristics (Goudet et al., 2022; Moalic & Gondran, 2018; Zhou et al., 2018) use Tabucol as their local search components. Experiments on GCP instances with up to 4,000 nodes show that both the construction and searching heuristics could be improved under the guidance of the embeddings. Note that the number of nodes in the training graphs ranges from 100 to 200, indicating that our model exhibits good generality.

The main contributions of this work are as follows.

We suggest novel ways of using deep learning to solve the NP-hard GCP. We make the model concentrate on the correlation between elements, i.e. the nodes, in the problem. The model aims to learn the embeddings of the nodes, leading nodes that should be in the same color to close embeddings.

We propose to use a heuristic rather than exact solvers to calculate multiple times to generate labels for model training, which can avoid spending a lot of time running exact solvers and can handle the issue that an instance might have multiple optimal solutions.

We suggest new ways to apply the learned node embeddings to guide some construction and searching GCP heuristics, including Tabucol, the algorithm for generating labels. Results show that they can all be improved by applying the node embeddings, indicating the excellent performance of our methods.

Our training model has excellent generalization capability that can be generalized to instances with much larger scales (e.g., instances with 4,000 nodes) than the training instances with 100 to 200 nodes.

2. Related Work

This section reviews studies that use deep learning to solve the GCP, which can be divided into supervised and unsupervised learning. The GNN-GCP algorithm (Lemos et al., 2019) is trained in a supervised manner. It regards the GCP as a decision problem and uses the embeddings to predict whether a graph can be colored by a given number of colors. Such a prediction is not precise, and thus the reference value of their results is uncertain. Moreover, they generate the labels by calling an exact GCP solver to solve more than 60,000 training instances, which is very time-consuming. Another supervised learning model for the GCP is incorporated in the DLMCOL algorithm (Goudet et al., 2022). To the best of our knowledge, DLMCOL (Goudet et al., 2022) is the only algorithm that combines deep learning with traditional algorithms to solve the GCP in the literature. However, the model in DLMCOL is only used to judge whether a solution might be improved to be a better solution by local search. Thus, deep learning might play a minor role in DLMCOL.

Unsupervised learning models usually regard the GCP as a node classification task and use different GNN models to handle the task, such as graph discrimination network (Li et al., 2022), physics-inspired GNN (Schuetz et al., 2022), and GraphSAGE (Hamilton et al., 2017). These methods investigate the potential of GNNs for solving the GCP, resulting in some novel frameworks. However, their models are trained in an unsupervised manner with only the rule that connected nodes should be assigned to different colors, which we believe is not enough for the models to solve the problem. In other words, their models only know how to avoid conflict without knowing how to minimize the coloring number. Moreover, these unsupervised learning models cannot well handle the problems in the GCP that there might be many equivalent (symmetric) node pairs in each graph and multiple optimal solutions for each graph, making the models inaccurate when facing these kinds of graphs.

Compared to the existing deep learning-based methods, our method assigns a more suitable task for the deep learning model, making it concentrate on the correlation between components of the problem, and the combination of the deep learning model and traditional algorithm results in more practical and effective solving methods.

3. Methodology

This section first presents the overall framework of our methods, then introduces the components used in the framework, including how to generate the training data, the Transformer encoder structure, the loss function, and how to apply the learned embeddings to improve various GCP heuristics.

3.1. General Framework

The framework of our proposed methods is summarized in Figure 1, where Figure 1(a) illustrates the training scheme and Figure 1(b) shows how to use the trained model.

Figure 1.

Overview of the proposed framework.

As shown in Figure 1(a), during the training process, the Transformer encoder receives an initial embedding matrix $X \in R^{n \times d}$ that contains the initial feature vectors with dimensions equal to $d$ of all the $n$ nodes in the training instance, i.e. the graph, and outputs an embedding matrix $f_{θ} (X) \in R^{n \times d}$ that contains the final feature vectors of the nodes. The Tabucol algorithm calculates the instance 100 times with a time limit of 10 seconds for each calculation and outputs a symmetric co-coloring matrix $M \in {\frac{0}{100}, \frac{1}{100}, \dots, \frac{100}{100}}^{n \times n}$ as a label for the training, where $M_{i j}$ indicates the frequency of nodes $i$ and $j$ in the same color in the 100 solving results and can also be regarded as an evaluation of the probability of coloring nodes $i$ and $j$ the same color in the optimal solutions. We use the co-coloring matrix $M$ and the final embedding matrix $f_{θ} (X)$ to calculate the loss function.

As shown in Figure 1(b), we use the trained embedding network to obtain the node embeddings of a testing graph and use the embeddings to do downstream tasks, such as guiding or assisting the heuristic GCP algorithms (e.g., the Greedy, Fastcolor, and Tabucol algorithms in our experiments).

3.2. Training Data Generation

This subsection introduces the methods of generating the training instances and their initial embeddings.

3.2.1. Training Instances

Considering that the GCP instances might have various densities and how to learn the features of instances with different densities might be different, we randomly generate 10,000 graphs for each density $p = [0.1, 0.5, 0.9]$ by connecting any two nodes with probability $p$ as the training instances. The number of nodes in each graph is randomly sampled from 100 to 200. We respectively train an embedding network for each density $p$ based on the 10,000 corresponding training instances. Note that when calculating the embeddings of a testing instance, we select the network that the density of its training instances is the closest to the density of the testing instance.

3.2.2. Generating the Initial Embeddings

The initial embedding is the only information and representation of each graph that inputs the encoder network. In other words, the encoder network can only distinguish the graphs by analyzing their initial embeddings. Therefore, we believe that the initial embedding of a graph should contain the information that is important for the GCP task, such as the structural information and the vertex degree information.

For each training graph, to embed the structural information, we use the DeepWalk (Perozzi et al., 2014) algorithm, a classic graph embedding method that preserves the topology information of nodes in the graph, to generate a feature vector $z_{i} \in R^{d / 2}$ for each node $i$ . To embed the degree information, we randomly generate a feature vector $y_{j} \in R^{d / 2}$ by sampling from the standard normal distribution, as (Ying et al., 2021) does, to represent the degree value $j$ . Finally, the initial embedding of node $i$ with degree $d e g (i)$ concatenates these two components, i.e. $x_{i} = CONCAT (z_{i}, y_{d e g (i)}) \in R^{d}$ , to represent the graph more accurate.

3.3. Transformer Encoder Structure

Due to the prevailing success of the Transformer in natural language processing and computer vision and its powerful encoding capability, we select the Transformer (Vaswani et al., 2017) as the embedding network in our model. Since the goal of our model is to learn the embeddings of nodes, we only need the encoder structure of the Transformer.

For an input graph $G = (V, E)$ , we define $h_{i}^{0} = x_{i}$ as the initial embedding (see Section 3.2 for details) of each node $i \in V$ , and feed as the input to the Transformer encoder. The embeddings are then updated by $L$ Transformer encoder layers defined as follows:

\begin{aligned} {\hat{h}}_{i}^{(l)} & = h_{i}^{(l - 1)} + MHA (LN (h_{i}^{(l - 1)})), \\ h_{i}^{(l)} & = {\hat{h}}_{i}^{(l)} + FFN (LN ({\hat{h}}_{i}^{(l)})), \end{aligned}

(1)

where

l \in {1, \dots, L}

indicates the

l

th encoder layer. The building blocks of each encoder layer contain a multi-head attention (MHA) (Vaswani et al., 2017) and a feed-forward network (FFN) (Vaswani et al., 2017). The layer normalization (LN) Ba, Kiros, and Hinton (2016) is applied before each block. The FFN consists of two linear layers with a GELU non-linearity Hendrycks and Gimpel (2016). The input of the MHA is a sequence

H = LN (h^{(l - 1)}) \in R^{n \times d^{'}}

, where

d^{'}

is the dimension of the hidden layers. The MHA calculates as:

\begin{aligned} MHA (H) & = CONCAT (h e a d_{1}, \dots, h e a d_{m}) \cdot W_{O}, \\ h e a d_{i} & = softmax (\frac{K_{i}^{T} Q_{i}}{\sqrt{d_{k}}}) \cdot V_{i}, i \in {1, \dots, m}, \end{aligned}

(2)

where

Q_{i}

K_{i}

V_{i}

can be calculated by multiplying the trainable parameter matrices

W_{i}^{Q} \in R^{d^{'} \times d_{q}}

W_{i}^{K} \in R^{d^{'} \times d_{k}}

W_{i}^{V} \in R^{d^{'} \times d_{v}}

and

H

, i.e.

Q_{i} = W_{i}^{Q} H

K_{i} = W_{i}^{K} H

V_{i} = W_{i}^{V} H

, and the projection

W_{O} \in R^{m d_{v} \times d^{'}}

is also a trainable parameter matrix.

3.4. Loss Function

Before introducing the loss function in our model, we first provide some necessary definitions. Let $G = (V, E)$ denote a training graph, where $V = {1, \dots, n}$ is the set of $n$ nodes, and $E$ is the set of edges, $h^{L}$ is the encoded embeddings of $V$ outputted by the Transformer encoder, and $M$ is the co-coloring matrix of $G$ calculated by Tabucol. Given an input graph, the Transformer encoder actually aims to obtain an embedding for each node such that nodes that should be in the same color have close embeddings, and nodes that should be in different colors have distinct embeddings. We define $l (v_{1}, v_{2}) = ‖ v_{1} - v_{2} ‖_{2}^{2}$ as the $L_{2}$ norm of vectors $v_{1}$ and $v_{2}$ , and use $l (h_{i}^{L}, h_{j}^{L})$ to represent the distance between the embeddings of nodes $i$ and $j$ , i.e. $h_{i}^{L}$ and $h_{j}^{L}$ .

Intuitively, we wish $l (h_{i}^{L}, h_{j}^{L})$ be small (resp. large) if $M_{i j}$ is close to 1 (resp. 0). Thus, a straightforward approach to design the loss function is to use $M_{i j} - \bar{M}$ as a weight of $l (h_{i}^{L}, h_{j}^{L})$ , where $\bar{M} = \frac{1}{n (n - 1) / 2 - | E |} \sum_{1 \leq i < j \leq n} M_{i j}$ is the average co-coloring value (note that when calculating $\bar{M}$ , we ignore the pair of nodes that are connected, because they must be in different colors). However, such a simple design of the loss function may lead the network to learn an incorrect correlation between the pair of nodes with an ambiguous correlation, i.e. the pair of nodes with a weight close to 0, because for such a node pair, the value of $l (h_{i}^{L}, h_{j}^{L})$ has almost no impact on the loss function. Therefore, the loss function should not only consider the pairs of nodes with clear correlation but also distinguish the pairs of nodes with ambiguous correlation from the others.

To address this issue, we set two bounds, $B_{l a r g e}$ and $B_{s m a l l}$ , to help the network distinguish the pair of nodes. Specifically, for nodes $i$ and $j$ , if $M_{i j} \geq B_{l a r g e}$ (resp. $M_{i j} \leq B_{s m a l l}$ ), we consider that nodes $i$ and $j$ should be (resp. not be) in the same color, otherwise, the correlation of the colors of nodes $i$ and $j$ are ambiguous. Then, for each node $i \in V$ , we define $S_{i} \subset V$ as the set of nodes that should be in the same color with $i$ , i.e. $S_{i} = {j ∣ j \in V, j \neq i, M_{i j} \geq B_{l a r g e}}$ , $D_{i} \subset V$ as the set of nodes that should be in different colors with $i$ , i.e. $D_{i} = {j ∣ j \in V, j \neq i, M_{i j} \leq B_{s m a l l}}$ , and $A_{i} \subset V$ as the set of nodes with ambiguous correlation to $i$ , i.e. $A_{i} = {j ∣ j \in V, j \neq i, B_{s m a l l} < M_{i j} < B_{l a r g e}}$ .

Based on the above definitions, we design two terms in the loss function. The first considers that the distance between $h_{i}^{L}$ and the embeddings of nodes in $S_{i}$ should be smaller than the distance between $h_{i}^{L}$ and the embeddings of nodes in $A_{i}$ . We design the loss function by adopting triplet metric learning (Tian et al., 2020; Yang et al., 2022), which is commonly used for machine learning algorithms where an input as the anchor sample is compared to its positive and negative samples. In other words, for embedding $h_{i}^{L}$ , the embeddings of nodes in $S_{i}$ can be regarded as its positive samples, and the embeddings of nodes in $A_{i}$ can be regarded as its negative samples.

Then, the triple loss function $L_{i}^{S A}$ to distinguish the nodes that should be in the same colors as node $i$ (i.e. nodes in $S_{i}$ ) and the nodes with ambiguous correlation to node $i$ (i.e. nodes in $A_{i}$ ) can be calculated as follows:

L_{i}^{S A} = max {\frac{\sum_{j \in S_{i}} M_{i j} l (h_{i}^{L}, h_{j}^{L})}{\sum_{j \in S_{i}} M_{i j}} - \frac{\sum_{j \in A_{i}} l (h_{i}^{L}, h_{j}^{L})}{| A_{i} |} + α_{i}, 0},

(3)

where

α_{i}

is a margin between the distance of positive and negative pairs. We set

α_{i} = \sum_{j \in V, j \neq i} l (h_{i}^{0}, h_{j}^{0}) / (n - 1)

to be the average distance between the initial embedding of node

i

h_{i}^{0}

, and the initial embeddings of the other nodes. Moreover, in equation (3), we use the co-coloring to weight the distances between

h_{i}^{L}

and the embeddings of nodes in

S_{i}

to make the pairs of nodes that are more likely to be in the same color have larger weights. We do not weight the distances between

h_{i}^{L}

and the embeddings of nodes in

A_{i}

since the correlation between these nodes and node

i

is ambiguous.

The second term in the loss function considers that the distance between $h_{i}^{L}$ and the embeddings of nodes in $D_{i}$ should be larger than the distance between $h_{i}^{L}$ and the embeddings of nodes in $A_{i}$ . Thus similarly, we design another triple loss function $L_{i}^{A D}$ as follows:

L_{i}^{A D} = max {\frac{\sum_{j \in A_{i}} l (h_{i}^{L}, h_{j}^{L})}{| A_{i} |} - \frac{\sum_{j \in D_{i}} {\hat{M}}_{i j} l (h_{i}^{L}, h_{j}^{L})}{\sum_{j \in D_{i}} {\hat{M}}_{i j}} + α_{i}, 0},

(4)

where

{\hat{M}}_{i j} = 1 - M_{i j}

In the end, the entire loss function $L$ that combines $L_{i}^{S A}$ and $L_{i}^{A D}$ can be calculated as follows:

L = \sum_{i \in V} (L_{i}^{S A} + L_{i}^{A D}) .

(5)

The final loss function in equation (5) can help the model distinguish the pairs of nodes with either clear or ambiguous correlation. The loss function can prevent the model from learning an incorrect correlation between the pair of nodes with an ambiguous correlation. Moreover, the loss function uses triplet metric learning to make the model more robust.

3.5. Applying to Various GCP Heuristics

Given a testing graph, our trained model can generate the embeddings for the nodes. The second problem we need to handle is how to use the node embeddings to improve various GCP algorithms. We propose to use the $k$ -nearest neighbor ( $k$ NN) method to help select the nodes that are more likely to be in the same color. Specifically, for each node $i$ , there are $n - 1 - d e g (i)$ nodes that are non-connected with $i$ , where $n$ is the number of nodes in the graph and $d e g (i)$ is the degree of node $i$ . We define set $N_{i}$ as a node set that contains $m i n {k, n - 1 - d e g (i)}$ nodes with the closest embeddings to the embedding of $i$ , $h_{i}^{L}$ . Intuitively, nodes in $N_{i}$ are more likely to be in the same color as node $i$ .

For evaluation, we select some representative GCP heuristics, including construction algorithms, the classical Greedy algorithm (Welsh & Powell, 1967) and Fastcolor (Lin et al., 2017), as well as the Tabucol (Hertz & de Werra, 1987) local search algorithm. In the following, we first introduce the process of these heuristic baselines and then introduce how to use the embeddings to boost the algorithms.

3.5.1. Greedy and Fastcolor

The coloring process of the Greedy and Fastcolor algorithms is similar. Both of them sequentially color a node in each step. The coloring orders of the nodes in the two algorithms are determined according to their degrees and saturation degrees, respectively. Each node is colored with the minimum possible color, i.e. the color with the minimum index. Since there exists randomness in finding the cliques and the reduction process in Fastcolor, it restarts the entire procedure once finishing coloring the graph to find better results until the cut-off time is reached.

To apply the embeddings to a construction GCP algorithm, we do not change the rule of ordering the coloring nodes but use the $k$ NN node set $N_{i}$ of node $i$ to suggest a color. When coloring node $i$ , we first try the color that most nodes in $N_{i}$ are in. If this fails, then we color node $i$ according to the original coloring rule of the construction heuristic.

3.5.2. Tabucol

In each round, Tabucol tries to color the graph with $C$ colors. The value of $C$ is initialized to the greedy coloring result. If Tabucol succeeds in coloring the graph with $C$ colors, it will try to use $C - 1$ colors in the next round until the cut-off time is reached. In Tabucol, a move $m o v e (v, c)$ is an operator to change the color of node $v$ to color $c$ , and a score $s c o r e (v, c)$ is defined to be the reduction of the number of conflict edges (i.e. edges that the two endpoints are in the same color) by executing $m o v e (v, c)$ . In each iteration, Tabucol selects all the candidate moves that are not in the tabu list and have the highest score. Usually, there are a few to hundreds of candidate moves selected by Tabucol in each iteration. Finally, Tabucol randomly selects a candidate move to execute in each iteration.

To apply the embeddings to Tabucol, we increase the probability of selecting a move $m o v e (v, c)$ if there are nodes in $N_{v}$ in color $c$ . Specifically, suppose there are $t$ nodes in $N_{v}$ in color $c$ , the probability of selecting $m o v e (v, c)$ will be increased by $2 t / | N_{v} |$ .

In summary, the embeddings learned by our model are used to guide the GCP heuristics without changing their original frameworks and procedures. As a result, the heuristics can utilize the information of the learned embeddings to assign appropriate colors to the nodes. In particular, the embeddings can improve Tabucol, indicating that our model, which is trained by Tabucol, can surprisingly enhance the teacher. Moreover, we offer some ways of utilizing the learned embeddings. Actually, one can design appropriate methods to utilize the embeddings according to the tasks and situations.

4. Experiments

This section presents experiments to evaluate the performance of our model and performs ablation studies to evaluate the efficacy of the components in our model.

4.1. Experimental Setup

4.1.1. Testing

We test our model by using the output embeddings to guide some baseline GCP heuristics, including Greedy (Welsh & Powell, 1967), Fastcolor (Lin et al., 2017), and Tabucol² (Hertz & de Werra, 1987) (see details in Section 3.5). The experiments were run on a server using an Intel $^{®}$ Xeon $^{®}$ Silver 4210 2.20 GHz CPU, running Ubuntu 18.04 Linux operation system.

The computational results contain two groups. The first presents a comparison with the state-of-the-art deep learning based algorithms, GDN (Li et al., 2022), PI-SAGE (Schuetz et al., 2022), and GNN-1N (Wang et al., 2023), all of them show better performance than the GNN-GCP algorithm (Lemos et al., 2019). The second group compares the heuristic baselines and their improved versions with the embeddings. The testing instances for this group consist of the 38 most difficult DIMACS coloring challenge benchmarks³ for the GCP, which are the union of the GCP instances used in the best-performing GCP heuristics (Goudet et al., 2021; Moalic & Gondran, 2018). Their number of nodes ranges from 125 to 4,000, and densities range from 0.03 to 0.97. They are also widely used in some other GCP heuristics (Goudet et al., 2021, 2022; Zhou et al., 2018). The testing instances for this group contain some common and hard instances with up to 19,717 nodes used in the original paper (see Table 1).

Table 1.
Comparison Between Tabucol+Embed and Tabucol with Deep Learning Based Algorithms.

Tabucol+Embed Tabucol

Instances $| V |$ $| E |$ $C$ Best Average Best Average GDN PI-SAGE GNN-1N

queen7_7 49 476 7 0 0 0 0 9 0 0

queen8_8 64 728 8 2 2 2 2 2 – –

queen8_8 64 728 9 0 0 0 0 – 1 1

queen9_9 81 1056 9 3 3 3 3 6 – –

queen9_9 96 1368 10 0 0 0 0 – 1 1

queen11_11 121 1980 11 6 7.1 7 7.3 21 17 13

queen13_13 169 3328 13 9 10.2 9 10.2 33 26 15

Pubmed 19717 44338 8 0 0 0 0 21 17 –

				Tabucol+Embed	Tabucol
queen7_7	49	476	7	0	0	0	0	9	0	0
queen8_8	64	728	8	2	2	2	2	2	–	–
queen8_8	64	728	9	0	0	0	0	–	1	1
queen9_9	81	1056	9	3	3	3	3	6	–	–
queen9_9	96	1368	10	0	0	0	0	–	1	1
queen11_11	121	1980	11	6	7.1	7	7.3	21	17	13
queen13_13	169	3328	13	9	10.2	9	10.2	33	26	15
Pubmed	19717	44338	8	0	0	0	0	21	17	–

Note. The results are expressed by the number of conflict edges. Equal best results appear in italic, and unique best results appear in bold. GNN = Graph neural network.

4.1.2. Parameter Settings

Our network has $L = 2$ Transformer encoder layers. The dimensions of the embeddings and the hidden layers are set to $d = d^{'} = 128$ . The dimensions, $d_{q}, d_{k}, d_{v}$ , are all set to 16. The number of heads $m$ is set to 8. We use Adam (Kingma & Ba, 2015) to train the model for 200 epochs in a mini-batch manner. The batch size is set to 25, and the learning rate is set to $1 \times 10^{4}$ .

As described in Section 3.5, we use the $k$ NN method to help select the nodes that are more likely to be in the same color. The value of $k$ is set to 17. The ablation study on the value of $k$ is presented in Section 4.3. Moreover, to determine the bounds $B_{l a r g e}$ and $B_{s m a l l}$ that are important for calculating the loss function, we analyze the distribution of co-coloring values of the non-connected node pairs in some sampled training instances with different densities, as shown in Figure 2, where the co-coloring values are ordered in descending order. We can observe that, for graphs with densities of 0.1 and 0.5, some large co-coloring values are clustered over 80, and for graphs with a density of 0.9, the large values are clustered over 95. Thus, according to the distributions in Figure 2, we empirically set $B_{l a r g e}$ to 80 for the training instances with densities of 0.1 and 0.5 and 95 for the training instances with a density of 0.9 and set $B_{s m a l l}$ to 10 for all the training instances.

Figure 2.

Distribution of the co-coloring values in sampled training graphs with various densities.

4.2. Computational Results

We denote the algorithms that use the embeddings to improve Greedy, Fastcolor, and Tabucol as Greedy+Embed, Fastcolor+Embed, and Tabucol+Embed, respectively. In the following, we first compare Greedy+Embed with Greedy, Fastcolor+Embed with Fastcolor, and Tabucol+Embed with Tabucol. For each instance, Greedy(+Embed) calculates once, and the other four algorithms with randomness in their searching process calculate 10 times with different seeds. Due to the difference between our machine and the machine used in Lin et al. (2017), we set the cut-off time to 90 seconds for Fastcolor(+Embed). We set the cut-off time to 5,000 seconds for Tabucol(+Embed).

The comparison results of Greedy vs. Greedy+Embed, Fastcolor vs. Fastcolor+Embed, and Tabucol vs. Tabucol+Embed are shown in Table 2. Note that for each pair of algorithms, equal best results appear in italic, and unique best results appear in bold. In these tables, columns $| V |$ and $| E |$ indicate the number of nodes and edges in the graph, respectively, columns Best and Average indicate the best and average coloring results obtained by the algorithms in 10 runs, row Win indicates the number of instances the algorithm obtained unique best results.

Table 2.
Comparison Between the Baselines and their Improved Algorithms Guided by the Embeddings.

Fastcolor Fastcolor+Embed Tabucol Tabucol+Embed

Instances $| V |$ $| E |$ BKS Greedy Greedy+Embed Best Average Best Average Best Average Best Average

DSJC125.1 125 736 5 7 7 5 5.0 5 5.0 5 5.0 5 5.0

DSJC125.5 125 3891 17 23 23 19 19.0 19 19.0 17 17.0 17 17.0

DSJC125.9 125 6961 44 57 56 46 46.0 46 46.0 44 44.0 44 44.0

DSJC250.1 250 3218 8 11 12 9 9.0 9 9.0 8 8.0 8 8.0

DSJC250.5 250 15668 28 39 38 35 35.0 34 34.7 28 28.1 28 28.2

DSJC250.9 250 27897 72 92 92 82 83.1 81 82.6 72 72.0 72 72.0

DSJC500.1 500 12458 12 18 17 15 15.0 14 14.9 12 12.0 12 12.0

DSJC500.5 500 62624 47 67 67 63 63.0 63 63.0 50 50.0 50 50.0

DSJC500.9 500 112437 126 167 164 155 157.1 155 155.9 127 127.0 127 127.0

DSJC1000.1 1000 49629 20 28 28 25 25.0 25 25.0 20 20.6 20 20.6

DSJC1000.5 1000 249826 82 120 119 113 113.9 113 113.6 90 90.1 90 90.0

DSJC1000.9 1000 449449 222 306 305 287 287.0 288 288.0 226 226.4 226 226.8

DSJR500.1 500 3555 12 13 13 12 12.0 12 12.0 12 12.0 12 12.0

DSJR500.5 500 58862 122 138 137 122 122.6 122 122.3 129 131.7 128 131.8

DSJR500.1c 500 121275 85 94 94 85 86.5 86 86.8 86 90.3 85 90.0

flat300_26_0 300 21633 26 46 44 39 39.2 39 39.1 26 26.0 26 26.0

flat300_28_0 300 21695 28 44 43 39 39.1 39 39.1 31 31.9 31 31.8

flat1000_50_0 1000 245000 50 119 118 112 112.1 111 112.0 50 50.0 50 50.0

flat1000_60_0 1000 245830 60 118 119 112 112.8 112 112.1 60 60.0 60 60.0

flat1000_76_0 1000 246708 81 118 118 112 113.0 111 112.4 88 88.9 89 89.0

latin_square_10 900 307350 97 153 148 118 119.9 118 120.3 99 99.8 98 99.7

le450_15a 450 8168 15 19 18 16 16.0 15 15.8 15 15.0 15 15.0

le450_15b 450 8169 15 19 18 15 15.0 15 15.0 15 15.0 15 15.0

le450_15c 450 16680 15 25 25 22 22.1 22 22.1 15 15.4 15 15.1

le450_15d 450 16750 15 26 26 22 22.6 22 22.8 15 15.8 15 15.5

le450_25a 450 8260 25 25 25 25 25.0 25 25.0 25 25.0 25 25.0

le450_25b 450 8263 25 25 25 25 25.0 25 25.0 25 25.0 25 25.0

le450_25c 450 17343 25 31 31 27 27.0 27 27.0 26 26.0 26 26.0

le450_25d 450 17425 25 31 31 27 27.0 27 27.0 26 26.0 26 26.0

r125.1 125 209 5 5 5 5 5.0 5 5.0 5 5.0 5 5.0

r125.5 125 3838 36 40 39 36 36.0 36 36.0 36 36.0 36 36.0

r250.1 250 867 8 8 8 8 8.0 8 8.0 8 8.0 8 8.0

r250.5 250 14849 65 73 72 65 65.0 65 65.0 67 67.6 66 67.7

r1000.1 1000 14378 20 24 24 20 20.0 20 20.0 20 20.0 20 20.0

r1000.5 1000 238267 234 269 270 236 236.6 236 236.7 249 251.2 248 249.3

r1000.1c 1000 485090 98 111 111 103 104.2 103 104.3 98 98.0 98 99.3

C2000.5 2000 999836 145 216 217 206 206.0 205 205.0 164 164.5 164 164.6

C4000.5 4000 4000268 259 391 389 377 377.8 375 375.0 304 304.9 304 304.8

Win – – – 4 16 2 6 8 13 1 7 5 8

						Fastcolor	Fastcolor+Embed	Tabucol	Tabucol+Embed
DSJC125.1	125	736	5	7	7	5	5.0	5	5.0	5	5.0	5	5.0
DSJC125.5	125	3891	17	23	23	19	19.0	19	19.0	17	17.0	17	17.0
DSJC125.9	125	6961	44	57	56	46	46.0	46	46.0	44	44.0	44	44.0
DSJC250.1	250	3218	8	11	12	9	9.0	9	9.0	8	8.0	8	8.0
DSJC250.5	250	15668	28	39	38	35	35.0	34	34.7	28	28.1	28	28.2
DSJC250.9	250	27897	72	92	92	82	83.1	81	82.6	72	72.0	72	72.0
DSJC500.1	500	12458	12	18	17	15	15.0	14	14.9	12	12.0	12	12.0
DSJC500.5	500	62624	47	67	67	63	63.0	63	63.0	50	50.0	50	50.0
DSJC500.9	500	112437	126	167	164	155	157.1	155	155.9	127	127.0	127	127.0
DSJC1000.1	1000	49629	20	28	28	25	25.0	25	25.0	20	20.6	20	20.6
DSJC1000.5	1000	249826	82	120	119	113	113.9	113	113.6	90	90.1	90	90.0
DSJC1000.9	1000	449449	222	306	305	287	287.0	288	288.0	226	226.4	226	226.8
DSJR500.1	500	3555	12	13	13	12	12.0	12	12.0	12	12.0	12	12.0
DSJR500.5	500	58862	122	138	137	122	122.6	122	122.3	129	131.7	128	131.8
DSJR500.1c	500	121275	85	94	94	85	86.5	86	86.8	86	90.3	85	90.0
flat300_26_0	300	21633	26	46	44	39	39.2	39	39.1	26	26.0	26	26.0
flat300_28_0	300	21695	28	44	43	39	39.1	39	39.1	31	31.9	31	31.8
flat1000_50_0	1000	245000	50	119	118	112	112.1	111	112.0	50	50.0	50	50.0
flat1000_60_0	1000	245830	60	118	119	112	112.8	112	112.1	60	60.0	60	60.0
flat1000_76_0	1000	246708	81	118	118	112	113.0	111	112.4	88	88.9	89	89.0
latin_square_10	900	307350	97	153	148	118	119.9	118	120.3	99	99.8	98	99.7
le450_15a	450	8168	15	19	18	16	16.0	15	15.8	15	15.0	15	15.0
le450_15b	450	8169	15	19	18	15	15.0	15	15.0	15	15.0	15	15.0
le450_15c	450	16680	15	25	25	22	22.1	22	22.1	15	15.4	15	15.1
le450_15d	450	16750	15	26	26	22	22.6	22	22.8	15	15.8	15	15.5
le450_25a	450	8260	25	25	25	25	25.0	25	25.0	25	25.0	25	25.0
le450_25b	450	8263	25	25	25	25	25.0	25	25.0	25	25.0	25	25.0
le450_25c	450	17343	25	31	31	27	27.0	27	27.0	26	26.0	26	26.0
le450_25d	450	17425	25	31	31	27	27.0	27	27.0	26	26.0	26	26.0
r125.1	125	209	5	5	5	5	5.0	5	5.0	5	5.0	5	5.0
r125.5	125	3838	36	40	39	36	36.0	36	36.0	36	36.0	36	36.0
r250.1	250	867	8	8	8	8	8.0	8	8.0	8	8.0	8	8.0
r250.5	250	14849	65	73	72	65	65.0	65	65.0	67	67.6	66	67.7
r1000.1	1000	14378	20	24	24	20	20.0	20	20.0	20	20.0	20	20.0
r1000.5	1000	238267	234	269	270	236	236.6	236	236.7	249	251.2	248	249.3
r1000.1c	1000	485090	98	111	111	103	104.2	103	104.3	98	98.0	98	99.3
C2000.5	2000	999836	145	216	217	206	206.0	205	205.0	164	164.5	164	164.6
C4000.5	4000	4000268	259	391	389	377	377.8	375	375.0	304	304.9	304	304.8
Win	–	–	–	4	16	2	6	8	13	1	7	5	8

Note. For each pair of algorithms, equal best results appear in italic, and unique best results appear in bold.

From the results, we can observe that Greedy+Embed yields better (resp. worse) results than Greedy in solving 16 (resp. 4) instances. The best solutions of Fastcolor+Embed are better (resp. worse) than Fastcolor in solving 8 (resp. 2) instances. The best solutions of Tabucol+Embed are better (resp. worse) than Tabucol in solving 5 (resp. 1) instances. The average solutions of Fastcolor+Embed and Tabucol+Embed are also better than the baselines. The results indicate that both construction and searching GCP heuristics can be improved with the guidance of the embeddings. In particular, the embeddings can even boost Tabucol, helping it escape from local optima and find better results, indicating that the “student” can surprisingly improve the “teacher,” which is a really exciting result. Moreover, the embeddings work well in solving instances with much larger scales than the training instances, indicating the excellent generalization capability of our model.

Then, we compare Tabucol+Embed and Tabucol under a time limit of 5 minutes with deep learning based algorithms, GDN, PI-SAGE, and GNN-1N (their results are from the literature). The results are shown in Table 1, where column $C$ indicates the number of colors the algorithms tried, and the results are expressed by the number of conflict edges. We can observe that Tabucol+Embed significantly outperforms GDN, PI-SAGE, and GNN-1N, and also obtains better results than Tabucol in solving instance queen11_11. Moreover, although these end-to-end approaches (i.e. GDN, PI-SAGE, and GNN-1N that use deep learning model to generate the solutions directly) can construct solutions in seconds (as the results in Li et al. 2022), their performance downgrade a lot compared to searching-based approaches, indicating again that combining deep learning with traditional heuristics is more practical than end-to-end approaches.

4.3. Ablation Study

Since the Greedy algorithm is fast and deterministic, we perform ablation studies based on this algorithm. We first analyze the performance of Greedy+Embed with different $k$ values of the $k$ NN method. The parameter domain of $k$ is set to $[1, 60]$ , and Greedy+Embed shows the best performance with $k = 17$ . We present the average results of Greedy+Embed in solving the 38 testing instances with several $k$ values near 17, i.e. $k \in [13, 23]$ , in Figure 3.

Figure 3.

Average coloring results in 38 tested graph coloring problem (GCP) instances of Greedy+Embed with different $k$ values.

Then, we perform four ablation studies to analyze the components in our model in Figure 4. The first study aims to analyze the method of generating the initial embeddings by comparing Greedy+Embed and Greedy with two variants, Greedy-NoDegree and Greedy-Rand. Greedy-NoDegree only uses the features generated by Perozzi et al. (2014) as the initial embeddings without concatenating the degree features. Greedy-Rand randomly generates the initial embeddings. We use Greedy+Embed, Greedy-NoDegree, and Greedy-Rand to calculate the 38 testing instances with $k \in [1, 60]$ and order the 60 average results of each algorithm in ascending order. These results and the average result of Greedy are shown in Figure 4(a). We can observe that Greedy+Embed shows better performance than Greedy-NoDegree and significantly better performance than Greedy-Rand, indicating that adding the degree features into the initial embeddings is reasonable and using DeepWalk to make the initial embeddings contain the information of structural information of the graph is effective.

Figure 4.

Ablation studies by comparing Greedy+Embed, Greedy, and their variants. The results are expressed by the average coloring results in 38 tested graph coloring problem (GCP) instances.

We use the same method to perform the other three groups of ablation studies. The second study aims to analyze the design of the loss function and also involves two variants, Greedy-NoTL and Greedy-NoBounds. Greedy-NoTL does not use triplet metric learning but only uses the co-coloring values to weight the distances of the node pairs with clear correlation, i.e. ${(i, j) ∣ j \in A_{i} \cup D_{i}}$ to calculate the loss function. Greedy-NoBounds does not use the bounds $B_{l a r g e}$ and $B_{s m a l l}$ to help the network distinguish the pair of nodes but uses the co-coloring values to weight the distances of the node pairs and uses the triplet metric learning method to calculate the loss function. The comparison results of Greedy+Embed, Greedy-NoTL, Greedy-NoBounds, and Greedy are shown in Figure 4(b). The results show that Greedy+Embed significantly outperforms Greedy-NoTL and Greedy-NoBounds, indicating that our design of the loss function is reasonable and effective.

The third study is to analyze the embedding network we choose, i.e. the Transformer encoder layer. We use the graph convolutional network (GCN) (Kipf & Welling, 2017) and graph attention network (GAT) (Velickovic et al., 2018) as the embedding network and obtain two variants, Greedy-GCN and Greedy-GAT. The comparison results of Greedy+Embed, Greedy-GAT, Greedy-GCN, and Greedy are shown in Figure 4(c). The results show that Greedy+Embed significantly outperforms Greedy-GAT and Greedy-GCN, which might be owing to the conflict between the homophily of GCN and GAT and the heterophily of the GCP. This is also the reason we did not choose GCN and GAT but the Transformer model.

The last study is to analyze the influence of the densities of training graphs on the performance of the model in solving the testing instances. We denote Greedy-0.1, Greedy-0.5, and Greedy-0.9 as variants of Greedy+Embed that only use the model trained on graphs with densities of 0.1, 0.5, and 0.9, respectively, to solve the testing instances. The comparison results of Greedy+Embed, Greedy-0.1, Greedy-0.5, Greedy-0.9, and Greedy are shown in Figure 4(d). We can observe that Greedy+Embed outperforms the three variants, and Greedy-0.9 shows the worst performance among the three variants. This is because the number of instances with densities close to 0.9 is the smallest among the 38 testing instances. The results indicate that training the model based on instances with different densities is effective.

Moreover, the results of Greedy+Embed and Greedy in the four groups of ablation studies show that with most $k$ values, Greedy+Embed can obtain better results than Greedy, indicating the good robustness of our methods.

5. Conclusion

In this work, we propose to incorporate supervised deep learning methods to solve the NP-hard GCP. The GCP itself is very difficult, especially for DNNs. This paper provides some new perspectives for using DNNs to help solve the problem and obtains some empirical improvements.

We propose to focus the deep learning model on learning the correlation between nodes in each graph. The model aims to learn the node embeddings, making nodes that should be in the same color have close embeddings. Then, we propose to use a heuristic, Tabucol, to solve each training instance multiple times to obtain a co-coloring matrix that indicates the correlation between nodes. The co-coloring matrix is used as a label to train the model to distinguish what kind of node pairs should be in the same color. We use triplet metric learning to improve the robustness of our model. Finally, we suggest using the embeddings to guide and boost the GCP heuristics by applying the $k$ -Nearest Neighbor method to suggest colors for nodes. Experiments show that our method can improve both the construction and local search algorithms and also shows superiority over other deep learning-based methods.

In future work, we will continue to investigate the potential of deep node embeddings in solving other combinatorial optimization problems.

Footnotes

Funding

The authors received the following financial support for the research, authorship, and/or publication of this article: This work is supported by National Natural Science Foundation of China (62076105) and Microsoft Research Asia (100338928).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Alexiadis

Refanidis

(2016). Optimizing individual activity personal plans through local search. AI Communications, 29(1), 185–203.

Awasthi

Das

Gollapudi

(2022). Beyond gnns: An efficient architecture for graph problems. In Proceedings of the 36th AAAI conference on artificial intelligence, (pp. 6019–6027).

L. J.

Kiros

J. R.

Hinton

G. E

(2016). Layer normalization. CoRR, abs/1607.06450.

Bai

Sun

Wang

(2021). GLSearch: Maximum common subgraph detection via learning to search. In Proceedings of the 38th international conference on machine learning, volume 139, (pp. 588–598).

Chen

Liu

(2024). Learn to solve dominating set problem with GNN and reinforcement learning. Applied Mathematics and Computation, 474, 128717.

Chow

F. C.

Hennessy

J. L.

(1990). The priority-based coloring approach to register allocation. ACM Transactions on Programming Languages and Systems, 12(4), 501–536.

de Werra

(1985). An introduction to timetabling. European Journal of Operational Research, 19, 151–162.

Goudet

Duval

Hao

(2021). Population-based gradient descent weight learning for graph coloring problems. Knowledge-Based Systems, 212, 106581.

Goudet

Grelier

Hao

J.-K.

(2022). A deep learning guided memetic framework for graph coloring problems. Knowledge-Based Systems, 258, 109986.

10.

Graf

(2021). Preemptive stacker crane problem: Extending tree-based properties and construction heuristics. European Journal of Operational Research, 292(2), 532–547.

11.

Hamilton

W. L.

Ying

Leskovec

(2017). Inductive representation learning on large graphs. In Advances in neural Information processing systems 30: Annual conference on neural information processing systems, (pp. 1024–1034).

12.

Hendrycks

Gimpel

(2016). Gaussian error linear units (gelus). CoRR, abs/1606.08415.

13.

Hertz

de Werra

(1987). Using tabu search techniques for graph coloring. Computing, 39(4), 345–351.

14.

Hudson

Malencia

Prorok

(2022). Graph neural network guided local search for the traveling salesperson problem. In Proceedings of the 10th international conference on learning representations.

15.

Hussain

Ahmad

Qadri

M. Y.

Qadri

N. N.

Ahmed

(2016). Ant colony optimization for multicore re-configurable architecture. AI Communications, 29(5), 595–606.

16.

Ireland

Montana

(2022). LeNSE: Learning to navigate subgraph embeddings for large-scale combinatorial optimisation. In Proceedings of the 39th international conference on machine learning, volume 162, (pp. 9622–9638).

17.

Kearns

Suri

Montfort

(2006). An experimental study of the coloring problem on human subject networks. Science (New York, N.Y.), 313(5788), 824–827.

18.

Khalil

E. B.

Dai

Zhang

Dilkina

Song

(2017). Learning combinatorial optimization algorithms over graphs. In Advances in neural information processing systems 30: Annual conference on neural information processing systems, (pp. 6348–6358).

19.

Kingma

D. P.

(2015). ADAM: A method for stochastic optimization. In Proceedings of the 3rd international conference on learning representations.

20.

Kipf

T. N.

Welling

(2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th international conference on learning representations.

21.

Kool

van Hoof

Welling

(2019). Attention, learn to solve routing problems! In Proceedings of the 7th international conference on learning representations.

22.

Lemos

Prates

M. O. R.

Avelar

P. H. C.

Lamb

L. C.

(2019). Graph colouring meets deep learning: Effective graph neural network models for combinatorial problems. In Proceedings of the 31st IEEE international conference on tools with artificial intelligence, (pp. 879–885).

23.

Fang

Jiang

(2018). Incremental upper bound for the maximum clique problem. INFORMS Journal on Computing, 30(1), 137–153.

24.

Chan

S. O.

Pan

D. Z.

(2022). Rethinking graph neural networks for the graph coloring problem. CoRR, abs/2208.06975.

25.

Lin

Cai

Luo

(2017). A reduction based method for coloring very large graphs. In Proceedings of the 26th international joint conference on artificial intelligence, (pp. 517–523).

26.

Moalic

Gondran

(2018). Variations on memetic algorithms for graph coloring problems. Journal of Heuristics, 24(1), 1–24.

27.

Nazari

Oroojlooy

Snyder

L. V.

Takác

(2018). Reinforcement learning for solving the vehicle routing problem. In Advances in neural information processing systems 31: Annual conference on neural information processing systems, (pp. 9861–9871).

28.

Niu

Liu

Lü

(2018). New stochastic local search approaches for computing preferred extensions of abstract argumentation. AI Communications, 31(4), 369–382.

29.

Paschos

V. T.

(2001). On-line independent set by coloring vertices. Operational Research, 1(3), 213–224.

30.

Paulus

Rolınek

Musil

Amos

Martius

(2021). CombOptNet: Fit the right NP-Hard problem by learning integer programming constraints. In Proceedings of the 38th international conference on machine learning, volume 139, (pp. 8443–8453).

31.

Perozzi

Al-Rfou

Skiena

(2014). DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 701–710).

32.

Prates

M. O. R.

Avelar

P. H. C.

Lemos

Lamb

L. C.

Vardi

M. Y.

(2019). Learning to solve np-complete problems: A graph neural network for decision TSP. In Proceedings of the 33rd AAAI conference on artificial intelligence, (pp. 4731–4738).

33.

Schuetz

M. J. A.

Brubaker

J. K.

Zhu

Katzgraber

H. G.

(2022). Graph coloring with physics-inspired graph neural networks. CoRR, abs/2202.01606.

34.

Tabakhi

A. M.

(2016). Pseudo-tree construction heuristics for dcops with variable communication times. In D. Schuurmans & M. P. Wellman (Eds.), Proceedings of the 30th AAAI conference on artificial intelligence, (pp. 4238–4239).

35.

Tian

Laguna

A. B.

Balntas

Mikolajczyk

(2020). HyNet: Learning local descriptor with hybrid similarity measure and triplet loss. In Advances in neural information processing systems 33: Annual conference on neural information processing systems.

36.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Polosukhin

(2017). Attention is all you need. In Advances in neural information processing systems 30: Annual conference on neural information processing systems, pages 5998–6008.

37.

Velickovic

Cucurull

Casanova

Romero

Liò

Bengio

(2018). Graph attention networks. In Proceedings of the 6th international conference on learning representations.

38.

Wang

Xue

Yang

(2014). Acyclic orientation graph coloring for software-managed memory allocation. Science China Information Sciences, 57(9), 1–18.

39.

Wang

Yan

Jin

(2023). A graph neural network with negative message passing for graph coloring. CoRR, abs/2301.11164.

40.

Welsh

D. J.

Powell

M. B.

(1967). An upper bound for the chromatic number of a graph and its application to timetabling problems. The Computer Journal, 10(1), 85–86.

41.

Yang

Wang

(2022). Robust textual embedding against word-level adversarial attacks. In Proceedings of the 38th conference on uncertainty in artificial intelligence, volume 180, (pp. 2214–2224).

42.

Ying

Cai

Luo

Zheng

Shen

Liu

T.-Y.

(2021). Do transformers really perform badly for graph representation? In Advances in neural information processing systems 34: Annual conference on neural information processing systems, (pp. 28877–28888).

43.

Zhou

Duval

Hao

(2018). Improving probability learning based local search for graph coloring. Applied Soft Computing, 65, 542–553.

44.

Zhou

Xiao

(2021). Improving maximum

k

-plex solver via second-order reduction and graph color bounding. In Proceedings of the 35th AAAI conference on artificial intelligence, (pp. 12453–12460).

				Tabucol+Embed		Tabucol
Instances	$\| V \|$	$\| E \|$	$C$	Best	Average	Best	Average	GDN	PI-SAGE	GNN-1N
queen7_7	49	476	7	0	0	0	0	9	0	0
queen8_8	64	728	8	2	2	2	2	2	–	–
queen8_8	64	728	9	0	0	0	0	–	1	1
queen9_9	81	1056	9	3	3	3	3	6	–	–
queen9_9	96	1368	10	0	0	0	0	–	1	1
queen11_11	121	1980	11	6	7.1	7	7.3	21	17	13
queen13_13	169	3328	13	9	10.2	9	10.2	33	26	15
Pubmed	19717	44338	8	0	0	0	0	21	17	–

Deep Node Embeddings for the Graph Coloring Problem

Abstract

Keywords

1. Introduction

2. Related Work

3. Methodology

3.1. General Framework

3.2.1. Training Instances

3.2.2. Generating the Initial Embeddings

3.3. Transformer Encoder Structure

3.5.1. Greedy and Fastcolor

3.5.2. Tabucol

4. Experiments

4.1. Experimental Setup

4.1.1. Testing

Footnotes

Funding

Declaration of Conflicting Interests

Notes

References