We define a distance function on propositional formulas in CNF as a measure of non-isomorphism of formulas: the larger the distance between two formulas is, the further they are from being isomorphic. This distance induces a metric on isomorphism classes of formulas. We show how this distance can be used for solving, namely for per-instance algorithm selection where there is a “portfolio” of solvers and there is a “meta-solver” that chooses a solver from the portfolio for a given input formula.
Two propositional formulas in CNF are isomorphic if one of them can be obtained from the other by renaming variables and flipping literals. It is natural to view isomorphism as “similarity” of formulas, but this similarity measure is all or nothing: any two formulas are similar or not. How can we modify this measure to make it gradual? In this paper, we define a distance functions on formulas such that the smaller the distance between two formulas is, the closer they are to being isomorphic. This distance function is a pseudometric on formulas and, thereby, it induces a metric on isomorphism classes of formulas.
Application toSolving. Before we describe the distance function and its properties, we briefly note how this distance can be used for solving, see more details in this section below and a formal description in Sections 5 and 6. We consider the following variant of per-instance algorithm selection [11]. There are a “dataset” D of formulas and a “portfolio” P of algorithms such that for every formula , at least one algorithm in P performs fast on ϕ. There is a “meta-algorithm” S that takes a formula ψ as input and tries to select an algorithm from P that performs fast on ψ. How does S work?
Under certain conditions on D and P (we specify these conditions in Section 6), if an algorithm from P performs fast on a formula then this algorithm performs fast on any formula close to ϕ. This suggests that S finds a formula closest to ψ and selects an algorithm that performs fast on ϕ. If the distance between ψ and ϕ is small enough, then A performs fast on ψ as well. Otherwise (D contains no formula close to ψ), the meta-algorithm S gives up and returns “can’t select”.
Distance Function. Precise definitions are given in Sections 2 and 3; here we give the main idea. The distance between two formulas is defined by combining isomorphism of formulas and symmetric difference of sets. Let and be propositional formulas in CNF ( and are sets of clauses). The distance between them, denoted , is the minimum cardinality of the symmetric differences of formulas and , where the minimum is taken over all formulas and such that they are isomorphic to and respectively. This definition has two equivalent forms:
is the minimum number of clauses that can be removed from and so that the remaining subformulas are isomorphic to each other;
is the minimum number of clauses that can be added to and so that the resulting superformulas are isomorphic to each other.
By definition, δ is preserved under isomorphism: if , and , are pairs of isomorphic formulas, then In Section 3, we show that δ is a pseudometric on formulas. This pseudometric induces a metric on isomorphism classes of formulas.
Note that this approach to defining a distance function is essentially a replica of the approach used by Gromov in the definition of the Gromov–Hausdorff distance on metric spaces [5,10]. The Gromov–Hausdorff distance between metric spaces and is the minimum Hausdorff distance between the images of and under distance-preserving functions and from these spaces to a metric space M, where the minimum is taken over all M, , and .
Computing the Distance. In Section 4 we address the problem of computing the distance function δ. Although it is -hard to compute in general, it is easier to determine whether this distance is “small”. Namely, for every integer d, the problem of verifying the equality is Turing reducible to the graph isomorphism problem. The latter can be solved in quasi-polynomial time [4] and, in practice, it is often solved efficiently using Nauty tools [14].
What distance is computed in the per-instance algorithm selection approach described above? The meta-algorithm determines whether the dataset contains a formula lying within a small distance from the input formula ψ. Assuming that a “small distance” means a constant distance, this can be done by a polynomial-time oracle algorithm with an oracle for graph isomorphism.
Per-Instance Algorithm Selection. Section 6 shows how the distance δ can be used for per-instance algorithm, see the brief sketch above. The meta-algorithm S defined in this section is called the fixed-distance selector. Taking a formula ψ and an integer d as input, S checks whether the dataset D contains a formula ϕ such that . If so, S selects an algorithm that performs fast on ϕ and returns it. Otherwise, S says “can’t select”. We formulate conditions on D and P that guarantee fast performance of A on the input formula ψ.
The key point of this approach is that every algorithm has the following property:
If A performs fast on a formula ϕ, then A performs fast on any formula close to ϕ.
To formalize this property, we define a class of parameters of formulas that we call parameters of constant variation (Section 5). By a parameter we mean any computable function from the set of formulas to the non-negative integers. A parameter π is a parameter of constant variation if the following holds:
π is preserved under isomorphism;
for every formula ϕ, a slight variation of ϕ can cause only a slight change of : the addition of one clause to ϕ can change by only a constant value that does not depend on ϕ.
The class of constant-variation parameters includes incidence treewidth, maximum deficiency, and many other parameters π such that the parameterized problem is fixed-parameter tractable [15]. This class can be characterized in terms of the distance δ. Namely, π is a parameter of constant variation if and only if for some number c and all formulas and . This characterization connects the running time of parameterized algorithms from the portfolio P with the distance δ.
Extension to CSP Instances. In Section 7 we briefly discuss an extension of the distance function δ from CNF formulas to instances of the constraint satisfaction problem.
Formulas and Isomorphism
Propositional Formulas in CNFs. A literal is a propositional variable or its negation; each of them is the complement of the other. The complement of a literal a is denoted by . A clause is a nonempty finite set of literals that contains no pair of complements. A propositional formula in Conjunctive Normal Form, called a formula or CNF formula for short, is a finite set of clauses.
Let ϕ be a formula. We write to denote the number of clauses of ϕ and we write to denote the set of variables appearing in ϕ (with or without negation). The width of a clause C is the number of literals in C. If ψ is a formula such that , we call ϕ a subformula of ψ.
An assignment for a formula ϕ is a function from to . Let α be such an assignment and . We say that x is true under α if ; otherwise x is false under α. Symmetrically, we say that is true under α if ; otherwise is false under α. A clause is thought of as the disjunction of its literals; we say that α satisfies a clause if at least one literal in C is true under α. A formula is thought of as the conjunction of its clauses; if α satisfies all clauses of ϕ then we say that α satisfies ϕ and call α a satisfying assignment for ϕ. The empty formula, denoted by ⊤, is considered to be satisfied by any assignment. The number of satisfying assignments for ϕ is denoted by [9].
We write to denote the satisfiability problem for propositional formulas in CNF: given a formula, does it have a satisfying assignment? The counting version of , denoted by , is the following function problem: given a formula ϕ, find .
Isomorphisms. Informally, two formulas are isomorphic if one of them can be obtained from the other by renaming variables and flipping literals. A more formal definition is given below.
Let and be formulas. Let and be the sets of literals defined by Let f be a bijection from to . We call f an isomorphism from to if the following holds:
f and its inverse preserve the complement relation, which means that for all literals ,
f and its inverse preserve the clauses, which means that for all literals ,
Formulas and are isomorphic, written as , if there is an isomorphism from to . For every formula ϕ, we write to denote the isomorphism class of ϕ, i.e., the set of all formulas isomorphic to ϕ. Thus, can be viewed as ϕ up to isomorphism. Note that an isomorphism between formulas is sometimes defined using renamings only [3], yet in this paper we adhere to the more general version.
Distance Function on Formulas
Let X be a set. A function d from to is a distance function if and for all . Consider the following two conditions on d: for all ,
if then ;
(triangle inequality).
The distance function d is a metric on X if d satisfies both (1) and (2). If d satisfies (2), but not necessarily (1), then d is a pseudometric on X.
To define a distance function on formulas, we recall the notation for the symmetric difference operation. The symmetric difference of sets A and B is denoted by :
Let δ be the following function that maps pairs of CNF formulas to nonnegative integers: where the minimum is taken over all formulas and such that and .
The distance function δ is a pseudometric.
Proof: Consider the distance function on finite sets defined as follows: for all finite sets A and B, It is well known that is a metric. Indeed, we have and, hence, the triangle inequality for holds. Now the triangle inequality for δ follows from the triangle inequality for . □
Suprema and Infima. We say that a formula ϕ is embedded into a formula ψ if ϕ is isomorphic to some subformula of ψ. A formula ψ is called a supremum of formulas and if the following holds:
both and are embedded into ψ;
for every formula such that both and are embedded into , we have .
Similarly, we say that a formula 𝜃 is an infimum of formulas and if
𝜃 is embedded into both and ;
for every formula such that 𝜃 is embedded into both and , we have .
Let ψ and 𝜃 be respectively a supremum and an infimum of formulasand. Then
Proof: It straightforwardly follows from the definition of δ and properties of the symmetric difference. □
Induced Metric on Isomorphism Classes. It follows from the definition of δ that if and , then Thus, δ induces a distance function on isomorphism classes of formulas: this function, denoted by Δ, is defined by Proposition 1 implies that Δ is a metric on the set of isomorphism classes.
Example. Let and be the following formulas: Consider the following subformula of : The formula 𝜃 is isomorphic to the following subformula of : (this isomorphism is given by , , ). Thus, 𝜃 is embedded into both and . It is easy to check that no formula with more than two clauses is embedded into and . Hence, 𝜃 is an infimum of and . Using Proposition 2, namely equality (3), we can find the distance between and :
The distance can also be found using a supremum. Consider the following formula that contains as a subformula: The formula is embedded into ψ with the isomorphism given by , , , . Since and cannot be embedded into a formula with less than five clauses, ϕ is a supremum of and . By equality (2), we have
Distance Computation
Given formulas and , how hard is it to compute ? We show that the problem of computing the distance δ is at least as hard as the hardest problems in . On the positive side, this general problem can be restricted to a more tractable case: determine whether the distance is “small”, where “small” means a constant not depending on the input formulas.
To make these claims more precise and to prove them, we consider the following computational problems:
The embedding problem: given formulas and , determine whether is embedded into .
The distance verification problem: given formulas , , and an integer d, determine whether .
The distance computation problem: given formulas and , find the distance .
The embedding problem is-complete.
Proof: It is obvious that the embedding problem is in . To prove the completeness, we reduce the Hamiltonian path problem to the embedding problem. Without loss of generality, we can assume that instances of the Hamiltonian path problem are graphs without isolated vertices.
In our reduction, every graph G without isolated vertices is represented by a formula denoted by and defined as follows. Let G be a graph on vertices and with edges . The formula has variables where each represents the vertex and each represents the edge . Using these variables, we form m clauses of : for each edge with endpoints and , there is the clause . Note that the map has the following property: for all graphs G and H without isolated vertices, H is isomorphic to a subgraph of G if and only if is embedded into .
Now we define a reduction of the Hamiltonian path problem to the embedding problem as follows. Let G be a graph with n vertices (all of them are non-isolated) and m edges. Let H be a Hamiltonian path on n vertices: a graph on vertices with edges between and where . The reduction maps G to the pair . It is easy to see that G has a Hamiltonian path if and only if H is isomorphic to a subgraph of G. Therefore, G has a Hamiltonian path if and only if is embedded into . It remains to note that the pair can be computed from G in polynomial time. □
The distance verification problem is-hard.
Proof: We show that the embedding problem is reducible to the distance verification problem. The reduction is a polynomial-time algorithm that takes an instance of the embedding problem as input and outputs the equality If is embedded into then by Proposition 2 and, hence, (4) holds.
Conversely, suppose that (4) holds. By Definition 1, for some formulas and such that and . Hence, using (4), we obtain By properties of the symmetric difference, the latter equality holds only if is a subformula of . Therefore, is embedded into , which completes the proof. □
The distance computation problem is at least as hard as the hardest problems in.
Proof: There is a trivial polynomial-time Turing reduction of the distance verification problem (shown to be -hard) to the distance computation problem. □
The distance verification problem can be solved by an oracle algorithm A with the following properties:
the oracle of A answers questions of whether two given formulas are isomorphic;
there exists a polynomial p such that on every instance, the running time of A is at most, where s is the size of the instance and.
Proof: The distance verification problem can be solved using the equivalent definitions of δ given by Proposition 2. For instance, we can use equality (3) that expresses δ in terms of infima: where 𝜃 is an infimum of and . In fact, this definition shows that distance verification is equivalent to “infimum verification”: That is, instead of verifying whether , we can verify whether and have an infimum 𝜃 that contains k clauses where The algorithm A takes as input and uses brute force to check whether such an infimum exists. Namely, A takes the following two steps.
First, A enumerates pairs , where is a k-subset of and is a k-subset of . For each such pair, A asks the oracle whether and are isomorphic. If the oracle’s answer is positive, an infimum of cardinality at least k exists. The number of queries to the oracle is estimated by Second, in a similar way, A checks whether there exist a -subset of and a -subset of such that these subsets are isomorphic. If no such subsets exist, there is an infimum of cardinality k. The total number of queries to the oracle in the two steps does not exceed , and each query is processed in polynomial time. Therefore, the overall running time is at most for some polynomial p. □
The algorithm in Proposition 6 uses an oracle for deciding the formula isomorphism problem (given two formulas, are they isomorphic?). It is well known that this problem is polynomially equivalent to the graph isomorphism problem [3]. It is easy to modify the algorithm A so that its oracle for the formula isomorphism problem can be replaced with an oracle for the graph isomorphism problem. Let be the modified algorithm; the running time of remains the same as up to a polynomial factor.
The complexity class consists of problems that have polynomial-time Turing reduction to the graph isomorphism problem, see for example [12]. It is known that the graph isomorphism problem can be solved in quasi-polynomial time [4] and, in practice, it is often solved efficiently using Nauty tools [14].
For every integer d, the restriction of the distant verification problem to instancesis in.
Proof: For each restriction, the algorithm described above is a polynomial-time Turing reduction to the graph isomorphism problem. □
Parameters of Constant Variation
By a parameter of formulas we mean any computable function from the set of formulas to the non-negative integers. Natural examples of parameters are the number of clauses, the number of variables, and the number of satisfying assignments. In this section we define a class of parameters of constant variation. This type of parameters is needed for the next section where we describe how the distance δ can be used for solving. We give only two examples of constant-variation parameters, namely incidence treewidth and maximum deficiency, but many parameters for which the corresponding parameterized version of is fixed-parameter tractable are parameters of constant variation.
(parameter of constant variation).
We say that a parameter π is a parameter of constant variation if it has the following two properties:
for all formulas and , if then ;
there exists a number c such that for every formula ϕ and every formula obtained from ϕ by adding one clause, we have
The number c is called a bound on the change.
Thus, π is a parameter of constant variation if it is preserved under isomorphism and the addition of one clause to ϕ can change by only a constant value that does not depend on ϕ. The number of clauses in a formula is a trivial example of parameters of constant variation. As for the number of variables, it is not a parameter of constant variation. Indeed, consider formulas ϕ and where is obtained from ϕ by adding one clause with arbitrarily many new variables. Then there is no constant upper bound on the difference between and .
For more illustration of the notion of constant-variation parameters, we give two examples: incidence treewidth and maximum deficiency. Both parameters are well known in connection with fixed-parameter tractability of .
Incidence Treewidth. For every formula ϕ, the incidence graph of ϕ is a bipartite graph G defined as follows. The set of vertices of G is partitioned into two subsets: one of them is the set of variables of ϕ and the other is the set of clauses of ϕ. There are only edges between variables and clauses: a variable x is connected by an edge with a clause C if and only if . The treewidth of the incidence graph of ϕ is called the incidence treewidth of ϕ and is denoted by .
There are various algorithms that solve and when an input formula is given along with a tree decomposition of its incidence graph. For example, the algorithm from [16] solves in time where w is the width of the input tree decomposition and l is the size of the input. Also there are algorithms that build a tree decomposition of a given graph. One of such algorithm is given in [1]: it approximates a tree decomposition with the minimum width. The combination of these two algorithms solves in time .
Incidence treewidth is a parameter of constant variation.
Proof: If two formulas are isomorphic then their incidence graphs are isomorphic too and, therefore, incidence treewidth is preserved under isomorphism. We prove that incidence treewidth has the second property from Definition 2 with : for every formula ϕ and every formula obtained from ϕ by adding one clause, we have .
Consider a tree decomposition of the incidence graph of ϕ: a labeled tree T in which every vertex is labeled by a bag of vertices. We show how to extend it to a tree decomposition of the incidence graph of . Let C be the “new” clause added to ϕ and let be the “new” variables, i.e., the variables appearing in C and not occurring in ϕ. These “new” elements must be added to T. The variables are added in a simple way: (1) select any leaf of T, (2) connect it with k new vertices, and (3) assign one-element bags to the new vertices. Finally, we extend all bags (including the bags of new vertices) by adding the clause C.
It is easy to see that the result of our extension, the labeled tree , is a tree decomposition of the incidence graph of . Moreover, by our construction, we have where w and are the width of T and the width of respectively. Taking the tree decomposition T such that and using the obvious inequality , we obtain which completes the proof. □
Maximum Deficiency. The parameter of formulas called maximum deficiency was introduced in [8]; it relates satisfiability of a formula with matchings in its incidence graph.
Let ϕ be a formula and let M be a matching in its incidence graph. For each edge in M that connects a variable x and a clause C, we can assign x a truth value that makes C true. Therefore, if all clauses of ϕ are matched by M then ϕ is satisfiable. The maximum deficiency of a formula ϕ, denoted by , is the number of clauses unmatched by a maximum matching in the incidence graph of ϕ (this number remains the same whatever maximum matching is taken). Since a maximum matching can be found in polynomial time, is computable in polynomial time. Equivalently, due to Hall’s marriage theorem, maximum deficiency can be defined as follows: where the maximum is taken over all subformulas of ϕ. Note that for all ϕ because ⊤ is a subformula of every formula. The satisfiability problem parameterized by maximum deficiency is fixed-parameter tractable: the algorithms in [7,17] solve in time where l is the size of ϕ.
Maximum deficiency is a parameter of constant variation.
Proof: First, maximum deficiency is preserved under isomorphism. Second, it is obvious that the addition of one clause to a formula ϕ can increase by at most one and, therefore, the second property from Definition 2 holds with . □
Characterization in Terms of Distance. The proposition below gives an equivalent definition of parameters of constant variation in terms of the distance δ.
The following two statements are equivalent:
A parameter π is a parameter of constant variation and c is a bound on the change.
Let π be a parameter and let c be a number such that for all formulasand,
Proof: First, suppose that (6) holds. Then implies , which means that π is preserved under isomorphism. If is obtained from by adding one clause, then and, therefore, we have the second property from Definition 2:
Second, suppose that π is a parameter of constant variation and c is a bound on the change. Let and be arbitrary formulas. By the definition of δ and Proposition 2, there exist four formulas , , ψ, 𝜃 such that
and ;
𝜃 is subformula of both and ;
both and are subformulas of ψ;
.
Thus, ψ is obtained from 𝜃 by the addition of clauses. Since the addition of one clause can change by at most c, we have inequality (6). □
Note that inequality (6) is the same as in the definition of Lipschitz functions (also called Lipschitz continuous functions or Lipschitz maps) [5]. Thus, loosely speaking, π is a parameter of constant variation if and only if π is a Lipschitz function.
Fixed-Distance Selector
There are many techniques for designing a “meta-algorithm” that solves using a given “portfolio” of algorithms [11]. We show how the distance δ can be used in a variant of the per-instance algorithm selection technique. Namely, we describe an incomplete meta-algorithm, called a fixed-distance selector, that takes a formula ψ as input and tries to select an algorithm from the portfolio that performs fast on ψ. Then we formulate conditions on portfolios and prove that if the conditions are met then the selected algorithm indeed performs efficiently on ψ.
Datasets and Portfolios. We consider algorithms that take formulas as inputs, for example, algorithms that solve , or , or . Let D be a finite set of formulas called a dataset. With every formula , we associate a non-empty finite set of algorithms that perform fast on ϕ (we do not define here what “fast” means, this will be done in the conditions formulated below). Note that sets associated with different formulas may or may not overlap. Given a dataset D, the set of algorithms is called the portfolio corresponding to D. Thus, for every formula , there is at least one algorithm that performs fast on ϕ. Likewise, every algorithm in the portfolio performs fast on at least one formula from the dataset.
Fixed-Distance Selection. We call the following simple meta-algorithm the fixed-distance selector and denote it by S. The algorithm uses a dataset D and the corresponding portfolio P. Let ψ be a formula and let d be a non-negative integer. Taking ψ and d as input, the selector S either outputs an algorithm or returns “can’t select”:
Find a formula such that and is the minimum over all formulas . If there is no such formula ϕ, return “can’t select”.
Select any algorithm A from and return it.
We do not specify how to find the formulas of D closest to ψ and how to choose one of them. The straightforward way of finding ϕ is exhaustive search: for , enumerate all formulas and verify whether (the distance verification problem). It follows from Proposition 6 that each verification can be done using a polynomial-time oracle algorithm with an oracle for graph isomorphism. Thus, this exhaustive search requires at most runs of the oracle algorithm where N is the cardinality of the dataset.
What is the role of the input number d, should it be small or large? Proposition 11 gives an upper bound on the running time of the selected algorithm A on ψ. This bound depends on d: the smaller d is, the faster A is. On the other hand, assuming that N is fixed, if d is too small, this increases the chance that S returns “can’t select”. Thus, a good tradeoff for choosing d is needed.
Algorithms Matching with Parameters. An efficient performance of an algorithm A on a given class of formulas is typically due to the fact that A makes use of some “hidden structure” of formulas of this class. Different algorithms exploit different hidden structures and, thereby, they are successful on different classes of formulas. Hidden structures can be utilized explicitly or implicitly. For example, the fixed-parameter tractable algorithms mentioned in Section 5 make explicit use of certain properties of the incidence graph: the existence of a tree decomposition with a small treewidth and a small maximum deficiency. The CDCL solvers in [2,13] can be viewed as examples of implicit use of community structures.
A hidden structure is often characterized by a parameter π such that the value determines how fast an algorithm A runs on ψ: the smaller is, the faster A on ψ is. For the most of such parameters, the running time of A on ψ is exponential in and polynomial in the size of ψ [6,15]. We say that A matches with π if there exist a number b and a polynomial q such that for every formula ψ, the running time of A on ψ is at most where is the size of ψ.
Note two differences between an algorithm A matching with a parameter π and a fixed-parameter algorithm. First, the input to A is a formula, while the input to a fixed-parameter algorithm is a formula along with a value of the parameter. Second, the upper bound for A has the exponential factor instead of an arbitrary function as in the case of a fixed-parameter algorithm.
Conditions on Datasets and Portfolios. Let D be a dataset and P be the corresponding portfolio. Let a, b, and c be numbers and let q be a polynomial. We say that the tuple is a bound for D and P if for every formula and every algorithm , there exists a parameter π such that the following three conditions are met:
Condition 1..
Condition 2. The algorithm A matches with π and the corresponding base and polynomial are bounded from above by b and q respectively: for every formula ψ, the running time of A on ψ does not exceed where is the size of ψ.
Condition 3. The parameter π is a parameter of constant variation and the corresponding bound on the change is less than or equal to c.
Let D be a dataset and P be the corresponding portfolio used by the fixed-distance selector S described above. Suppose that numbers a, b, c, and a polynomial q are bounds for D and P. Given a formula ψ and an integer d as input, letbe the algorithm returned by S on this input. Then the running time of A on ψ is at mostwhereis the size of ψ.
Proof: Let be the formula chosen by S as a formula closest to the input formula ψ. By Conditions 2 and 3, there is a parameter π such that
π is a parameter of constant variation;
A runs on ψ in time at most .
By Proposition 10 we have . By Condition 1 we have . Therefore, . □
Thus, A performs efficiently on ψ: assuming that d in the bound is fixed, there exists a polynomial such that the running time of A on ψ does not exceed .
Concluding Remarks: Possible Extensions
Isomorphism of Formulas. There are several equivalent definitions of the distance δ (Proposition 2) and all of them use isomorphism of formulas as the central notion. On the other hand, these definitions do not require exactly our notion of isomorphism: two formulas are isomorphic if one of them can be obtained from the other by renaming variables and flipping literals. For example, we could define based on only renaming, or only flipping, or another equivalence relation on formulas. In fact, we could define the distance relative to any equivalence relation on formulas: if I is such an equivalence relation, then is the corresponding distance function on formulas. This distance could be of interest for solving if, first, the satisfiability status is preserved under I and, second, the problem of I-equivalence testing is “easier” than the problem of satisfiability testing (like the graph isomorphism problem is supposedly easier than the satisfiability problem).
Extension to CSP Instances. The main definitions and results of this paper can be extended from formulas in CNF to instances of the constraint satisfaction problem (CSP instances). The adjustments are obvious: clauses are replaced with constraints and isomorphism of formulas is replaced with isomorphism of CSP instances.
There are several variants of isomorphism of CSP instances; we describe the most common one. Let V be a finite set of variables that take values from a finite domain D. A constraint of arity k is a pair , where is a k-ary tuple of variables from V and R is a k-ary relation on D. A CSP instance is a triple where C is finite set of constraints over V and D. Two CSP instances and are isomorphic if there are bijections and such that the set of constraints is obtained from by replacing each variable and each value with their images and respectively.
Using this variant of isomorphism, the definitions and propositions of Section 3 are easily carried over from formulas in CNF to CSP instances. Thus, the distance between two CSP instances and is the minimum number of constraints that can be removed from and so that the remaining sub-instances are isomorphic to each other. Or, equivalently, this distance is the minimum number of constraints that can be added to and so that the resulting “super-instances” are isomorphic to each other. The definitions and propositions regarding parameters of constant variation, their characterization in terms of the distance, and fixed-distance selection are carried over to CSP instances as well.
Footnotes
Acknowledgement
We thank the anonymous reviewers for their targeted comments that helped us significantly improve the manuscript.
References
1.
AmirE., Approximation algorithms for treewidth, Algorithmica56(4) (2010), 448–479. doi:10.1007/s00453-008-9180-4.
2.
AnsóteguiC.BonetM.L.Giráldez-CruJ. and LevyJ., Structure features for SAT instances classification, Journal of Applied Logic23 (2017), 27–39. doi:10.1016/j.jal.2016.11.004.
3.
AusielloG.CristianoF.FantozziP. and LauraL., Syntactic isomorphism of CNF Boolean formulas is graph isomorphism complete, in: Proceedings of the 21st Italian Conference on Theoretical Computer Science, 2020, CEUR Workshop Proceedings, Vol. 2756, CEUR-WS.org, 2020, pp. 190–201.
4.
BabaiL., Graph isomorphism in quasipolynomial time [extended abstract], in: Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, ACM, 2016, pp. 684–697.
5.
BuragoD.BuragoY. and IvanovS., A Course in Metric Geometry, Graduate Studies in Mathematics, Vol. 33, American Mathematical Society, 2001.
6.
CyganM.FominF.V.KowalikL.LokshtanovD.MarxD.PilipczukM.PilipczukM. and SaurabhS., Parameterized Algorithms, Springer, 2015.
7.
FleischnerH.KullmannO. and SzeiderS., Polynomial-time recognition of minimal unsatisfiable formulas with fixed clause-variable difference, Theoretical Computer Science289(1) (2002), 503–516. doi:10.1016/S0304-3975(01)00337-1.
8.
FrancoJ.V. and GelderA.V., A perspective on certain polynomial-time solvable classes of satisfiability, Discrete Applied Mathematics125(2–3) (2003), 177–214. doi:10.1016/S0166-218X(01)00358-4.
9.
GomesC.P.SabharwalA. and SelmanB., Model counting, in: Handbook of Satisfiability, 2nd edn, Frontiers in Artificial Intelligence and Applications, Vol. 336, IOS Press, 2021, pp. 993–1014, Chapter 25.
10.
GromovM., Metric Structures for Riemannian and Non-Riemannian Spaces, Progress in Mathematics, Vol. 152, Birkhäuser, 1999, Based on the 1981 French original.
11.
HoosH.H.HutterF. and Leyton-BrownK., Automated configuration and selection of SAT solvers, in: Handbook of Satisfiability, 2nd edn, Frontiers in Artificial Intelligence and Applications, Vol. 336, IOS Press, 2021, pp. 481–507, Chapter 12.
12.
KöblerJ.SchöningU. and ToránJ., The Graph Isomorphism Problem: Its Structural Complexity, Progress in Theoretical Computer Science, Birkhäuser/Springer, 1993.
13.
LiC.ChungJ.MukherjeeS.VinyalsM.FlemingN.KolokolovaA.MuA. and GaneshV., On the hierarchical community structure of practical Boolean formulas, in: Proceedings of the 24th International Conference on Theory and Applications of Satisfiability Testing, SAT 2021, Lecture Notes in Computer Science, Vol. 12831, Springer, 2021, pp. 359–376. doi:10.1007/978-3-030-80223-3_25.
14.
McKayB.D. and PipernoA., Practical graph isomorphism, II, Journal of Symbolic Computation60 (2014), 94–112. doi:10.1016/j.jsc.2013.09.003.
15.
SamerM. and SzeiderS., Fixed-parameter tractability, in: Handbook of Satisfiability, 2nd edn, Frontiers in Artificial Intelligence and Applications, Vol. 336, IOS Press, 2021, pp. 693–736, Chapter 17.
16.
SlivovskyF. and SzeiderS., A faster algorithm for propositional model counting parameterized by incidence treewidth, in: Proceedings of the 23rd International Conference on Theory and Applications of Satisfiability Testing, SAT 2020, Lecture Notes in Computer Science, Vol. 12178, Springer, 2020, pp. 267–276. doi:10.1007/978-3-030-51825-7_19.
17.
SzeiderS., Minimal unsatisfiable formulas with bounded clause-variable difference are fixed-parameter tractable, Journal of Computer and System Sciences69(4) (2004), 656–674. doi:10.1016/j.jcss.2004.04.009.