Sage Journals: Discover world-class research

Abstract

We propose an end-to-end approach for Answer Set Programming (ASP) and linear algebraically compute stable models satisfying given constraints. The idea is to implement Lin-Zhao’s theorem together with constraints directly in vector spaces as numerical minimization of a cost function constructed from a matricized normal logic program, loop formulas in Lin-Zhao’s theorem and constraints, thereby no use of symbolic ASP or SAT solvers involved in our approach. We also propose precomputation that shrinks the program size and heuristics for loop formulas to reduce computational difficulty. We empirically test our approach with programming examples including the three-coloring and Hamiltonian cycle problems.

Keywords

Answer Set Programming end-to-end ASP vector space cost minimization loop formula supported model stable model

1. Introduction

Computing stable model semantics (Gelfond & Lifshcitz, 1988) lies at the heart of Answer Set Programming (ASP) (Lifschitz, 2008; Marek & Truszczyński, 1999; Niemelä, 1999) and there have been a variety of approaches proposed so far. Early approaches such as smodels (Niemelä & Simons, 1997) used backtracking. Then the concept of loop formula was introduced and approaches that use a Boolean satisfiability problem (SAT) solver to compute stable models based on Lin-Zhao’s theorem (Lin & Zhao, 2004) were proposed. They include ASSAT (Lin & Zhao, 2004) and cmodels (Lierler, 2005) for example. Later more elaborated approaches such as clasp (Gebser et al., 2007, 2012) based on conflict-driven no good learning have been developed. While these symbolic approaches continue to predominate in ASP, there has been another trend towards differentiable methods. For example, differentiable ASP/SAT (Nickles, 2018) computes stable models by an ASP solver that utilizes derivatives of a cost function. More recently, NeurASP (Yang et al., 2020) and SLASH (Skryagin et al., 2022, 2023) combined deep learning and ASP. In their approaches, deep learning is not used in an end-to-end way to compute stable models, but used as a component to compute and learn probabilities represented by special atoms interfacing to ASP. Takemura and Inoue (2024) proposed a neurosymbolic learning pipeline that leverages differentiable computation of supported models. Although their method does not specifically address stable model computation, it bypasses the need for a symbolic solver and illustrates how differentiable computation facilitates integration with deep learning. A step towards end-to-end computation was taken by Aspis et al. (2020) and Takemura and Inoue (2022). They formulated the computation of supported models, a super class of stable models, entirely as fixedpoint computation in vector spaces, and obtain supported models represented by binary vectors. However, there still remains a gap between computing supported models and computing stable models.

In this article, we propose an end-to-end approach for ASP and compute stable models satisfying given constraints in vector spaces. The idea is simple; we implement Lin-Zhao’s theorem (Lin & Zhao, 2004) together with constraints directly in vector spaces as a cost minimization problem, thereby no use of symbolic ASP or SAT solvers involved. Since our approach is numerical and relies solely on vector and matrix operations, future work could explore the potential benefits of parallel computing technologies such as many-core CPUs and GPUs.

Technically, Lin-Zhao’s theorem (Lin & Zhao, 2004) states that a stable model of a ground normal logic program coincides with a supported model which satisfies “loop formulas” associated with the program. Loop formulas are propositional formulas indicating how to get out of infinite loops of top-down rule invocation. We formulate finding such a model as root finding in a vector space of a non-negative cost function represented in terms of the matricized program and loop formulas. The problem is that in whatever approach we may take, symbolic or non-symbolic, computing supported models is NP-hard (for example graph coloring is solved by computing supported models) and there can be exponentially many loop formulas to be satisfied (Lifschitz & Razborov, 2006). We reduce this computational difficulty in two ways. One is precomputation that removes atoms from the search space which are known to be false in any stable model and yields a smaller program. The other is to heuristically choose loop formulas to be satisfied. The latter would mean allowing non-stable model computation, and in our continuous approach, we modify the cost function to be affected only by these chosen loop formulas. The intuition behind this heuristic is that the modified cost function would assign higher cost to models that do not satisfy the chosen loop formulas, thus driving the search process away from them.

Our end-to-end computing framework differs from those by Aspis et al. (2020) and Takemura and Inoue (2022) in that they basically compute supported models and the computing process itself has no mechanism such as loop formulas to exclude non-stable models. In addition, any propositional normal logic program is acceptable in our framework, since we impose no restrictions on the syntax of programs like the multiple definition (MD) condition (Aspis et al., 2020) or the singly defined (SD) condition (Takemura & Inoue, 2022). More importantly, we incorporate the use of constraints, that is, rules with an empty head, which make ASP smooth and practical.

Hence, our contributions include as follows:

–
a proposal of end-to-end computing of stable models in vector spaces for propositional normal logic programs;
–
augmentation of the above by constraints;
–
introduction of precomputation and heuristics to reduce computational difficulty of stable model computation.
We add that since our primary purpose in this article is to establish theoretical feasibility of end-to-end ASP computing in vector spaces, programming examples are small and implementation is of preliminary nature. Furthermore, the main search algorithm we propose in this article is incomplete, in the sense that it does not guarantee reaching a global minimum if it exists, nor it cannot conclusively prove that no solution exists.

In what follows, after preliminaries in Section 2, we formulate the computation of supported models in vector spaces in Section 3 and that of stable models in Section 4. We then show programming examples in Section 5 including ASP programs for the three-coloring problem and the Hamiltonian cycle problem. We there compare performance of precomputation and loop formula heuristics. Section 6 contains related work and Section 7 is the conclusion.
2. Preliminaries

In this article, we mean by a program a propositional normal logic program $P$ which is a finite set of rules of the form $a \leftarrow G$ , where $a$ is an atom called the head, $G$ is a conjunction of literals called the body of the rule, respectively. We equate propositional variables with atoms. A literal is an atom (positive literal) or its negation (negative literal). The logical connective $\neg$ in this article denotes negation as failure. We suppose $P$ is written in a given set of atoms $A$ but usually assume $A$ = atom( $P$ ), that is, the set of atoms occurring in $P$ . We use $G^{+}$ and $G^{-}$ to denote the conjunction of positive and negative literals in $G$ , respectively. $G$ may be empty. The empty conjunction is always true. We call $a \leftarrow G$ rule for $a$ . A rule with an empty head is called a constraint. Let $a \leftarrow G_{1}, \dots, a \leftarrow G_{m}$ be rules for $a$ in $P$ . When $m > 0$ , put $iff (a) = a \Leftrightarrow G_{1} \lor \dots \lor G_{m}$ . When $m = 0$ , that is, there is no rule for $a$ , put $iff (a) = a \Leftrightarrow ⊥$ , where $⊥$ is a special symbol representing the empty disjunction which is always false. We call $iff (a)$ the completed rule for $a$ . The completion of $P$ , comp( $P$ ), is defined as comp( $P$ ) = ${iff (a) ∣ atom a occurs in P}$ . For a finite set $S$ , we denote the number of elements in $S$ by $| S |$ . So $| P |$ is the number of rules in the program $P$ .

An interpretation (assignment) $I$ over a set of atoms $A$ is a mapping which determines the truth value of each atom $a \in A$ . Then the truth value of a formula $F$ is inductively defined by $I$ , and if $F$ becomes true evaluated by $I$ , we say $I$ satisfies $F$ , $F$ is true in $I$ , or $I$ is a model of $F$ and write $I ⊨ F$ . This notation is extended to a set $F = {F_{1}, \dots, F_{u}}$ by considering $F$ as a conjunction $F_{1} \land \dots \land F_{u}$ . For convenience, we always equate $I$ with ${a \in A ∣ I ⊨ a}$ , that is, the set of atoms true in $I$ . When $I$ satisfies all rules in the program $P$ , that is, $I ⊨ P$ , $I$ is said to be a model of $P$ . If no rule body contains negative literals, $P$ is said to be a definite program. In that case, $P$ always has the least model (in the sense of set inclusion) ${a \in A ∣ P ⊢ a}$ , that is, the set of atoms provable from $P$ .

A model $I$ of comp( $P$ ) is a supported model of $P$ (Apt et al., 1988; Marek & Subrahmanian, 1992). When $P$ is a definite program, there is at least one supported model, and its least model is also a supported model. In general, there can be multiple supported models for both definite and non-definite programs $P$ . Stable models are a subclass of supported models. They are defined as follows. Given a program $P$ and a model $I$ , remove all rules from $P$ whose body contains a negative literal false in $I$ , then remove all negative literals from the remaining rules. The resulting program, $P^{I}$ , is called the Gelfond-Lifschitz (GL) reduct of $P$ by $I$ or just the reduct of $P$ by $I$ . It is a definite program and has the least model. If this least model is identical to $I$ , $I$ is called a stable model of $P$ (Gelfond & Lifshcitz, 1988). $P$ may have zero or multiple stable models, as in the case of supported models. Since the existence of a stable model is NP-complete (Marek & Truszczyński, 1999) and so is a supported model, their computation is expected to be hard. Supported models and stable models of a propositional normal logic program coincide when the program is tight (no infinite call chain through positive goals) (Erdem & Lifschitz, 2003; Fages, 1994).

Let $F = d_{1} \lor \dots \lor d_{h}$ be a Boolean formula in $n$ variables (atoms) in disjunctive normal form (DNF), where each $d_{i}$ ( $1 \leq i \leq h$ ) is a conjunction of literals and called a disjunct of $F$ . When $F$ has no disjunct, $F$ is false.

A walk in a directed graph is a sequence $v_{1} \to v_{2} \to \dots \to v_{u}$ ( $u \geq 1$ ) of vertices representing the corresponding non-zero sequence of edges $(v_{1}, v_{2}), \dots, (v_{u - 1}, v_{u})$ . When $v_{u} = v_{1}$ , it is said to be closed. A cycle is a closed walk $v_{1} \to v_{2} \to \dots \to v_{u} \to v_{1}$ , where ${v_{1}, \dots, v_{u}}$ are all distinct. A Hamiltonian cycle (HC) is a cycle which visits every vertex exactly once. A path is a walk with no vertex repeated. A directed subgraph is called strongly connected if there are paths from $v_{1}$ to $v_{2}$ and from $v_{2}$ to $v_{1}$ for any pair of distinct vertices $v_{1}$ and $v_{2}$ . This “strongly connected” relation induces an equivalence relation over the set of vertices and an induced equivalence class is called a strongly connected component (SCC).

The positive dependency graph $pdg (P)$ for a program $P$ is a directed graph where vertices are atoms occurring in $P$ and there is an edge $(a, b)$ from atom $a$ to atom $b$ if and only if (iff) there is a rule $a \leftarrow G$ in $P$ such that $b$ is a positive literal in $G$ . $P$ is said to be tight (Erdem & Lifschitz, 2003; Fages, 1994)¹ when $pdg (P)$ is acyclic, that is, has no cycle. A loop $S = {a_{1}, \dots, a_{u}}$ $(u > 0)$ in $P$ is a set of atoms where for any pair of atoms $a_{1}$ and $a_{2}$ in $S$ ( $a_{1} = a_{2}$ allowed), there is a path in $pdg (P)$ from $a_{1}$ to $a_{2}$ and also from $a_{2}$ to $a_{1}$ . A singleton loop $S = {a}$ is induced by a self-referencing rule of the form $a \leftarrow a \land G$ where $G$ is possibly empty, that is, a self-loop $a \leftarrow a$ . A support rule for $a$ relative to a loop $S$ is a rule $a \leftarrow G$ such that $G^{+} \cap S = \emptyset$ . Given a loop $L = {a_{1}, \dots, a_{u}}$ and its external support rules ${a_{1} \leftarrow G_{11}, \dots, a_{1} \leftarrow G_{1 n}, \dots, a_{u} \leftarrow G_{u 1}, \dots, a_{u} \leftarrow G_{u n}}$ , the (conjunctive) loop formula is the following implication: $(a_{1} \land \dots \land a_{u}) \to (G_{11} \lor \dots \lor G_{1 n} \lor \dots \lor G_{u 1} \lor \dots \lor G_{u n})$ .

We denote vectors by bold lower case letters such as $a$ , where $a (i)$ represents the $i$ -th element of $a$ . Vectors are column vectors by default. We use $(a ∙ b)$ to stand for the inner product (dot product) of vectors $a$ and $b$ of the same dimension. $‖ a ‖_{1}$ and $‖ a ‖_{2}$ , respectively, denote the 1-norm and 2-norm of $a$ , where $‖ a ‖_{1} = \sum | a (i) |$ and $‖ a ‖_{2} = \sqrt{\sum a {(i)}^{2}}$ . We use $1$ to denote an all-ones vector of appropriate dimension. An interpretation $I$ over a set $A = {a_{1}, \dots, a_{n}}$ of $n$ ordered atoms is equated with an $n$ -dimensional binary vector $s_{I} \in {0, 1}^{n}$ such that $s_{I} (i) = 1$ if $a_{i}$ is true in $I$ and $s_{I} (i) = 0$ otherwise ( $1 \leq i \leq n$ ). $s_{I}$ is called the vectorized $I$ .

Bold upper case letters such as $A$ stand for a matrix. We use $A (i, j)$ to denote the $i, j$ -th element of $A$ , $A (i, :)$ the $i$ -th row of $A$ and $A (:, j)$ the $j$ -th column of $A$ , respectively. We often consider one dimensional matrices as (row or column) vectors. $‖ A ‖_{F}$ denotes the Frobenius norm of $A$ , that is, $\sqrt{\sum_{i, j} A (i, j)^{2}}$ . Let $A, B \in R^{m \times n}$ be $m \times n$ matrices. $A ⊙ B$ denotes their Hadamard product, that is, $(A ⊙ B) (i, j) = A (i, j) B (i, j)$ for $i, j (1 \leq i \leq m, 1 \leq j \leq n)$ . $[A; B]$ designates the $2 m \times n$ matrix of $A$ stacked onto $B$ . We implicitly assume that all dimensions of vectors and matrices in various expressions are compatible. We introduce a piece-wise linear function ${min}_{1} (x) = min (x, 1)$ that returns the lesser of 1 and $x$ as an activation function which is related to the popular activation function $ReLU (x) = max (x, 0)$ by $1 - {min}_{1} (x) = ReLU (1 - x)$ . ${min}_{1} (A)$ denotes the result of component-wise application of ${min}_{1} (x)$ to matrix $A$ . We also introduce thresholding notation. Suppose $θ$ is a real number and $a$ an $n$ -dimensional vector. Then $[a \leq θ]$ denotes a binary vector obtained by thresholding $a$ at $θ$ where for $i (1 \leq i \leq n)$ , $[a \leq θ] (i) = 1$ if $a (i) \leq θ$ and $[a \leq θ] (i) = 0$ otherwise. $[a \geq θ]$ is treated similarly. We extend thresholding to matrices. Thus $[A \leq 1]$ means a matrix such that $[A \leq 1] (i, j) = 1$ if $A (i, j) \leq 1$ and $[A \leq 1] (i, j) = 0$ otherwise. For convenience, we generalize bit inversion to an $n$ -dimensional vector $a$ and use an expression $1 - a$ to denote the $n$ -dimensional vector such that $(1 - a) (i) = 1 - a (i)$ for $i (1 \leq i \leq n)$ . $1 - A$ is treated similarly.

3. Computing Supported Models in Vector Spaces

In this section, we formulate the semantics of supported models in vector spaces and show how to compute it by cost minimization.

3.1. Matricized Programs

Definition 1 Matricized program

A program $P$ that has $m$ rules in $n$ atoms is numerically encoded as a pair $P = (C, D)$ of binary matrices $C \in {0, 1}^{m \times 2 n}$ and $D \in {0, 1}^{n \times m}$ , which we call a matricized program $P$ .

$C$ represents rule bodies in $P$ . Suppose atoms are ordered like $A = {a_{1}, \dots, a_{n}}$ and similarly rules are ordered like ${r_{1} : a_{i_{1}} \leftarrow G_{1}, \dots, r_{m} : a_{i_{m}} \leftarrow G_{m}}$ . Then the $j$ -th row $C (j, :)$ ( $1 \leq j \leq m$ ) encodes the $j$ -th conjunction $G_{j}$ of the $j$ -th rule $a_{i_{j}} \leftarrow G_{j}$ . Write $G_{j} = a_{i_{1}} \land \dots \land a_{i_{p}} \land \neg a_{i_{p + 1}} \land \dots \land \neg a_{i_{p + q}}$ ( $1 \leq p, q \leq n$ ). Then an element of $C (j, :)$ is zero except for $C (j, i_{1}) = \dots = C (j, i_{p}) = C (j, n + i_{p + 1}) = \dots = C (j, n + i_{p + q}) = 1$ . $D$ combines these conjunctions as a disjunction (DNF) for each atom in $A$ . If the $i$ -th atom $a_{i} \in A$ ( $1 \leq i \leq n$ ) has rules ${a_{i} \leftarrow G_{j_{1}}, \dots, a_{i} \leftarrow G_{j_{s}}}$ in $P$ , we put $D (i, j_{1}) = \dots = D (i, j_{s}) = 1$ to represent a disjunction $G_{j_{1}} \lor \dots \lor G_{j_{s}}$ which is the right hand side of the completed rule for $a_{i}$ : $iff (a_{i}) = a_{i} \Leftrightarrow G_{j_{1}} \lor \dots \lor G_{j_{s}}$ . If $a_{i}$ has no rule, we put $D (i, j) = 0$ for all $j$ ( $1 \leq j \leq m$ ). Thus the matricized $P = (C, D)$ can represent the completed program $comp (P)$ .

For concreteness, we explain by an example below.

Example 1 Encoding a program

Suppose we are given a program $P_{0}$ below containing three rules ${r_{1}, r_{2}, r_{3}}$ in a set of atoms $A_{0} = {p, q, r}$ .

\begin{aligned} P_{0} & = \begin{array}{ll} {\begin{cases} p & \leftarrow & q \land \neg r & : rule r_{1} for p \\ p & \leftarrow & \neg q & : rule r_{2} for p \\ q & : rule r_{3} for q \end{cases} \end{array} \end{aligned}

(1)Assuming atoms are ordered as

p, q, r

and correspondingly so are the rules

{r_{1}, r_{2}, r_{3}}

as in (1), we encode

P_{0}

as a pair of matrices

P_{0} = (C_{0}, D_{0})

. Here

C_{0}

represents conjunctions (the bodies of

{r_{1}, r_{2}, r_{3}}

) and

D_{0}

their disjunctions so that they jointly represent

P_{0}

\begin{aligned} C_{0} & = \begin{array}{ll} \begin{matrix} p & q & r & \neg p & \neg q & \neg r \end{matrix} \\ [\begin{matrix} 0 & 1 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}] & \begin{matrix} : r_{1} has the body q \land \neg r \\ : r_{2} has the body \neg q \\ : r_{3} has the empty body \end{matrix} \end{array} \end{aligned}

(2)

\begin{aligned} D_{0} & = \begin{array}{ll} \begin{matrix} r_{1} & r_{2} & r_{3} \end{matrix} \\ [\begin{matrix} 1 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{matrix}] & \begin{matrix} : p has two rules r_{1} and r_{2} \\ : q has one rule r_{3} \\ : r has no rule \end{matrix} \end{array} \end{aligned}

(3)

As can be seen,

C_{0}

represents conjunctions in

P_{0}

in such a way that

C_{0} (1, :)

, for example, represents the conjunction

q \land \neg r

of the first rule in

P_{0}

by setting

C_{0} (1, 2) = C_{0} (1, 6) = 1

, and so on.

D_{0}

represents the disjunctions of rule bodies. So

D_{0} (1, 1) = D_{0} (1, 2) = 1

means the first atom

p

{p, q, r}

has two rules, the first rule

r_{1}

and the second rule

r_{2}

, representing a disjunction

(q \land \neg r) \lor \neg q

for

p

3.2. Evaluation of Formulas and the Reduct of a Program in Vector Spaces

Here we explain how the propositional formulas and the reduct of a program are evaluated by a model in vector spaces. Let $I$ be a model over a set $A$ of atoms. Recall that $I$ is equated with a subset of $A$ . We inductively define the relation “a formula $F$ is true in $I$ ,” $I ⊨ F$ in notation, as follows. For an atom $a$ , $I ⊨ a$ iff $a \in I$ . For a compound formula $F$ , $I ⊨ \neg F$ iff $I ⊭ F$ . When $F$ is a disjunction $F_{1} \lor \dots \lor F_{u}$ ( $u \geq 0$ ), $I ⊨ F$ iff there is some $i$ ( $1 \leq i \leq u$ ) s.t. $I ⊨ F_{i}$ . So the empty disjunction ( $u = 0$ ) is always false. We consider a conjunction $F_{1} \land \dots \land F_{u}$ as a syntax sugar for $\neg (\neg F_{1} \lor \dots \lor \neg F_{u})$ using De Morgan’s law. Consequently, the empty conjunction is always true. Let $P$ be a program having $m$ ordered rules in $n$ ordered atoms as before and $G = a_{i_{1}} \land \dots \land a_{i_{p}} \land \neg a_{i_{p + 1}} \land \dots \land \neg a_{i_{p + q}}$ the body of a rule $a \leftarrow G$ in $P$ . By definition, $I ⊨ G$ ( $G$ is true in $I$ ) iff ${a_{i_{1}}, \dots, a_{i_{p}}} \subseteq I$ and ${a_{i_{p + 1}}, \dots, a_{i_{p + q}}} \cap I = \emptyset$ . Also let $iff (a_{i}) = a_{i} \Leftrightarrow G_{j_{1}} \lor \dots \lor G_{j_{s}}$ be the completed rule for an atom $a_{i}$ in $P$ . We see $I ⊨ iff (a_{i})$ iff $(a_{i} \in I iff I ⊨ G_{j_{1}} \lor \dots \lor G_{j_{s}})$ .

Now we isomorphically embed the above symbolic evaluation to the one in vector spaces. Let $I$ be a model over ordered atoms $A = {a_{1}, \dots, a_{n}}$ . We first vectorize $I$ as a binary column vector $s_{I}$ such that $s_{I} (i) = 1$ if $a_{i} \in I$ and $s_{I} (i) = 0$ ( $1 \leq i \leq n$ ) otherwise, and introduce the dualized $s_{I}$ written as $s_{I}^{δ}$ by $s_{I}^{δ} = [s_{I}; (1 - s_{I})]$ . $s_{I}^{δ}$ is a vertical concatenation of $s_{I}$ and the bit inversion of $s_{I}$ .

Consider a matricized program $P = (C, D)$ $(C \in {0, 1}^{m \times 2 n}$ , $D \in {0, 1}^{n \times m})$ and its $j$ -th rule $r_{j}$ having a body $G_{j}$ represented by $C (j, :)$ . Compute $C (j, :) s_{I}^{δ}$ which is the number of true literals in $I$ in $G_{j}$ and compare it with the number of literals $| C (j, :) |_{1}$ ² in $G_{j}$ . When $| C (j, :) |_{1} = C (j, :) s_{I}^{δ}$ holds, all literals in $G_{j}$ are true in $I$ and hence the body $G_{j}$ is true in $I$ . In this way, we can algebraically compute the truth value of each rule body, but since we consider a conjunction as a negated disjunction, we instead compute $C (j, :) (1 - s_{I}^{δ})$ which is the number of false literals in $G_{j}$ . If this number is non-zero, $G_{j}$ have at least one literal false in $I$ , and hence $G_{j}$ is false in $I$ . The converse is also true. The existence of a false literal in $G_{j}$ is thus computed by ${min}_{1} (C (j, :) (1 - s_{I}^{δ}))$ which is $1$ if there is a false literal, and $0$ otherwise. Consequently, $1 - {min}_{1} (C (j, :) (1 - s_{I}^{δ})) = 1$ if there is no false literal in $G_{j}$ and vice versa. In other words, $1 - {min}_{1} (C (j, :) (1 - s_{I}^{δ}))$ computes $I ⊨ G_{j}$ .

Now let ${a_{i} \leftarrow G_{j_{1}}, \dots, a_{i} \leftarrow G_{j_{s}}}$ be an enumeration of rules for $a_{i} \in A$ and $G_{j_{1}} \lor \dots \lor G_{j_{s}}$ the disjunction of the rule bodies. $d_{i} = \sum_{t = 1}^{s} (1 - {min}_{1} (C (j_{t}, :) (1 - s_{I}^{δ})))$ is the number of rule bodies in ${G_{j_{1}}, \dots, G_{j_{s}}}$ that are true in $I$ . Noting $D (i, j) = 1$ if $j \in {j_{1}, \dots, j_{s}}$ and $D (i, j) = 0$ otherwise by construction of $D$ in $P = (C, D)$ , we replace the summation $\sum_{t = 1}^{s}$ by matrix multiplication and obtain $d_{i} = D (i, :) (1 - {min}_{1} (C (1 - s_{I}^{δ})))$ . Introduce a column vector $d_{I} = D (1 - {min}_{1} (C (1 - s_{I}^{δ})))$ . We have $d_{I} (i) = d_{i}$ = the number of rules for $a_{i}$ whose bodies are true in $I$ ( $1 \leq i \leq n$ ).

In the case of $P_{0} = (C_{0}, D_{0})$ in (1) having three rules ${r_{1}, r_{2}, r_{3}}$ , take a model $I_{0} = {p, q}$ over the ordered atom set $A_{0} = {p, q, r}$ , where $p$ and $q$ are true in $I_{0}$ but $r$ is false in $I_{0}$ . Then we have $s_{I_{0}} = [1 1 0]^{T}$ , $s_{I_{0}}^{δ} = [1 1 0 0 0 1]^{T}$ , $1 - s_{I_{0}}^{δ} = [0 0 1 1 1 0]^{T}$ , and finally, $C_{0} (1 - s_{I_{0}}^{δ}) = [0 1 0]^{T}$ . The last equation says that the rule bodies of $r_{1}$ , $r_{2}$ , and $r_{3}$ have, respectively, zero, one, and zero literal false in $I_{0}$ . Hence ${min}_{1} (C_{0} (1 - s_{I_{0}}^{δ})) = [0 1 0]^{T}$ indicates that only the second rule body is false and the other two bodies are true in $I_{0}$ . So its bit inversion $1 - {min}_{1} (C_{0} (1 - s_{I_{0}}^{δ})) = [1 0 1]^{T}$ indicates that the second rule body is false in $I_{0}$ , while others are true in $I_{0}$ . Thus by combining these truth values in terms of disjunctions $D_{0}$ , we obtain $d_{I_{0}} = D_{0} (1 - {min}_{1} (C_{0} (1 - s_{I_{0}}^{δ}))) = [1 1 0]^{T}$ . The elements in $d_{I_{0}} = [1 1 0]^{T}$ denote for each atom $a \in A_{0}$ the number of rules for $a$ whose body is true in $I_{0}$ . For example, $d_{I_{0}} (1) = 1$ means that the first atom $p$ in $A_{0}$ has one rule ( $p \leftarrow q \land \neg r$ ) whose body ( $q \land \neg r$ ) is true in $I_{0}$ . Likewise, $d_{I_{0}} (2) = 1$ means that the second atom $q$ has one rule ( $q \leftarrow$ ) whose body (empty) is true in $I_{0}$ . $d_{I_{0}} (3) = 0$ indicates that the third atom $r$ has no such rule. Therefore, ${min}_{1} (d_{I_{0}}) = [1 1 0]^{T}$ denotes the truth values of the right hand sides of the completed rules ${iff (p), iff (q), iff (r)}$ evaluated by $I_{0}$ .

Proposition 1
Let $P = (C, D)$ be a matricized program $P$ in a set of atoms $A$ and $s_{I}$ a vectorized model $I$ over $A$ . Put $d_{I} = D (1 - {min}_{1} (C (1 - s_{I}^{δ})))$ . It holds that
$I ⊨ comp (P) iff ‖ s_{I} - \min_{1} (d_{I}) ‖_{2} = 0.$
(4)
Proof.
Put $n = | A |$ . Suppose $I ⊨ comp (P)$ and write $iff (a_{i})$ , the completed rule for an atom $a_{i} \in A$ ( $1 \leq i \leq n$ ), as $iff (a_{i}) = a_{i} \Leftrightarrow G_{j_{1}} \lor \dots \lor G_{j_{s}}$ ( $s \geq 0$ ). We have $I ⊨ iff (a_{i})$ . So if $s_{I} (i) = 1$ , $I ⊨ a_{i}$ , and hence $I ⊨ G_{j_{1}} \lor \dots \lor G_{j_{s}}$ , giving $d_{i} \geq 1$ because $d_{i}$ is the number of rule bodies in ${G_{j_{1}}, \dots, G_{j_{s}}}$ that are true in $I$ . So ${min}_{1} (d_{i}) = 1$ holds. Otherwise, if $s_{I} (i) = 0$ , we have $I ⊭ a_{i}$ and $I ⊭ G_{j_{1}} \lor \dots \lor G_{j_{s}}$ . Consequently, none of the rule bodies are true in $I$ and we have $d_{i} = {min}_{1} (d_{i}) = 0$ . Putting the two together, we have $s_{I} (i) = d_{i}$ . Since $i$ is arbitrary, we conclude $s_{I} = \min_{1} (d_{I})$ , or equivalently $‖ s_{I} - \min_{1} (d_{I}) ‖_{2} = 0$ . The converse is similarly proved.

Proposition 1 says that whether $I$ is a supported model of the program $P$ or not is determined by computing $s_{I} - {min}_{1} (d_{I})$ in vector spaces whose complexity is $O (m n)$ where $m$ is the number of rules in $P$ , $n$ that of atoms occurring in $P$ . In the case of $P_{0} = (C_{0}, D_{0})$ with $s_{I_{0}} = [1 1 0]^{T}$ as in the aforementioned example, since $s_{I_{0}} = {min}_{1} (d_{I_{0}})$ holds, it follows from Proposition 1 that $I_{0}$ is a supported model of $P_{0}$ .

We next show how $P^{I}$ , the reduct of $P$ by $I$ , is dealt with in vector spaces. We assume $P$ has $m$ rules ${r_{1}, \dots, r_{m}}$ with a set $A = {a_{1}, \dots, a_{n}}$ of $n$ ordered atoms as before. We first show the evaluation of the reduct of the matricized program $P = (C, D)$ by a vectorized model $s_{I}$ . Write $C \in {0, 1}^{m \times 2 n}$ as $C = [C^{p o s} C^{n e g}]$ , where $C^{p o s} \in {0, 1}^{m \times n}$ (resp. $C^{n e g} \in {0, 1}^{m \times n}$ ) is the left half (resp. the right half) of $C$ representing the positive literals (resp. negative literals) of each rule body in $C$ . Compute $M^{n e g} = 1 - {min}_{1} (C^{n e g} s_{I})$ . It is an $m \times 1$ matrix (treated as a column vector here) such that $M^{n e g} (j) = 0$ if the body of $r_{j}$ contains a negative literal false in $I$ and $M^{n e g} (j) = 1$ otherwise ( $1 \leq j \leq m$ ). Let $r_{j}^{+}$ be a rule $r_{j}$ with negative literals in the body deleted. We see that $P^{I} = {r_{j}^{+} ∣ M^{n e g} (j) = 1, 1 \leq j \leq m}$ and $P^{I}$ is syntactically represented by $(C^{p o s}, D^{I})$ , where $D^{I} = D$ with columns $D (:, j)$ replaced by the zero column vector if $M^{n e g} (j) = 0$ ( $1 \leq j \leq m$ ). $D^{I} (i, :)$ denotes a rule set ${r_{j}^{+} ∣ D^{I} (i, j) = 1, 1 \leq j \leq m}$ in $P^{I}$ for $a_{i} \in A$ . We call $P^{I} = (C^{p o s}, D^{I})$ the matricized reduct of $P$ by $I$ .

The matricized reduct $P^{I} = (C^{p o s}, D^{I})$ is evaluated in vector spaces as follows. Compute $M^{p o s} = M^{n e g} ⊙ (1 - {min}_{1} (C^{p o s} (1 - s_{I})))$ . $M^{p o s}$ denotes the truth values of rule bodies in $P^{I}$ evaluated by $I$ . Thus $M^{p o s} (j) = 1$ ( $1 \leq j \leq m$ ) if $r_{j}^{+}$ is contained in $P^{I}$ and its body is true in $I$ . Otherwise $M^{p o s} (j) = 0$ and $r_{j}^{+}$ is not contained in $P^{I}$ or the body of $r_{j}^{+}$ is false in $I$ . Introduce $d_{I}^{+} = D M^{p o s}$ . $d_{I}^{+} (i)$ ( $1 \leq i \leq n$ ) is the number of rules in $P^{I}$ for the $i$ -th atom $a_{i}$ in $A$ whose bodies are true in $I$ .
Proposition 2
Let $P = (C, D)$ be a matricized program $P$ in a set $A = {a_{1}, \dots, a_{n}}$ of $n$ ordered atoms and $I$ a model over $A$ . Write $C = [C^{p o s} C^{n e g}]$ as above. Let $s_{I}$ be the vectorized model $I$ . Compute $M^{n e g} = 1 - \min_{1} (C^{n e g} s_{I})$ , $M^{p o s} = M^{n e g} ⊙ (1 - \min_{1} (C^{p o s} (1 - s_{I})))$ , and $d_{I}^{+} = D M^{p o s}$ . Also compute $d_{I} = D (1 - \min_{1} (C (1 - s_{I}^{δ})))$ . Then, $I ⊨ comp (P)$ , $‖ s_{I} - \min_{1} (d_{I}) ‖_{2} = 0$ , $‖ s_{I} - \min_{1} (d_{I}^{+}) ‖_{2} = 0$ , and $I ⊨ comp (P^{I})$ are all equivalent (proof in Appendix A.1).

From the viewpoint of end-to-end ASP, Proposition 2 means that we can obtain a supported model $I$ as a binary solution $s_{I}$ of the equation $s_{I} - \min_{1} (d_{I}) = 0$ derived from $P$ or $s_{I} - \min_{1} (d_{I}^{+}) = 0$ derived from the reduct $P^{I}$ . Either equation is possible and gives the same result but their computation will be different. This is because the former equation $s_{I} - \min_{1} (d_{I})$ is piecewise linear w.r.t. $s_{I}$ , whereas the latter $s_{I} - \min_{1} (d_{I}^{+})$ is piecewise quadratic w.r.t. $s_{I}$ .
Example 2 Evaluation of a reduct

Now look at $P_{0} = {r_{1}, r_{2}, r_{3}}$ in (1) and a model $I_{0} = {p, q}$ again. $\begin{array}{lcl} P_{0}^{I_{0}} & = & \begin{array}{ll} {\begin{cases} p & \leftarrow & q \\ q & \leftarrow \end{cases} \end{array} \end{array}$ is the reduct of $P_{0}$ by $I_{0}$ . $P_{0}^{I_{0}}$ has the least model ${p, q}$ that coincides with $I_{0}$ . So $I_{0}$ is a stable model of $P_{0}$ . To simulate the reduction process of $P_{0}$ in vector spaces, let $P_{0} = (C_{0}, D_{0})$ be the matricized $P_{0}$ . We first decompose $C_{0}$ in (2) as $C_{0} = [C_{0}^{p o s} C_{0}^{n e g}]$ , where $C_{0}^{p o s}$ is the positive part and $C_{0}^{n e g}$ the negative part of $C_{0}$ . They are

\begin{aligned} C_{0}^{p o s} = [\begin{matrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] and C_{0}^{n e g} = [\begin{matrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{matrix}] . \end{aligned}

Let $s_{I_{0}} = [1 1 0]^{T}$ be the vectorized $I_{0}$ . We first compute $M_{0}^{n e g} = 1 - {min}_{1} (C_{0}^{n e g} s_{I_{0}})$ to determine rules to be removed. Since $M_{0}^{n e g} = [1 0 1]^{T}$ , the second rule $r_{2}$ , indicated by $M_{0}^{n e g} (2) = 0$ , is removed from $P_{0}$ , giving $P_{0}^{I_{0}} = {r_{1}^{+}, r_{3}^{+}}$ . Using $M_{0}^{n e g}$ and $D_{0}$ shown in (3), we then compute $M_{0}^{p o s} = M_{0}^{n e g} ⊙ (1 - {min}_{1} (C_{0}^{p o s} (1 - s_{I_{0}}))) = [1 0 1]^{T}$ and $d_{I_{0}}^{+} = D_{0} M_{0}^{p o s} = [1 1 0]^{T}$ . The elements of $d_{I_{0}}^{+}$ denote the number of rule bodies in $P_{0}^{I_{0}}$ that are true in $I_{0}$ for each atom. Thus, since $s_{I_{0}} = {min}_{1} (d_{I_{0}}^{+}) (= [1 1 0]^{T})$ holds, $I_{0}$ is a supported model of $P_{0}$ by Proposition 2.

3.3. Cost Minimization for Supported Models

Having linear algebraically reformulated several concepts in logic programming, we tackle the problem computing supported models in vector spaces. Although there already exist approaches for this problem, we tackle it without assuming any condition on programs while allowing constraints. Aspis et al. formulated the problem as solving a non-linear equation containing a sigmoid function (Aspis et al., 2020). They encode normal logic programs differently from ours based on Sakama’s encoding (Sakama et al., 2017) and impose the MD condition on programs which is rather restrictive. No support is provided for constraints in their approach. Later Takemura and Inoue proposed another approach (Takemura & Inoue, 2022) which encodes a program in terms of a single matrix and evaluates conjunctions by the number of true literals. They compute supported models by minimizing a non-negative function, not solving an equation like (Aspis et al., 2020). Their programs are, however, restricted to those satisfying the SD condition and constraints are not considered.

Here we introduce an end-to-end way of computing supported models in vector spaces through cost minimization of a new cost function based on the evaluation of disjunction. We impose no syntactic restriction on programs and allow constraints. We believe that these two features make our end-to-end ASP approach more feasible.

We can base our supported model computation either on Proposition 1 or on Proposition 2. In the latter case, we have to compute GL reduction which requires complicated computation compared to the former case. So for the sake of simplicity, we explain the former. Then our task in vector spaces is to find a binary vector $s_{I}$ representing a supported model $I$ of a matricized program $P = (C, D)$ that satisfies $‖ s_{I} - \min_{1} (d_{I}) ‖_{2} = 0$ , where $d_{I} = D (1 - {min}_{1} (C (1 - s_{I}^{δ})))$ . For this task, we relax $s_{I} \in {0, 1}^{n}$ to $s \in R^{n}$ and introduce a non-negative cost function $L_{S U}$ :

L_{S U} = 0.5 \cdot (‖ s - \min_{1} (d) ‖_{2}^{2} + ℓ_{2} \cdot ‖ s ⊙ (1 - s) ‖_{2}^{2}), where ℓ_{2} > 0 and d = D (1 - {min}_{1} (C (1 - s^{δ}))) .

(5)

Proposition 3

Let $L_{S U}$ be defined from a program $P = (C, D)$ as above.

$L_{S U} = 0$ iff $s$ is a binary vector representing a supported model of $P$ .

Proof.

Apparently if $L_{S U} = 0$ , we have $‖ s - \min_{1} (d) ‖_{2}^{2} = 0$ and $‖ s ⊙ (1 - s) ‖_{2}^{2} = 0$ . The second equation means $s$ is binary ( $x (1 - x) = 0 \Leftrightarrow x \in {0, 1}$ ), and the first equation means this binary $s$ is a vector representing a supported model of $P$ by Proposition 1. The converse is obvious.

$L_{S U}$ is piecewise differentiable and we can obtain a supported model of $P$ as a root $s$ of $L_{S U}$ by minimizing $L_{S U}$ to zero using Newton’s method. The Jacobian $J_{a_{S U}}$ required for Newton’s method is derived as follows. We assume $P$ is written in $n$ ordered atoms ${a_{1}, \dots, a_{n}}$ and $s = [u_{1}, \dots, u_{n}]^{T}$ represents their continuous truth values, where $s (p) = s_{p} \in R$ is the continuous truth value for atom $a_{p}$ ( $1 \leq p \leq n$ ). For the convenience of derivation, we introduce the dot product $(A ∙ B) = \sum_{i, j} A (i, j) B (i, j)$ of matrices $A$ and $B$ and a one-hot vector $I_{p}$ which is a zero vector except for the $p$ -th element and $I_{p} (p) = 1$ . We note $(A ∙ (B ⊙ C)) = ((B ⊙ A) ∙ C)$ and $(A ∙ (B C)) = ((B^{T} A) ∙ C) = ((A C^{T}) ∙ B)$ hold (see Appendix B.1 for details).

Let $P = (C, D)$ be the matricized program and write $C = [C^{p o s} C^{n e g}]$ . Introduce $N$ , $M$ , $d$ , $E$ , $F$ , and compute $L_{S U}$ by

\begin{aligned} \begin{array}{lclll} N & = & C (1 - s^{δ}) = C^{p o s} (1 - s) + C^{n e g} s & : (continuous) counts of false literals in the rule bodies \\ M & = & 1 - \min_{1} (N) & : (continuous) truth values of the rule bodies \\ d & = & D M & : (continuous) counts of true disjuncts for each atom \\ E & = & \min_{1} (d) - s & : error between the estimated truth values of atoms and s \\ F & = & s ⊙ (1 - s) & : (continuous) 0 iif s is binary \\ L_{s q} & = & (E ∙ E) \\ L_{n r m} & = & (F ∙ F) \\ L_{S U} & = & 0.5 \cdot (L_{s q} + ℓ_{2} \cdot L_{n r m}) . \end{array} \end{aligned}

(6)

We then compute the Jacobian $J_{a_{S U}}$ of $L_{S U}$ as follows (full derivation in Appendix B.2):

\begin{aligned} J_{a_{S U}} & = (\frac{\partial L_{s q}}{\partial s}) + ℓ_{2} \cdot (\frac{\partial L_{n r m}}{\partial s}) \\ = (C^{p o s} - C^{n e g})^{T} ([N \leq 1] ⊙ (D^{T} ([d \leq 1] ⊙ E))) - E + ℓ_{2} \cdot ((1 - 2 s) ⊙ F), \end{aligned}

(7)

\begin{aligned} where N = C (1 - s^{δ}), d = D (1 - \min_{1} (N)), E = \min_{1} (d) - s, and F = s ⊙ (1 - s) . \end{aligned}

Note that since

s

is a vector, the Jacobian in this case is also a vector.

3.4. Adding Constraints

A rule which has no head like $\leftarrow a \land \neg b$ is called a constraint. We oftentimes need supported models which satisfy constraints. Since constraints are just rules without a head, we encode constraints as rule bodies in a program using a binary matrix $\hat{C} = [{\hat{C}}^{p o s} {\hat{C}}^{n e g}]$ . We call $\hat{C}$ constraint matrix. We introduce $N_{\hat{c}}$ , a non-negative function $L_{\hat{c}}$ of $s$ and $L_{\hat{c}}$ ’s Jacobian $J_{a_{\hat{c}}}$ as follows (derivation in Appendix B.3):

\begin{aligned} N_{\hat{c}} & = \hat{C} (1 - s^{δ}) = {\hat{C}}^{p o s} (1 - s) + {\hat{C}}^{n e g} s : number of literals falsified by s \\ L_{\hat{c}} & = (1 ∙ (1 - \min_{1} (N_{\hat{c}}))), where 1 is an all-ones vector : counts of violated constraints \end{aligned}

(8)

\begin{aligned} J_{a_{\hat{c}}} & = ({\hat{C}}^{p o s} - {\hat{C}}^{n e g})^{T} [N_{\hat{c}} \leq 1] : the Jacobian of L_{\hat{c}} \end{aligned}

(9)

The meaning of $N_{\hat{c}}$ and $L_{\hat{c}}$ is clear when $s$ is binary. Note that any binary $s$ is considered as a model over a set $A = {a_{1}, \dots, a_{n}}$ of $n$ ordered atoms in an obvious way. Suppose $k$ constraints are given to be satisfied. Then $\hat{C}$ is a $k \times 2 n$ binary matrix and $N_{\hat{c}}$ is a $k \times 1$ matrix. $N_{\hat{c}} (i)$ ( $1 \leq i \leq k$ ) is the number of literals falsified by $s$ in a conjunction $G_{i}$ of the $i$ -th constraint $\leftarrow G_{i}$ . So $N_{\hat{c}} (i) = 0$ , or equivalently $1 - \min_{1} (N_{\hat{c}} (i)) = 1$ implies $G_{i}$ has no false literal, that is, $s ⊨ G_{i}$ , and vice versa. Hence $L_{\hat{c}} = \sum_{i = 1}^{k} (1 - \min_{1} (N_{\hat{c}} (i))) = (1 ∙ (1 - \min_{1} (N_{\hat{c}})))$ equals the number of violated constraints. Consequently, when $s$ is binary, we can say that $L_{\hat{c}} = 0$ iff all constraints are satisfied by $s$ .

When $s$ is not binary but just a real vector $s \in R^{n}$ , $N_{\hat{c}}$ and $L_{\hat{c}}$ are thought to be a continuous approximation to their binary counterparts. Since $L_{\hat{c}}$ is a piecewise differentiable non-negative function of $s$ , the approximation error can be minimized to zero by Newton’s method using $J_{a_{\hat{c}}}$ in (9).

3.5. An Algorithm for Computing Supported Models With Constraints

Here we present a minimization algorithm for computing supported models of the matricized program $P = (C, D)$ which satisfy constraints represented by a constraint matrix $\hat{C}$ . We first combine $L_{S U}$ and $L_{\hat{c}}$ into $L_{S U + \hat{c}}$ using $ℓ_{3} > 0$ .

\begin{aligned} L_{S U + \hat{c}} & = L_{S U} + ℓ_{3} \cdot L_{\hat{c}} \end{aligned}

(10)

\begin{aligned} = 0.5 \cdot (‖ s - \min_{1} (d) ‖_{2}^{2} + ℓ_{2} \cdot ‖ s ⊙ (1 - s) ‖_{2}^{2}) + ℓ_{3} \cdot (1 ∙ (1 - \min_{1} (\hat{C} (1 - s^{δ})))) ℓ_{2} > 0, ℓ_{3} > 0, \\ where d = D (1 - {min}_{1} (C (1 - s^{δ}))), \\ J_{a_{S U + \hat{c}}} & = J_{a_{S U}} + ℓ_{3} \cdot J_{a_{\hat{c}}} . \end{aligned}

(11)

The next proposition is immediate from Proposition 3.

Proposition 4

$L_{S U + \hat{c}} = 0$ iff $s$ represents a supported model of $P$ satisfying a constraint matrix $\hat{C}$ .

We compute $L_{S U}$ in $L_{S U + \hat{c}}$ by (6) and $L_{\hat{c}}$ by (8), and their Jacobians $J_{a_{S U}}$ and $J_{a_{\hat{c}}}$ by (7) and by (9), respectively. We minimize the non-negative $L_{S U + \hat{c}}$ to zero by Newton’s method using Algorithm 1. It finds a solution $s_{*}$ of $L_{S U + \hat{c}} = 0$ which represents a supported model of $P$ satisfying constraint matrix $\hat{C}$ . The updating formula is derived from the first-order Taylor expansion of $L_{S U + \hat{c}}$ and by solving $L_{S U + \hat{c}} + (J_{a_{S U * c}} ∙ (s_{new} - s)) = 0$ w.r.t. $s_{new}$ . The updating formula with a learning rate $α > 0$ is thus defined as follows:

s_{new} = s - α (\frac{L_{S U + \hat{c}}}{(J_{a_{S U + \hat{c}}} ∙ J_{a_{S U + \hat{c}}})}) J_{a_{S U + \hat{c}}} .

(12)

Algorithm 1 is a double-loop algorithm where the inner $j$ -loop updates $s \in R^{n}$ repeatedly to minimize $L_{S U + \hat{c}}$ while thresholding $s$ into a binary solution candidate $s_{*} \in {0, 1}^{n}$ for $L_{S U + \hat{c}} = 0$ . The outer $i$ -loop is for retry when the inner loop fails to find a solution. The initialization at line 3 is carried out by sampling $s (i) \sim N (0, 1) + 0.5$ ( $1 \leq i \leq n$ ) where $N (0, 1)$ is the standard normal distribution. Lines 6 to 8 collectively perform thresholding of $s$ into a binary $s_{*}$ . As the inner loop repeats, $L_{S U + \hat{c}}$ becomes smaller and smaller and so do $L_{s q}$ and $L_{n r m}$ in $L_{S U}$ . $L_{s q}$ being small means $s$ is close to a supported model of $P$ while $L_{n r m}$ being small means each element of $s$ is close to ${0, 1}$ . So binarization $s_{*} = [s \geq θ]$ with an appropriate threshold $θ$ ³ has a good chance of yielding a binary $s_{*}$ representing a supported model of $P$ satisfying constraints represented by $\hat{C}$ . It may happen that the inner loop fails to find a solution. In such a case, we retry another $j$ -loop with perturbated $s$ at line 12. There $s$ is perturbated by $s \leftarrow 0.5 (s + Δ + 0.5)$ where $Δ \sim N (0, 1)$ before the next $j$ -loop.

4. Computing Stable Models in Vector Spaces

4.1. Loop Formulas and Stable Models

Let $P = (C, D)$ be a matricized program in a set of atoms $A = {a_{1}, \dots, a_{n}}$ having $m$ rules ${a_{i_{1}} \leftarrow G_{1}, \dots, a_{i_{m}} \leftarrow G_{m}}$ , where $C \in {0, 1}^{m \times 2 n}$ and $D \in {0, 1}^{n \times m}$ . We assume atoms and rules are ordered as indicated.

Computing a supported model of $P$ is equivalent to computing any binary fixedpoint $s \in {0, 1}^{n}$ such that $s = \min_{1} (D (1 - {min}_{1} (C (1 - s^{δ}))))$ in vector spaces and in this sense, it is conceptually simple (though NP-hard). Contrastingly since stable models are a proper subclass of supported models, if one wishes to obtain precisely stable models through fixedpoint computation, the exclusion of non-stable models is necessary. Lin-Zhao’s theorem (Lin & Zhao, 2004) states that $I$ is a stable model of $P$ iff $I$ is a supported model of $P$ and satisfies a set of formulas called loop formulas associated with $P$ .

Let $S = {h_{1}, \dots, h_{p}} \subseteq A$ be a loop in $P$ . Recall that $S$ is a set of atoms which are strongly connected in the positive dependency graph of $P$ .⁴ A support rule for $h$ with respect to $S$ is a rule $h \leftarrow H$ such that $H^{+} \cap S = \emptyset$ . $H$ is called a support body for $S$ . Introduce a (conjunctive) loop formula for $S$ by

L F (S) = (h_{1} \land \dots \land h_{p}) \to (H_{1} \lor \dots \lor H_{q}), where {H_{1}, \dots, H_{q}} are support bodies for S .

(13)Then define loop formulas associated with

P

L F (P) = {L F (S) ∣ S is a loop in P}

, which is treated as the conjunction of its elements. We note that in the original form (Lin & Zhao, 2004), the antecedent of

L F (S)

is a disjunction

(h_{1} \lor \dots \lor h_{p})

. Later it was shown that the disjunctive and conjunctive loop formulas are equivalent (Ferraris et al., 2006), and we choose to use the conjunctive form of

L F (S)

as it is easier to satisfy using our method. We evaluate

L F (P)

by a real vector

s \in R^{n}

. Introduce an external support matrix

E_{s u p} \in {0, 1}^{n \times m}

E_{s u p} (i, j) = 1

if there is a support rule

a_{i} \leftarrow G_{j}

for

a_{i} \in A

, else

E_{s u p} (i, j) = 0

(

1 \leq i \leq n, 1 \leq j \leq m

). Suppose there are

w

loops

{S_{1}, \dots, S_{w}}

P

. Introduce a loop matrix

L_{o o p} \in {0, 1}^{w \times m}

such that

L_{o o p} (v, j) = 1

if the

v

-th loop

S_{v}

has

G_{j}

as a support body for

S_{v}

, else

L_{o o p} (v, j) = 0

(

1 \leq v \leq w

Example 3 Encoding loop formulas

Suppose we are given a program $P_{L 0}$ :

\begin{aligned} P_{L 0} & = \begin{array}{ll} {\begin{cases} p & \leftarrow & q \land \neg r & : rule r_{1} for p \\ p & \leftarrow & \neg s & : rule r_{2} for p \\ q & \leftarrow & p & : rule r_{3} for q \\ r & \leftarrow & r & : rule r_{4} for r \end{cases} \end{array} \end{aligned}

(14)

This program contains two loops: $S_{1} = {p, q}$ and $S_{2} = {r}$ . In this case, only $S_{1}$ has an external support body $\neg s$ . Thus, the external support matrix $E_{s u p 0}$ and the loop matrix $L_{o o p 0}$ for this program are as follows:

\begin{aligned} E_{s u p 0} & = \begin{array}{ll} \begin{matrix} r_{1} & r_{2} & r_{3} & r_{4} \end{matrix} \\ [\begin{matrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}] & \begin{matrix} : p has the body of r_{2} as the support body \\ : q has no external support \\ : r has no external support \\ : s is not part of any loops \end{matrix} \end{array} \end{aligned}

(15)

\begin{aligned} L_{o o p 0} & = \begin{array}{ll} \begin{matrix} r_{1} & r_{2} & r_{3} & r_{4} \end{matrix} \\ [\begin{matrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}] & \begin{matrix} : S_{1} has the body of r_{2} as its support body \\ : S_{2} has no support bodies \end{matrix} \end{array} \end{aligned}

(16)

We then introduce a loss function $L_{L F}$ , which is a non-negative piecewise linear function of $s$ .

\begin{aligned} M & = 1 - \min_{1} (C (1 - s^{δ})) : (continuous) truth values by s of the rule bodies in P \\ S_{v} & = L_{o o p} (v, :) : represents the v -th loop in {S_{1}, \dots, S_{w}} \\ A_{v} & = S_{v} (1 - s) + S_{v} E_{s u p} M : (continuous) counts of true disjuncts by s of L F (S_{v}) \\ L_{L F} & = \sum_{v = 1}^{w} (1 - \min_{1} (A_{v})) \end{aligned}

(17)

Proposition 5

Let $L_{L F}$ be defined as above. When $s$ is a binary vector representing a model $I$ over $A$ , it holds that $L_{L F} = 0$ iff $s ⊨ L F (P)$ .

Proof.

Suppose $L_{L F} = 0$ and $s$ is binary. A summand $(1 - \min_{1} (A_{v}))$ in $L_{L F}$ (17) corresponds to the $v$ -th loop $S_{v} = {h_{1}, \dots, h_{p}}$ and is non-negative. Consider $L F (S_{v}) = (h_{1} \land \dots \land h_{p}) \to (H_{1} \lor \dots \lor H_{q})$ as a disjunction $\neg h_{1} \lor \dots \lor \neg h_{p} \lor H_{1} \lor \dots \lor H_{q}$ . Then $L_{L F} = 0$ implies $(1 - \min_{1} (A_{v})) = 0$ , or equivalently $A_{v} \geq 1$ . Consequently, as $s$ is binary, we have $S_{v} (1 - s) \geq 1$ or $S_{v} E_{s u p} M \geq 1$ . The former means $I ⊨ \neg h_{1} \lor \dots \lor \neg h_{p}$ . The latter, $S_{v} E_{s u p} M \geq 1$ , means $I ⊨ H_{1} \lor \dots \lor H_{q}$ . This is because the element $(E_{s u p} M) (i)$ is the number of support rules for $a_{i} \in A$ whose bodies are true in $I$ $s$ ( $1 \leq i \leq n$ ), and hence $S_{v} E_{s u p} M \geq 1$ means some support body $H_{r}$ ( $1 \leq r \leq q$ ) for $S_{v}$ is true in $I$ . So in either case $I ⊨ L F (S_{v})$ . Since $v$ is arbitrary, we have $I ⊨ L F (P)$ . The converse is straightforward and omitted.

The Jacobian $J_{a_{L F}}$ of $L_{L F}$ is computed as follows (derivation in Appendix B.4):

\begin{aligned} N & = C (1 - s^{δ}), \\ N_{v} & = S_{v} (1 - s), \\ M_{v} & = \min_{1} (N_{v}), \\ J_{a_{L F}} & = \frac{\partial L_{L F}}{\partial s} = \sum_{v = 1}^{w} - (\frac{\partial \min_{1} (A_{v})}{\partial s}) \\ = \sum_{v = 1}^{w} [A_{v} \leq 1] ([N_{v} \leq 1] S_{v}^{T} + (((S_{v} E_{s u p}) ⊙ [N \leq 1]^{T}) (C^{n e g} - C^{p o s})^{T}) . \end{aligned}

(18)Here

C = [C^{p o s} C^{n e g}]

and

S_{v}

A_{v}

, and

M

are computed by (17).

Now introduce a new cost function $L_{S U + \hat{c} + L F}$ by (19) that incorporates $L_{L F}$ and compute its Jacobian $J_{a_{S U + \hat{c} + L F}}$ by (11).

\begin{aligned} L_{S U + \hat{c} + L F} & = L_{S U + \hat{c}} + ℓ_{4} \cdot L_{L F}, where ℓ_{4} > 0, \end{aligned}

(19)

\begin{aligned} J_{a_{S U + \hat{c} + L F}} & = J_{a_{S U + \hat{c}}} + ℓ_{4} \cdot J_{a_{L F}} . \end{aligned}

(20)

By combining Propositions 4, 5, and Lin-Zhao’s theorem (Lin & Zhao, 2004), the following is obvious.

Proposition 6

$s$ is a stable model of $P$ satisfying constraints represented by $\hat{C}$ iff $s$ is a root of $L_{S U + \hat{c} + L F}$ .

We compute such $s$ by Newton’s method using Algorithm 1 with a modified update rule (12) such that $L_{S U + \hat{c}}$ and $J_{a_{S U + \hat{c}}}$ are replaced by $L_{S U + \hat{c} + L F}$ and $J_{a_{S U + \hat{c} + L F}}$ , respectively.

When a program $P$ is tight (Fages, 1994), for example, when rules have no positive literal in their bodies, $P$ has no loop and hence $L F$ is empty. In such a case, we directly minimize $L_{S U + \hat{c}}$ instead of using $L_{S U + \hat{c} + L F}$ with the empty $L F$ .

4.2. LF Heuristics

Minimizing $L_{S U + \hat{c} + L F}$ is a general way of computing stable models under constraints. It is applicable to any program and gives us a theoretical framework for computing stable models in an end-to-end way without depending on symbolic systems. However, there can be exponentially many loops and they make the computation of $L_{L F}$ (17) extremely difficult or practically impossible. To mitigate this seemingly insurmountable difficulty, we propose two heuristics which use a subset of loop formulas.

$L F_{m a x}$ :
The first heuristic is $L F_{m a x}$ . We consider only a set $L F_{m a x}$ of loop formulas associated with SCCs in the positive dependency graph $pdg (P) = (V, E)$ of a program $P$ . In the case of a singleton SCC ${a}$ , $a$ must have a self-loop in $pdg (P)$ . We compute SCCs in $O (| E | + | V |)$ time by Tarjan’s algorithm (Tarjan, 1972).
$L F_{m i n}$ :
In this heuristic, instead of SCCs (maximal strongly connected subgraphs), we choose minimal strongly connected subgraphs, that is, cycle graphs. Denote by $L F_{m i n}$ the set of loop formulas associated with cycle graphs in $pdg (P)$ . We use an enumeration algorithm described by Liu and Wang (2006) to enumerate cycles and construct $L F_{m i n}$ due to its simplicity.

We remark that although $L F_{m a x}$ and $L F_{m i n}$ can exclude some of non-stable models, they do not necessarily exclude all of non-stable models. However, the role of loop formulas in our framework is entirely different from the one in symbolic ASP. Namely, the role of LF in our framework is not to logically reject non-stable models but to guide the search process by their gradient information in the continuous search space. Hence, we expect, as actually observed in experiments in the next section, some loop formulas have the power of guiding the search process to a root of $L_{S U + \hat{c} + L F}$ .
4.3. Precomputation

We introduce here precomputation. The idea is to remove atoms from the search space which are false in every stable model. It downsizes the program and realizes faster model computation.

When a program $P$ in a set $A$ = atom( $P$ ) is given, we transform $P$ to a definite program $P^{+}$ by removing all negative literals from the rule bodies in $P$ . Since $P^{+} \supseteq P^{I}$ holds as a set of rules for any model $I$ , we have $L M (P^{+}) \supseteq L M (P^{I})$ , where $L M (P)$ denotes the least model of a definite program $P$ . When $I$ is a stable model, $L M (P^{I}) = I$ holds and we have $L M (P^{+}) \supseteq I$ . By taking the complements of both sides, we can say that if an atom $a$ is outside of $L M (P^{+})$ , that is, if $a$ is false in $L M (P^{+})$ , so is $a$ in any stable model $I$ of $P$ . Thus, by precomputing the least model $L M (P^{+})$ , we can remove a set of atoms $F_{P} = A ∖ L M (P^{+})$ from our consideration as they are known to be false in any stable model. We call $F_{P}$ stable false atoms. Of course, this precomputation needs additional computation of $L M (P^{+})$ but it can be done in linear time proportional to the size of $P^{+}$ , that is, the total number of occurrences of atoms in $P^{+}$ (Dowling & Gallier, 1984).⁵ Accordingly precomputing the least model $L M (P^{+})$ makes sense if the benefit of removing stable false atoms from the search space outweighs linear time computation for $L M (P^{+})$ , which is likely to happen when we deal with programs with positive literals in the rule bodies.

More concretely, given a program $P$ and a set of constraints $K$ , we can obtain downsized ones, $P^{'}$ and $K^{'}$ , as follows.

Step 1:
Compute the least model $L M (P^{+})$ and the set of stable false atoms $F_{P} = atom (P) ∖ L M (P^{+})$ .
Step 2:
Define
$\begin{aligned} G^{'} & = conjunction G with negative literals {\neg a \in G ∣ a \in F_{P}} removed \\ P^{'} & = {a \leftarrow G^{'} ∣ a \leftarrow G \in P, a \notin F_{P}, G^{+} \cap F_{P} = \emptyset}, where G^{+} = positive literals in G, \end{aligned}$
(21)
$\begin{aligned} K^{'} & = {\leftarrow G^{'} ∣\leftarrow G \in K, G^{+} \cap F_{P} = \emptyset} . \end{aligned}$
(22)

Proposition 7
Let $P^{'}$ and $K^{'}$ be, respectively, the program (21) and constraints (22). Also let $I^{'}$ be a model over atom( $P^{'}$ ). Expand $I^{'}$ to a model $I$ over atom( $P$ ) by assuming every atom in $F_{P}$ is false in $I$ . Then

$I^{'}$ is a stable model of $P^{'}$ satisfying constraints $K^{'}$ iff $I$ is a stable model of $P$ satisfying constraints $K$ .
Proof.
We prove first $I^{'}$ is a stable model of $P^{'}$ iff $I$ is a stable model of $P$ . To prove it, we prove $L M (P^{' I^{'}}) = L M (P^{I})$ as set.

Let $a \leftarrow G^{' +}$ be an arbitrary rule in $P^{' I^{'}}$ . Correspondingly there is a rule $a \leftarrow G^{'}$ in $P^{'}$ such that $I^{'} ⊨ G^{' -}$ . So there is a rule $a \leftarrow G$ in $P$ such that $G^{'} = G ∖ {\neg b ∣ b \in F_{P}}$ and $G^{+} \cap F_{P} = \emptyset$ . $I^{'} ⊨ G^{' -}$ implies $I ⊨ G^{-}$ by construction of $I$ from $I^{'}$ . So $a \leftarrow G^{+}$ is contained in $P^{I}$ , which means $a \leftarrow G^{' +}$ is contained in $P^{I}$ because $G^{' +} = G^{+}$ (recall that $G^{'} = G ∖ {\neg b ∣ b \in F_{P}}$ and $G^{'}$ and $G$ have the same set of positive literals). Thus since $a \leftarrow G^{' +}$ is an arbitrary rule, we conclude $P^{' I^{'}} \subseteq P^{I}$ , and hence $L M (P^{' I^{'}}) \subseteq L M (P^{I})$ .

Now consider $a \in L M (P^{I})$ . There is a selective linear definite clause (SLD) derivation for $\leftarrow a$ from $P^{I}$ . Let $b \leftarrow G^{+} \in P^{I}$ be a rule used in the derivation which is derived from the rule $b \leftarrow G \in P$ such that $I ⊨ G^{-}$ . Since $P^{I} \subseteq P^{+}$ , we have $L M (P^{I}) \subseteq L M (P^{+})$ and hence $L M (P^{I}) \cap F_{P} = \emptyset$ , that is, $L M (P^{I})$ contains no stable false atom. So $b \notin F_{P}$ and $G^{+} \cap F_{P} = \emptyset$ because every atom in the SLD derivation must belong in $L M (P^{I})$ . Accordingly $b \leftarrow G^{'} \in P^{'}$ . On the other hand, $I ⊨ G^{-}$ implies $I^{'} ⊨ G^{' -}$ . So $b \leftarrow G^{'}$ is in $P^{'}$ and $b \leftarrow G^{' +}$ is in $P^{' I^{'}}$ . Therefore $b \leftarrow G^{+}$ is in $P^{' I^{'}}$ because $G^{' +} = G^{+}$ . Thus every rule used in the derivation for $\leftarrow a$ from $P^{I}$ is also a rule contained in $P^{' I^{'}}$ , which means $a \in L M (P^{' I^{'}})$ . Since $a$ is arbitrary, it follows that $L M (P^{I}) \subseteq L M (P^{' I^{'}})$ . By putting $L M (P^{' I^{'}}) \subseteq L M (P^{I})$ and $L M (P^{I}) \subseteq L M (P^{' I^{'}})$ together, we conclude $L M (P^{I}) = L M (P^{' I^{'}})$ .

Then, if $I^{'}$ is a stable model of $P^{'}$ , we have $I^{'} = L M (P^{' I^{'}}) = L M (P^{I})$ as set. Since $I = I^{'}$ as set, we have $I = L M (P^{I})$ as set, which means $I$ is a stable model of $P$ . Likewise when $I$ is a stable model of $P$ , we have $I = L M (P^{I}) = L M (P^{' I^{'}})$ and $I = I^{'}$ as set. So $I^{'} = L M (P^{' I^{'}})$ as set and $I^{'}$ is a stable model of $P^{'}$ .

As for the constraints, consider a constraint $\leftarrow G^{'}$ in $K^{'}$ . We consider two cases, where $b \in F_{P}$ occurs positively and negatively in $G$ and $G^{'}$ . In the former case, the body remains the same between $G^{'}$ and $G$ , thus if $I^{'} ⊨ G^{'}$ then $I ⊨ G$ and vice versa. In the latter case, because $\neg b$ always evaluates to true, the negative occurrence $\neg b$ in the body of the constraint will not change the result of the conjunction $G$ . Thus, combining the two cases and since the constraint $\leftarrow G^{'}$ is arbitrary, we conclude that $I^{'} ⊨ K^{'}$ iff $I ⊨ K$ .
5. Programming Examples

In this section, we apply our ASP approach to examples as a proof of concept and examine the effectiveness of precomputation and heuristics. Since large scale computing is out of scope in this article, the program size is mostly small.⁶

5.1. The Three-Coloring Problem

We first deal with the three-coloring problem. Suppose we are given a graph $G_{1}$ . The task is to color the vertices of the graph blue, red, and green so that no two adjacent vertices have the same color like (b) as shown in Figure 1.

Figure 1.

Three-coloring problem: (a) graph $G_{1}$ and (b) three-coloring.

There are four nodes ${a, b, c, d}$ in the graph $G_{1}$ . We assign a set of three color atoms (Boolean variables) to each node to represent their color. For example, node $a$ is assigned three color atoms ${a_{1} (red), a_{2} (blue), a_{3} (green)}$ . We need to represent two facts by these atoms. –

Each node has a unique color chosen from ${r e d, b l u e, g r e e n}$ . So color atoms assigned to each node are in an XOR relation. We represent this fact by a tight program $P_{1}$ below containing three rules for each node.

\begin{aligned} P_{1} & = {\begin{cases} a_{1} \leftarrow \neg a_{2} \land \neg a_{3}, & a_{2} \leftarrow \neg a_{3} \land \neg a_{1}, & a_{3} \leftarrow \neg a_{1} \land \neg a_{2} \\ b_{1} \leftarrow \neg b_{2} \land \neg b_{3}, & b_{2} \leftarrow \neg b_{3} \land \neg b_{1}, & b_{3} \leftarrow \neg b_{1} \land \neg b_{2} \\ c_{1} \leftarrow \neg c_{2} \land \neg c_{3}, & c_{2} \leftarrow \neg c_{3} \land \neg c_{1}, & c_{3} \leftarrow \neg c_{1} \land \neg c_{2} \\ d_{1} \leftarrow \neg d_{2} \land \neg d_{3}, & d_{2} \leftarrow \neg d_{3} \land \neg d_{1}, & d_{3} \leftarrow \neg d_{1} \land \neg d_{2} \end{cases} \end{aligned}

(23)

–

Two nodes connected by an edge must have a different color. We represent this fact in terms of constraints.

\begin{aligned} K_{1} & = {\begin{cases} \leftarrow a_{1} \land b_{1}, & \leftarrow a_{2} \land b_{2}, & \leftarrow a_{3} \land b_{3} \\ \leftarrow a_{1} \land c_{1}, & \leftarrow a_{2} \land c_{2}, & \leftarrow a_{3} \land c_{3} \\ \leftarrow b_{1} \land c_{1}, & \leftarrow b_{2} \land c_{2}, & \leftarrow b_{3} \land c_{3} \\ \leftarrow b_{1} \land d_{1}, & \leftarrow b_{2} \land d_{2}, & \leftarrow b_{3} \land d_{3} \\ \leftarrow d_{1} \land c_{1}, & \leftarrow d_{2} \land c_{2}, & \leftarrow d_{3} \land c_{3} \end{cases} \end{aligned}

(24)

Assuming an ordering of atoms ${a_{1}, a_{2}, a_{3}, \dots, d_{1}, d_{2}, d_{3}}$ , the normal logic program $P_{1}$ shown in (23) is matricized to $P_{1} = (C_{1}, D_{1})$ , where $D_{1}$ is a $(12 \times 12)$ binary identity matrix (because there are 12 atoms and each atom has just one rule) and $C_{1}$ is a $(12 \times 24)$ binary matrix shown in (25). Constraints listed in (24) are a matricized to a ( $15 \times 12$ ) constraint matrix ${\hat{C}}_{K_{1}}$ (26). In (25) and (26), $a$ , for example, stands for a triple $(a_{1} a_{2} a_{3})$ and $\neg a$ for $(\neg a_{1} \neg a_{2} \neg a_{3})$ .

\begin{aligned} C_{1} & = \begin{array}{ll} \begin{matrix} a & b & c & d & \neg a & \neg b & \neg c & \neg d \end{matrix} \\ [\begin{matrix} H_{3} \\ H_{3} \\ H_{3} \\ H_{3} \end{matrix}], & where H_{3} = [\begin{matrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{matrix}] . \end{array} \end{aligned}

(25)

\begin{aligned} {\hat{C}}_{K_{1}} & = \begin{array}{ll} \begin{matrix} a & b & c & d \end{matrix} \\ [\begin{matrix} E_{3} & E_{3} \\ E_{3} & E_{3} \\ E_{3} & E_{3} \\ E_{3} & E_{3} \\ E_{3} & E_{3} \end{matrix}], & where E_{3} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}] . \end{array} \end{aligned}

(26)

We run Algorithm 1 on program $P_{1}$ with constraints $K_{1}$ to find a supported model (solution) of $P_{1}$ satisfying $K_{1}$ .⁷

To measure time to find a model, we conduct 10 trials⁸ of running Algorithm 1 with $m a x_t r y = 20$ , $m a x_i t r = 50$ , $ℓ_{2} = ℓ_{3} = 0.1$ and take the average. The result is 0.104 s (0.070)⁹ on average. Also to check the ability of finding different solutions, we perform 10 trials of Algorithm 1¹⁰ and count the number of different solutions in the returned solutions. #solutions in Table 1 is the average of 10 such measurements. Considering there are six solutions and the naive implementation, the number of different solutions found by the algorithm, which was 5.2 on average, seems rather high.

Table 1.

Time and the Number of Solutions.

Time(s)	# solutions
6.7 (0.7)	5.2 (0.9)

Next we check the scalability of our approach by a simple problem. We consider the three-coloring of a cycle graph like (a) in Figure 2. In general, given a cycle graph that has $n$ nodes, we encode its three-coloring problem as in the previous example by a matricized program $P_{2} = (C_{2}, D_{2})$ and a constraint matrix ${\hat{C}}_{K_{2}}$ , where $D_{2} (3 n \times 3 n)$ is an identity matrix and $C (3 n \times 6 n)$ and ${\hat{C}}_{K_{2}} (3 n \times 6 n)$ represent, respectively, rules and constraints. There are $2^{n} + 2 (- 1)^{n}$ solutions ( $n \geq 3$ ) in $2^{3 n}$ possible assignments for $3 n$ atoms.¹¹ So the problem will be exponentially difficult as $n$ goes up.

Figure 2.

Convergence and scalability: (a) cycle graph, (b) minimization of $L_{S U + \hat{c}}$ with retry, and (c) scalability.

The graph (b) in Figure 2 is an example of convergence curve of $L_{S U + \hat{c}}$ by Algorithm 1 with $n = 10$ , $m a x_t r y = 100$ , and $m a x_i t r = 50$ . The curve tells us that in the first cycle of $j$ -loop, the inner for loop of Algorithm 1, no solution is found after $m a x_i t r = 50$ iterations of update of continuous assignment vector $s$ . Then perturbation is given to $s$ which causes a small jump of $L_{S U + \hat{c}}$ at $i t r = 51$ and the second cycle of $j$ -loop starts and this time a solution is found after dozens of updates by thresholding $s$ to a binary vector $s_{*}$ .

The graph (c) in Figure 2 shows the scalability of computation time to find a solution up to $n =$ 10,000. We set $m a x_t r y = 100$ , $m a x_i t r = 2000$ and plot the average of 10 measurements of time to find a solution. The graph seems to indicate good linearity w.r.t. $n$ up to $n =$ 10,000.

5.2. The HC Problem, Precomputation, and Another Solution Constraint

A HC is a cycle in a graph that visits every vertex exactly once and the HC problem is to determine if an HC exists in a given graph. It is an NP-complete problem and has been used as a programming example since the early days of ASP. Initially, it is encoded by a non-tight program containing positive recursion (Niemelä, 1999). Later a way of encoding by a program that is not tight but tight on its completion models is proposed (Lin & Zhao, 2003). We here introduce yet another encoding by a tight ground program inspired by SAT encoding proposed by Zhou (2020), where Zhou showed that the problem is solvable by translating six conditions listed in Figure 3 into a SAT problem.¹²

Figure 3.

Conditions for Boolean satisfiability problem (SAT) encoding of a Hamiltonian cycle problem.

In what follows, we assume vertices are numbered from $1$ to $N$ = the number of vertices in a graph. We use $i \to j$ to denote an edge from vertex $i$ to vertex $j$ and $H_{i, j}$ to indicate there exists an edge from $i$ to $j$ in an HC. $U_{j, q}$ means vertex $j$ is visited at time $q$ ( $1 \leq j, q \leq N$ ) and $one_of (a_{1}, \dots, a_{k})$ means that one of ${a_{1}, \dots, a_{k}}$ is true. We translate these conditions into a program $P_{3} = {(1), (2), (3)}$ and constraints $K_{3} = {(4), (5), (6)}$ . To be more precise, the first condition $(1)$ is translated into a tight program just like a program $P_{1}$ (23). The conditions ${(2), (3)}$ constitute a tight definite program. Constraints $K_{2} = {(4), (5), (6)}$ are encoded as a set of implications of the form $\leftarrow L_{1} \land \dots \land L_{k}$ , where $L_{1}, \dots, L_{k}$ are literals. A set of $U_{j, q}$ atoms contained in a stable model of $P_{3}$ satisfying $K_{3}$ gives an HC.

We apply the above encoding to a simple HC problem for a graph $G_{2}$ in Figure 4.¹³ There are six vertices and six HCs.¹⁴ To solve this HC problem, we matricize $P_{3}$ and $K_{3}$ . There are 36 $H_{i, j}$ atoms ( $1 \leq i, j \leq 6$ ) and 36 $U_{j, q}$ atoms ( $1 \leq j, q \leq 6$ ). So there are 72 atoms in total. $P_{3} = {(1), (2), (3)}$ contains 197 rules in these 72 atoms and we translate $P_{3}$ into a pair of matrices $(C_{3}, D_{3})$ , where $D_{3}$ is a $72 \times 197$ binary matrix for disjunctions¹⁵ and $C_{3}$ is a $197 \times 144$ matrix for conjunctions (rule bodies). Likewise $K_{3} = {(4), (5), (6)}$ is translated into a constraint matrix ${\hat{C}}_{K_{3}}$ which is a $67 \times 144$ binary matrix. A more detailed description of the encoding is available in the appendix (Appendix C). Then our task is to find a root $s$ of $L_{S U + \hat{c}}$ (10) constructed from these $C_{3}$ , $D_{3}$ , and ${\hat{C}}_{K_{3}}$ in a 72 dimensional vector space by minimizing $L_{S U + \hat{c}}$ to zero.

Figure 4.

A Hamiltonian cycle (HC) problem: (a) graph $G_{2}$ and (b) time and the number of different solutions.

We apply precomputation in the previous section to $P_{3} = (C_{3}, D_{3})$ and ${\hat{C}}_{K_{3}}$ to reduce program size. It takes 2.3 ms and detects 32 false stable atoms. It outputs a precomputed program $P_{3}^{'} = (C_{3}^{'}, D_{3}^{'})$ and a constraint matrix ${\hat{C}}_{K_{3}}^{'}$ of size $D_{3}^{'} (40 \times 90)$ , $C_{3}^{'} (90 \times 80)$ , and ${\hat{C}}_{K_{3}}^{'} (52 \times 80)$ , respectively, which is $1 / 4$ or $1 / 2$ of the original size. So precomputation removes $45 %$ of atoms from the search space and returns much smaller matrices.

We run Algorithm 1 on $P_{3} = (C_{3}, D_{3})$ with ${\hat{C}}_{K_{3}}$ (no precomputation) and also on $P_{3}^{'} = (C_{3}^{'}, D_{3}^{'})$ with ${\hat{C}}_{K_{3}}^{'}$ (precomputation) using $m a x_t r y = 20$ , $m a x_i t r = 200$ , and $ℓ_{2} = ℓ_{3} = 0.1$ and measure time to find a solution, that is, stable model satisfying constraints. The result is shown in Table (b) in Figure 4 as time(s), where time(s) is an average of 10 trials. The figures in the table, 2.08 s versus 0.66 s,¹⁶ clearly demonstrate the usefulness of precomputation.

In addition to computation time, we examine the search power of different solutions in our approach by measuring the number of obtainable solutions. More concretely, we run Algorithm 1 seven times, and each time a stable model is obtained as a conjunction $L_{1} \land \dots \land L_{72}$ of literals, we add a new constraint $\leftarrow L_{1} \land \dots \land L_{72}$ to previous constraints, thereby forcibly computing a new stable model in next trial. We call such use of constraint another solution constraint. Since there are at most six solutions, the number of solutions obtained by seven trials is at most six. We repeat a batch of seven trials 10 times and take the average of the number of solutions obtained by each batch. The average is denoted as #solutions in Table (b) which indicates that 5.7 solutions out of 6, almost all solutions, are obtained by seven trials using another solution constraint.

Summing up, the figures in Table (b) exemplify the effectiveness of precomputation which significantly reduces computation time and returns a more variety of solutions when combined with another solution constraint.

5.3. LF Heuristics and Precomputation on Loopy Programs

So far we have been dealing with tight programs which have no loop and hence have no loop formulas. We here deal with non-tight programs containing loops and examine how LF heuristics, $L F_{m a x}$ and $L F_{m i n}$ , introduced in the previous section work. We use an artificial non-tight program $P_{4_n}$ (with no constraint) shown below that has exponentially many loops.

\begin{aligned} P_{4_n} & = {\begin{cases} a_{0} & \leftarrow & a_{1} \land \dots \land a_{n} \\ a_{0} & \leftarrow & \neg a_{n + 1} \\ \dots \\ a_{2 i - 1} & \leftarrow & a_{0} \lor a_{2 i} & for i : 1 \leq i \leq n / 2 \\ a_{2 i} & \leftarrow & a_{0} \lor a_{2 i - 1} & for i : 1 \leq i \leq n / 2 \\ \dots \\ a_{n + 1} & \leftarrow & a_{n + 1} \end{cases} \end{aligned}

We here consider an even

n

, then

P_{4_n}

program has

n + 2

atoms

{a_{0}, a_{1}, \dots, a_{n}, a_{n + 1}}

2^{n / 2} + 1

supported models and one stable model

{a_{0}, a_{1}, \dots, a_{n}}

. There are

n / 2 + 1

minimal loops

{a_{1}, a_{2}}, \dots, {a_{n - 1}, a_{n}}, {a_{n + 1}}

and a maximal loop

{a_{0}, a_{1}, \dots, a_{n}}

. The set of loop formulas for LF heuristics are computed as follows:

\begin{aligned} L F_{m a x} & = {(a_{0} \land a_{1} \land \dots \land a_{n}) \to \neg a_{n + 1}, a_{n + 1} \to⊥}, \\ L F_{m i n} & = {(a_{1} \land a_{2}) \to a_{0}, \dots, (a_{n - 1} \land a_{n}) \to a_{0}, a_{n + 1} \to⊥} . \end{aligned}

Note that although there are

2^{n / 2} + 1

supported models, there is only one stable model. So

L F_{m a x}

and

L F_{m i n}

are expected to exclude

2^{n / 2}

supported models.

After translating $P_{4_n}$ into a matricized program $P_{4_n} = (C_{4_n}, D_{4_n})$ , where $C_{4_n}$ is a $(2 n + 3) \times (2 n + 4)$ binary matrix and $D_{4_n}$ is a $(n + 2) \times (2 n + 3)$ binary matrix, respectively, we compute a stable model of $P_{4_n}$ for various $n$ by Algorithm 1 that minimizes $L_{S U + \hat{c} + L F}$ (19) with coefficient $ℓ_{3} = 0$ for the constraint term (because of no use of constraints) using Jacobian $J_{a_{S U + \hat{c} + L F}}$ (11).

Below is an example of the program $P_{4_n}$ , where $n = 4$ .

\begin{aligned} P_{4_4} & = \begin{array}{ll} {\begin{cases} a_{0} & \leftarrow & a_{1} \land a_{2} \land a_{3} \land a_{4} \\ a_{0} & \leftarrow & \neg a_{5} \\ a_{1} & \leftarrow & a_{0} & a_{1} \leftarrow a_{2} \\ a_{2} & \leftarrow & a_{0} & a_{2} \leftarrow a_{1} \\ a_{3} & \leftarrow & a_{0} & a_{3} \leftarrow a_{4} \\ a_{4} & \leftarrow & a_{0} & a_{4} \leftarrow a_{3} \\ a_{5} & \leftarrow & a_{5} \end{cases} \end{array} \end{aligned}

This program has three minimal loops

{a_{1}, a_{2}}, {a_{3}, a_{4}}, a_{5}

, and a maximal loop

{a_{0}, a_{1}, a_{2}, a_{3}, a_{4}}

. There are 11 rules and six atoms, so

C_{4_4}

is a

(11 \times 12)

binary matrix.

Since all supported models of $P_{4_n}$ except for one stable model are non-stable, even if $L F_{m a x}$ and $L F_{m i n}$ are used to guide the search process towards a stable model, Algorithm 1 is likely to return a non-stable model. We can avoid such a situation by the use of another solution constraint.

To verify it, we examine the pure effect of another solution constraint that guides the search process to compute a model different from previous ones. Without using $L F_{m a x}$ or $L F_{m i n}$ heuristics, we repeatedly run Algorithm 1 with/without another solution constraint for $10^{4}$ trials with $n = 4$ , $m a x_t r y = 20$ , $m a x_i t r = 50$ , $ℓ_{2} = ℓ_{3} = 0.1$ and measure time to find a stable model and count the number of trials until then. We repeat this experiment 10 times and take the average. The result is shown in Table 2.

Table 2.
The Effect of Another Solution Constraint.

Another solution constraint Time(s) # trials

Not used 11.46 (0.41) $10^{4} (0)$

Used 0.09 (0.13) 3.5 (1.6)

Another solution constraint	Time(s)	# trials
Not used	11.46 (0.41)	$10^{4} (0)$
Used	0.09 (0.13)	3.5 (1.6)

The figure $10^{4} (0)$ in Table 2 in the case of no use of another solution constraint means Algorithm 1 always exhausts $10^{4}$ trials without finding a stable model (due to implicit bias in Algorithm 1). When another solution constraint is used, however, it finds a stable model in 0.09 s after 3.5 trials on average. Thus Table 2 demonstrates the necessity and effectiveness of another solution constraint to efficiently explore the search space.

We next compare the effectiveness of LF heuristics and that of precomputation under another solution constraint. For $n = 10, \dots, 50$ , we repeatedly run Algorithm 1 using $L_{S U + \hat{c} + L F}$ with $m a x_t r y = 10, m a x_i t r = 100$ on matricized $P_{4_n} = (C_{4_n}, D_{4_n})$ (and no constraint matrix) to compute supported (stable) models. Coefficients in $L_{S U + \hat{c} + L F}$ are set to $ℓ_{2} = 0.1, ℓ_{3} = 0, and ℓ_{4} = 1$ . To be more precise, for each $n$ and each case of $L F_{m a x}$ , $L F_{m i n}$ , precomputation (without ${L F_{m a x}, L F_{m i n}}$ ) and no ${L F_{m a x}, L F_{m i n}, precomputation}$ , we run Algorithm 1 at most 100 trials to measure time to find a stable model and count the number of supported models computed till then. We repeat this computation 10 times and take the average and obtain graphs in Figure 5.

Figure 5.

The effect of loop formula (LF) heuristics and precomputation on program $P_{4_n}$ : (a) time to find a stable model and (b) #computed models.

In Figure 5, no_LF means no use of ${L F_{m a x}, L F_{m i n}}$ heuristics. Also no_LF_pre means no_LF is applied to precomputed $P_{4_n}$ .¹⁷

We can see from graph (a) in Figure 5 that computation time is $L F_{m i n} > L F_{m a x} > no\_LF > no\_LF\_pre$ . This means that using LF heuristics is not necessarily a good policy. They might cause extra computation to reach the same model. Concerning the number of non-stable models computed redundantly, graph (b) in Figure 5 tells us that $L F_{m i n}$ allows computing redundant non-stable models but the rest, $L F_{m a x}$ , no_LF, and no_LF_pre, return a stable model without computing redundant non-stable models. This shows first that $L F_{m a x}$ works correctly to suppress the computation of non-stable models and second that the $L F_{m i n}$ heuristics works adversely, that is, guiding the search process away from the stable model. This somewhat unexpected result indicates the need of (empirical) choice of LF heuristics.

Finally, to examine the effectiveness of precomputation more precisely, we apply precomputation to a more complex program $P_{5_n k}$ . It is a modification of $P_{4_n}$ by adding self-loops of $k$ atoms as illustrated by (a) in Figure 6. The addition of self-loop causes the choice of $a_{n + j}$ ( $1 \leq j \leq k$ ) being true or being false in the search process. $P_{5_n k}$ has $(2^{n / 2} - 1) (2^{k} - 1) + 1$ supported models but has just one stable model ${a_{0}, a_{1}, \dots, a_{n}}$ .

Figure 6.

Precomputation applied to program $P_{5_n k}$ : (a) a non-tight program $P_{5_n k}$ and (b) scalability of precomputation w.r.t. $P_{5_n k}$ .

We compute a stable model by running Algorithm 1 on precomputed $P_{5_n k}$ without using LF heuristics up to $n = k = 5000$ . When precomputation is applied to $P_{5_n k}$ , where $n = k = 5000$ , it detects $5000$ false stable atoms and downsizes the matrices in $P_{5_n k} = (C_{5_n k}, D_{5_n k})$ from $D_{5_n k} (10001 \times 15002)$ to $D_{5_n k}^{'} (5001 \times 10002)$ and from $C_{5_n k} (15002 \times 20002)$ to $C_{5_n k}^{'} (10002 \times 10002)$ . Thus precomputed $P_{5_n k}^{'} = (C_{5_n k}^{'}, D_{5_n k}^{'})$ is downsized to 1/3 of the original $P_{5_n k}$ .

We run Algorithm 1 on $P_{5_n k}^{'}$ with $ℓ_{2} = ℓ_{3} = 0.1$ and $m a x_t r y = 10$ , $m a x_i t r = 100$ at most 100 trials to measure time to find a stable model ten times for each $n = 1000, \dots, 5000$ and take the average. At the same time, we also run clingo (version 5.6.2) on $P_{5_n k}$ and similarly measure time. Graph (b) in Figure 6 is the result. It shows that as far as computing a stable model of $P_{5_n k}$ is concerned, our approach comes close to clingo. However, this is due to a very specific situation that precomputation removes all false atoms ${a_{n + 1}, \dots, a_{n + k}}$ in the stable model of $P_{5_n k}$ and Algorithm 1 run on the precomputed $P_{5_n k}^{'} = (C_{5_n k}^{'}, D_{5_n k}^{'})$ detects the stable model only by thresholding $s$ before starting any update of $s$ . So what graph (b) really suggests seems to be the importance of optimization of a program like precomputation, which is to be developed further in our approach.

6. Related Work

The most closely related work is Aspis et al. (2020) and Takemura and Inoue (2022). As mentioned in Section 1, our approach differs from them in three points: (1) theoretically, the exclusion of non-stable models by loop formulas, (2) syntactically, no restriction on acceptable programs, and (3) practically, incorporation of constraints. Concerning performance, they happen to use the same $N$ -negative loops program which consists of $N$ copies (alphabetic variants) of a program ${a \leftarrow \neg b, b \leftarrow \neg a}$ . According to Aspis et al. (2020), the success rate w.r.t. $N$ of returning a supported model goes from one initially to almost zero at $N = 64$ by Aspis et al. (2020), while it keeps one till $N = 100$ by Takemura and Inoue (2022). We tested the same program with $m a x_t r y = 20$ , $m a x_i t r = 100$ , and observed that the success rate keeps one till $N =$ 10,000.

Although our approach is non-probabilistic, that is, purely linear algebraic, there are probabilistic differentiable approaches for ASP. Differentiable ASP/SAT (Nickles, 2018) iteratively samples a stable model by an ASP solver a la ASSAT (Lin & Zhao, 2004). The solver decides the next decision literal based on the derivatives of a cost function which is the MSE between the target probabilities and predicted probabilities computed from the sampled stable models via parameters associated with “parameter atoms” in a program.

NeurASP (Yang et al., 2020) uses an ASP solver to obtain stable models including “neural atoms” for a program. They are associated with probabilities learned by deep learning and the likelihood of an observation (a set of ASP constraints) is computed from them. The whole learning is carried out by backpropagating the likelihood to neural atoms to parameters in a neural network.

Similarly to NeurASP, SLASH (Skryagin et al., 2023, 2022) uses an ASP solver to compute stable models for a program containing “neural probabilistic predicates.” Their probabilities are dealt with by neural networks and probabilistic circuits. The latter makes it possible to compute a joint distribution of the class category and data. Both NeurASP and SLASH are examples of symbolic ASP solver-based neuro-symbolic systems, where they include a neural frontend to process the perception part of the problem, and a symbolic backend which typically is the ASP solver. Therefore, the neural frontend does not need to be involved in the computational details and problems associated with computing stable models (Section 4).

Independently of ASP solver-based approaches mentioned above, Sato and Kojima proposed a differentiable approach to sampling supported models of (non-propositional) probabilistic normal logic programs (Sato & Kojima, 2019, 2020). They encode programs by matrices and formulate the problem of sampling supported models as repeatedly computing a fixedpoint of some differentiable equations. They solve the equations in vector spaces by minimizing a non-negative cost function defined by Frobenius norm. More recently, Takemura and Inoue (2024) proposed a neuro-symbolic learning pipeline for distant supervision tasks, which leverages differentiable computation of supported models. Similarly to this work, they encode normal logic programs into matrices and define a differentiable loss function which is based on the supported model semantics.

As for the non-differentiable linear-algebraic approaches to logic programming, Nguyen et al. adopted matrix encoding for propositional normal logic programs based on Sakama et al. (2017) and proposed to compute stable models in vector spaces by a generate-and-test approach using sparse representation (Nguyen et al., 2022).

6.1. Connection to Neural Network Computation

At this point, it is quite interesting to see the connection of our approach to neural network computation. In (6), we compute $M$ and $d = D M$ . We point out that the computation of this $d$ is nothing but the output of a forward process of a single layer ReLU network from an input vector $s$ . Consider the computation of $M = (1 - {min}_{1} (C (1 - s^{δ})))$ . We rewrite this using $1 - \min (x, 1) = ReLU (1 - x)$ to

\begin{aligned} M & = 1 - {min}_{1} (C^{p o s} (1 - s) + C^{n e g} s) \\ = ReLU (W s + b), \\ where C = [C^{p o s} C^{n e g}], W = C^{p o s} - C^{n e g}, b = 1 - C^{p o s} 1 . \end{aligned}

M

is the output of a ReLU network having a weight matrix

W = C^{p o s} - C^{n e g}

and a bias vector

b = 1 - C^{p o s} 1

. Then

\min_{1} (d) = \min_{1} (D M) = \min_{1} (D \cdot ReLU (W s + b))

is the output of a ReLU network with a single hidden layer and a linear output layer represented by

D

having

\min_{1} (\cdot)

as activation function. Also when we compute a supported model

s

, we minimize

L_{S U + \hat{c}}

(6) which contains an MSE error term

L_{s q} = ‖ \min_{1} (d) - s ‖^{2}

using

J_{a_{S U + \hat{c}}}

(11). This is precisely back propagation from learning data

s

Thus we may say that our approach is an integration of ASP semantics and neural computation and provides a neuro-symbolic (Sarker et al., 2021) way of ASP computation. Nonetheless, there is a big difference. In standard neural network architecture, a weight matrix $W$ and a bias vector $b$ are independent. In our setting, they are interdependent and they faithfully reflect the logical structure of a program.

7. Conclusion

We proposed an end-to-end approach for computing stable models satisfying given constraints. We matricized a program and constraints and formulated stable model computation as a minimization problem in vector spaces of a non-negative cost function. We obtain a stable model satisfying constraints as a root the cost function by Newton’s method.

By incorporating all loop formula constraints introduced in Lin-Zhao’s theorem (Lin & Zhao, 2004) into the cost function to be minimized, we can prevent redundant computation of non-stable models, at the cost of processing exponentially many loop formulas. Hence, we introduced precomputation which downsizes a program while preserving stable model semantics and also two heuristics that selectively use loop formulas. Then we confirmed the effectiveness of our approach including precomputation and loop formula heuristics by simple examples.

Future work could focus on improving the integration of neural networks with this proposed end-to-end approach to tackle neuro-symbolic benchmark tasks that require both perception and reasoning. We also aim to improve the optimization techniques, such as precomputation, to enhance efficiency and scalability.

Footnotes

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by JSPS KAKENHI Grant Numbers JP21H04905, JP25K03190 and JST CREST Grant Number JPMJCR22D3.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Taisuke Sato

Akihiro Takemura

Katsumi Inoue

Notes

Appendix A. Proofs

Appendix B. Derivations

Appendix C. Encoding the Hamiltonian Cycle Problem

This section describes the encoding and program used in solving the Hamiltonian cycle problem (Section 5.2).

Firstly, looking at the graph (Figure 4a), it is evident that there are six vertices. We use an atom H i , j to indicate there exists an edge from vertices i to j in an HC. Then there are 36 atoms { H 1 , 1 , H 1 , 2 , … , H 1 , 6 , H 2 , 1 , … , H 6 , 1 , … , H 6 , 6 } . We also use an atom U j , q to indicate that the vertex j is visited at time q . Then there are 36 atoms { U 1 , 1 , U 1 , 2 , … , U 1 , 6 , U 2 , 1 , … , U 6 , 1 , … , U 6 , 6 } . Thus, in total, there are 72 atoms consisting of H i , j and U j , q .

References

Apt

K. R.

Blair

H. A.

Walker

(1988). Towards a theory of declarative knowledge. In J. Minker (Ed.), Foundations of deductive databases and logic programming (pp. 89–148). Morgan Kaufmann.

Aspis

Broda

Russo

Lobo

(2020). Stable and supported semantics in continuous vector spaces. In D. Calvanese, E. Erdem, & M. Thielscher (Eds.), Proceedings of the 17th international conference on principles of knowledge representation and reasoning, KR 2020, Rhodes, Greece, September 12–18, 2020 (pp. 59–68).

Dowling

W. F.

Gallier

J. H.

(1984). Linear-time algorithms for testing the satisfiability of propositional Horn formula. Journal of Logic Programming, 3, 267–284.

Erdem

Lifschitz

(2003). Tight logic programs. Theory and Practice of Logic Programming (TPLP), 3(4–5), 499–518.

Fages

(1994). Consistency of Clark’s completion and existence of stable models. Journal of Methods of Logic in Computer Science, 1, 51–60.

Ferraris

Lee

Lifschitz

(2006). A generalization of the Lin-Zhao theorem. Annals of Mathematics and Artificial Intelligence, 47, 79–101.

Gebser

Kaufmann

Neumann

Schaub

(2007). clasp: A conflict-driven answer set solver. In LPNMR. Lecture Notes in Computer Science, Vol. 4483 (pp. 260–265). Springer.

Gebser

Kaufmann

Schaub

(2012). Conflict-driven answer set solving: From theory to practice. Artificial Intelligence, 187, 52–89.

Gelfond

Lifshcitz

(1988). The stable model semantics for logic programming. In ICLP/SLP 1988, pp. 1070–1080. MIT Press.

10.

Lierler

(2005). CMODELS: SAT-based disjunctive answer set solver. In Proceedings of the 8th international conference on logic programming and nonmonotonic reasoning, LPNMR’05 (pp. 447–451). Springer-Verlag.

11.

Lifschitz

(2008). What is answer set programming? In Proceedings of the 23rd national conference on artificial intelligence, Vol. 3, AAAI’08 (pp. 1594–1597). AAAI Press.

12.

Lifschitz

Razborov

(2006). Why are there so many loop formulas? ACM Transactions on Computational Logic, 7(2), 261–268.

13.

Lin

Zhao

(2003). On tight logic programs and yet another translation from normal logic programs to propositional logic. In Proceedings of the 18th international joint conference on artificial intelligence (IJCAI’03) (pp. 853–858). Morgan Kaufmann Publishers Inc.

14.

Lin

Zhao

(2004). ASSAT: Computing answer sets of a logic program by SAT solvers. Artificial Intelligence, 157(1), 115–137.

15.

Liu

Wang

(2006). A new way to enumerate cycles in graph. In Advanced int’l conference on telecommunications and int’l conference on internet and web applications and services (AICT-ICIW’06) (pp. 57–59). IEEE Computer Society.

16.

Marek

V. W.

Truszczyński

(1999). Stable models and an alternative logic programming paradigm. In K. R. Apt, V. W. Marek, M. Truszczynski, & D. S. Warren (Eds.), The logic programming paradigm: A 25-year perspective (pp. 375–398). Springer Berlin Heidelberg. doi:10.1007/978-3-642-60085-2_17

17.

Marek

Subrahmanian

V. S.

(1992). The relationship between stable, supported, default and autoepistemic semantics for general logic programs. Theoretical Computer Science, 103(2), 365–386.

18.

Nguyen

T. Q.

Inoue

Sakama

(2022). Enhancing linear algebraic computation of logic programs using sparse representation. New Generation Computers, 40(1), 225–254. doi:10.1007/s00354-021-00142-2

19.

Nickles

(2018). Differentiable SAT/ASP. In Proceedings of the 5th international workshop on probabilistic logic programming, PLP 2018 (pp. 62–74). CEUR-WS.org.

20.

Niemelä

(1999). Logic programs with stable model semantics as a constraint programming paradigm. Annals of Mathematics and Artificial Intelligence, 25(3–4), 241–273.

21.

Niemelä

Simons

(1997). Smodels—An implementation of the stable model and well-founded semantics for normal logic programs. In J. Dix, U. Furbach, & A. Nerode (Eds.), Logic programming and nonmonotonic reasoning (pp. 420–429). Springer Berlin Heidelberg.

22.

Sakama

Inoue

Sato

(2017). Linear algebraic characterization of logic programs. In Proceedings of the 10th international conference on knowledge science, engineering and management (KSEM2017), LNAI 10412 (pp. 520–533). Springer-Verlag.

23.

Sarker

M. K.

Zhou

Eberhart

Hitzler

(2021). Neuro-symbolic artificial intelligence: Current trends. The European Journal on Artificial Intelligence , 34(3), 197–209. doi:https://doi.org/10.3233/AIC-210084

24.

Sato

Kojima

(2019). Logical inference as cost minimization in vector spaces. In Proceedings of the fourth international workshop on declarative learning based programming (DeLBP’19). https://delbp.github.io/

25.

Sato

Kojima

(2020). In A. E. F. Seghrouchni & D. Sarne (Eds.), Artificial intelligence IJCAI 2019 international workshops, revised selected best papers. Lecture Notes in Computer Science, Vol. 12158 (pp. 239–252). Springer. doi:10.1007/978-3-030-56150-5

26.

Skryagin

Ochs

Dhami

D. S.

Kersting

(2023). Scalable neural-probabilistic answer set programming. Journal of Artificial Intelligence Research, 78, 579–617.

27.

Skryagin

Stammer

Ochs

Dhami

D. S.

Kersting

(2022). Neural-probabilistic answer set programming. In Proceedings of the 19th international conference on principles of knowledge representation and reasoning (pp. 463–473). IJCAI Organization.

28.

Takemura

Inoue

(2022). Gradient-based supported model computation in vector spaces. In G. Gottlob, D. Inclezan, & M. Maratea (Eds.), Logic programming and nonmonotonic reasoning - 16th international conference, LPNMR 2022, Genova, Italy, September 5–9, 2022, Proceedings (pp. 336–349). Lecture Notes in Computer Science, Vol. 13416. Springer.

29.

Takemura

Inoue

(2024). Differentiable logic programming for distant supervision. In U. Endriss, F. S. Melo, K. Bach, A. J. Bugarín Diz, J. M. Alonso-Moral, S. Barro, & F. Heintz (Eds.), ECAI 2024 - 27th European conference on artificial intelligence, Santiago de Compostela, Spain – Including 13th conference on prestigious applications of intelligent systems (PAIS 2024). Frontiers in Artificial Intelligence and Applications (Vol. 392, pp. 1301–1308). IOS Press. doi:10.3233/FAIA240628.

30.

Tarjan

R. E.

(1972). Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2), 146–160.

31.

West

D. B.

(2001). Introduction to graph theory. Prentice Hall.

32.

Yang

Ishay

Lee

(2020). NeurASP: Embracing neural networks into answer set programming. In C. Bessiere (Ed.), Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20, Yokohama, Japan (pp. 1755–1762). IJCAI Organization.

33.

Zhou

N.-F.

(2020). In pursuit of an efficient SAT encoding for the Hamiltonian cycle problem. In Proceeding of the principles and practice of constraint programming: 26th international conference (CP 2020) (pp. 585–602). Springer-Verlag.

Towards End-to-End ASP Computation

Abstract

Keywords

1. Introduction

3. Computing Supported Models in Vector Spaces

3.1. Matricized Programs

Definition 1 Matricized program

Example 1 Encoding a program

3.3. Cost Minimization for Supported Models

4.1. Loop Formulas and Stable Models

5.1. The Three-Coloring Problem

Table 2. The Effect of Another Solution Constraint. Another solution constraint Time(s) # trials Not used 11.46 (0.41) 10 4 ( 0 ) Used 0.09 (0.13) 3.5 (1.6)

6.1. Connection to Neural Network Computation

7. Conclusion

Footnotes

Funding

Declaration of Conflicting Interests

ORCID iDs

Notes

Appendix A. Proofs

Appendix B. Derivations

Appendix C. Encoding the Hamiltonian Cycle Problem

References

Table 2.
The Effect of Another Solution Constraint.

Another solution constraint Time(s) # trials

Not used 11.46 (0.41) $10^{4} (0)$

Used 0.09 (0.13) 3.5 (1.6)