Abstract
The definition and the general methods of construction of non-statistical association measures on different domains are discussed. An association measure is a function of two variables defined on a set X with involutive operation and satisfying the properties similar to the properties of the Pearson’s correlation coefficient. Such measure can be used for analysis of the possible positive and negative relationships between variables. The methods of construction of association measures using similarity measures and pseudo-difference operations associated to t-conorms are discussed. The examples of association measures on different domains are considered.
Introduction
The association measures are widely used in data analysis. Different association and correlation measures have been introduced in statistics, data mining, fuzzy set theory etc. [1, 17] for different types of data. The Pearson’s correlation coefficient [12]
is the most popular association measure used for analysis of possible relationships between variables. Many association measures similar to the correlation coefficient have been proposed but it is an interesting problem not only to introduce a new association measure for some type of data but to analyze a class of functions similar to the correlation coefficient and to propose the methods of their generation. In [1], it was proposed the measure of correlation between fuzzy membership functions satisfying to the set of properties similar to the properties of the Pearson’s correlation coefficient. In [6], it was considered another set of properties similar to the properties of Pearson’s correlation coefficient and defining the time series shape association measures. In [7], the general methods of construction of such association measures have been proposed and the sample Pearson’s correlation coefficient was obtained as a particular case of the general approach. In [8], the methods proposed in [7] have been extended on the general case of functions A : X × X → [-1, 1] defined on a set X with involutive operation N (called reflection) and satisfying the properties similar to the properties of the Pearson’s correlation coefficient. The methods of construction of such measures [8, 9] use similarity measures and pseudo-difference operations associated with t-conorms [2, 15]. In [9], the problems appeared in the definition of the general class of functions similar to the Pearson’s correlation coefficient have been discussed. These problems have the different reasons. First, the properties of the function (1): corr(x,x) = 1 and corr(x,–x) = –1, are, really, contradictive for the n-tuple x = (0, … ,0) where it is fulfilled: x = –x. The similar problem appears, generally, for the fixed points of the reflection operation N used in the definition of the association measure A. Second, the function (1) does not defined for the constant n-tuples x = (x1, …, x n ) = (s, …, s) where s is some real value because the denominator of (1) equals to 0. Similarly, it is possible that an association measure cannot be defined on all set X. Such elements of X can be excluded from the domain of the association measure or this function should be additionally defined there. Third, depending on the domain X, additionally to the general properties of the association measures it is possible to consider other properties specific for this domain. See, for example, the definition of time series shape association measure [6, 7].
The current paper tries to avoid these problems by two ways. First, to consider explicitly the association measures defined on some subset V of X where these problems disappear. Second, to define the association measure on the set X and to correct some properties required from the association measure to avoid the possible contradiction between them.
The current paper also gives the proofs of some general results considered in the previous papers of the author without proofs. Some related details can be found also in [10].
The paper has the following structure. Section 2 discusses the definitions and the properties of association measures defined on the sets with involutive operation. For example, the simple association measure on the set of real values is introduced. Section 3 considers the basic definitions and the properties of operations of fuzzy logic used in the following sections. Section 4 considers the general methods of construction of association measures and gives the proofs of the related theoretical results. Section 5 considers an example of association measure constructed by proposed methods. The conclusions are given in the last section.
Let X be a set and |X|>1.
N is called a reflection on X if it is not an identical function, i.e. for some x ∈ X it is fulfilled N (x) ≠ x. An element x ∈ X, such that
is called a fixed point of N in X.
The fixed points will be denoted by x
FP
, hence:
Denote FP (N, X) the set of all fixed points of N in X. This set can be empty.
is called an association measure on V.
Consider the following properties of association measures:
an association measure of type 1 on X if Equations (5) and (7) are fulfilled for all x, y ∈ X and Equation (6) is fulfilled for all x ∉ FP (N, X); an association measure of type 2 on X if Equations (5) and (6) are fulfilled for all x, y ∈ X and Equation (7) is fulfilled for all x ∈ X and all y ∉ FP (N, X).
Note that from Proposition 3 it follows A (x FP , x FP ) =0. Although some papers require the fulfillment of (6) for all x ∈ X, in this paper the association measures of type 2 will be not considered. The property (12) of association measures of type 1 seems more reasonable. See [10].
In [10], it was considered an association measure of type 1 on [0,1] related with the strong negation N.
A strong negation is a reflection operation on [0,1] with the unique fixed point denoted as c. In [10], it was considered the class of c-separable association measures of type 1 satisfying for all x, y ∈ [0, 1] the properties:
Such association measures can be used for analysis of associations between truth or probability values of some plausible statements P and Q. For example, the association between them is negative when one statement has high plausibility value and another one has low plausibility value.
Let X be a set of real values, X =
From Proposition 3 we obtain the following properties of the association measures on
Similarly to the c-separable association measure on [0,1] introduce the following definition.
0-separable association measures have the simple interpretation: x and y are positively associated if they have the same sign and they are negatively associated if they have the opposite signs. Based on these considerations it can be proposed the following simplest association measure on the set of real values.
is the 0-separable association measure of type 1 on the set of real values.
The proof is straightforward.
Association measures on the set of time series are considered in [6, 7]. A time series of the length n, (n > 1), is a sequence (n-tuple) of a real values x = (x1 … , x n ). Consider the reflection operation N (x) = - x = (- x1, …, - x n ) on the set X of all time series with the length n. Suppose p, q are real values and p ≠ 0. Define x + y = (x1 + y1, …, x n + y n ) and py + q = (py1 + q, …, py n + q). Denote q(n) a constant time series with the length n with all elements equal to q. The n-tuple x FP = 0(n) is a unique fixed point of N. We write x = const if x = q(n) for some q, and x ≠ const if x i ≠ x j for some i ≠ j from {1, …, n}. Denote X C a set of all constant time series from X.
is called a shape association measure on V. If from x ∈ V it is fulfilled px ∈ V for all p > 0 and A satisfies on V the property:
then A is called a scale invariant association measure.
In the next section, there are considered the basic properties of some operations of fuzzy logic that will be used further in construction of association measures.
Consider the basic properties of the operations of fuzzy logic used in the following sections [2–5, 18].
From the definition of t-conorm it follows for all a ∈ [0, 1]:
It is clear that t-conorm S has no nilpotent elements if and only if for all x, y ∈ [0, 1] it is fulfilled:
Consider the simplest, basic, t-conorms:
Maximum and probabilistic sum have no nilpotent elements but Lukasiewicz t-conorm has.
(i) The is defined for all a, b ∈ [0, 1] as follows:
(ii) The pseudo-differenceassociated toS is defined for all a, b ∈ [0, 1] as follows:
Pseudo-difference associated to S has the following properties (see [10, 15] for details):
1) , if b = 0 or
2) For any a, b ∈ [0, 1] it is fulfilled:
3) If t-conorm S is continuous at the point 0 in both arguments then the following is fulfilled for all a, b ∈ [0, 1]:
The following pseudo-differences are associated to the basic t-conorms S
M
, S
P
and S
L
:
The function φ in Equation (41) is called a generator of N.
has the generator φ (x) = x and the fixed point c = 0.5.
has the generator φ (x) = x p and the fixed point.
It has the generator:
This simple strong negation connects by line segments the fixed point (c, c) with the points (0,1) and (1,0). It can be used for construction of strong negations with any fixed point c ∈]0, 1 [.
Consider the methods of construction of association measures using a similarity measure and pseudo-difference operation associated with some t-conorm and prove the related results [8, 9].
For strict reflexive similarity measure SIM on X with reflection N it is fulfilled the property of weak similarity of reflections:
if and only if it satisfies the permutation of reflections property:
is an association measure on V.
Let us prove Equation (5). For y = xEquation (5) is fulfilled trivially.
For y = N (x) from involutivity of N we have x =N (y), and from Equation (53) we obtain: ASIM,S (x, y) = ASIM,S (x, N(x))=-1 and ASIM,S (y, x)=ASIM,S (y, N(y))=-1, hence ASIM,S (x, y)=ASIM,S (y, x).
Suppose y ≠ x, y ≠ N (x) then from the involutivity of N it follows N (y) ≠ x. From Equations (54), symmetry and permutation of reflections properties of SIM we obtain:
Let us prove Equation (7). For y = xEquation (7)follows from Equations (53) and (52): ASIM,S(x,N (y)) = ASIM,S (x, N (x)) = -1 = - ASIM,S (x, x) =-ASIM,S (x, y).
For y = N (x) Equation (7) follows from the involutivity of N, Equations (52) and (53): ASIM,S (x, N (y)) = ASIM,S (x, N (N (x))) = ASIM,S (x, x) = 1 = - ASIM,S (x, N(x)) = - ASIM,S (x, y).
If y ≠ x, y = N (x) then Equation (7) follows from Equation (54), involutivity of N and Equation (36):
is an association measure on V if one of the following is fulfilled:
The reflexivity of ASIM,S follows from the reflexivity and weak similarity of reflections properties of SIM and from (35) that requires the fulfilment of (56) or (57):
The inverse relationship of ASIM, S follows from the involutivity of N and Equation (55):
The examples of association measures on different domains constructed by the methods discussed in the previous section can be found in [7–10]. The similarity measures satisfying the conditions of Theorems 2 and 3 can be obtained from the distance measures used together with some data transformation [7], from generators of strong negations [3, 10] etc. For example, suppose φ, ψ : [0, 1] → [0, 1] are automorphisms of [0,1] and φ defines by (41) a strong negation N on [0,1]. Then the function
is a similarity measure on [0,1] that can be used for constructing association measure on [0,1] related with strong negations (42)-(44) (see [10] for details). Below is an example of the simplest association measure on [0,1] related with the standard negation(42) [10]:
The paper gives the definitions of the association measures generalizing the Pearson’s correlation coefficient and proposes the general methods of construction of such measures. The proofs of the main results are provided. The simple association measure on the set of real numbers is introduced. The considered methods of generation of association measures can be used for construction of association measures on different domains.
Footnotes
Acknowledgments
This work was partially supported by the project 20151589 of Instituto Politécnico Nacional, Mexico.
