Sage Journals: Discover world-class research

Abstract

This article presents a comparison of multivariate normal mean vectors under covariance positive definite matrices. We introduce an improved parametric bootstrap (IPB) approach for addressing the multivariate Behrens-Fisher problem, specifically focusing on cases with unequal covariance matrices. Additionally, we evaluate the performance of the IPB test by comparing it with three existing tests: the parametric bootstrap (PB) test, the generalized variable (GV) test, and the Johansen test. Through Monte Carlo simulation, our results demonstrate that both the IPB test and the PB test exhibit superior control over Type I error rates compared to the GV and Johansen tests. Notably, the IPB test outperforms the PB test in terms of controlling Type I error rates. Consequently, our study concludes that the IPB test represents a preferred statistical method for testing the equality of mean vectors in the multivariate Behrens-Fisher problem.

Keywords

Parametric bootstrap Type I error rates Behrens-Fisher problem unequal covariance

1. Introduction

Multivariate analysis of variance (MANOVA) is used for comparing the mean vectors with multivariate normal populations. The observed vectors are $Y_{ij}$ where $i=1,2,\ldots,k$ denotes the sample and $j=1,2,\ldots,n_{i}$ denotes the observation within the sample. Each observation vector $Y_{ij}$ is a $p$ -multiple multivariate normal vectors with a mean vector $\mu_{i}$ and equal common covariance matrix $\sum$ . The model for MANOVA can be written as

$\displaystyle Y_{ij}=\mu_{i}+\varepsilon_{ij},\quad(i=1,2,\ldots,k;j=1,2,% \ldots,n_{i}),$ (1)

where $\mu_{i}$ is the parameter mean vector and $\varepsilon_{ij}$ represents error terms that are independently drawn from a multivariate normal distribution with mean vector 0 and equal common covariance matrix $\sum$ . Therefore, $E(Y_{ij})=\mu_{i}$ and $\textit{Var}(Y_{ij})=\sum$ . The hypotheses for testing the equality of the mean vectors in MANOVA are

$\displaystyle H_{0}:\mu_{1}=\mu_{2}=\ldots=\mu_{k}\text{ versus }H_{1}:\mu_{i}% \neq\mu_{j}\text{ for some }i\neq j.$ (2)

In general, a test statistic for testing the equality of the mean vectors in Eq. (2) is the Hotelling $T^{2}$ test (Zhang & Xu, 2009), which is a test statistic based on assumptions. Most tests available in the literature for testing the equality of the mean vectors for the hypotheses in Eq. (2) based on equal common covariance matrices $\sum$ have been presented by many researchers, such as Lawley (1938), Hotelling (1951), Wilks’ (1932) and Bartlett (1939).

Sometime, the assumption of the equal common covariance matrix in MANOVA is violated, which is the Behrens-Fisher problem (unequal common covariance matric $\sum_{i}$ ). With an unequal common covariance matric $\sum_{i}$ , the aforementioned statistics show that it cannot control the Type I error rate, which is reflected in the performance of the test statistics. Therefore, many researchers have presented test statistics for testing the equality of the mean vectors for the hypotheses in Eq. (1) based on an unequal common covariance matrix.

A natural test statistic for testing the equality of mean vectors in Eq. (2) is given by

$\displaystyle T({\bar{Y}_{1},\ldots,\bar{Y}_{k};\tilde{S}_{1},\ldots,\tilde{S}% _{k}})=\sum\limits_{i=1}^{k}{({\bar{Y}_{i}-\hat{\mu}_{0}^{*}})}^{\prime}W_{i}(% {\bar{Y}_{i}-\hat{\mu}_{0}^{*}})=\sum\limits_{i=1}^{k}{{\bar{Y}}^{\prime}_{i}W% \bar{Y}_{i}}-\left({\sum\limits_{i=1}^{k}{W_{i}\bar{Y}_{i}}}\right)^{\prime}W^% {-1}\left({\sum\limits_{i=1}^{k}{W_{i}\bar{Y}_{i}}}\right)$ (3)

Where $\bar{Y}_{i}=\sum\limits_{j=1}^{n_{i}}{Y_{ij}}$ , $W=\sum\limits_{i=1}^{k}{W_{i}}$ and $W_{i}=\tilde{S}^{-1}$ with $\tilde{S}_{i}=\frac{1}{n_{i}}S_{i}$ and $S_{i}=\frac{1}{n_{i}-1}\sum\limits_{j=1}^{n_{i}}{({Y_{ij}-\bar{Y}_{i}})}({Y_{% ij}-\bar{Y}_{i}})^{\prime}$ . The null hypothesis $H_{0}$ in 2 is rejected when $T({\bar{Y}_{1},\ldots,\bar{Y}_{k};\tilde{S}_{1},\ldots,\tilde{S}_{k}})>\chi_{p% (k-1)}^{2}$ .

Johansen (1980) developed the test statistic in Eq. (3) for testing the equality of the mean vectors for the hypotheses in Eq. (2), based on an unequal common covariance matrix which is

$\displaystyle J_{OH}=\frac{T({\bar{Y}_{1},\ldots,\bar{Y}_{k};\tilde{S}_{1},% \ldots,\tilde{S}_{k}})}{c}$ (4)

where $c$ is

$\displaystyle c=p(k-1)+2A-\frac{6A}{p(k-1)+2}$ (5)

and

$\displaystyle A=\frac{\sum\limits_{i=1}^{k}{tr({I-W^{-1}W_{i}})^{2}+[{tr({I-W^% {-1}W_{i}})}]}^{2}}{2({n_{i}-1})}$ (6)

The Johansen test is distributed as an $F$ -distribution with degrees of freedom $f_{1}=p(k-1)$ and $f_{2}={p(k-1)[{p(k-1)+2}]}/{(3A)}$ . Therefore, Johansen’s test rejects the null hypothesis $H_{0}$ in Eq. (2) when $J_{OH}>F_{f_{1},f_{2},1-\alpha}$ , where $F_{m,r;q}$ denotes the $q$ th quantile of an $F$ distribution with degrees of freedom $m$ and $r$ .

Gamage et al. (2004) used the concept of a generalized $p$ -value supported by Tsui and Weerahandi and Tsui (1989) for developing the generalized variable (GV) test. The GV test is described as follows. Let $(\tilde{y}_{i},\tilde{s}_{i})$ be an observed value of $({\bar{Y}_{i},\tilde{S}_{i}})$ , and let

$\displaystyle R_{i}^{*}=[\tilde{s}_{i}^{{-1}/2}{{\tilde{\Sigma}}}_{i}\tilde{s}% _{i}^{{-1}/2}]^{{-1}/2}[\tilde{s}_{i}^{{-1}/2}\tilde{S}_{i}\tilde{s}_{i}^{{-1}% /2}][\tilde{s}_{i}^{{-1}/2}{{\tilde{\Sigma}}}_{i}\tilde{s}_{i}^{{-1}/2}]^{{-1}% /2}\quad,i=1,\ldots,k$ (7)

where $\tilde{s}_{i}$ and $R_{i}^{*}$ are independent, and $\tilde{s}_{i}^{{-1}/2}\tilde{S}_{i}\tilde{s}_{i}^{{-1}/2}\sim W_{p}({n_{i}-1,% \tilde{s}_{i}^{{-1}/2}\tilde{\Sigma}_{i}\tilde{s}_{i}^{{-1}/2}})$ , $R_{i}^{*}\sim W_{p}(n_{i}-1,1/(n_{i}-1)\linebreak I_{p}),i=1,\ldots,k$ .

The GV test is presented as

$\displaystyle G=\frac{T({\bar{Y}_{1},\ldots,\bar{Y}_{k};\tilde{S}_{1},\ldots,% \tilde{S}_{k}})}{T({{{\tilde{y}}}_{1},\ldots,{{\tilde{y}}}_{k};\tilde{s}_{1}^{% {-1}/2}R_{1}^{*-1}\tilde{s}_{1}^{{-1}/2},\ldots,\tilde{s}_{k}^{{-1}/2}R_{k}^{*% -1}\tilde{s}_{k}^{{-1}/2}})}$ (8)

since $T({\bar{Y}_{1},\ldots,\bar{Y}_{k};\tilde{S}_{1},\ldots,\tilde{S}_{k}})$ is distributed as chi-squared with degree of freedom $p(k-1)$ . The generalized $p$ -value is given by

$\displaystyle P_{\chi_{p(k-1)}^{2},R_{1}^{*},\ldots,R_{k}^{*}}\left({\frac{T({% \bar{Y}_{1},\ldots,\bar{Y}_{k};\tilde{S}_{1},\ldots,\tilde{S}_{k}})}{T({{{% \tilde{y}}}_{1},\ldots,{{\tilde{y}}}_{k};\tilde{s}_{1}^{{-1}/2}R_{1}^{*-1}% \tilde{s}_{1}^{{-1}/2},\ldots,\tilde{s}_{k}^{{-1}/2}R_{k}^{*-1}\tilde{s}_{k}^{% {-1}/2}})}>1}\right)$ (9)

where the GV test rejects the null hypothesis in Eq. (2) when the GV test in Eq. (9) is less than a given nominal level $\alpha$ .

Krishnamoorthy and Yu (2010) proposed the parametric bootstrap (PB) test by improving the sample mean $\bar{Y}_{i}$ and sample variance $\tilde{S}_{i}$ by using the bootstrap approach in which the PB pivotal quantity can be obtained as follows. Let $\bar{Y}_{Bi}\sim N_{p}({{{0}},\tilde{s}_{i}})$ and $\tilde{S}_{i}\sim W_{p}({n_{i}-1,(1/(n_{i}-1))\tilde{s}_{i}})$ . The PB pivotal quantity can be written as

$\displaystyle T({\bar{Y}_{B1},\ldots,\bar{Y}_{Bk};\tilde{S}_{B1},\ldots,\tilde% {S}_{Bk}})=\sum\limits_{i=1}^{k}{({\bar{Y}_{Bi}-\hat{\mu}_{B}^{*}})}^{\prime}% \tilde{S}^{-1}_{Bi}({\bar{Y}_{Bi}-\hat{\mu}_{B}^{*}})$ (10)

where $\hat{\mu}_{B}^{*}=\left({\sum\limits_{i=1}^{k}{\tilde{S}_{Bi}^{-1}}}\right)^{-% 1}\sum\limits_{i=1}^{k}{\tilde{S}_{Bi}^{-1}\bar{Y}_{Bi}}$ or equivalently,

$\displaystyle T({\bar{Y}_{B1},\ldots,\bar{Y}_{Bk};\tilde{S}_{B1},\ldots,\tilde% {S}_{Bk}})=\sum\limits_{i=1}^{k}{{\bar{Y}}^{\prime}_{Bi}\tilde{S}^{-1}_{Bi}% \bar{Y}_{Bi}-}\left({\sum\limits_{i=1}^{k}{{\bar{Y}}^{\prime}_{Bi}}\tilde{S}^{% -1}}\right)\left({\sum\limits_{i=1}^{k}{\tilde{S}^{-1}_{Bi}}}\right)^{-1}\left% ({\sum\limits_{i=1}^{k}{\tilde{S}^{-1}_{Bi}\bar{Y}_{Bi}}}\right)$ (11)

The PB $p$ -value is defined as

$\displaystyle P({T({\bar{Y}_{B1},\ldots,\bar{Y}_{Bk};\tilde{S}_{B1},\ldots,% \tilde{S}_{Bk}})>T_{0}}),$ (12)

where $T_{0}$ is the observed value in Eq. (3).

The PB pivotal quantity can be estimated using the Monte Carlo simulation as described below. Let $t_{i}$ be the Cholesky factor of $\tilde{s}_{i}$ , so that $\tilde{s}_{i}=t_{i}{t}^{\prime}_{i}$ , $i=1,\ldots,k$ . Then $\bar{Y}_{Bi}\sim t_{i}Z_{i}$ and $\tilde{S}^{-1}_{Bi}\sim{t_{i}V_{i}{t}^{\prime}_{i}}/{(n_{i}-1)}$ , where $Z_{i}$ and $V_{i}$ are independent with $Z_{i}\sim N_{p}({{{0}},I_{p}})$ and $V_{i}\sim W_{p}({n_{i}-1,I_{p}})$ . The PB test in Eq. (11) is distributed as

$\displaystyle T_{B}({{{Z}}_{i},{{V}}_{i}})=\sum\limits_{i=1}^{k}{f_{i}{{{Z}^{% \prime}}}_{i}{{V}}_{i}^{-1}{{Z}}_{i}-}\left({\sum\limits_{i=1}^{k}{f_{i}Z^{% \prime}_{i}{{V}}_{i}^{-1}t_{i}^{-1}}}\right)\left({\sum\limits_{i=1}^{k}{f_{i}% (t_{i}{{V}}_{i}{t}^{\prime}_{i})^{-1}}}\right)^{-1}\left({\sum\limits_{i=1}^{k% }{f_{i}t_{i}^{\prime-1}{{V}}_{i}^{-1}{{Z}}_{i}}}\right),$ (13)

where $f_{i}=n_{i}-1$ . Hence the $p$ -value of the PB test is

$\displaystyle P({T_{B}({{{Z}}_{i},{{V}}_{i}})>T_{0}}).$ (14)

The PB test rejects the null hypothesis in Eq. (2) when the $p$ -value of the PB test in Eq. (14) is less than the nominal level $\alpha$ . According to the results presented by Krishnamoorthy and Yu (2010) the PB test could control type I error rates better than the Johansen test, and the GV test performed very satisfactorily when the sample sizes are small.

The Bootstrap approach is a technical statistic for reducing the error of the test statistics for hypothesis testing, where it uses the process of the resampling technique. The Bootstrap approach is employed to reduce the error for estimating the test statistics supported by Jiang and Simon. Hence, this research uses the Bootstrap approach for developing the PB test for testing the mean vector $\mu_{i}$ with multivariate normal populations under unequal variance, where the Bootstrap approach is used to improve the $p$ -value of the PB test, and it is called the Improved Parametric Bootstrap (IPB) test.

The objective of this research is to compare the performance of the Johansen test, PB test, GV test and IPB test for testing the equality of the vector means in MANOVA when the unequal common covariance matrix is based on Type I error rates, which reflect the performance of the test statistic. The paper is organized as follows. The IPB test in MANOVA is described in Section 2. Section 3 show the results of simulation of the performance of the tests based on Type I error rates.and Section 4 presents illustrative examples. Finally, Section 5 contains conclusions.

2. Methodology

We proposed IPB test for comparing mean vectors under a Behrens–Fisher problem as follows:

Step 1: Compute the test statistic in Eq. (3) as

$\displaystyle T({\bar{Y}_{1},\ldots,\bar{Y}_{k};\tilde{S}_{1},\ldots,\tilde{S}% _{k}})=\sum\limits_{i=1}^{k}{{\bar{Y}}^{\prime}_{i}W\bar{Y}_{i}}-\left({\sum% \limits_{i=1}^{k}{W_{i}\bar{Y}_{i}}}\right)^{\prime}W^{-1}\left({\sum\limits_{% i=1}^{k}{W_{i}\bar{Y}_{i}}}\right).$ (15)

Step 2: Calculate PB tests in Eq. (13) with m values.

Step 3: From step 2, obtain the PB test of m values which is

$\displaystyle{{X}}=(PB_{1},\ldots,PB_{i-1},PB_{i},PB_{i+1},\ldots,PB_{m}).$ (16)

Step 4: From step 3, the sample of the Bootstrap approach is resampled in Eq. (16). The sample of the Bootstrap approach is

$\displaystyle{{X}}^{\ast}=(PB_{1}^{\ast},\ldots,PB_{i-1}^{\ast},PB_{i}^{\ast},% PB_{i+1}^{\ast},\ldots,PB_{m}^{\ast}).$ (17)

Step 5: Compare the values in ${{X}}^{\ast}$ with the test statistic in step 1 when the values in ${{X}}^{\ast}$ are more than the test statistic in step 1, set $Q_{j}=1$ .

Step 6: From step 5, we obtain the $p$ -value as

$\displaystyle\textit{p-value}_{i}=\frac{\sum\limits_{j=1}^{m}{Q_{j}}}{m}.$ (18)

Step 7: Repeat the steps 4, 5 and 6 n times. We obtain the $p$ -value of IPB is

$\displaystyle\textit{p-value}_{\textit{IPB}}=\frac{\sum\limits_{i=1}^{n}{% \textit{p-value}_{i}}}{n}$ (19)

where IPB test rejects the null hypothesis in Eq. (2) when the $p$ -value in Eq. (19) is less than the nominal level $\alpha$ .

3. Results

For simulation studies, we used Monte Carlo simulation with the R statistical package. Monte Carlo simulation is presented for calculating Type I error rates of the Johansen test, GV test, PB test and IPB test. We denote, without loss of generality, that ${{\mu}}=(\mu_{1},\mu_{2},\ldots,\mu_{k})$ to be a vector of zeroes to examine Type I error rates of the Johansen test, GV test, PB test and IPB test. For comparing the population mean vectors, we can assume $\sum_{1}=I_{p}$ , $\sum_{2}=\text{diag}(\lambda_{1},\ldots,\lambda_{p})$ and other matrices are arbitrarily positive definite. For $k=$ 3 and 5 under various values of ${{n}}_{i}$ in our simulation studies, we take the covariance matrices to range from $-1<\rho<1$ .

For Type I error rates of the Johansen test, GV test, PB test and IPB test, the sample mean ${{\bar{y}}}_{i}$ and the sample covariance matrix ${{\tilde{s}}}_{i}$ based on the $i$ th sample are generated independently as ${{\bar{y}}}_{i}\sim N_{p}({{{0}},\frac{1}{n_{i}}{{\Sigma}}_{i}})$ and ${{\tilde{s}}}_{i}\sim W_{p}({n_{i}-1,\frac{1}{n_{i}-1}{{\tilde{\Sigma}}}_{i}})$ with ${{\tilde{\Sigma}}}_{i}=\frac{1}{n_{i}}{{\Sigma}}_{i}$ . We used 10,000 observed vectors $({{\bar{y}}}_{1},\ldots,\bar{y}_{k},\tilde{s}_{1},\ldots,\tilde{s}_{k})$ to compute the observed value in Eq. (3). To calculate the Type I error rates of the Johansen test, they are determined by the proportions of Johansen test that exceed the critical value. For estimating Type I error rates of the GV test, PB test and IPB test, we used 10,000 observed vectors $({{{\bar{y}}}_{1}{{,\ldots,\bar{y}}}_{k}{{,\tilde{s}}}_{1}{{,\ldots,\tilde{s}}% }_{k}})$ to compute the observed value in Eq. (3). We use 10,000 runs to estimate the $p$ -value of the GV test, PB test and IPB test. Finally, the Type I error rates are estimated by the proportion of the 10,000 $p$ -values that are less than the nominal level 0.05. The values of the Type I error rates are shown in Tables 1–3.

Table 1
Monte Carlo estimates of Type I error rates for comparing bivariate normal mean vectors

$k=$ 3, $p=$ 2, $\sum_{1}=I_{2}$ , $\sum_{2}=\text{diag}(\lambda_{1},\lambda_{2})$ , $\sum_{3}=\left({{\begin{array}[]{*{20}c}1&{\rho_{3}}\\ {\rho_{3}}&1\\ \end{array}}}\right)$
$(n_{1},n_{2},n_{3})$	$(\lambda_{1},\lambda_{2},\rho_{3})$	PB	IPB	Johansen	GV
(7, 7, 7)	(1, 1, 0)	0.044	0.046	0.057	0.054
	(1, 0.9, 0.1)	0.048	0.047	0.054	0.052
	(1, 0.5, 0.2)	0.043	0.049	0.058	0.047
	(1, 0.1, 0.3)	0.047	0.045	0.068	0.061
	(0.2, 0.6, 0.5)	0.056	0.053	0.067	0.073
	(0.9, 0.9, 0.6)	0.049	0.049	0.055	0.056
	(0.7, 0.8, $-$ 0.2)	0.046	0.045	0.057	0.058
(7, 10, 20)	(1, 1, 0)	0.052	0.051	0.060	0.095
	(1, 0.9, 0.1)	0.054	0.051	0.059	0.072
	(1, 0.5, 0.2)	0.052	0.051	0.062	0.089
	(1, 0.1, 0.3)	0.054	0.049	0.070	0.086
	(0.2, 0.6, 0.5)	0.053	0.051	0.067	0.078
	(0.9, 0.9, 0.6)	0.054	0.052	0.061	0.096
	(0.7, 0.8, $-$ 0.2)	0.053	0.051	0.064	0.079
(10, 10, 40)	(1, 1, 0)	0.058	0.051	0.055	0.100
	(1, 0.9, 0.1)	0.049	0.048	0.055	0.090
	(1, 0.5, 0.2)	0.043	0.042	0.054	0.093
	(1, 0.1, 0.3)	0.052	0.049	0.055	0.096
	(0.2, 0.6, 0.5)	0.052	0.049	0.054	0.110
	(0.9, 0.9, 0.6)	0.057	0.051	0.055	0.111
	(0.7, 0.8, $-$ 0.2)	0.045	0.043	0.055	0.099
(25, 20, 20)	(1, 1, 0)	0.049	0.051	0.049	0.043
	(1, 0.9, 0.1)	0.050	0.048	0.050	0.059
	(1, 0.5, 0.2)	0.049	0.046	0.051	0.048
	(1, 0.1, 0.3)	0.052	0.049	0.049	0.054
	(0.2, 0.6, 0.5)	0.053	0.049	0.052	0.054
	(0.9, 0.9, 0.6)	0.050	0.048	0.053	0.059
	(0.7, 0.8, $-$ 0.2)	0.050	0.048	0.050	0.058
$k=$ 5, $p=$ 2, $\sum_{1}=I_{2}$ , $\sum_{2}=\text{diag}(\lambda_{1},\lambda_{2})$ , $\sum_{i}=\left({{\begin{array}[]{*{20}c}1&{\rho_{i}}\\ {\rho_{i}}&1\\ \end{array}}}\right)$ , $i=3,4,5$
$(n_{1},\ldots,n_{5})$	$(\lambda_{1},\lambda_{2},\rho_{3},\rho_{4},\rho_{5})$	PB	IPB	Johansen	GV
(7, 7, 7, 7, 7)	(1, 1, 1, 0.5, 0.5, 0.5)	0.050	0.049	0.071	0.104
	(0.1, 0.1, 0.1, 0.3, 0.3, 0.3)	0.050	0.048	0.072	0.114
	(0.1, 0.4, 0.7, 0, 0, 0)	0.048	0.048	0.072	0.113
	(0.1, 0.3, 0.9, 0.1, 0.4, 0.9)	0.048	0.048	0.074	0.123
	(0.1, 0.2, 0.3, $-$ 0.1, 0.1, 0.9)	0.051	0.050	0.076	0.124
	(0.4, 0.4, 0.5, $-$ 0.3, 0.4, 0.3)	0.053	0.051	0.072	0.118
	(0.9, 0.9, 0.9, $-$ 0.4, 0.6, 0.9)	0.052	0.050	0.077	0.133
(12, 12, 12, 12, 12)	(1, 1, 1, 0.5, 0.5, 0.5)	0.050	0.045	0.055	0.075
	(0.1, 0.1, 0.1, 0.3, 0.3, 0.3)	0.053	0.046	0.056	0.078
	(0.1, 0.4, 0.7, 0, 0, 0)	0.052	0.045	0.056	0.085
	(0.1, 0.3, 0.9, 0.1, 0.4, 0.9)	0.051	0.046	0.056	0.083
	(0.1, 0.2, 0.3, $-$ 0.1, 0.1, 0.9)	0.050	0.047	0.057	0.086
	(0.4, 0.4, 0.5, $-$ 0.3, 0.4, 0.3)	0.050	0.046	0.056	0.082
	(0.9, 0.9, 0.9, $-$ 0.4, 0.6, 0.9)	0.048	0.048	0.057	0.084
(20, 20, 20, 20, 20)	(1, 1, 1, 0.5, 0.5, 0.5)	0.054	0.049	0.053	0.054
	(0.1, 0.1, 0.1, 0.3, 0.3, 0.3)	0.051	0.045	0.051	0.057
	(0.1, 0.4, 0.7, 0, 0, 0)	0.052	0.046	0.052	0.065
	(0.1, 0.3, 0.9, 0.1, 0.4, 0.9)	0.047	0.047	0.052	0.061
	(0.1, 0.2, 0.3, $-$ 0.1, 0.1, 0.9)	0.051	0.049	0.052	0.067
	(0.4, 0.4, 0.5, $-$ 0.3, 0.4, 0.3)	0.053	0.050	0.052	0.054
	(0.9, 0.9, 0.9, $-$ 0.4, 0.6, 0.9)	0.048	0.047	0.053	0.065

In Table 1, for $k=$ 3 and $p=$ 2, in the case of the balanced sample sizes, the four tests control Type I error rates quite well. Meanwhile, in the case of the unbalanced sample sizes ${n}=({7,10,20})$ , the PB test and IBP test could control Type I error rates more satisfactorily than the Johansen test and GV test. Again, observe that the GV test is extremely poor in the case of unbalanced sample sizes ${{n}}=({10,10,40})$ , but the PB test, IPB test and Johansen test are quite good in this case. For $k=$ 5 and $p=$ 2, in the case of the balanced sample sizes, the GV test is poor at controlling Type I error rates, however the PB test and IBP test are very good in the case of balanced sample sizes. In unbalanced sample sizes ${{n}}=({15,20,10,32,7})$ , the PB test and IPB test could control Type I error rates more than the Johansen test and GV test. Considering the PB test and IPB test it was found that the Type I error rates of the IPB test do not exceed the nominal level 0.05 better than for the PB test in case $k=$ 5 and $p=$ 2.

Table 2

Monte Carlo estimates of Type I error rates for comparing trivariate normal mean vectors

$k=$ 3, $p=$ 3, $\sum_{1}=I_{3}$ , $\sum_{2}=\text{diag}(\lambda_{1},\lambda_{2},\lambda_{3})$ , $\sum_{3}=\left({{\begin{array}[]{*{20}c}1&\rho&\rho\\ \rho&1&\rho\\ \rho&\rho&1\\ \end{array}}}\right)$
$(n_{1},n_{2},n_{3})$	$(\lambda_{1},\lambda_{2},\lambda_{3},\rho)$	PB	IPB	Johansen	GV
(7, 7, 7)	(1, 1, 1, 0)	0.039	0.041	0.079	0.077
	(1, 1, 0.1, 0.1)	0.043	0.042	0.090	0.081
	(1, 0.1, 0.1, 0.5)	0.049	0.047	0.105	0.096
	(0.2, 0.6, 0.9, $-$ 0.3)	0.048	0.048	0.089	0.086
	(0.6, 0.6, 0.6, 0)	0.045	0.046	0.086	0.076
	(0.3, 0.9, 0.1, $-$ 0.1)	0.045	0.048	0.102	0.081
	(0.8, 0.5, 0.5, 0.1)	0.047	0.046	0.087	0.079
(7, 10, 20)	(1, 1, 1, 0)	0.054	0.050	0.080	0.101
	(1, 1, 0.1, 0.1)	0.058	0.051	0.091	0.114
	(1, 0.1, 0.1, 0.5)	0.064	0.058	0.111	0.117
	(0.2, 0.6, 0.9, $-$ 0.3)	0.061	0.053	0.094	0.108
	(0.6, 0.6, 0.6, 0)	0.052	0.045	0.086	0.109
	(0.3, 0.9, 0.1, $-$ 0.1)	0.053	0.048	0.098	0.111
	(0.8, 0.5, 0.5, 0.1)	0.052	0.049	0.087	0.108
(10, 10, 10)	(1, 1, 1, 0)	0.045	0.041	0.057	0.063
	(1, 1, 0.1, 0.1)	0.058	0.049	0.062	0.069
	(1, 0.1, 0.1, 0.5)	0.055	0.047	0.064	0.069
	(0.2, 0.6, 0.9, $-$ 0.3)	0.057	0.052	0.065	0.070
	(0.6, 0.6, 0.6, 0)	0.056	0.053	0.068	0.072
	(0.3, 0.9, 0.1, $-$ 0.1)	0.053	0.046	0.064	0.073
	(0.8, 0.5, 0.5, 0.1)	0.052	0.049	0.060	0.072
(10, 10, 40)	(1, 1, 1, 0)	0.060	0.052	0.063	0.131
	(1, 1, 0.1, 0.1)	0.056	0.051	0.064	0.133
	(1, 0.1, 0.1, 0.5)	0.053	0.049	0.065	0.136
	(0.2, 0.6, 0.9, $-$ 0.3)	0.054	0.053	0.063	0.123
	(0.6, 0.6, 0.6, 0)	0.052	0.048	0.062	0.121
	(0.3, 0.9, 0.1, $-$ 0.1)	0.053	0.051	0.061	0.126
	(0.8, 0.5, 0.5, 0.1)	0.052	0.052	0.063	0.131
(25, 20, 20)	(1, 1, 1, 0)	0.056	0.052	0.050	0.057
	(1, 1, 0.1, 0.1)	0.052	0.049	0.053	0.054
	(1, 0.1, 0.1, 0.5)	0.047	0.047	0.055	0.063
	(0.2, 0.6, 0.9, $-$ 0.3)	0.054	0.053	0.051	0.057
	(0.6, 0.6, 0.6, 0)	0.052	0.052	0.052	0.045
	(0.3, 0.9, 0.1, $-$ 0.1)	0.059	0.055	0.052	0.053
	(0.8, 0.5, 0.5, 0.1)	0.048	0.042	0.051	0.042
$k=$ 5, $p=$ 3, $\sum_{1}=I_{3}$ , $\sum_{2}=\text{diag}(\lambda_{1},\lambda_{2},\lambda_{3})$ , $\sum_{3}=\left({{\begin{array}[]{*{20}c}1&{\rho_{i}}&{\rho_{i}}\\ {\rho_{i}}&1&{\rho_{i}}\\ {\rho_{i}}&{\rho_{i}}&1\\ \end{array}}}\right)$ , $i=3,4,5$
$(n_{1},n_{2},n_{3},n_{4},n_{5})$	$(\lambda_{1},\lambda_{2},\lambda_{3},\rho_{3},\rho_{4},\rho_{5})$	PB	IPB	Johansen	GV
(7, 7, 7, 7, 7)	(1, 1, 1, 0.5, 0.5, 0.5)	0.047	0.045	0.112	0.208
	(0.1, 0.1, 0.1, 0.3, 0.3, 0.3)	0.048	0.044	0.172	0.211
	(0.1, 0.4, 0.7, 0, 0, 0)	0.050	0.046	0.132	0.222
	(0.1, 0.3, 0.9, 0.1, 0.4, 0.9)	0.051	0.046	0.145	0.232
	(0.1, 0.2, 0.3, $-$ 0.1, 0.1, 0.9)	0.049	0.045	0.134	0.225
	(0.4, 0.4, 0.5, $-$ 0.3, 0.4, 0.3)	0.048	0.043	0.135	0.226
	(0.9, 0.9, 0.9, $-$ 0.4, 0.6, 0.9)	0.049	0.046	0.130	0.221
(12, 12, 12, 12, 12)	(1, 1, 1, 0.5, 0.5, 0.5)	0.047	0.045	0.073	0.133
	(0.1, 0.1, 0.1, 0.3, 0.3, 0.3)	0.051	0.050	0.083	0.143
	(0.1, 0.4, 0.7, 0, 0, 0)	0.049	0.046	0.074	0.124
	(0.1, 0.3, 0.9, 0.1, 0.4, 0.9)	0.050	0.048	0.081	0.134
	(0.1, 0.2, 0.3, $-$ 0.1, 0.1, 0.9)	0.051	0.049	0.082	0.121
	(0.4, 0.4, 0.5, $-$ 0.3, 0.4, 0.3)	0.050	0.050	0.067	0.125
	(0.9, 0.9, 0.9, $-$ 0.4, 0.6, 0.9)	0.050	0.050	0.073	0.141
(20, 20, 20, 20, 20)	(1, 1, 1, 0.5, 0.5, 0.5)	0.047	0.045	0.061	0.086
	(0.1, 0.1, 0.1, 0.3, 0.3, 0.3)	0.052	0.049	0.065	0.105

Table 2, continued
$k=$ 5, $p=$ 3, $\sum_{1}=I_{3}$ , $\sum_{2}=\text{diag}(\lambda_{1},\lambda_{2},\lambda_{3})$ , $\sum_{3}=\left({{\begin{array}[]{*{20}c}1&{\rho_{i}}&{\rho_{i}}\\ {\rho_{i}}&1&{\rho_{i}}\\ {\rho_{i}}&{\rho_{i}}&1\\ \end{array}}}\right)$ , $i=3,4,5$
$(n_{1},n_{2},n_{3},n_{4},n_{5})$	$(\lambda_{1},\lambda_{2},\lambda_{3},\rho_{3},\rho_{4},\rho_{5})$	PB	IPB	Johansen	GV
(20, 20, 20, 20, 20)	(0.1, 0.4, 0.7, 0, 0, 0)	0.048	0.048	0.065	0.076
	(0.1, 0.3, 0.9, 0.1, 0.4, 0.9)	0.051	0.049	0.058	0.089
	(0.1, 0.2, 0.3, $-$ 0.1, 0.1, 0.9)	0.051	0.050	0.061	0.088
	(0.4, 0.4, 0.5, $-$ 0.3, 0.4, 0.3)	0.051	0.052	0.062	0.089
	(0.9, 0.9, 0.9, $-$ 0.4, 0.6, 0.9)	0.050	0.051	0.063	0.089
(15, 20, 10, 32, 7)	(1, 1, 1, 0.5, 0.5, 0.5)	0.049	0.045	0.092	0.158
	(0.1, 0.1, 0.1, 0.3, 0.3, 0.3)	0.051	0.049	0.126	0.234
	(0.1, 0.4, 0.7, 0, 0, 0)	0.050	0.051	0.116	0.156
	(0.1, 0.3, 0.9, 0.1, 0.4, 0.9)	0.050	0.050	0.086	0.165
	(0.1, 0.2, 0.3, $-$ 0.1, 0.1, 0.9)	0.052	0.050	0.101	0.145
	(0.4, 0.4, 0.5, $-$ 0.3, 0.4, 0.3)	0.051	0.051	0.111	0.153
	(0.9, 0.9, 0.9, $-$ 0.4, 0.6, 0.9)	0.050	0.051	0.076	0.152

In Table 2, the results indicate that the Type I error rates of the PB test and IPB test, in the case of balanced sample sizes and unbalanced sample sizes, control Type I error rates very well under $k=$ 5 and $p=$ 3, but some Type I error rates of the Johansen test and GV test exceed 0.1. However, the four tests control Type I error rates satisfactorily in the case of the unbalanced sample sizes ${{n}}=({25,20,20})$ , $k=$ 5 and $p=$ 3. As observed for balanced and unbalanced sample sizes, for $k=$ 5 and $p=$ 3, the results show that the PB test and IPB test control quite well, but the Type I error rates of the Johansen test and GV test exceed the nominal level 0.05. Among the four tests, both the PB test and the IPB test are the best test statistics in terms of Type I error rates, however the IPB test inhibited Type I error rates quite well.

Table 3

Monte Carlo estimates of power of the test for four tests

	${{\mu}}_{1}={{0}},(\lambda_{1},\lambda_{2},\lambda_{3},\rho)=(1,0.1,0.1,0.5),k% =3,p=3,\sum_{1}=I_{3}$
$\sum_{2}=\text{diag}(\lambda_{1},\lambda_{2},\lambda_{3})$ , $\sum_{3}=\left({{\begin{array}[]{*{20}c}1&\rho&\rho\\ \rho&1&\rho\\ \rho&\rho&1\\ \end{array}}}\right)$	${{\mu}}_{{2}}={{0}}$ , ${{\mu}}_{3}={{0}}$	${{\mu}}_{{2}}={{0}}$ , ${{\mu}}_{3}=0.2$	${{\mu}}_{{2}}={{0}}$ ,
${{\mu}}_{3}=0.5$	${{\mu}}_{{2}}={{0}}$ , ${{\mu}}_{3}=0.7$	${{\mu}}_{{2}}=0.5$ , ${{\mu}}_{3}=1$	${{\mu}}_{{2}}=1.5$ , ${{\mu}}_{3}=1$
${{n}}=({7,7,7})$
PB	0.049	0.252	0.355	0.524	0.742	1
IPB	0.047	0.284	0.405	0.611	0.802	1
Johansen	0.105	0.304	0.444	0.702	0.823	1
GV	0.096	0.301	0.424	0.700	0.820	1
${{n}}=({7,10,20})$
PB	0.058	0.342	0.355	0.644	0.901	1
IPB	0.051	0.374	0.405	0.721	0.912	1
Johansen	0.091	0.424	0.444	0.742	0.914	1
GV	0.114	0.421	0.424	0.743	0.917	1
${{n}}=({20,20,20})$
PB	0.049	0.352	0.412	0.727	0.812	1
IPB	0.047	0.374	0.423	0.742	0.825	1
Johansen	0.051	0.367	0.434	0.745	0.827	1
GV	0.091	0.364	0.435	0.747	0.824	1

In Table 3, we present the power of four tests for $k=$ 3. Upon examination, we find that the following tests as PB test, IPB test, PB test, Johansen test, and GV test demonstrate significant power in the case of sample sizes $n=$ (7, 7, 7), $n=$ (7, 10, 20) and $n=$ (20, 20, 20). Notably, the power of the IPB test in Table 3 surpasses that of the PB test, especially when sample sizes are equal or unequal under unequal common covariance matric.

4. Illustrative examples

We used a data set for explaining the four tests by Bryan and Jorge (2016). The data set consisted of fives samples of 30 skulls for each sample, where there are the early predynastic period (circa 4000 BC), the late predynastic period (circa 3300 BC), the 12th and 13th dynasties (circa 1850 BC), the ptolemaic period (circa 200 BC), and the Roman period (circa AD 150). For each time period, 30 skulls are measured with four variables, being maximal breadth, basibregmatic height, basialveolar length and nasal height. We conducted using a situation by Krishnamoorthy and Yu (2010), which is $(n_{1},n_{2},n_{3},n_{4})=(15,15,15,15)$ , the number of groups is $k=$ 4, and the number of variables is $p=$ 4. Hence, the hypothesis testing is

$\displaystyle H_{0}:\mu_{1}=\mu_{2}=\mu_{3}=\mu_{4}\text{ versus }H_{1}:\mu_{i% }\neq\mu_{j}\text{ for some }i\neq j.$

Where $\mu_{1}=\left({{\begin{array}[]{*{20}c}{\mu_{11}}\\ {\mu_{12}}\\ {\mu_{13}}\\ {\mu_{14}}\\ \end{array}}}\right)$ , $\mu_{2}=\left({{\begin{array}[]{*{20}c}{\mu_{21}}\\ {\mu_{22}}\\ {\mu_{23}}\\ {\mu_{24}}\\ \end{array}}}\right)$ , $\mu_{3}=\left({{\begin{array}[]{*{20}c}{\mu_{31}}\\ {\mu_{32}}\\ {\mu_{33}}\\ {\mu_{34}}\\ \end{array}}}\right)$ and $\mu_{4}=\left({{\begin{array}[]{*{20}c}{\mu_{41}}\\ {\mu_{42}}\\ {\mu_{43}}\\ {\mu_{44}}\\ \end{array}}}\right)$ . The Johansen’s test was calculated by Krishnamoorthy and Yu (2010) where the $p$ -value is computed as 0.0304. Therefore, we will calculate the GV test, PB test and IPB test as follows.

4.1 Generalized variable (GV) test

In the first step, we calculate statistics as ${{\tilde{y}}}_{i}$ and $\tilde{s}_{i}$ . After that the GV test in Eq. (9) has 10,000 values generated, and then the p value of the GV test is estimated by the proportion of these 10,000 generated values that are greater than 1. We have the $p$ value of the GV test as 0.001, which is the null hypothesis in (20). Hence, the Egyptian skulls have statistically significant changes over those four periods.

4.2 Parametric bootstrap (PB) test

The Cholesky factors $t_{i}$ are calculated where $\tilde{s}_{i}=t_{i}{t}^{\prime}_{i}(i=1,2,3,4)$ . The Cholesky factors $t_{i}$ are as follows.

$\displaystyle t_{1}=\left({{\begin{array}[]{*{20}c}{1.542}&0&0&0\\ {0.320}&{1.330}&0&0\\ {0.264}&{-0.374}&{1.465}&0\\ {0.486}&{0.004}&{-0.129}&{0.534}\\ \end{array}}}\right),t_{2}=\left({{\begin{array}[]{*{20}c}{1.433}&0&0&0\\ {-0.226}&{1.361}&0&0\\ {0.216}&{0.309}&{0.717}&0\\ {0.050}&{0.561}&{0.229}&{0.607}\\ \end{array}}}\right)$ $\displaystyle t_{3}=\left({{\begin{array}[]{*{20}c}{1.053}&0&0&0\\ {-0.167}&{1.305}&0&0\\ {-0.128}&{0.005}&{1.299}&0\\ {-0.048}&{0.159}&{0.185}&{0.794}\\ \end{array}}}\right),t_{4}=\left({{\begin{array}[]{*{20}c}{0.871}&0&0&0\\ {0.036}&{1.206}&0&0\\ {-0.062}&{0.428}&{1.314}&0\\ {0.173}&{0.441}&{0.219}&{0.678}\\ \end{array}}}\right)$

when the Cholesky factors $t_{i}(i=1,2,3,4)$ are calculated. After that the $p$ -value of the PB test is estimated with a simulation consisting of 10,000 runs, then the $p$ -value of the PB test is equal to 0.044. Therefore, we reject the null hypothesis in (20). Hence, the Egyptian skulls have statistically significant changes over those four periods.

4.3 Parametric bootstrap (PB) test

The PB test is estimated from a simulation consisting of 10,000 runs. We estimated the $p$ value with the bootstrap approach consisting of 10,000 times. We have the sample bootstrap approach as ${{X}}^{\ast}=(PB_{1}^{\ast},\ldots,PB_{i-1}^{\ast},PB_{i}^{\ast},\linebreak PB% _{i+1}^{\ast},\ldots,PB_{10000}^{\ast})$ . After that the values in ${{X}}^{\ast}$ are compared with the test statistic in Eq. (15), when the values in ${{X}}^{\ast}$ are more than the test statistic in Eq. (15), set $Q_{j}=1$ . The $p$ -value is $\textit{p-value}_{i}=\frac{\sum\limits_{j=1}^{10000}{Q_{j}}}{10000}$ . The $p$ -value repeated with 10,000 times as $\textit{p-value}_{\textit{IPB}}=\frac{\sum\limits_{i=1}^{10000}{\textit{p-% value}_{i}}}{10,000}$ , then PB $p$ -value is obtained as 0.041. Therefore, we reject the null hypothesis in Eq. (20). Hence, the Egyptian skulls have statistically significant changes over those four periods.

5. Conclusion

The Hotelling T² test for MANOVA is a test statistic based on equal common covariance matrices $\sum$ , and it could not control Type I error rates under the Behrens-Fisher problem (unequal common covariance matric $\sum_{i})$ . In this research, we propose the IBP test and compare it with three tests (PB test, GV test and Johansen’s test) based on Type I error rates. For $p=$ 2, the PB test and IPB test control Type I error rates very well, which is better than the GV test and Johansen’s test at the nominal level 0.05 under the Behrens-Fisher problem. When $p=$ 3, the PB test and IPB test control Type I error rates quite well when the nominal level is 0.05. For $k=$ 10, the PB test and IPB test control Type I error rates quite well at the nominal level 0.05 under the Behrens-Fisher problem. Moreover, power of the four tests appear that IPB test, Johansen test, and GV test are very well power of the test. From the results of the Type I error rates and power of the tests, we suggest that the IBF test be used as an alternative approach for comparing the mean vectors with multivariate normal populations under the Behrens-Fisher problem in MANOVA, since it can control Type I error rates and power of the test very well.

References

Bartlett

M.S.

(1939). A note on tests of significance in multivariate analysis. Mathematical Proceedings of the Cambridge Philosophical Society, 35, 180-185.

Bryan

F.J.

, & Jorge

A.N.A.

(2016). Multivariate Statistical Methods. Taylor & Francis 4th Edition, 4-5.

Gamage

Mathew

, & Weerahandi

(2004). Generalized p-values and generalized confidence regions for the multivariate Behrens-Fisher problem and MANOVA. Journal of Multivariate Analysis, 88, 177-189.

Hotelling

(1951). A generalized T test and measure of multivariate dispersion, in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, 23-41.

Jiang

, & Simon

(2007). A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Statistics in Medicine, 26, 5320-5334.

Johansen

(1980). The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. Biometrika, 67, 85-92.

Krishnamoorthy

, & Yu

(2010). A parametric bootstrap solution to the MANOVA under heteroscedasticity. Journal of Statistical Computation and Simulation, 80, 873-887.

Lawley

D.N.

(1938). A generalization of Fisher’s z-test. Biometrika, 30, 180-187.

Pillai

K.C.S.

(1995). Some new test criteria in multivariate analysis. Annals of Mathematical Statistics, 26, 117-121.

10.

Tsui

, & Weerahandi

(1989). Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters. Journal American Statistical Association, 84, 602-607.

11.

Willks

S.S.

(1932). Certain generalizations in the analysis of variance. Biometrika, 24, 471-494.

12.

Zhang

, & Xu

(2009). On the k-sample Behrens-Fisher problem for high-dimensional data. Science in China, Series A: Mathematics, 52, 1285-1304.