This paper proposes a new asymmetric V-shaped distribution for fitting continuous data. In this study, some statistical properties, such as the mean, the median, the variance, the survival, and the hazard function of the new distribution are investigated. Furthermore, we also presented how to generate the proposed asymmetric V-shaped distribution based on two random variables that have uniform distributions. Three examples are presented to illustrate the advantages of the asymmetric V-shaped distribution for some simulated and real-life data sets.
Data distribution allows us to represent a variable in a compact and simple form. A well-fitting distribution can result in a suitable model for classifying and predicting data, given a data set. Consequently, data distribution modeling has been extensively studied and applied in numerous disciplines, including physic (Afify et al., 2021), economics (Basheer, 2019; Ramos et al., 2019; Vovan et al., 2019), image processing (VoVan & NguyenTrang, 2018), etc.
A distribution can be classified as either unimodal, bimodal, or multimodal (Galtung, 1967). A unimodal distribution is one that has only a single mode. The value of the density function initially rises to its utmost at the mode and then decreases. The normal distribution is the most well-known unimodal distribution. Other examples that can be listed are the Student distribution, the Poisson distribution, the Cauchy distribution, etc. Due to the analytical advantages, most modeling methods are mainly based on some well-known unimodal distributions. Nevertheless, in several practical problems, a single mode does not accurately describe a data set (Braga et al., 2018; Cheng et al., 2017; Ong et al., 2016). Therefore, bimodal distributions need to be studied.
The bimodal distribution is a distribution with two different modes. The larger mode is called major mode and the other mode is called minor mode. The Beta distribution can be either a unimodal or a bimodal distribution, depending on its shape parameters (Jeffreys, 1946; Mazucheli & Menezes, 2019). The Gaussian mixture distribution, which was first proposed by Karl Pearson, is another multimodal distribution (Heggeseth et al., 2018; Rahmani et al., 2020). This is a compound distribution composed of Gaussian distributions. When 2, the Gaussian mixture distribution becomes a bimodal distribution. In addition to the Gaussian mixture distribution, several compound distributions proposed in the literature can be considered as a multimodal distribution (Ahmad et al., 2022; Chesneau et al., 2019; Moreau, 2021; Para et al., 2020; Punzo et al., 2018; Vasconcelos et al., 2020). Some less familiar bimodal distributions are the U-shaped and the V-shaped distributions. In a U-shaped distribution, the probability density function value is greatest at the extremes of data and fast decreases to a minimum in the middle of the distribution. Similarly, the V-shaped probability density distribution value is high at the two ends of the data and decreases towards the middle part.
To our knowledge, V-shaped distribution was rarely studied in literature and was only presented for symmetric data (Mauris, 2009; Rinne, 2014). However, there is evidence of an asymmetric V-shaped distribution in some problems. For example, in the financial market, the probability that a trader takes his/her profit increases as the magnitude of the profit increases. Additionally, the positive side has a steeper slope than the negative side (An, 2016; Ben-David & Hirshleifer, 2012). This phenomenon produces an asymmetric V-shaped distribution for the obtained profit. As a result, this paper proposes a new asymmetric V-shaped distribution. Because “asymmetric” is a more general concept compared to “symmetric”, the proposed distribution is expected to have a higher level of generality. Thus, it can be a more flexible and suitable distribution compared to the existing symmetric V-shaped distributions. Besides, this distribution provides a new method for researchers in various fields to model a data set. Based on the proposed probability density function, we further present the cumulative distribution function, and other statistical properties, such as mean, median, variance, survival function and Hazard function.
The rest of the paper is presented as follows. Some related bimodal distributions are presented in Section 2. Section 3 presents the new asymmetric V-shaped probability density and cumulative distribution functions. Statistical parameters such as mean, median, variance, survival function, and hazard function, are also presented in this section. The numerical examples are presented in Section 4 and the conclusion is provided in Section 5.
Related works
Beta distribution
Suppose that is a random variable on the interval and has the Beta distribution, . The probability density function of is defined as Eq. (1).
where .
The standard Beta distribution can be extended to the interval using the generalized Beta distribution where and stand for the minimum and the maximum values of . Also, it can be noted that the Beta is a flexible distribution. For example, approaches a Bernoulli distribution, approaches a Uniform distribution whereas approaches a bimodal U-shaped distribution. Nevertheless, no report indicated the parameters by which the Beta distribution can approach a V-shaped distribution.
Gaussian mixture distribution
A Gaussian mixture distribution can represent a bimodal distribution using Eq. (2).
where is the probability density function of the -th normal distribution, is the -th mixing parameter, and . Suppose that is indicative of , and . We have the following Eq. (3), which is a simplified equation to represent a bimodal distribution.
Arcsine distribution
A random variable on the interval has the arcsine distribution , or U-shaped distribution if its probability density function is given by:
Symmetric V-shaped distribution
The probability density function of a symmetric V-shaped distribution is given by
where is a random variable on the interval . It can be seen from Eq. (5) that always decreases from to the range’s middle point and increases after that. As a result, it always produces a symmetric but not an asymmetric distribution.
The proposed asymmetric V-shaped distribution
Probability density function and cumulative distribution function
A random variable on the interval has an asymmetric V-shaped distribution, , if its probability density function is given by Eq. (6):
where gets an arbitrary value in the range . It can be seen from Eq. (6) that linearly decreases from to and then linearly increases. Because is not always the midpoint but it could be any value in the range , is not always a symmetric but it could be an asymmetric function. Therefore, the proposed probability density function in Eq. (6) has a higher generality compared to Eq. (5).
The probability density functions and the cumulative distribution functions of asymmetric V-shaped distribution for various values of .
It can be easily shown that and . Also, by integrating , the cumulative distribution function can be shown as:
The probability density functions and the cumulative distribution functions for , , and various values of are shown in Fig. 1a and b. It can be seen from Fig. 1 that the proposed distribution is quite flexible due to the parameter . For example, when , the density function is asymmetric; when , the density function becomes symmetric; when or , the density function is simplified to be a linear function ascending/descending from to .
Statistical properties of the proposed asymmetric V-shaped distribution
.
Let have an asymmetric V-shaped distribution, , , and stand for the population mean, median and variance. We have the following statistical properties:
Proof..
1. We have,
and
Get sum of Eqs (11) and (12), and then simplify it, we obtain Eq. (8).
2. Let is the median of .
For , assuming that , using the definition of Median, we have:
The above inequality shows that the assumption of is untrue.
Now, assuming that , using the definition of Median, we have:
Because , the above equality is equivalent to . or
For , it can be easily proved that the assumption of is untrue.
Now, let , using the definition of Median, we have:
It can be seen from Eqs (20) and (21) that is the cumulative distribution function of an asymmetric V-shaped distribution with .
∎
Survival function and hazard function
The survival function describes the probability that an individual will not fail after a certain time. The hazard function or the failure rate function represents the probability that an individual will fail in a period of time. Mathematically, the survival function is given by and the hazard function is given by . Figure 2 illustrates the survival and hazard functions for various choices of asymmetric V-shaped distribution.
The survival and hazard functions of asymmetric V-shaped distribution for various values of .
The histogram of the simulated data and the estimated asymmetric V-shaped distribution.
Numerical example
In this section, three examples are presented to illustrate the asymmetric V-shaped distribution. Example 1 extends Theorem 3.3 in such a way to generate any independent identically distributed (i.i.d.) uniform random variables . Example 2 indicates the existence of the asymmetric V-shaped distribution when we establish the major error distribution of two measurements or two statistical/machine learning models. An application of the proposed distribution in analyzing the stock market behavior is depicted in Example 3. In each example, we compare the proposed distribution with the Gaussian distribution, the Beta distribution, the Gaussian mixture distribution, and the Arcsine distribution. The maximum likelihood estimation for each distribution is obtained using the quasi-Newton method. The obtained distributions are compared using the log-likelihood function.
Example 1
Let have Uniform distributions on , , we create a random variable by . In this case, is expected to follow the asymmetric V-shaped distribution, with . Based on the Theorem 3.1, we can find the mean , the median , and the variance as follows.
For determining the most suitable distribution, we first simulate 10000 pairs of and then calculate the corresponding . The maximum likelihood method is then applied for estimating the parameters of the Gaussian distribution, the Beta distribution, the Gaussian mixture distribution, the Arcsine distribution, and the asymmetric V-shaped distribution. Figure 3 depicts the histogram of the simulated data and the estimated asymmetric V-shaped distribution. The qualitative results in terms of log-likelihood values are summarized in Table 1. It can be seen from Table 1 that the most suitable distribution is the asymmetric V-shaped distribution, with .
The log-likelihood value of comparative distributions
Distribution
Log-likekihood value
Normal
17661.9
Beta
13111.2
Mixture Gaussian
13489.3
Arcsine
13402.2
Asymmetric V-shaped
11892.0
We also further conduct similar experiments with where is a positive number. The obtained result also indicates that the asymmetric V-shaped distribution is the most suitable distribution for those cases. Based on Theorem 3.3 and this example, it can be implied that if then has the asymmetric V-shaped distribution .
Example 2
Let be the actual data value, be the predicted value of Model 1, and be the predicted value of Model 2. Let and be the corresponding errors of the two models, respectively. The major error of the two models is defined as . In other words, the major error of the two models is the error in such that its modulus is the maximum.
Illustrating for the major error.
Figure 4 illustrates the major error. In Fig. 4, the blue line stands for the actual data , the red points stand for the predicted values of Model 1 or , the yellow points stand for the predicted values of Model 2 or , the blue circle points stand for . It can be seen from Fig. 4 that the major error can be either or depending on their maximum value. The difference between Fig. 4a and b is that Fig. 4a stands for unbiased estimators (Case 1) when ; whereas Fig. 4b stands for biased estimators (Case 2) . We next consider the major error distribution in the two cases above. Similar to Example 1, we first simulate 10000 pairs of and then calculate the corresponding for each case. The relationships between and for two cases are given in Fig. 5. Obviously, in both cases, the surface area decreases as goes to zero. This will create a V-shaped distribution as we can see later. Furthermore, for Case 1 (unbiased estimators), the area of the surface where is equal to the area of the surface where . For case 2 (bias estimators), the area of the lower surface is smaller than the area of the upper surface leading to a strictly asymmetric property. Based on the above discussion, can potentially follow , and for the two cases. According to Theorem 3.1, for Case 1 and for Case 2.
The relationship between and for two cases.
The log-likelihood value of comparative distributions
Distribution
Log-likelihood value
Case 1
Case 2
Normal
10724.1
13322.3
Beta
6180.0
10262.0
Mixture Gaussian
6566.0
10411.6
Arcsine
19960.7
19722.9
Asymmetric Vshaped
4941.1
9395.6
Similar to Example 1, the maximum likelihood method is applied for estimating the parameters of the comparative distributions. Some visualize results in the case of bias estimators are shown in Fig. 6, whereas the qualitative results in terms of log-likelihood values are summarized in Table 2.
As shown in Fig. 6 and Table 2, the asymmetric V-shaped distribution is the most suitable distribution for both cases, even if it is not too completely fit for the case of biased estimators. We also further conduct more experiments on various cases of and . The results show that if and are i.i.d uniform distribution, , then the major error has asymmetric V-shaped distribution, . It indicates the potential of asymmetric V-shaped distribution in modeling the major error distribution.
Example 3
In this example, data from the Vietnam stock market are analyzed to demonstrate the applicability of the asymmetric V-shaped distribution to modeling fluctuations under certain conditions. To this end, the daily prices () and volumes () of specific equities are collected. The period under investigation is from 01/01/2021 to 18/07/2023. We then calculate the absolute of percentage change () by .
Figure 7a illustrates the distribution of when examining the stock price of the Vietnam Dairy Products Joint Stock Company (VNM). It can be seen that the frequency of is concentrated between 0% and 1%, then non-linearly drops in the remainder of the range under consideration. This is typical behavior of stock price fluctuation that has been reported in previous studies (Yang & Zhang, 2019; Zebrowska-Suchodolska et al., 2021). However, this behavior does not hold when examining the fluctuations under different volume conditions. As shown in Fig. 7b, when , appears to follow an asymmetric V-shaped distribution. Specifically, the frequency of reaches its highest values in the range (0, 0.01), decreases linearly in the range (0.01, 0.065), and then increases abruptly in the range (0.065, 0.07). Using the maximum likelihood estimator, can be calculated. Similar reports can be found when investigating some other individuals, such as the Asia Commercial Bank (ACB), the Vietnam Joint Stock Commercial Bank for Industry and Trade (CTG), and the Vietnam Prosperity Joint Stock Commercial Bank (VPB). Particularly,
for ACB;
for CTG;
for VPB.
The log-likelihood value of comparative distributions for some stocks
Distribution
Log-likelihood value
ACB
CTG
VNM
VPB
Normal
232.2
315.5
256.6
177.4
Beta
253.4
348.4
279.7
195.2
Mixture Gaussian
253.1
335.1
275.9
191.9
Arcsine
244.1
337.9
268.9
191.3
Asymmetric V-shaped
259.3
355.8
287.5
196.4
Some distributions using the maximum likelihood estimator.
The histograms of the absolute percentage change.
Specific major results are summarized in Table 3. Table 3 shows that the asymmetric V-shaped distribution is the most appropriate one, compared to the other distributions. According to the above results, it can be seen that the distribution of will follow the asymmetric V-shaped distribution under certain conditions of . The value of the , which leads to the asymmetric V-shaped distribution of , also varies depending on the liquidity characteristics of each stock. As an explanation, it can be implied that the transaction liquidity has an effect on the price movement (Dong & Guo, 2022). Generally, higher volume leads to higher volatility. Specifically, as increases, the frequency of in the range (0, 0.01) decreases, while the frequency of in the range (0, 0.07) increases. Therefore, the nonlinear behavior of frequency is replaced by the linear behavior. As exceeds a specific threshold, e.g. 0.067 for VNM, herding behavior leads to a sharp increase in the frequency of . Additionally, the price range for shares is strictly limited in 7% for the Vietnam stock market, so is not the case. Those are the reasons why the absolute percentage change follows an asymmetric V-shaped distribution under a high value of volume.
In summary, the results obtained in this section have demonstrated the benefits of applying the asymmetric V-shaped distribution to specific numerical and real-world examples. Specifically, the proposed distribution can be used to represent the distribution of stock price fluctuations under a given volume condition. From our perspective, modeling stock price behaviors with the asymmetric V-shaped distribution is interesting and requires further investigation in future research.
Conclusion
This paper has proposed a new asymmetric V-shaped distribution. Some statistical properties of the new distribution have also been clarified. The numerical examples on simulated and actual data sets have been presented to illustrate the benefits of the proposed distribution in certain situations. In particular, the proposed distribution can represent the distribution of stock price fluctuations under conditions of high volume. This stock price behavior still requires further investigation. Additionally, related multivariate statistics and Bayesian statistics could be studied in the future.
Footnotes
Acknowledgments
The authors wish to thank Dr. Stan Lipovetsky, the Co-Editor-in-Chief of MASA, and anonymous referees for their valuable support during the peer-review process.
References
1.
AfifyA.Z.SuzukiA.K.ZhangC., & NassarM. (2021). On three-parameter exponential distribution: Properties, bayesian and non-bayesian estimation based on complete and censored samples. Communications in Statistics-Simulation and Computation, 50(11), 3799-3819.
2.
AhmadZ.MahmoudiE., & AlizadehM. (2022). Modelling insurance losses using a new beta power transformed family of distributions. Communications in Statistics-Simulation and Computation, 51(8), 4470-4491.
3.
AnL. (2016). Asset pricing when traders sell extreme winners and losers. The Review of Financial Studies, 29(3), 823-861.
4.
BasheerA.M. (2019). Alpha power inverse weibull distribution with reliability application. Journal of Taibah University for Science, 13(1), 423-432.
5.
Ben-DavidI., & HirshleiferD. (2012). Are investors really reluctant to realize their losses? trading responses to past returns and the disposition effect. The Review of Financial Studies, 25(8), 2485-2532.
6.
BragaA.d.S.CordeiroG.M., & OrtegaE.M. (2018). A new skew-bimodal distribution with applications. Communications in Statistics-Theory and Methods, 47(12), 2950-2968.
7.
ChengC.WangZ.XiaoP.XuZ.JiaoP.DongG., & WeiG. (2017). Spatio-temporal dynamics of ndvi and its response to climate factors in the heihe river basin, china. In Iop Conference Series: Earth and Environmental Science, Vol. 82, p. 012045.
8.
ChesneauC.BakouchH.S., & HussainT. (2019). A new class of probability distributions via cosine and sine functions with applications. Communications in Statistics-Simulation and Computation, 48(8), 2287-2300.
9.
DongH., & GuoX. (2022). Option price predictability, splines, and expanded rationality. Model Assisted Statistics and Applications, 17(4), 285-297.
10.
GaltungJ. (1967). Theory and methods of social research. Universitetsforlaget.
11.
HeggesethB.C.JewellN.P., et al. (2018). How gaussian mixture models might miss detecting factors that impact growth patterns. The Annals of Applied Statistics, 12(1), 222-245.
12.
JeffreysH. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 186(1007), 453-461.
13.
MaurisG. (2009). Transformation of bimodal probability distributions into possibility distributions. IEEE Transactions on Instrumentation and Measurement, 59(1), 39-47.
14.
MazucheliJ., & MenezesA.F.B. (2019). L-moments and maximum likelihood estimation for the complementary beta distribution with applications on temperature extremes. Journal of Data Science, 17(2), 391-406.
15.
MoreauV.H. (2021). Using the weibull distribution to model covid-19 epidemic data. Model Assisted Statistics and Applications, 16(1), 5-14.
16.
OngJ.H.BurnhamD.StevensC.J., & EscuderoP. (2016). Naive learners show cross-domain transfer after distributional learning: The case of lexical and musical pitch. Frontiers in Psychology, 7, 1189.
17.
ParaB.A.JanT.R., & BakouchH.S. (2020). Poisson xgamma distribution: A discrete model for count data analysis. Model Assisted Statistics and Applications, 15(2), 139-151.
18.
PunzoA.BagnatoL., & MaruottiA. (2018). Compound unimodal distributions for insurance losses. Insurance: Mathematics and Economics, 81, 95-107.
19.
RahmaniD.NiranjanM.FayD.TakedaA., & BrodzkiJ. (2020). Estimation of gaussian mixture models via tensor moments with application to online learning. Pattern Recognition Letters, 131, 285-292.
20.
RamosP.L.LouzadaF.ShimizuT.K., & LuizA.O. (2019). The inverse weighted lindley distribution: Properties, estimation and an application on a failure time data. Communications in Statistics-Theory and Methods, 48(10), 2372-2389.
21.
RinneH. (2014). The hazard rate: theory and inference. Justus-Liebig-Universität Giessen: Giessen, Germany, 149-151.
22.
VasconcelosJ.M.CintraR.J.NascimentoA.D., & RêgoL.C. (2020). The compound truncated poisson cauchy model: A descriptor for multimodal data. Journal of Computational and Applied Mathematics, 112887.
23.
VoVanT., & NguyenTrangT. (2018). Similar coefficient for cluster of probability density functions. Communications in Statistics-Theory and Methods, 47(8), 1792-1811.
24.
VovanT.TranphuocL., & ChengocH. (2019). Classifying two populations by bayesian method and applications. Communications in Mathematics and Statistics, 7(2), 141-161.
25.
YangX., & ZhangH. (2019). Extreme absolute strength of stocks and performance of momentum strategies. Journal of Financial Markets, 44, 71-90.
26.
Zebrowska-SuchodolskaD.KarpioA., & KompaK. (2021). Covid-19 pandemic: Stock markets situation in european ex-communist countries, 24(3), 1106-1128.