Abstract
In this article, we present the command
An extension to inflated CUB models is discussed. We also present a subcommand,
1 Motivation
Several estimation commands, such as
In this article, we add to this literature by presenting a command implementing the class of combination of uniform and binomial (CUB) models for ordinal data (Piccolo and Simone 2019a, b), uniform and binomial being the two distributions used to jointly model feeling and uncertainty of the response process via a mixture specification. Beyond this baseline definition, this new paradigm for ordinal data modeling (Piccolo 2003; D’Elia and Piccolo 2005) includes a richer class of models, which has shown to be of interest to a broad audience of applied scholars because of a versatile and multifaceted range of applications (Balirano and Corduas 2008; Arboretti Giancristofaro, Bordignon, and Carrozzo 2014; Capecchi and Piccolo 2016; Fin et al. 2017; Capecchi, Simone, and Ghiselli 2019) and the flexibility to perform more complex analysis (Cappelli, Simone, and Di Iorio 2019; Simone, Cappelli, and Di Iorio 2019; Simone, Tutz, and Iannario 2020; Manisera and Zuccolotto 2014; Bonnini et al. 2012; D’Elia 2008). From the methodological point of view, see Piccolo, Simone, and Iannario (2019) for a comparative analysis with cumulative models.
The innovative aspect of the combination of uniform and binomial (CUB) paradigm is the modeling of uncertainty arising from the ensemble of individuals or framing effects surrounding the evaluation on rating scales. This component is meant to convey indecision, fuzziness, and the heterogeneity of responses (Di Nardo and Simone 2019), yielding a twofold interpretation of response patterns. Uncertainty blurs the assessment of respondents’ sentiments toward the trait being investigated (preference, satisfaction, and so on). Thus, the CUB paradigm involves a mixture between the least informative uniform distribution over the discrete support and an adequate model for feeling to analyze both heterogeneity and location of the responses, respectively. Linking estimable uncertainty and feeling parameters to subjects’ covariates adds further value. This feature allows the derivation of interpretable response profiles useful for understanding and prediction of response behaviors.
This approach to the analysis of the rating process can be extended to account for other response phenomena, such as overdispersion and an inflated frequency in a given category, by modifying the baseline distributions in the model specification. Frequency inflation occurs when one category is a refuge or shelter option for the response choice because of its peculiar wording, because of response styles, or to avoid the cognitive burden of a more precise choice. Hence, we refer to this as a shelter effect. We present an example in the illustrative case study in section 4.
CUB models have proven to be parsimonious yet valuable in research and applications in social and behavioral studies, particularly in terms of their effective visualization features. In addition to providing a response distribution for each covariate profile, estimated feeling and uncertainty measures can be represented as points in the parameter space. Hereafter, we call this representation
Currently, the cub (Iannario, Piccolo, and Simone 2020) and fastcub (Simone 2020) libraries are available for the R environment and for the GRETL community (Simone, Di Iorio, and Lucchetti 2019) as open-source software. They both include the implementation of the expectation-maximization algorithm (McLachlan and Krishnan 2008) for maximum likelihood inference (Piccolo 2006). For Stata users, no related tool is available. To fill this gap, we present the commands
This article is intended to provide a concise yet comprehensive introduction of CUB models, illustrating their applications and interpretation. Section 2 briefly reviews the methodological background. See Piccolo and Simone (2019a, b) and the references for an overview and a comprehensive description of the state of the art on methodology and applications. Section 3 sets out the syntax of the
2 CUB model specification
For a sample of size n, let Ri be the ordinal rating response provided to a given item (of a questionnaire) by the ith subject, i = 1,…, n. Assume that the response is collected on a Likert-type scale with m ordered categories, with m > 3 required for identifiability (Iannario 2010). For convenience, categories will be coded as the first m integers to convey their position along the scale. CUB models’ paradigm prescribes that the rating process arise from the combination of two main components: feeling, addressing the perception of the item being investigated (attraction, satisfaction, agreement, and so on); and uncertainty, conveying the fuzzy elements of the response. The CUB model for the rating response mechanism is then specified as a two-component mixture distribution of these components. In the baseline definition, uncertainty is modeled by a uniform discrete distribution over the first m integers to contribute to model parsimony, while feeling is modeled by a shifted Binomial distribution of the parameter ξi ∊ (0, 1):
Thus, if
Here the uncertainty parameter πi
= π(
Then
Model selection for the best covariate specification can be performed by fitting several models and choosing the one that attains the lowest values of the Akaike information criterion (Akaike 1974) or Bayesian information criterion (Schwarz 1978). As with all mixture models, variable-selection procedures should be based on a crossed search for the best covariate specification for both feeling parameter ξi and uncertainty parameter πi . This could be pursued via best-subset search algorithms as described in Simone (2021).
A focal point of this class of models is that covariate specification is not compulsory. A CUB model could describe a given rating distribution in terms of a global measure of feeling ξ and uncertainty π, in which case one refers to the model as CUB(0, 0). In this case, (1 − π) is a (normalized) measure of the uncertainty implied by the model in terms of the overall heterogeneity of the distribution (Capecchi and Piccolo 2017). Thus, CUB models allow characterization of different rating responses in terms of only two parameters (π, ξ), ranging in (0, 1]×[0, 1] and leading to effective visualization tools. Indeed, for different items or response profiles, obtained by conditioning (1) on selected values of covariates, estimated uncertainty and feeling parameters identify a point in the parameter space, yielding a scatterplot that gives a unified picture of the data at hand. This can be visualized with the
A further remark on interpretation is worthwhile. According to common motivations for mixture models (McLachlan and Peel 2000), CUB models should imply two clusters of respondents such that, in the more uncertain group, people randomly select an ordinal score. Although this is a possible interpretation, in the case when no covariates are specified, the CUB parameterization of the response variable directly on its support is intended to be a synthesis of the overall distribution in terms of location and heterogeneity. Otherwise, the CUB parameterization provides a method to assess individuals’ level of uncertainty, which can be interpreted as subjective indecision and feeling in terms of subjects’ characteristics.
2.1 Inflated CUB models
One of the major advantages of the CUB paradigm is the ease of extending the model to encompass other circumstances that may affect the rating response process. One typical scenario concerns the inflation in frequency for a category that attains a peculiar meaning or role for the respondents. Inflated CUB models include a so-called shelter effect (Corduas, Iannario, and Piccolo 2009; Iannario 2012) located at a known category s ∊ {1,…, m}. This category is excessively frequent, beyond that accounted for by the standard CUB mixture. To fit this circumstance, the CUB model is extended with the introduction of a degenerate distribution: 2
The estimable parameter vector is then
3 The cub and scattercub commands
3.1 Syntax for cub
The model fit by
3.2 Options for cub
3.3 Syntax for scattercub
3.4 Options for scattercub
4 CUB models at work
This section is meant to help Stata users become familiar with the
4.1 Application
Hereafter, we consider the data collected in 2002, consisting of 2,179 observations. The remaining 7 variables correspond with subjects’ covariates (for instance, the dichotomous variable
As a first step, we show how to simultaneously visualize the ordinal variables included in

CUB models without covariates for satisfaction items in
From figure 1, we observe that the highest feeling has been expressed for the willingness of the staff, whereas the lowest corresponds with the scheduled office hours. Because the latter item is affected by the highest uncertainty, it deserves further investigation. Thus, we focus on the item
We first show how to estimate the parameters of a CUB(0, 0) model for the
To provide a unique output format for fitted CUB models possibly with covariate effects, we also report the logit transformation of feeling and uncertainty parameters (2) when no covariate is specified. In this circumstance, the “constants” denoted as
By default, the output tables report the
The

Plot of the observed versus fitted probabilities for variable
It follows that the model does not sufficiently fit responses observed for categories 5, 6, and 7. Specifically, a moderate inflation in frequency for the fifth category seems unaccounted for by the model. Thus, we test for a possible shelter effect at category s = 5 by calling
As indicated by both an improvement in the log likelihood and the significance of parameter δ, it can be inferred that category 5 is perceived as a shelter for the assessment of satisfaction on office hours. Notice that the first panel reports the logit transform also for
The fit improvement entailed by the specification of the shelter effect can be additionally inspected with a graphical comparison between observed frequencies and estimated probabilities (see figure 3).

Plot of the observed versus fitted probabilities for variable
Next, to enrich the interpretation of results, we introduce some covariates in the model to identify the main determinants of satisfaction for office hours in terms of students’ characteristics. As a first example, we test if the model components can be explained by the dichotomous covariate
Because the regression coefficient for logit(ξi
) is negative, it follows that regular users have a higher feeling 1−ξi
than occasional users, whereas there is no statistically significant difference in terms of heterogeneity between these groups. Notice that covariate specification in CUB models can be similar or different for uncertainty and feeling parameters. In this case, one could fit a CUB model by including the
If covariates are specified in the model, then the
It can be insightful to compare the estimated probability distribution for the two groups of respondents (regular and nonregular users in the case under examination). This goal can be obtained via the following commands, returning a matrix and a plot comparing observed relative frequencies and fitted probabilities for the two groups (see figure 4).

Plot of the observed frequencies and fitted probabilities for variable
According to the fitted models, it follows that relevant differences between the two profiles appear only in categories 4, 5, and 7; specifically, nonregular users are more likely to score lower categories 4 and 5 than regular ones. Conversely, regular users are more likely to score the highest grade R = 7 than the nonregular ones. In particular, figure 4 indicates that inflation in category 5 is mainly due to regular users, whereas inflation in the last category should be accounted for instead for the ratings assigned by nonregular users to further improve the fit. This circumstance could be assessed by fitting the following models and comparing classical goodness-of-fit statistics; results are displayed in figure 5.

Separate fit of CUB models with shelter for ratings on
As an example of a more complex covariate specification, we show how to check for possible age effects. We consider the deviation from the mean of the logarithmic transform of age (covariate
After the
The output of the estimation procedure is given below and indicates that regular users have a higher feeling than occasional users and that younger users have lower feeling than older ones and higher uncertainty. 3 In addition, responses provided by women are more heterogeneous than those provided by men.
Because no plot is directly provided as output for complex covariate specifications, the results of the CUB model estimation with significant covariates on feeling and uncertainty parameters may be represented, for instance, as in figure 6, obtained with the following commands:

Plot of CUB model with covariates (4): age and gender effect for uncertainty, age and frequency effect for feeling
Figure 6 is meant to display how feeling and uncertainty vary together with the continuous variable
As discussed in section 2, the class of CUB mixture models includes a specific extension to fit the so-called shelter effect, arising in the presence of an inflated category. For illustrative purposes, we show how to perform the analysis of a possible shelter effect in the previous model using category 5 as the shelter choice for
The sign of the regression coefficients for the selected covariates for both uncertainty and feeling parameters confirms, overall, the interpretations derived from inspection of figure 6. In addition, we observe that the significance test for parameter δ suggests that accounting for inflation in category 5 improves the fit even after controlling for the selected covariates.
5 Categories with zero frequencies
The
The estimates are updated to account for the presence of the extra categories having zero frequency. The total number of categories considered in the procedure (including those with zero frequency) is reported in the last line of the first panel of the output table (here not reported for brevity).

Scatterplot to display the effect of misspecification of the length of the original rating scale in case there are categories with zero observed frequencies
To discuss the case of zero-frequency categories not at the extreme of the scales, we consider for illustrative purposes the ratings on satisfaction for the willingness of the staff of the orientation office, and we artificially set frequencies of the second and third category at 0 by shifting those responses to 1. Figure 8 shows the graphical output of the code below; also, in this case the shelter effect is tested at s = 7.

Observed and estimated distributions restricted to observed categories of satisfaction ratings for willingness of the staff (without and with shelter at s = 7)
Accordingly, the output will report the estimated uncertainty and feeling estimation results (given below only for the fitted CUB model with shelter at s = 7).
6 Conclusions
Compared with more consolidated approaches mainly derived from cumulative models (McCullagh 1980), which is the leading pathway to analyze ordinal data, the CUB paradigm offers wider possibilities from both the interpretative and graphical points of view. In addition, an important consequence is the circumstance that CUB models are not constrained to include covariates as explanatory tools to fit consistent models for data fitting, prediction, and classification. This opportunity allows the introduction of more flexible methods to manage and compare rating responses.
In this framework, the
Improvements of
Beyond extended methodologies, CUB modeling is also under active development for applications; in this respect, we quote original marketing research in the field of food preferences and sensory analysis: Piccolo and D’Elia (2008), Iannario et al. (2012), Corduas, Cinquanta, and Ievoli (2013), Capecchi et al. (2016), Mauracher, Procidano, and Sacchi (2016), and Contini et al. (2016). We also quote recent new perspectives and applications (Hwang, Sohn, and Oh 2015; Low 2017; Finch and Hernández Finch 2020; Hu, Zhou, and Sharma 2020; and Xu and Zhang 2021), providing evidence of an increasing international interest toward the CUB paradigm.
7 Programs and supplemental materials
Supplemental Material, sj-zip-1-stj-10.1177_1536867X221083927 - Fitting mixture models for feeling and uncertainty for rating data analysis
Supplemental Material, sj-zip-1-stj-10.1177_1536867X221083927 for Fitting mixture models for feeling and uncertainty for rating data analysis by Giovanni Cerulli, Rosaria Simone, Francesca Di Iorio, Domenico Piccolo and Christopher F. Baum in The Stata Journal
Footnotes
7 Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
