Sage Journals: Discover world-class research

Abstract

In this work, we describe the new command zonotope, which, by resorting to a geometry-based approach, provides a measure of productivity that fully accounts for the existing heterogeneity across firms within the same industry. The method we propose also enables assessment of the extent of multidimensional heterogeneity with applications to fields beyond that of production analysis. Finally, we detail the functioning of the software to perform the related empirical analysis, and we discuss the main computational issues encountered in its development.

Keywords

st0662 zonotope heterogeneity measures production analysis multivariate Gini coefficient

1 Introduction

In this work, we present a command, zonotope, that allows measurement of the existing heterogeneity among multidimensional units that can also be nested in groups at different levels of aggregation. Here we document the application of the methodology in the context of production analysis in economics; however, the possible range of applications is much wider. Within the proposed field, the method also provides a measure of productivity (both at the aggregate and at the individual levels) and its change over time that allows us to relax many of the standard assumptions of the theory, which have been falsified by recent work on actual data.

Traditionally, empirical analysis in economics has suffered from the scarcity of disaggregated sources of data (that is, at the level of individual, household, enterprise, etc.) such that, in the analysis of behaviors at the micro level, much was left to theoretical analysis. This situation oftentimes required heroically simplifying assumptions on the behavior of agents, the tradeoffs they were facing, and the absence of any path dependency, among other things. Nowadays, the ever-growing availability of disaggregated data on business firms has revealed a much richer picture than that previously conjectured based on theories alone or on aggregate industry-level data.

Firms are different along most of the dimensions typically taken into consideration by economic analyses. To provide a brief account of what is at stake, consider that even firms within the same, narrowly defined industry display very different levels of productivities, in terms of both labor and total factor productivities.¹ Also, the relative input intensities are much different (Dosi and Grazzi 2006), even with relatively similar input prices. At least as relevant, such heterogeneities are persistent over time; that is, if there is any selection at work, its effects take much longer to display (Dosi 2007). The ubiquitous presence of such heterogeneity has been vividly expressed by Griliches and Mairesse (1999): “We […] thought that one could reduce heterogeneity by going down from general mixtures as ‘total manufacturing’ to something more coherent, such as ‘petroleum refining’ or ‘the manufacture of cement’. But something like Mandelbrot’s fractal phenomenon seems to be at work here also: the observed variability-heterogeneity does not really decline as we cut our data finer and finer. There is a sense in which different bakeries are just as much different from each other as the steel industry is from the machinery industry.”

The evidence recalled above presents several challenges to the standard theory of production and to the related empirical applications based on the notion of a “representative” firm or an industry production function and, of course, on the estimation of such production function itself. The observed combinations of inputs chosen by firms appear to be quite dispersed, hardly displaying any regularity resembling a conventional isoquant. Further, although output—as expected—increases in both inputs, this does happen in a nonmonotonic way (Dosi and Grazzi 2006). In addition, the degree to which firms substitute their inputs is challenged by empirical observations.

Given all that, how can one obtain measures of productivity, at both the firm and the industry levels, that do not require assumptions not met by data, and on the contrary, that take into account such pervasive heterogeneity within the industry? In this article, we present a new application, zonotope, that enables measurement of productivity in the presence of such firm-level heterogeneity. zonotope also allows for measuring the degree of heterogeneity of the firms making up the industry and assessing its variation over time. This latter feature can be applied in a wide range of fields beyond production analysis. For example, based on zonotopes and path polytopes inclusion, Andreoli and Zoli (2014) frame the dissimilarity comparisons of sets of group distributions, which can be applied into multigroup comparisons of segregation, discrimination, and mobility, as well as inequality evaluations.

To illustrate heterogeneity and its relevance, we provide some empirical evidence in figure 1 that focuses on the labor productivity distribution, where labor productivity is defined as the ratio of deflated turnover value over number of employees. The details on these two variables can be found in section 4.1.²

Figure 1.

Empirical distribution of (log) labor productivity in a given 3-digit sector and two nested 4-digit sectors

In figure 1(a), the productivity distribution of a given 3-digit sector—that is, the solid line—is sufficiently widespread to indicate the huge productivity gap between the most productive firms and the least productive ones. On the graph, productivity is measured in a log scale, which makes the heterogeneity even greater. This heterogeneity does not disappear when focusing on similar firms (the firms with a 4-digit industrial classification,³ that is, the dashed and dotted lines); we still observe significantly different productivity levels among firms. The persistence of heterogeneity indicates not only that heterogeneity holds when increasing the level of disaggregation but also that it holds over time.

As recalled above, not only do firms within the same industry display very different levels of efficiency, but also they employ very diverse production techniques. This is illustrated in figure 2.

Figure 2.

Contour plots of adopted techniques and output in 3- and 4-digit sectors

Figure 2(a) provides a representation of production activities of firms within a given 3-digit industry⁴ assuming the standard 2-input-1-output production, where the axes represent inputs (labor and capital are proxied by number of employees and fixed assets, respectively) and the contour line displays a constant level of output as proxied by turnover value. Thus, each firm within this industry, as one observation in our empirical data sample, can in principle be represented by one point in this contour plot.

In the input plane, we first plot all such points representing the firms’ labor and capital combinations from empirical data and then plot the isoquants indicating the possible combination of labor and capital corresponding to the same output level. As observed, dots in the input plane are quite dispersed while the corresponding isoquants do not display any regularity resembling a conventional production function.⁵ Furthermore, the output increases with the inputs in a nonmonotonic manner. To be specific, given a quantity of one input, different firms attain the same level of output with very different levels of the other input. Again notice that this type of heterogeneity does not disappear when we increase the disaggregation of the industrial classification; similar phenomena can be observed in 4-digit-level industries, as reported in figure 2(b).

To sum up, this empirical evidence suggests that, within one industry, firms not only display very different levels of productivity but also have production techniques that are more heterogeneous. As a result, accounting for such apparent differences among firms would greatly improve productivity measurement and its change over time. In the next section, we focus on this attempt.

2 The zonotope approach to production analysis

In this section, we briefly outline the geometric approach to production analysis on which we rely for the proposed software packages. For a more detailed exposition, we refer the reader to Hildenbrand (1981) and Dosi et al. (2016).

The seminal work by Hildenbrand (1981) suggests an agnostic and data-oriented approach, which—instead of estimating some aggregate production function—offers a representation of the empirical production possibility set of an industry in the short run based on actual microdata. In such a setting, it is possible to represent a firm (or, for that matter, an establishment) in the input–output space. In such a way, the production possibility set of any given industry is represented geometrically by the space formed by the finite sum of all the line segments linking the origin and the points representing each production unit, called a zonotope. Based on this zonotope framework, Dosi et al. (2016) show that by further exploiting the properties of zonotopes, it is possible to obtain rigorous measures of heterogeneity and productivity without imposing on data a model like that implied by standard production functions.

Similarly to Koopmans (1977), Hildenbrand (1981), and now also Dosi et al. (2016), we denote the production activity, as representing the actual technique of production unit i, by a vector

a_{i} = (α_{i_{1}}, \dots, α_{i_{l}}, α_{i_{l + 1}}) \in ℝ_{+}^{l + 1}

which indicates that during the current period, this production unit, at its best, can produce α_i _l ₊₁ units of output by means of $(α_{i}_{_{1}}, . . ., α_{i_{l}})$ units of input. Then we can define the short-run production possibilities of an industry with N units during the current period by a finite family of production activity vectors {a _i }₁ _≤ _i _≤ _N . Notice that, any vector a _i from the collection of vectors {a _i }₁ _≤ _i _≤ _N in $ℝ_{+}^{l + 1}$ can be associated with a line segment

[0, a_{i}] = {s_{i} a_{i} | s_{i} \in ℝ, 0 \leq s_{i} \leq 1}

Further, with the assumption of N ≥ l + 1, Hildenbrand defines the short-run total production set associated with the family {a _i }₁ _≤ _i _≤ _N as the Minkowski sum

Y = \sum_{i = 1}^{N} [0, a_{i}]

of line segments generated by production activities {a _i }₁ _≤ _i _≤ _N , and, more explicitly, defines the short-run feasible industry production function as the zonotope

Y = {y \in ℝ_{+}^{l + 1} ∣ y = \sum_{i = 1}^{N} ϕ_{i} a_{i}, 0 \leq ϕ_{i} \leq 1}

Hildenbrand also defines his short-run efficient industry production function within the zonotope framework. Let’s project the above-defined zonotope Y on its first l coordinates and denote this projection as D, which reads

D = {u \in ℝ_{+}^{l} ∣ \exists x \in ℝ_{+} s . t . (u, x) \in Y}

Thus his production function F : D → R₊ follows:

F (u) = max {x \in ℝ_{+} | (u, x) \in Y}

This definition implies that, given the level u ₁ ,…, u_l of inputs for the industry, the maximum total output could be achieved by allocating, without any restrictions, the amounts u ₁ ,…, u_l of inputs over the individual production units within the industry in one of the most efficient ways. However, the frontier associated with this production function does not provide any information on the actual technological setup of the whole industry. This production function could not be the focal reference either, from a positive or from a normative point of view (Hildenbrand 1981).

Within the zonotope framework, Dosi et al. (2016) define the main diagonal of a zonotope Y as the diagonal joining the origin O = (0,…, 0) ∊ Y ⊂ ℝ ^l ⁺¹ with its opposite vertex in Y. They call this diagonal the production activity of the industry, because it expresses both the amount of inputs used and the output produced by the industry. To be specific, this diagonal, denoted by d _Y , is simply the sum of individual production activities of the N production units involved in the industry, that is,

d_{Y} = (β_{1}, \dots, β_{l}, β_{l + 1}) = (\sum_{i = 1}^{N} α_{i_{1}}, \dots, \sum_{i = 1}^{N} α_{i_{l}}, \sum_{i = 1}^{N} α_{i_{l + 1}}) \in ℝ_{+}^{l + 1}

Obviously, if all firms in one industry were to use the same technique in a given year, all the vector-firms would lie on the same line. This is the case where only one technology is adopted and all the firms within this industry are homogeneous. In this case, the associated zonotope would degenerate to one with null volume, that is, coinciding with the diagonal d _Y . On the other hand, the maximal heterogeneity case occurs when one industry involves some firms with almost zero inputs but sufficient output and other firms with a large quantity of inputs but little output. In such a case, the generated zonotope almost becomes a parallelotope. Starting from this simple observation on these two extreme cases, it is possible to derive a rigorous measure for industry heterogeneity.

First, let $A_{i}_{_{1}, ..., i_{l + 1}}$ be the matrix whose rows are vectors $(α_{i}_{_{1}}, . . ., α_{i_{l + 1}})$ and let $Δ_{i}_{_{1}, ..., i_{l + 1}}$ be its determinant. It is possible to compute the volume of the zonotope Y in ℝ ^l ⁺¹ by using the formula

Vol (Y) = \sum_{1 \leq i_{1} \leq \dots \leq i_{l + 1} \leq N} | Δ_{i_{1}, \dots, i_{l + 1}} |

where $| Δ_{i}_{_{1}, ..., i_{l + 1}} |$ is the module of the determinant $Δ_{i}_{_{1}, ..., i_{l + 1}}$ . However, the value of Vol(Y) depends both on the unit in which inputs and output are measured and on the number of firms. To normalize the measure, the volume of zonotope Y generated by the production activities {a _i }₁ _≤ _i _≤ _N is divided by the volume of the parallelotope with diagonal $d_{Y} = \sum_{i = 1}^{N} a_{i}$ . The parallelotope is the zonotope with the largest volume if the main diagonal is fixed. Such ratio is defined as

G (Y) = \frac{Vol (Y)}{Vol (P_{Y})}

where P _Y denote the parallelotope with diagonal d _Y and volume Vol(P _Y ). The normalized volume, G(Y), is named the Gini volume.⁶

Aside from the heterogeneity measure, Dosi et al. (2016) also suggest that the angle formed by the industry production activity vector d _Y with the space generated by all inputs expresses the industry productivity, and the tangent of this angle can be an appropriate measure.⁷ To be specific, the measure of productivity P for a given industry including N firms at the current period is

P = tg {Θ_{l + 1} (d_{Y})} = \frac{\sum_{i = 1}^{N} α_{i_{l + 1}}}{{||pr}_{- (l + 1)} (d_{Y}) | |}

where for any vector v = (x ₁ ,…, x_k ) ∊ ℝ ^k for k = 2, 3, 4,…, projection map pr ₋ _j (·) follows

\begin{array}{l} {pr}_{-}_{j} (v) : ℝ^{k} \to ℝ^{k - 1} \\ (x_{1}, . . ., x_{k}) \mapsto (x_{1}, . . ., x_{j -}_{1}, x_{j}_{+ 1}, . . ., x_{k}) \end{array}

Θ _j (v) represents the angle formed by the vector v and the space generated by all entities in vector pr ₋ _j (v), and ||v|| represents the normal of the vector v. Furthermore, similarly, given the industry input vector $w = {pr}_{- (l + 1)} (d_{Y}) \in ℝ_{+}^{l}$ , we have Θ _i (w) and its tangent value to measure the relative intensity of input i with respect to all the other inputs for i = 1,…, l.

A few remarks are needed on the nature of the measure, its relation to similar techniques, and the scope of applicability. Notice that the volume of zonotope, as well as the angles, are measures and not estimates. Further, although the measure of productivity bears some similarity, mostly because of the nonparametric nature, with data envelopment analysis in production (see, among others, Farrell [1957], Charnes, Cooper, and Rhodes [1978], and Simar and Zelenyuk [2011]), the two methods are clearly distinct; the zonotope approach considers all firms, not just those on the frontier. Although both approaches are data driven and nonparametric, the emphasis of data envelopment analysis is, for the industry, to construct the efficient frontier by enveloping the data and, for the individual firm, to proxy its efficiency (for an application to Stata, see Ji and Lee [2010] and Badunenko and Mozharovskyi [2016]). Empirically, the traditional deterministic approach faces some issues, which have been addressed by recent work. To be specific, Daraio and Simar (2007) deal with the sensitivity to measurement errors and outliers, and Cazals, Florens, and Simar (2002) and Daraio and Simar (2005) propose robust frontiers. In addition, there exists a command that deals with such issues; see Belotti et al. (2013).

As recalled above, the zonotope approach provides a general framework not necessarily limited to industry heterogeneity and productivity analysis. For example, Aruka (2017) argues that the zonotope framework provides a different view of the production set and discloses new possibilities for modeling international trade. In the latter field, he suggests a further generalization of the model so that it is possible to jointly assess more than three countries and three commodities, thus abandoning the limiting special case of two countries and two commodities. In a rather different domain, the software application we are presenting here can be used to compute measures of multivariate disparity similar to those outlined in Koshevoy and Mosler (1996, 1997).

In sum, the proposed framework allows the investigation of the production and productivity dynamics of firms while taking into consideration the existing heterogeneity across firms, which is high and persistent even within the same industry. The geometrybased approach, on one side, enables us to relax many of the current existing assumptions, which have often been shown to lack empirical support. On the other side, while it allows for providing a representation in the production space with several inputs and outputs, on two or three dimensions, it is a way to provide an illustration of the trend of the industry and of individual firms.

To reinforce the last point, we take advantage of the simple one-input, one-output setting, as depicted in figure 3.

Figure 3.

One-input, one-output examples of zonotopes $Y = \sum_{i = 1}^{2} [0, a_{i}^{t}] \subset ℝ^{2}$

There are two firms, represented by $a_{1}^{t}$ and $a_{2}^{t}$ as defined in (1). In figure 3(a), $a_{1}^{t} = (4, 2)$ indicates that firm 1 produces 2 units of outputs using 4 units of inputs. The industry production activity, as the sum of vector-firms defined in (2), is represented by d ^t = (6, 6). In figure 3(b), we have identical industry production activity but with more firm heterogeneity. Comparing these two plots, the higher heterogeneity is consistent with the bigger parallelogram (the areas are equal to 12 and 24, respectively), which indicates the possibility of using the volume of zonotope defined in (3) to measure the heterogeneity. However, as discussed, one still needs a more refined measure, by applying some normalization [as in (4)] to ensure that differences are not due to, for instance, different units of measure, which might well happen in the real world. More in detail, the area is normalized by dividing it by the area of the corresponding square. This graphical illustration can also provide the intuition behind the productivity measures for the industry [defined in (5)] and for the firm [defined in (6)] in R². In this respect, figure 3(a) shows that the tangent value of Θ(a ₁) is indeed the ratio of firm 1’s output over its input, that is, labor productivity, which is one of the most popular productivity measures when accounting for only one input.

We conclude the section with table 1, which summarizes the key concepts of the proposed zonotope method.

Table 1.

Key concepts of the proposed method

Key concept	Details
Individual production activity a _i	It is a vector defined in $ℝ_{+}^{l + 1}$ . Intuitively, it represents the multiple-input, one-output production activity of one firm whose first lth elements are inputs while the (l + 1)th is an output
Zonotope Y	It provides a description of the whole industry obtained by combining all firms. Formally, it is the Minkowski sum of line segments generated by firm production activities {a _i }₁ _≤ _i _≤ _N .
Gini G(Y)	It is the normalized volume of the zonotope, and it provides a measure for industry heterogeneity; the bigger the index, the larger the differences across firms.
Productivity P = tg {Θ _l ₊₁(d _Y )}	In a two-inputs, one-output case, the “steeper” the vector-firm (or vector-industry) the more productive the firm (or industry). More formally, it is the tangent value of the angle formed by the vector and the input-hyperplane.

3 The zonotope command

In this section, we introduce the command zonotope.⁸

3.1 Syntax

The syntax of the command to compute the zonotope is

zonotope varlist [if] [in] [, verbose]

where the option verbose requests that the program print on screen all the computed quantities (see below). Instructions for installing the package under different operating systems are provided in section 8.

3.2 Description

The zonotope command requires a list of vector variables, where the last variable is the output and the other variables are the inputs of the relation we want to analyze. All the vector variables must have the same length. All the variables can be seen as columns of a single matrix, gen (that is, the matrix of generators); therefore its ith row coincides with the ith generator a _i , as defined in (1). The number of columns in this matrix represents the dimension where the zonotope lies, (l + 1). In the zonotope command, the generators are assumed to be nonnegative; thus, all entries of this matrix should be nonnegative.

3.2.1 Output and return value

The zonotope command returns two vectors, diagonal and tangents, and one matrix, gen (it contains the generators actually used, which is important when the if or in qualifier is used). In addition, it returns several scalar values, such as

nrow: the number of generators actually used (that is, N)

ncol: the number of variables (it coincides with l + 1 if the program has been called with l input variables and one output variable)

etMIN: the elapsed time (expressed in minutes)

S1, S2,…, S8: statistics that are detailed below

When the zonotope command is called with the verbose option, it shows all the returned variables (both vectors and scalars) on the screen.

All the scalars returned by zonotope can be accessed using the r() command. For example, the volume of the zonotope can be displayed with the command

display r(S1)

and the elapsed time can be displayed with

display r(etMIN)

3.2.2 Output vector: diagonal

The output vector diagonal contains d _Y , the geometric diagonal of the zonotope, which, according to (2), is defined as the sum of generator a _i for i = 1,…, N. Specifically,

d_{Y} = (β_{1}, \dots, β_{l}, β_{l + 1}) = (\sum_{i = 1}^{N} α_{i_{1}}, \dots, \sum_{i = 1}^{N} α_{i_{l}}, \sum_{i = 1}^{N} α_{i_{l + 1}})

Clearly, it is an (l + 1)-dimensional (row) vector. It can be easily displayed on screen (or reused) using matrix list diagonal.

3.2.3 Output vector: tangents

The output vector r(tangents) contains the tangent of the angle formed by each generator and the input space. Thus, it is an N-dimensional (column) vector. It can be easily displayed on screen (or reused) using matrix list tangents.

3.2.4 Output matrix: gen

The output matrix gen contains the generators actually used, as filtered by if and in qualifiers. It can be easily displayed on screen (or reused) using matrix list gen.

We report below a brief description of the eight statistics mentioned above. Some of them have a clear economic interpretation because they directly correspond to the key concepts of the proposed method, while others are necessary intermediate steps toward the measure of interest.

Statistic S1: Volume

Given N generators $a_{i} \in ℝ_{+}^{l + 1}$ , we can generate one zonotope, again denoted as Y. Thus, we can compute its volume as

S 1 \equiv Vol (Y)

where Vol(·) follows (3). We report this volume as S1, which is printed on screen and also returned as an output scalar. S1 (Volume) provides only a preliminary idea of industry heterogeneity, but it represents an intermediate step to compute the more refined measure of the Gini index.

Statistic S2: Diagonal’s norm

The norm of the diagonal ||d _Y || is computed as the square root of the sum of the squares of all the components of the diagonal, that is,

S 2 \equiv ‖ d_{Y} ‖ = \sqrt{\sum_{i = 1}^{l + 1} β_{i}^{2}}

The length of the diagonal of the zonotope represents the “size” of the industry; the longer this vector, the bigger the industry is.

Statistic S3: Sum of squared norms of all the generators

As indicated by the name, we first compute for each generator the sum of the square of each of its components, and then we sum over all generators.

S 3 = \sum_{i = 1}^{N} (\sum_{j = 1}^{l + 1} α_{i_{j}}^{2})

Statistic S4: Gini index

According to (4), this Gini index is computed as the ratio between the volume of the zonotope and the product of the components of the diagonal. We rewrite this industry heterogeneity measure as follows:

S 4 \equiv G (Y) = \frac{Vol (Y)}{Vol (P_{Y})} = \frac{S 1}{\prod_{j = 1}^{l + 1} β_{j}}

where P _Y denotes the parallelotope with diagonal d _Y . The normalized volume of the zonotope, as already discussed, is a more refined measure of industry heterogeneity. The bigger this index, the more heterogeneous the industry.

Statistic S5: Tangent of angle formed by diagonal and input space

Given the diagonal vector reported as d _Y , according to (5), we can further compute the tangent of the angle formed by the diagonal and its input space. We report this industry productivity measure as S5, as follows:

S 5 \equiv tg {Θ_{l + 1} (d_{Y})} = \frac{β_{l + 1}}{\sqrt{\sum_{j = 1}^{l} β_{j}^{2}}}

Intuitively, it provides our measure for industry productivity. In a two-inputs, one-output setting, the “steeper” the vector, the more productive is the industry.

Statistic S6: Cosine against output

S6 reports the cosine of the complementary angle of Θ _l ₊₁(d _Y ) as

S 6 = \frac{β_{l + 1}}{\sqrt{\sum_{j = 1}^{l + 1} β_{j}^{2}}}

Because of the complementary angle relationship, S5 and S6 are connected as follows:

S 5 = \frac{S 6}{\sqrt{1 - {(S 6)}^{2}}}

Statistic S7: Cosine of diagonal projected on input plane with x axis

The angle formed by the x axis and the projection of the diagonal in the input plane measures the relative intensity of the first input related to the other inputs. We report its cosine value as

S 7 = \frac{β_{1}}{\sqrt{\sum_{j = 1}^{l} β_{j}^{2}}}

Statistic S8: Volume against cube of the norm of the diagonal

The last statistic is the ratio between the volume of the zonotope and the cube of the norm of the diagonal.

S 8 = \frac{Vol (Y)}{| | d_{Y} | |^{3}} = \frac{S 1}{{(S 2)}^{3}}

This definition of volume is restricted to the boundary edges of the cone of all possible vectors, and as such, it considers only the most diverse vectors. It attempts to measure the maximal diversity of the field (hence, it takes into account only the boundary vertices) by measuring how “wide” the cone is.

Other than these statistics, the command also provides the elapsed time, expressed in minutes.

3.3 A working example of the zonotope command

In this section, we provide a step-by-step working example that shows the use of the zonotope command, with comments on the related output. The general framework, as before, is production analysis.

We start by loading the dataset into Stata and listing the first five rows:

The first column indicates the year, and the second reports the firm’s sector of main activity. Columns 3 to 6 report, respectively, number of employees, fixed assets, material cost, and turnover. The last column prints a fake ID of the firm. In the interest of replicability, we are providing artificially generated data that display the same distributional properties as the “real” data used in other works, such as Dosi et al. (2016) and Dosi et al. (2021). Each row in the dataset represents a different firm in a given year and industry.

As usual in the literature, the number of employees and fixed assets are used as proxies for labor and capital inputs, respectively, and turnover is used for output. In this two-inputs, one-output setting, the zonotope command allows analysis of the industry in ℝ³. For example, focusing on a specific year and sector, one can type

We will go through the results step by step to provide specific comments about the results relevant to the investigation.

The above, the first part of the output, after informing us about the version of the package, displays the number of dimensions of the current analysis (3 in the example) and the number of generators that in this case are firms (160).

The second part of the output, shown above, reports the production activity of the whole industry, the diagonal of the zonotope, as defined by (2), which is the result of the aggregation of the 160 generator-firms. The total number of employees in this industry is 5,531, while the total fixed assets and turnover of this industry are equal to 459,015 and 2,846,360 thousand Euros, respectively.

For the sake of exposition, we reported only the tangent for the first five firms of this example. As will be discussed in section 4.2, we extend the definition of industry productivity, (5), to individual firm productivity, as proxied by the angle that the vector-firm forms with the input plane. For example, firm 145725 produces 29,127.35 thousand Euros of output from 19 employees and 2,969.52 thousand Euros of fixed assets. According to (6), productivity of firm 145725, as proxied by the tangent of the above-mentioned angle, is

\frac{29127.35}{\sqrt{19^{2} + {2969.52}^{2}}} = 9.80857

Recall that a higher value of the tangent of the angle suggests a higher productivity because the vector-firm is steeper; that is, the angle that the vector forms with the input plane is larger. Figure 3 provides a simple illustration of this, in a one-input, one-output setting.

As for many other indicators of efficiency that use multiple inputs, the value of the measure alone is not of great use. It is much more relevant instead to perform a comparison across firms within the same industry or an intertemporal comparison of the same unit over time. This is the sort of comparison that is proposed in table 2 in section 4.1.

As shown above, the zonotope command then produces the eight statistics described in section 3.2. For example, S2 reports the length of the diagonal, which measures the size of the industry, accounting for two inputs and one output.

S4 reports the Gini index, which measures the heterogeneity of the sector, while S5 reports industry productivity as defined in (5). We strongly suggest using the measure of heterogeneity of a sector (or, in a different context, of the relevant aggregate) to make cross-sectional or intertemporal comparison. This is how the measures of heterogeneity are used in Dosi et al. (2016) and Dosi et al. (2021). The same argument applies to S5 and tangents, that is, the tangent value of the angle between the diagonal and the input plane. For example, it is difficult to conclude whether one industry in a specific country is highly productive based only on the magnitude of S5. But if we have the S5 productivity levels for the same industry in two different countries, then it is possible to determine which country is more productive in a given industry. Or if we have the S5 productivity level for one industry in one specific country over time, it is then possible to determine whether that country has increased productivity in that industry. In section 4.1, we provide a more detailed example regarding the dynamics of heterogeneity and productivity for a few selected industries over time.

The S7 statistic reports the cosine value of the angle⁹ formed by the projection of the diagonal on the input plane, that is, the angle formed by the industry input vector (5531, 459015) with the x axis (which corresponds to the number of employees). The value 0.0120488 indicates that this angle is almost π/2, telling us that the industry adopts many more fixed assets compared with employees.

The last bit of output is displayed as follows:

Elapsed time (MIN): 0.001383

This is the computation time. As we will explain more in section 5, building the zonotope of a set of generators has an exponential complexity (that is, it is very time consuming, especially in a high dimension and when the number of generators is high).

It is possible to display (or reuse) the computed results as follows:

4 Empirical analysis

In this section, we provide three additional, more-advanced examples of using zonotope to perform economic investigations. In section 4.1, based on firm-level data, we compute the heterogeneity and productivity levels of industries as suggested by Dosi et al. (2016). In section 4.2, we compute firm-level measures of productivity. And in section 4.3, based on the income and expenditure data of Namibia, we compute the multivariate Gini coefficient as a measure for inequality as suggested by Mosler (1994) and Koshevoy and Mosler (1997).

4.1 Assessing industry heterogeneity

In this section, we conduct some empirical investigations on industry heterogeneity and productivity. We do this by using larger sets of data to replicate the standard situation faced during empirical analysis on firm-level data, where firms are grouped according to their sector of main activity, at different levels of aggregation.

For replication of all the analyses, we use the same data introduced in the previous section. For each firm in the selected industries, number of employees and fixed assets are chosen to proxy inputs and turnover is chosen for output. All values except for the number of employees are assumed to be in thousands of Euros and deflated at the 4- digit NACE level with the year 2010 as benchmark to perform intertemporal comparison. We refer to the 4-digit NACE sector simply as 0001, 0002, etc. When it is necessary to aggregate the 4-digit sectors into 3-digit sectors, we use 0091, 0092, and 009, as we did in figures 1 and 2.

4.1.1 The two-inputs, one-output case

Traditionally, the most standard setting in which to investigate production is that which assumes production activity with two inputs and one output; number of employees and fixed assets are chosen to be the proxies for inputs and turnover value is chosen for the output. For each industry in one specific year, the normalized zonotope volume and the tangent value of the angle formed by the industry production vector and its input plane are easily computed as S4 and S5, which can be used to measure the industry heterogeneity and productivity level for a specific industry at a given point in time. Columns (2) and (3) of table 2 report these two results for selected industries in 2006, while column (1) reports the number of firms within that industry that year. To get these three variables for a specific industry in a specific year—for example, sector 0001 in 2006—we run the following commands:

From the available results, we select

Table 2.

Computation results for Gini coefficient and productivity among selected sectors and years—ℝ³ case

Sector	Year 2006			Year 2009			Year 2012
Sector	Obs	Gini	tg	Obs	Gini	tg	Obs	Gini	tg
	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)
0001	160	0.203	6.201	222	0.233	4.239	346	0.264	4.483
0002	115	0.126	6.617	134	0.119	6.780	162	0.138	9.292
0003	14	0.016	1.561	31	0.020	1.803	70	0.062	1.385
0004	386	0.153	3.014	431	0.190	2.077	603	0.200	2.937
0005	57	0.090	5.744	69	0.087	3.116	142	0.114	4.590
0006	142	0.155	4.078	168	0.170	2.660	212	0.254	5.859
0007	43	0.098	0.864	52	0.112	0.892	68	0.123	1.265
0008	47	0.091	3.402	53	0.101	2.292	69	0.122	3.107
0009	38	0.073	1.495	41	0.088	1.060	47	0.080	2.204
0010	18	0.084	1.276	26	0.092	1.048	40	0.201	1.995
0011	65	0.103	3.834	85	0.172	3.249	126	0.168	3.487
0012	177	0.204	3.102	189	0.155	2.153	259	0.164	3.153
0013	54	0.042	2.372	58	0.063	1.539	82	0.076	1.796
0014	225	0.137	3.376	245	0.125	2.218	368	0.126	2.755
0015	28	0.001	0.899	33	0.001	0.725	45	0.001	0.751
0016	45	0.026	1.288	44	0.022	0.912	62	0.042	0.875
0017	86	0.072	3.713	89	0.089	1.486	121	0.094	2.292

The chosen normalization strategy for the volume of the zonotope seems to be effective, because there is no apparent relation between the number of generator-firms and the Gini coefficients. For example, in 2006, there are 386 firms in sector 0004 and 160 firms in sector 0001. However, given the larger number of firms in sector 0004, we do not necessarily expect its Gini coefficient (the normalized volume of the zonotope) to be bigger than that of sector 0001. Indeed, based on our data sample, the Gini coefficient of sector 0001 is larger: 0.203 compared with 0.153 for sector 0004.

After taking into account the effect due to the number of firms, we notice that the heterogeneity levels are different among different industries. For example, in 2006, the Gini coefficients vary from 0.001 for sector 0015 to 0.204 for sector 0012. Similarly, as indicated in column (3), the productivity levels among different sectors are different.

From columns (4) to (6) and from columns (7) to (9), we report similar results in the years 2009 and 2012, respectively. This allows us to explore the dynamic of industry heterogeneity and productivity over time. Most of the selected sectors share an upward trend in their heterogeneity levels. For example, the heterogeneity level of sector 0001 increases from 0.203 in 2006 to 0.233 in 2009, and again to 0.264 in 2012. As for the productivity, for most of the industries, we report a decrease from 2006 to 2009 and an increase from 2009 to 2012.

The same analysis can be performed in the case of four dimensions, for instance in the three-inputs, one-output case.¹⁰ In the interest of space, results are not reported here, but they can be obtained with the following commands:

4.2 A geometric approach to firm-level productivity

As indicated in (5), Dosi et al. (2016) propose as a measure for industry productivity the tangent of the angle formed by the industry production activity vector d _Y with the space generated by all inputs. The approach can also be easily applied to individual firms. Assume firm i’s production activity follows (1). Then its productivity level can be measured by the tangent of Θ _l ₊₁(a _i ), that is, the angle formed by the firm production activity vector a _i with the space generated by all inputs. One possible measure for the productivity of firm i, denoted by p _i , follows:

p_{i} = tg {Θ_{l + 1} (a_{i})} = \frac{α_{i_{l + 1}}}{{||pr}_{- (l + 1)} (a_{i}) | |}, i = 1, \dots, N

The zonotope command computes this productivity for each firm and returns all of them as the vector tangents. Based on the same firm-level data used above, we compute the productivities of 12 selected firms (which all report data in all years) from industry sector 0010 in 2006 and report them in the column “Year 2006” of table 3. This newer measure is also consistent with the previous literature in that we still observe relevant heterogeneous productivity at the firm level. To compute the productivity (6) for the 12 firms,¹¹ we run the following commands:

Among the available results, we focus on that referring to firm productivity:

We then compute productivity of these firms in 2009 and 2012 and report the results in the respective columns of table 3. Note that, in accordance with previous findings, firm-level productivity displays persistence, and hence the large within-industry dispersion does not vanish over time.

Table 3.

Productivity of selected firms from sector 0010—R³ case

Firm ID	Year 2006	Year 2009	Year 2012
1	1.347	2.470	4.353
2	6.915	6.704	13.996
3	0.399	0.178	0.292
4	5.519	4.470	8.272
5	0.852	0.603	0.997
6	5.660	3.336	5.394
7	11.707	14.092	14.949
8	23.457	10.902	4.380
9	13.302	19.282	3.519
10	4.768	5.417	11.143
11	0.470	0.622	0.762
12	0.962	1.090	1.393

4.3 An application to inequality

The zonotope command is not limited to firm-level data to compute industry heterogeneity and productivity at the industry and firm levels. In this section, we show that the command is also helpful for computing the multivariate Gini coefficient, which measures inequality among a group of N units with l > 1 attributes.

4.3.1 Introduction to multivariate Gini coefficient

Following Mosler (1994) and Koshevoy and Mosler (1996), we focus on two multivariate Gini coefficients. To do this, we denote by $w_{i}^{j}$ the quantity of attribute j owned by unit i for i = 1,…, N and j = 1,…, l. Thus, we have vectors

w_{i} = (ω_{i}^{1}, \dots, ω_{i}^{j}, \dots, ω_{i}^{l}) \in ℝ_{+}^{l}

for i = 1,…, N. Correspondingly,

{\tilde{ω}}_{i}^{j} = \frac{ω_{i}^{j}}{\sum_{i = 1}^{N} ω_{i}^{j}}

indicates the share of attribute j out of the total amount owned by unit i. Vectors ${\tilde{w}}_{i}$ follow as

{\tilde{w}}_{i} = ({\tilde{ω}}_{i}^{1}, \dots, {\tilde{ω}}_{i}^{j}, \dots, {\tilde{ω}}_{i}^{l}) = (\frac{ω_{i}^{1}}{\sum_{i = 1}^{N} ω_{i}^{1}}, \dots, \frac{ω_{i}^{j}}{\sum_{i = 1}^{N} ω_{i}^{j}}, \dots, \frac{ω_{i}^{l}}{\sum_{i = 1}^{N} ω_{i}^{l}}) \in ℝ_{+}^{l}

for i = 1,…, N.

Mosler (1994) introduces a multivariate Gini index by using the volume of the Lorenz zonotope (LZ) as the Minkowski sum

L Z = \sum_{i = 1}^{N} {0, (\frac{1}{N}, {\tilde{w}}_{i})}

of line segments generated by ${(1 / N, {\tilde{w}}_{i})}_{1 \leq i \leq N}$ . The zonotope command provides this volume, that is, Vol(LZ). Note, however, that Koshevoy and Mosler (1996) point out that Vol(LZ) would be 0 when two attributes are similarly distributed among N units or one attribute is equally distributed among all units.¹² Hence, Koshevoy and Mosler (1997) suggest that instead of using the volume of the LZ, one should use the volume of the lift zonoid “expanded” by an l-dimensional cube in ℝ ^l ⁺¹. They propose the multivariate volume–Gini index as

R_{v} = \frac{1}{2^{l} - 1} \sum_{s = 1}^{l} {\sum_{1 \leq j_{1} \leq \dots \leq j_{s} \leq l} Vol (Z^{j_{1}, \dots, j_{s}})}

where zonotopes are defined as the Minkowski sum

Z^{j_{1}, \dots, j_{s}} = \sum_{i = 1}^{N} {0, (\frac{1}{N}, {\tilde{w}}_{i}^{j_{1}, \dots, j_{s}})}

of line segments generated by vectors ${(1 / N, {\tilde{w}}_{i}^{j_{1, \dots,} j_{s}})}_{1 \leq i \leq N}$ and ${\tilde{w}}_{i}^{j_{1, \dots,} j_{s}} \in ℝ_{+}^{s}$ is obtained from the j ₁ ,…, j_s components of vector ${\tilde{w}}_{i}$ .

To show the necessity of resorting to this further measure, we use a toy example detailed in appendix A. We focus on x and z, representing two attributes among the 10 units. Using the zonotope command, we can easily compute the volume of the corresponding LZ defined in (7) as 0.053. Now assume we have a third attribute that still coincides with z. We would then have a volume of LZ equal to 0. On the other hand, the multivariate volume–Gini index defined in (8) is able to provide a result of 0.144, even when there are identically distributed attributes.

To get this volume-Gini index, we need to use the zonotope command to compute volumes of different (2 ^l − 1) zonotopes. If we construct a sample of normalized data—that is, an N × (l + 1) matrix with each row as $(1 / N, {\tilde{w}}_{i})$ , where every column sums up to 1—to get each $Vol (Z^{j}^{_{1}, ..., j_{s}})$ , the zonotope command computes corresponding columns of data samples and reports the results in S1. However, recall that the zonotope command also reports the normalized volume of the zonotope in S4, which facilitates the process.

4.3.2 Empirical application of multivariate Gini coefficient

We present an empirical exercise based on actual data from the Namibia Household Income and Expenditure Survey 2009/2010 (The Namibia Statistics Agency 2013).¹³ This dataset includes detailed information about income and expenditures of 9,643 households in Namibia from 2009 to 2010. For each household, we compute the income per capita (IPC) and expenditure per capita (EPC) based on the income, expenditure, and number of people in the household.¹⁴ Notice, in particular, that the survey reports only the main source of income for each household. As a result, we have two attributes of each household, IPC and EPC, either of which can be used for computing the traditional univariate Gini coefficient.

In many studies, the univariate Gini coefficient based on IPC is adopted as the measure of inequality. We report it at the bottom of column (1) of table 4. Correspondingly, the univariate Gini coefficient based on EPC is reported at the bottom of column (2). The univariate Gini coefficient indicates a higher inequality level based on expenditure (0.780) than income (0.611). As we discuss above, these two attributes can be taken into account simultaneously by computing the volume of the corresponding LZ and multivariate volume–Gini index. We report them at the bottom of columns (3) and (4), respectively. It is not possible to compare the values of the univariate and multivariate Gini indices, because they are constructed in different ways.¹⁵ However, it is meaningful to compare one of the indexes cross-section or over time.

Table 4.

Various Gini coefficients among households with different income sources in Namibia from 2009 to 2010

Income source	Univariate Gini		Multivariate Gini
	IPC	EPC	Vol(LZ)	R_v	Obs
	(1)	(2)	(3)	(4)	(5)
Salaries / wages	0.558	0.700	0.155	0.471	4,953
Farming	0.656	0.866	0.206	0.576	2,014
Business activities	0.651	0.799	0.213	0.554	785
Employment pension	0.581	0.777	0.253	0.537	121
Cash remittances	0.465	0.645	0.127	0.413	282
State old pension	0.422	0.619	0.135	0.392	986
Other, specify	0.666	0.864	0.288	0.606	502
Total	0.611	0.780	0.190	0.527	9,643

Let us conduct a cross-section analysis now. Households are classified into seven groups according to their main source of income. We report the univariate and multivariate Gini coefficients between columns (1)–(4) in table 4 and the number of households per group in column (5). Notice that groups with different income sources display different inequality levels. According to the univariate Gini coefficient based on IPC, the least unequal group is the one relying on state old pension. This seems reasonable because state old pensions, compared with other sources, are expected to be more equally distributed among receivers.

The next thing to notice is related to the potential bias associated with Vol(LZ) as a measure of inequality, because it is sometimes not consistent with the evidence stemming from the univariate Gini coefficients. For example, the group “Business activities” seems more unequal when compared with “Employment pension” according to the univariate Gini coefficient and volume-Gini index R_v , while Vol(LZ) indicates the opposite. This exemplifies one possible drawback of Vol(LZ) as a measure of inequality. When IPC and EPC are more correlated, the value of Vol(LZ) is smaller.¹⁶ Thus, it is not impossible that two highly correlated attributes, both pointing to high-level inequality according to the univariate Gini coefficient, would point to low inequality when based on Vol(LZ). Intuitively, Vol(LZ) becomes smaller because of the correlation between two such attributes. In this case, although the two univariate Gini coefficients of group “Business activities” are bigger than those of “Employment pension”, because the correlation between IPC and EPC is bigger for the former (0.814 versus 0.652), the Vol(LZ) of “Business activities” becomes smaller than that of “Employment pension”. The volume-Gini index is considered to be more robust than Vol(LZ) not only when there are identically distributed attributes but also when there are highly correlated attributes.

We provide the code to generate the fourth row of table 4. First, we generate the relevant variables, that is, IPC and EPC, from the raw data.

Then, we compute the univariate Gini coefficient based on IPC and EPC by using the community-contributed command inequal7 (Kerm 2001).

Finally, we compute Vol(LZ) and volume-Gini index R_v based on IPC and EPC, using the package to compute the volume of different zonotopes. Additionally, we use the community-contributed commands tuples (Luchman, Klein, and Cox 2006), matsum() (Weesie 1997), and frmttable (Gallup 2012).

5 Analysis of zonotope computing time

Computing the volume of the zonotope given the list of its generators can potentially be time consuming, because its computational complexity is O(N^l ), where N is the number of generators and (l + 1) is the dimension of each generator (that is, its length). Thus, this algorithm falls within the exponential category; that is, its time does not scale in a polynomial way but in an exponential way. In particular, when N is kept constant, it is clearly exponential in the number of dimensions (l + 1).

Table 5.provides the elapsed times for a varying number of variables (from 2 to 6) and a fixed number of generators (N = 200). On the contrary, table 6 provides the elapsed times for a varying number of generators (from 50 to 250) and a fixed number of variables [(l + 1) = 6]. We ran the experiments on an Intel CPU i7 4-cores on a Windows 7 operating system, with 32 GB of RAM using Stata/IC. The data from tables 5 and 6 are charted on figures 4 and 5, respectively. From these figures, we can clearly see how the time complexity of computing the volume of the zonotope is exponential. To empirically prove that the time complexity is N^l , we have plotted in figure 6 the fifth root of the elapsed times, when the number of variables (l + 1) is equal to 6. The fact that the graph is a perfect straight line proves that the dependency is a power of 5, as expected.

Table 5.

Elapsed times for varying numbers of variables and 200 generators

Number of variables	Number of generators	Elapsed time
2	200	0.00017 min.
3	200	0.00157 min.
4	200	0.11477 min.
5	200	6.75457 min.
6	200	4h and 40.612 min.

Table 6.

Elapsed times for varying numbers of generators and 6 variables

Number of variables	Number of generators	Elapsed time
6	50	0.126 min.
6	100	5.648 min.
6	150	54.736 min.
6	200	4h and 40.612 min.
6	250	16h and 16.389 min.

Figure 4.

Elapsed time as a function of the number of variables, for 200 generators

Figure 5.

Elapsed time as a function of the number of generators, for 6 variables

Figure 6.

The fifth root of the elapsed time as a function of the number of variables, for 6 variables. The fact that the graph is a perfect straight line empirically proves that the computational complexity is N^l , because (l + 1) = 6 in our case (N = 200, of course).

6 Conclusions and future work

The recent and increasing availability of disaggregated economic data challenges many of the standard theoretical assumptions, yet there is still urgent need for adequate tools to deal with these emerging rich and complex empirical observations.

The zonotope package provides a rigorous way to perform empirical analysis while taking advantage of some of these emerging properties. In this work, we proposed an application to production analysis in which, thanks to the proposed methodology, it is possible to relax most of the standard assumptions that do not find support in the data. Firms, and economic agents in general, are quite different from each other in many respects: size, productivity, propensity to innovate and export, etc. Such differences do not vanish over time; selection, if at work, takes longer to exert its effects. Even focusing on firms within the same narrowly defined industrial sector does not help in reducing heterogeneity. In this context, the zonotope package enables assessment of the level of intra-industry heterogeneity and measurement of the level of productivity, and its variation over time, without imposing strong assumptions on the actual observations.

The software package we introduced is ready for applications in other domains; the zonotope framework itself is already used in other fields, for example, still within economic analysis, to assess inequality.

8 Program and supplemental materials

Supplemental Material, sj-zip-1-stj-10.1177_1536867X221083854 - A toolbox for measuring heterogeneity and efficiency using zonotopes

Supplemental Material, sj-zip-1-stj-10.1177_1536867X221083854 for A toolbox for measuring heterogeneity and efficiency using zonotopes by Marco Cococcioni, Marco Grazzi, Le Li and Federico Ponchio in The Stata Journal

Footnotes

7 Acknowledgments

We gratefully acknowledge Gianluigi Tiesi (Netfarm s.r.l.) for support in creating the git repository and help in generating the Stata plugin. This project has received financial support from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 649186—ISIGrowth. This article was presented at the 2019 Italian Stata Users Group meeting in Florence. We thank the participants at the conference for their comments. Suggestions and comments by an anonymous referee were much appreciated. The usual caveat applies. Le Li is the corresponding author.

8 Program and supplemental materials

To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type

To update the zonotope command to the latest version, type

To run the demo, execute the following do-file:

run "`c(sysdir_plus)’/z/zonotope_demo"

If the demo runs correctly, you are done.

However, it may fail, especially if you are using an operating system for which the plugin has not been generated yet.

In this case, the plugin must be generated from scratch, as explained next.

8.1 How to generate the zonotope plugin

You first need to download the C++ source code from the following git repository:

It contains everything needed to generate the Stata plugin for most platforms (see below).

To generate the plugin, the following things are required:

a C++ compiler

the CMake utility

(optional) the GIT utility (it is useful for downloading the tree of the whole project or specific subprojects)

The CMake utility is required to compile the plugin as a shared library. This shared library, named zonotope2.plugin or zonotope3.plugin (depending on the Stata version), is used by the ado-file associated with the zonotope command.

The ado-file is named zonotope.ado. Together with demo files, additional datasets and the help file can be downloaded from

The plugin generation has been tested on Windows 10 64-bit using Visual Studio 15, on Linux Ubuntu 18.04 64-bit using GCC, and on Mac OS 10.12.3 (16D32) 64-bit using the Mac OS Sierra operating system, with LLVM C++ compiler.

If you have troubles with the plugin, please open an issue on the associated GitHub repository. We will do our best to help you.

Notes

A A simple toy example

As a simple example, we enter the following commands:

We now have the following dataset loaded into Stata memory:

The dataset contains only 10 observations and 3 variables: x, y, and z.

We now enter the following command:

zonotope x y z if x > 3, verbose

This command builds the zonotope on input variables x and y, using variable z as the output variable. With the if statement, only the observations satisfying the condition x > 3 (the first input variable must be greater than 3) will be considered.

The following result will be displayed:

As is explained in greater detail in section 5, building the zonotope of a set of generators has an exponential complexity (that is, it is very time consuming, especially in a high dimension and when the number of generators is high). Therefore, we decided to implement it in C++, to reduce the run time as much as possible. In section 8, we provide step-by-step instructions for compiling the C++ source code to create the plugin (a binary file with extension .plugin). More precisely, our new command zonotope loads and then calls the C++ Stata plugin.

References

Andreoli

Zoli

2014. Measuring dissimilarity. Working Paper 23/2014, University of Verona, Department of Economics.

Aruka

2017. Some new perspectives on the inter-country analysis of the world production system. Evolutionary and Institutional Economics Review 14: 467–498. https://doi.org/10.1007/s40844-017-0085-2.

Badunenko

Mozharovskyi

2016. Nonparametric frontier analysis using Stata. Stata Journal 16: 550–589. https://doi.org/10.1177/1536867X1601600302.

Baily

M. N.

Hulten

Campbell

1992. Productivity dynamics in manufacturing establishments. Brookings Papers on Economic Activity: Microeconomics 1992: 187–267. https://doi.org/10.2307/2534764.

Baldwin

J. R.

Rafiquzzaman

1995. Selection versus evolutionary adaptation: Learning and post-entry performance. International Journal of Industrial Organization 13: 501–522. https://doi.org/10.1016/0167-7187(95)00502-1.

Bartelsman

E. J.

Doms

2000. Understanding productivity: Lessons from longitudinal microdata. Journal of Economic Literature 38: 569–594. https://doi.org/10.1257/jel.38.3.569.

Belotti

Daidone

Ilardi

Atella

2013. Stochastic frontier analysis using Stata. Stata Journal 13: 719–758. https://doi.org/10.1177/1536867X1301300404.

Cazals

Florens

J.-P.

Simar

2002. Nonparametric frontier estimation: A robust approach. Journal of Econometrics 106: 1–25. https://doi.org/10.1016/S0304-4076(01)00080-X.

Charnes

Cooper

W. W.

Rhodes

1978. Measuring the efficiency of decision making units. European Journal of Operational Research 2: 429–444. https://doi.org/10.1016/0377-2217(78)90138-8.

10.

Daraio

Simar

2005. Introducing environmental variables in nonparametric frontier models: A probabilistic approach. Journal of Productivity Analysis 24: 93–121. https://doi.org/10.1007/s11123-005-3042-8.

11.

Daraio

Simar

2007. Conditional nonparametric frontier models for convex and nonconvex technologies: A unifying approach. Journal of Productivity Analysis 28: 13–32. https://doi.org/10.1007/s11123-007-0049-3.

12.

Disney

Haskel

Heden

2003. Entry, exit and establishment survival in UK manufacturing. Journal of Industrial Economics 51: 91–112. https://doi.org/10.1111/1467-6451.00193.

13.

Dosi

2007. Statistical regularities in the evolution of industries: A guide through some evidence and challenges for the theory. In Perspectives on Innovation, ed. Malerba

Brusoni

, 153–186. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511618390.009.

14.

Dosi

Grazzi

2006. Technologies as problem-solving procedures and technologies as input–output relations: Some perspectives on the theory of production. Industrial and Corporate Change 15: 173–202. https://doi.org/10.1093/icc/dtj010.

15.

Dosi

Grazzi

Marengo

Settepanella

2021. Productivity decomposition in heterogeneous industries. Journal of Industrial Economics 69: 615–652. https://doi.org/10.1111/joie.12252.

16.

Dosi

Grazzi

Marengo

Settepanella

2016. Production theory: Accounting for firm heterogeneity and technical change. Journal of Industrial Economics 64: 875–907. https://doi.org/10.1111/joie.12128.

17.

Eurostat. 2008. NACE Rev. 2 Statistical classification of economic activities in the European Community. Methodologies and working papers, Eurostat.

18.

Farrell

M. J.

1957. The measurement of productive efficiency. Journal of the Royal Statistical Society, Series A 120: 253–290. https://doi.org/10.2307/2343100

19.

Franciosi

Settepanella

Terni

2020. The robustness of the generalized Gini index. ArXiv Working Paper No. arXiv:2007.12924. https://arxiv.org/abs/2007.12924.

20.

Gallup

J. L.

2012. A programmer’s command to build formatted statistical tables. Stata Journal 12: 655–673. https://doi.org/10.1177/1536867X1201200406.

21.

Griliches

Mairesse

1999. Production functions: The search for identification. In Econometrics and Economic Theory in the Twentieth Century: The Ragner Frisch Centennial Symposium, ed. Steiner

, 169–203. Cambridge: Cambridge University Press. https://doi.org/10.1017/CCOL521633230.006.

22.

Hildenbrand

1981. Short-run production functions based on microdata. Econometrica 49: 1095–1125. https://doi.org/10.2307/1912746.

23.

Y.-B.

Lee

2010. Data envelopment analysis. Stata Journal 10: 267–280. https://doi.org/10.1177/1536867X1001000207.

24.

Kerm

P. V.

2001. inequal7: Stata module to compute measures of inequality. Statistical Software Components S416401, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s416401.html.

25.

Koopmans

T. C.

1977. Examples of production relations based on microdata. In The Microeconomic Foundations of Macroeconomics, ed. H. G. C., 144–178. London: Palgrave Macmillan. https://doi.org/10.1007/978-1-349-03236-5_6.

26.

Koshevoy

Mosler

1996. The Lorenz zonoid of a multivariate distribution. Journal of the American Statistical Association 91: 873–882. https://doi.org/10.1080/01621459.1996.10476955.

27.

Koshevoy

G. A.

Mosler

1997. Multivariate Gini indices. Journal of Multivariate Analysis 60: 252–276. https://doi.org/10.1006/jmva.1996.1655.

28.

Luchman

J. N.

Klein

Cox

N. J.

2006. tuples: Stata module for selecting all possible tuples from a list. Statistical Software Components S456797, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s456797.html.

29.

Mosler

1994. Majorization in economic disparity measures. Linear Algebra and Its Applications 199: 91–114. https://doi.org/10.1016/0024-3795(94)90343-3.

30.

Simar

Zelenyuk

2011. Stochastic FDH/DEA estimators for frontier analysis. Journal of Productivity Analysis 36: 1–20. https://doi.org/10.1007/s11123-010-0170-6.

31.

Syverson

2011. What determines productivity? Journal of Economic Literature 49: 326–365. https://doi.org/10.1257/jel.49.2.326.

32.

The Namibia Statistics Agency. 2013. Namibia Household Income and Expenditure Survey 2009/2010 (NAM_2009_HIES_v01_M_v01_A_PUF). https://nsa.org.na/microdata1/index.php/catalog/6/study-description.

33.

United Nations. 2008. International Standard Industrial Classification of All Economic Activities Revision 4. Statistical papers series m no. 4/rev.4, UN—Department of Economic and Social Affairs.

34.

Weesie

1997. dm49: Some new matrix commands. Stata Technical Bulletin 39: 17–20. Reprinted in Stata Technical Bulletin Reprints. Vol. 7, pp. 43–48. College Station, TX: Stata Press.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

2.33 MB

0.00 MB