Software Updates

Abstract

dm0042_3: Speaking Stata: Distinct observations. N. J. Cox and G. M. Longton. Stata Journal 15: 899; 12: 352; 8: 557–568.

A sort() option has been added to specify that output be displayed sorted. The specification may be one of alpha, distinct, or total and may also be descending. alpha specifies that output be displayed sorted alphabetically by variable name. distinct specifies that output be displayed sorted on number of distinct values. total specifies that output be displayed sorted on total number of nonmissing values. descending specifies that output be displayed sorted but in reverse order.

Some other small fixes have been made, notably in use of colors in display.

dm0085_2: Speaking Stata: A set of utilities for managing missing values. N. J. Cox. Stata Journal 17: 779; 15: 1174–1185.

Sorting has been extended for missings report so that a sort() option allows sorting by either number of missing values or variable names and in either ascending (the default) or descending order. The previous sort option with no arguments, equivalent to sort(missings descending), remains supported to not break existing scripts.

The scope of the identify() option has been clarified in the help.

gr0072_1: Speaking Stata: Logarithmic binning and labeling. N. J. Cox. Stata Journal 18: 262–286.

The code for the niceloglabels command is tweaked to fix two small occasional puzzles. First, the command allows two syntaxes. If the syntax was incorrect for either syntax, the error message could be confusing because an attempt to specify one syntax might be misread as an attempt to specify the other syntax. Parsing of user syntax is made a little smarter. Second, precision problems could bite for values of less than 1 so that the command could fail to suggest a label for a value within the range specified or implied. This has been improved.

A new unitfraction option has been added so that labels for values less than 1 may appear as, say, 1/2 1/4 1/8 rather than 0.5 0.25 0.125 or as, say, 1/10 1/100 1/1000 rather than 0.1 0.01 0.001.

pr0041_3: Correlation with confidence, or Fisher’s z revisited. N. J. Cox. Stata Journal 17: 779; Stata Journal 10: 691; Stata Journal 8: 413–439.

This update corrects code for a bias correction used if (and only if) the fisher option is specified. For correlation r as an estimate of ρ, the option applies the approximation that z ≡ atanhr = atanhρ + ρ/[2(n − 1)]. The last term giving bias on the z scale was misstated as 2ρ/(n − 1) on pages 422 and 434 of the 2008 article and in the code previously published. Thus, revisiting the worked example with sample size n = 20 on page 424 reveals that the 95% confidence interval with a bias correction is (to 3 decimal places) [−0.833, −0.243], not [−0.817, −0.196]. Similarly, on page 427, the interval with bias correction is [−0.827, −0.223], not [−0.811, −0.176]. On page 431, with now n = 82, the interval with bias correction is [0.648, 0.837], not [0.639, 0.833].

The update is an opportunity to correct statements in the article that might mislead. On page 416, it is stated that “Stata itself has no sinh() or cosh() function.” That was correct at the time of writing, but these functions were added in Stata 11 in 2009. The Doornik–Hansen test mentioned on page 428 was also added in Stata 11 as one output of mvtest normality.

st0509_1: Estimating receiver operative characteristic curves for time-dependent outcomes: The stroccurve package. M. Cattaneo, P. Malighetti, and D. Spinelli. Stata Journal 17: 1015–1023.

The stroccurve command included a coding error, which led to incorrect estimates of the area under the time–concentration curve.

st0516_2: Exploring marginal treatment effects: Flexible estimation using Stata. M. E. Andresen. Stata Journal 18: 489; 18: 118–158.

Following feedback from users, the mtefe command has been updated and bugs have been fixed. A bug that caused the weights for the local average treatment effect to be wrong when using weights that do not have mean 1 was corrected. Among the most important changes, the default behavior for bandwidth selection when using semiparametric models has been changed so that the rule-of-thumb bandwidth from lpoly is now used when constructing Y ^˜ ; see table 2 of the original article. The default bandwidth for the semiparametric estimates of k(u) is still 0.2 because the rule of thumb is not appropriate for estimating local derivatives.

Furthermore, the final equation on page 137 of the original article is incorrect—K ₁(p) cannot be backed out directly from estimates on K(p) and K ₀(p). Thus, mtefe now stores values of K(p), K ₀(p), and K ₁(p) when the savekp option is specified.

I thank Christina Gathmann, Li-Wei Chao, Rebecca Diamond, and many other users for useful bug reports and discussions that have improved mtefe.

st0564_1: kg_nchs: A command for Korn–Graubard confidence intervals and National Center for Health Statistics’ Data Presentation Standards for Proportions.

B. W. Ward. Stata Journal 19: 510–522.

This updated version of the kg_nchs postestimation command includes a note that was added to the bottom of the output noting that the number of strata and primary sampling units used to calculate the degrees of freedom are those from e(sample). If the strata and primary sampling unit counts for the subpopulation do not equal those of the e(sample), the degrees of freedom used may be larger than those specified by National Center for Health Statistics standards.

st0585_1: Simar and Wilson two-stage efficiency analysis for Stata. O. Badunenko and H. Tauchmann. Stata Journal 19: 950–988.

simarwilson has been updated to tremendously reduce the runtime of the command. The reduction in computing time is achieved through the new community-contributed command ftruncreg (included with this update), which replaces the official Stata command truncreg within simarwilson. ftruncreg makes full use of Mata’s optimization routines and thus fits the truncated regression model much faster than truncreg. Because simarwilson bootstraps the truncated regression model, the required computing time is substantial, and its reduction is significant. For the application in Badunenko and Tauchmann (2019), the average runtime is reduced from 91 to 15 seconds (algorithm 1) and from 103 to 22 seconds (algorithm 2).

As a further minor change to its earlier version, simarwilson now allows the options cformat(), pformat(), and sformat().

The new ftruncreg command is similar to the official truncreg command; however, it does not allow weights or the constraints() or offset() option. If the user specifies one of these with simarwilson, the command will automatically switch to using the official Stata command truncreg. By specifying the new option truncreg, one may manually switch to the official Stata truncreg command, which follows the behavior of the earlier version of simarwilson. Because ftruncreg invariably uses a modified Newton–Raphson algorithm for optimization, which is the best choice for fitting the truncated regression model, all specified maximize_options are ignored unless the official Stata truncreg command is used. The new version of simarwilson stores information about the program used to estimate the truncated regressions in the macro e(truncreg).

ftruncreg can be used independently from simarwilson for fast estimation of the truncated regression model. The syntax is the same as for the official Stata truncreg command. Yet, the above restrictions regarding the available options apply. The truncreg options lrmodel, collinear, and coeflegend are likewise not allowed with ftruncreg.

References

Badunenko

Tauchmann

. 2019. Simar and Wilson two-stage efficiency analysis for Stata. Stata Journal 19: 950–988. https://doi.org/10.1177/1536867X19893640.