Sage Journals: Discover world-class research

Abstract

Data management tasks include manipulating variables, variable labels, and value labels. While Stata has versatile commands and functions to address the first task, managing variable and value labels is not as convenient. In this article, I introduce a new command, elabel, that enhances the capabilities of Stata’s label commands. I discuss these enhancements using various examples. I also demonstrate how to add new commands to elabel.

Keywords

dm0101 elabel label value labels variable labels data management

1 Introduction

Manipulating variables is convenient in Stata. We can use wildcard characters (see [U] 11.4 varname and varlists) to abbreviate variable names or refer to more than one variable at a time. The rename command (see [D] rename group) changes groups of variable names systematically. Moreover, we can apply arithmetic and relational expressions as well as many functions (see [D] generate and [U] 13 Functions and expressions) and transformation rules (see [D] recode) to create or change the contents of variables.

Unlike manipulating variables, managing variable and value labels is not as convenient. The label commands (see [D] label) do not support wildcard characters in value-label names, and there is no dedicated command for changing value-label names. Moreover, Stata’s expressions and functions do not readily apply to labels. For example, we cannot change a specific word in a label; we must define or redefine the complete label. Likewise, there is no convenient way to define or modify value labels other than specifying integer-to-text mappings one at a time.

To date, there are many community-contributed commands for manipulating variable and value labels that go beyond Stata’s native label commands (for example, Blumenberg 2012, 2016; Cox 2000; Jann 2007, Joly 2002; Klein 2011; Newson 2007, 2009, 2018; Nichols 2011; Weesie 2005a, b). Many of these community-contributed commands are tailored to solve one specific problem. Notwithstanding their functionality, locating and utilizing the appropriate command for any specific problem at hand is sometimes inconvenient because there are no shared conventions concerning command names and syntax among different authors.

In this article, I introduce another command for manipulating variable and value labels: elabel. The approach that I follow here differs somewhat from the approach of most existing community-contributed commands in this area. Instead of focusing on specific problems, I suggest an integrating approach for extending Stata’s built-in label commands. I argue that Stata’s label commands provide a natural starting point because they are already familiar to most Stata users. The general extensions that I propose here are also useful for implementing commands that address more specific problems.

The remainder of this article is structured as follows. I start with a brief technical overview of the elabel command in section 3 and show basic applied examples in section 3. In section 4, I further develop elabel‘s underlying idea of general solutions to specific problems. In section 5, I demonstrate how to add new commands to elabel, assuming some familiarity with Stata’s programming features. I close with a brief summary and concluding remarks.

2 The elabel command

2.1 Syntax

elabel subcommand [ elblnamelist ] [ mappings ] [ iff eexp ] [ , options ]

2.2 Description

elabel manipulates variable labels and value labels. In what follows, I briefly describe its respective syntax elements.

subcommands

subcommands for elabel are the same as those for label, and we may generally type elabel wherever we type label. If elabel adds nothing to subcommand, it simply passes whatever we type through to Stata’s respective label command. elabel also provides additional subcommands, some of which I discuss in section 4.3, 4.4, and 5.

elblnamelist

An elblnamelist is a list of value-label names, which may contain the wildcard characters *, ˜, and ?; these characters have the same meaning as they have in varlist (see [U] 11.4 varname and varlists). The names in elblnamelist may also refer to value-label names indirectly; (varname) refers to the value-label name, if any, that is attached to varname in the current label language (see [D] label language). More than one variable name may be specified.

mappings

The mappings are subcommand specific and mirror those used with the respective label command. Typically, mappings alternate values (or variable names) with labels. The general form is

{ # | varname } " label " { # | varname } " label "…

With elabel, we can use parentheses to group the integer values (or variable names) and labels, respectively. The general form is

(numlist|=eexp|varlist) ("label" [… ]|=eexp)

where eexp is explained below. We further discuss mappings in section 4.

iff eexp

The iff (note the double f) qualifier is similar to Stata’s standard if qualifier; eexp, however, does not typically refer to observations in the dataset and does not typically contain variable names.¹

An eexp is a Stata expression that typically contains the characters # and @. The # character acts as a placeholder for (integer) values, while the @ character represents the text in value labels or variable labels.

When following the iff qualifier, eexp must evaluate to true (!=0) or false (==0), and it selects a subset of integer-to-text mappings from value labels. For example, iff (# < .) selects all nonmissing integers and associated text from a value label. Likewise, iff (@ == "Foreign") selects only the text Foreign and the associated integer value from a value label. Note that the @ character must not be enclosed in double quotes.

When eexp follows the equals sign in mappings, # is replaced with the integer values in a value label; likewise, @ is replaced with the corresponding text. Both characters may be combined, but the evaluated eexp must either be of type numeric or string.

3 Basic examples of the elabel command

3.1 Wildcard characters for value-label names

In a first example, suppose we are interested in the contents of value label origin in auto.dta. To view the contents with elabel, we use the list subcommand.

The output that we obtain looks familiar; it is the same output that we would have obtained with Stata’s standard label command. However, with Stata’s label command, we cannot use wildcard characters in value-label names. For example, typing

. label list ori˜ value label ori not found r(111);

results in an error message. With elabel, we still obtain the desired result:

We could also use wildcard characters to conveniently refer to more than one valuelabel name at a time.

3.2 Specify value-label names indirectly

In the example above, we have typed the value-label name origin or substituted parts of that name with wildcard characters. Often, I do not readily remember the names of the value labels that are attached to variables. However, I do remember the variable names. Suppose we want to list the contents of the value label that is attached to variable foreign. With elabel, we do not need to look up the respective value-label name; we can simply enclose the variable name in parentheses, typing

and obtain the desired result. The experienced Stata user will recognize the syntax that encloses variable names in parentheses; there is a macro function, label(see [P] macro), with the same syntax. With elabel, we can use this syntax anywhere an elblnamelist is allowed.

3.3 Additional returned results

Although the output from elabel list looks exactly like the output that we get from Stata’s label list, there are differences behind the scenes. Let us look at the returned results.

elabel list returns all the scalars that label list would return.² Arguably more useful, elabel list also returns the value-label name, the integer values, and the associated text.³

3.4 Subsets of integer-to-text mappings

For brevity, ignore the motivational reason for now, and pretend that we wish to list only the integer-to-text mappings in origin for which the integer value is greater than 0. We do this by typing

Here the iff qualifier is similar to Stata’s if qualifier (see [U] 11.1.3 if exp), which is allowed with most commands for manipulating variables. The # character acts as a placeholder for the integer values in origin. We can also refer to the text that is mapped to integer values using the @ character. Say we wish to list only the integer-totext mappings for which the text contains an uppercase D:

The strpos() function was an arbitrary choice; we could have chosen strmatch() instead (see [FN] String functions). In general, we can use any of Stata’s functions in the expression that follows the iff qualifier as long as it evaluates to true (!=0) or false (==0) for any value # and string @.

4 General solutions to specific problems

4.1 Define value labels with numeric-to-numeric mappings

Compared with many other community-contributed commands for manipulating variable and value labels, elabel follows a more general approach that I will illustrate with two examples from the labutil package (Cox 2000), available from the Statistical Software Components. Although most commands in the labutil package date back to the early 2000s, the package is still the most frequently downloaded bundle of commands for managing variable and value labels.

One specific problem that labutil solves is defining “labels for values which are base 10 logarithms containing the antilogged values” (Cox 2000). The command for solving this problem is lablog. An example is

. lablog logs, values(1/4)

label def logs 1 "10" 2 "100" 3 "1000" 4 "10000", modify

where the label define command that lablog creates is echoed. Following the valuelabel name, logs, we have specified the values() option, listing the (integer) values that we wish to associate with labels.

A second, more general problem that labutil addresses is defining “value labels using a mapping from numeric values to numeric labels” (see labmap.hlp in Cox [2000]). The respective command that solves this problem is labmap. As an example, we define a value label that maps minutes after midnight to hours.

Following the value-label name, time, we specify the values() option; we also include the first(), max(), step(), and postfix() options that specify the first (numeric) label, the maximum label, the steps between labels, and additional text, respectively.

When we compare the two examples, it appears as if the command that solves the more general problem, labmap, also has the more complex syntax. The more complex syntax is arguably both harder to remember and harder to understand. Moreover, note that the first problem, mapping values to their base 10 antilog, is actually a special case of the second problem, mapping numeric values to numeric labels. Yet the more general command, labmap, does not readily apply to the first problem.

With elabel, we approach both problems more generally: mapping (integer) values to an arbitrary function of themselves. Here is how we solve the two problems above with elabel.

Let us examine the code. Stata’s command for defining value labels is label define; because we wish to define value labels, we use the corresponding elabel command. Following the value-label names, logs2 and time2, respectively, we specify the integerto-text mappings, grouping integer values and labels (see section 2.2). Inside the first pair of parentheses, we specify a numlist of integer values. Inside a second pair of parentheses, we specify the text to be mapped to these integer values. We specify the text with an expression that contains the # character, which acts as a placeholder for the values in the first pair of parentheses. Because the expression must evaluate to a string, we use Stata’s strofreal() function; for the second problem, we additionally use the cond() function (see [FN] String functions and [FN] Programming functions for more information on both).

Comparing the first elabel command with the respective lablog command, we see the latter clearly has the more convenient syntax. However, elabel‘s syntax is arguably more explicit about what actually happens and might thus be easier to understand just by looking at the code; there are some peculiarities, but most of the syntax elements are already known to Stata users. Moreover, once we understand that code, it readily applies to related problems. Moving to the second example, we use the same command, elabel define, and we specify the integer-to-text mappings as before. All we change is the expression and functions to transform the integer values in the desired way. Admittedly, figuring out the appropriate expression is the hard part; given an appropriate expression, elabel define basically reduces to a convenient wrapper for foreach(see [P] foreach). Arguably, elabel is more convenient for systematically modifying existing value labels.

4.2 Modify value labels systematically

For our next example, suppose we have the following value label indicating the frequency of smoking:⁴

Suppose further that we wish to change the integer-to-text mappings so that never is mapped to 0, once a week or less is mapped to 1, and so on. If smoke was a variable, we could simply code

. replace smoke = smoke-1

to change the integer values.

Using elabel, we can do something similar with value labels.

. elabel define smoke (= #-1) (= @), replace

Let us inspect the code. Stata’s command for changing value labels is label define; because we wish to change a value label, we use the corresponding elabel command. Following the value-label name, smoke, we specify the integer-to-text mappings, grouping integer values and labels (see section 2.2). Inside the first pair of parentheses, we specify an expression for the integer values; here we subtract 1 from each integer value. We also specify an expression for the text inside a second pair of parentheses; here we simply copy the existing text. We are then left with five new integer-to-text mappings: 0 "never" 1 "once a week or less" …4 "every day". Because we wish to replace an existing value label, we specify the option replace.⁵ Let us verify the result:

4.3 Modify value labels systematically, continued

As mentioned in the introduction, Stata has convenient commands for manipulating variables, such as recode (see [D] recode). Let us stick with our example of value label smoke. If smoke was a variable and we wanted to reverse its coding, we could type

simultaneously defining an appropriate value label. Although recode allows us to define a new value label, it is arguably inconvenient to retype all labels when they already exist.

With elabel, we can specify transformation rules that are similar to those used with recode. Here is how elabel‘s respective recode subcommand looks.

Compared with the recode command for variables, elabel‘s recode subcommand is conveniently short because elabel allows a numlist on both sides of the equals sign.⁶ Also, we do not need to retype any labels; we merely change the integer values. The define() option requests that, instead of replacing value label smoke, a new value label, smoke2, be defined. However, because we have also specified the dryrun option, elabel did not define smoke2; instead, it has listed the original and transformed value labels so we can verify the result first. If we are satisfied, we can remove the dryrun option and define value label smoke2.

There is one more convenient feature: elabel recode returns the transformation rules in r() in a format that the recode command for variables will accept.

. return list

macros:

r(rules) : "(0=4) (1=3) (2=2) (3=1) (4=0)"

We could now pass these transformation rules to Stata’s recode command and modify any number of variables, accordingly.⁷

4.4 Changing value-label names

Stata’s label commands cannot readily change value-label names. In principle, changing a value label name requires three steps: first, copy the old value label using a new name; second, attach this new value label to all variables that previously had the old value label attached;⁸ third, drop the old value label from memory.

Weesie (2005b) discusses the problem of renaming value labels and introduces the labelrename command to do this. labelrename resembles Stata’s old rename command for variables (see [D] rename) and changes the name of one value label at a time. Drawing on Weesie’s work but resembling Stata’s new rename command (see [D] rename group), elabel can change the names of groups of value labels.

I will demonstrate elabel‘s rename subcommand with nlsw88.dta, which is shipped with Stata. All value-label names in this dataset end in lbl; here are two examples.

Suppose now we wanted to change all value-label names to instead end in VL. Here is how we do this with elabel rename.

4.5 A final example of changing variable labels

Stata’s label commands also manage variable labels. We will continue where we left off in section 4.4. Suppose we want to change the label of variable collgrad in nlsw88.dta so that each word starts with an uppercase letter. With elabel, we change the current variable label in the same way in which we change value labels.

Inside the first pair of parentheses, we specify the variables whose labels we want to change, collgrad.⁹ Inside the second pair of parentheses, we specify an expression combining the strproper() function (see [U] 13 Functions and expressions) with the @ character, which acts as a placeholder for the current variable label.¹⁰

5 Adding commands to elabel

In the examples above, we have seen how elabel enhances Stata’s label commands; elabel also comprises programming commands (and Mata functions) that are intended to assist with implementing new commands.

5.1 The problem: Combining value labels

As an example of a new command, we will draw on labvalcombine, which is part of the labutil package (Cox 2000). The labvalcombine command “combines two or more sets of value labels into one”.¹¹ Here is an example from the help file:

No corresponding elabel command does what labvalcombine does, and because we know about labvalcombine, there is little need for such a command. However, pretend that there was no labvalcombine command and that we wanted to add such a command to elabel. In the remainder of this section, I will demonstrate how to do this.

5.2 How to combine value labels

Our goal is to implement a new command, elabel combine, that essentially does what labvalcombine does. First, we need to figure out how to combine sets of value labels.¹² Using elabel‘s copy subcommand makes this fairly easy. Assuming value label both does not yet exist, we will need two lines of code.

. elabel copy lbl1 both

. elabel copy lbl2 both, modify

The first line of code copies the contents of value label lbl1 to the new value label both. The second line copies the contents of value label lbl2 to the now existing value label both, modifying both‘s contents. While we could have used Stata’s label copy command in the first line, we could not have used it in the second because Stata’s label copy command does not allow the option modify. Anyway, we now know how to combine two sets of value labels. For more than two labels, we would simply loop over the remaining value labels.

5.3 Implementing elabel combine

After figuring out the code to combine sets of value labels, we are ready to implement our new command. Say we want the syntax to be

elabel combine elblnamelist, {define( newlblname)| replace}

That syntax diagram indicates that the caller must specify an elblnamelist and one of two options, which we discuss below.

To allow elabel to call our new subcommand, combine, we need to write a program and name it elabel_cmd_combine. Because we want to allow an elblnamelist, we will use elabel‘s parse command, which resembles a rudimentary version of Stata’s syntax command (see [P] syntax) and which I will briefly explain below. We will also allow two options. Here is how our program starts.

Focusing on the elabel parse command, we first list the allowed syntax elements before the colon. Here we allow an elblnamelist and two options. Following the colon, we explicitly pass the contents of local macro ‘0’ (see [P] macro) to elabel parse. To remind you, local macro ‘0’ contains whatever the caller has typed (see [U] 18.4 Program arguments). After elabel parse has concluded, local macro ‘lblnamelist’ will contain a list of value-label names that the caller has passed to elabel combine. Further, if the caller specifies the option define(), local macro ‘define’ will contain the specified name. If the caller specifies the option replace, local macro ‘replace’ will contain the word replace.

Let us agree on the following rules for options in elabel combine: if callers do not specify a value-label name for the combined values in define(), we will use the first value-label name that they mention in ‘lblnamelist’. The callers will have to specify replace if the value-label name for the combined values is already defined. Here is how we implement this.

The only command in the code that you cannot read about in Stata’s documentation is elabel confirm, which does the same thing as Stata’s confirm command (see [P] confirm): it confirms that ‘define’ is a new, not yet defined value-label name and exits with the appropriate error message if it is not. We are done with the initial parsing. Let us implement the code for combining the value labels.

In a last step, we need to store our program where Stata can find it. If we intend to use our command regularly, we would store the code as elabel_cmd_combine.ado somewhere along the ado-file path (see [P] sysdir).¹³ For our example here, it suffices to put the code into a do-file and execute it to define the program in memory.

. do elabel_cmd_combine.do (output omitted )

Let us replicate the example of labvalcombine at the beginning of this section to verify that elabel combine works as expected. Before we do, let us remove all the value labels, except lbl1 and lbl2, which we want to combine. We note in passing that elabel has a keep subcommand that complements the label drop command.

. elabel keep lbl1 lbl2

Here is how we obtain the result of labvalcombine with elabel combine.

6 Conclusion

A common task in data management is defining and manipulating variable and value labels. elabel facilitates this task by enhancing Stata’s label commands. The command supports wildcard characters in value-label names and indirectly refers to value labels via variable names. Further, elabel can select integer-to-text mappings from value labels and applies any of Stata’s expressions and functions to variable and value labels. Thus, elabel facilitates changing variable and value labels systematically. Stata programmers who intend to write their own commands for managing variable and value labels can easily include elabel‘s features in their code.

8 Programs and supplemental materials

Supplemental Material, dm0101 - Extensions to the label commands

Supplemental Material, dm0101 for Extensions to the label commands by Daniel Klein in The Stata Journal

Footnotes

7 Acknowledgments

I thank Nicholas J. Cox, Daniel Bela, and the participants at the 2019 German Stata Users Group meeting in Munich for suggestions and critical comments on the software. I also thank Tim Morris for valuable remarks regarding both the manuscript and the software.

8 Programs and supplemental materials

To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type

. net sj 19-4

. net install dm0101 (to install program files, if available)

. net get dm0101 (to install ancillary files, if available)

Notes

References

Blumenberg

J. N.

2012. valtovar: Stata module to rename value labels to match variable names. Statistical Software Components S457443, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s457443.html.

Blumenberg

J. N.

2016. trimlabs: Stata module to trim variable labels. Statistical Software Components S458148, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s458148.html.

Cox

N. J.

2000. labutil: Stata modules for managing value and variable labels. Statistical Software Components S402501, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s402501.html.

Jann

2007. labelsof: Stata module to obtain a list of labeled values. Statistical Software Components S456834, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s456834.html.

Joly

2002. varlab: Stata module to save and load variable labels. Statistical Software Components S425001, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s425001.html.

Klein

2011. labutil2: Stata module to manage value and variable labels. Statistical Software Components S457320, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s457320.html.

Newson

2007. lablist: Stata module to list value labels (if present) for one or more variables. Statistical Software Components S456855, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s456855.html.

Newson

2009. varlabdef: Stata module to define a value label with values corresponding to variables. Statistical Software Components S457026, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s457026.html.

Newson

2018. vallabdef: Stata module to define value labels from label name, value and label variables. Statistical Software Components S458451, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s458451.html.

10.

Nichols

2011. labmatch: Stata module to find observations by label values. Statistical Software Components S457263, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s457263.html.

11.

Weesie

2005a. Multilingual datasets. Stata Journal 5: 162–187.

12.

Weesie

2005b. Value label utilities: labeldup and labelrename. Stata Journal 5: 154–161.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB