Abstract
I discuss three related problems about getting the last day of the month in a new variable. Commentary ranges from the specifics of date and other functions to some generalities on developing code. Modular arithmetic belongs in every Stata user’s coding toolbox.
1 Introduction
Given a monthly date variable in Stata, people sometimes want the last day of each month as a new daily date variable. This problem has been touched on in a previous tip (Samuels and Cox 2012), but the title of that tip may not make its relevance to this question sufficiently evident.
In this column, I will examine that problem and two related problems. I also attempt to distill some coding morals that lie behind the problems and their solutions. I assume you know enough about date variables to understand that monthly dates and daily dates are held in different ways. If you do not, reading
2 Have monthly dates; seek last daily date of each month
Scrutiny of
Let us make a sandbox dataset to work on:
Do you see what we did there? The sandbox has some months for 1960 (a leap year) and some for 1961 (not a leap year), so we can test our solutions on different months and the two cases for February. Just typing some data into the Data Editor may be quicker than thinking up the small trickery here with
The first trick is this. The last day of the present month is the day before the first day of the next month. If someone mentioned that in conversation, you might wonder if they were being facetious in saying something so obvious, but this identity is the key to a one-line solution of the problem.
With our sandbox dataset, the next month is
There is no completely free lunch here. We may still need to use the help for functions to find the function
3 Have daily dates; seek last day of current month
A common extra twist is that we have some daily dates in a variable (perhaps meaningful dates, perhaps arbitrary dates) and also wish to have the last day of the corresponding month in a new variable. Given that problem, we just need to convert it to the previous problem, and then we are done. Let us add the 15th of each month to the sandbox. Now the inverse function
4 But it must be a Friday (or some other day of the week)
Once a problem is solved, it often looks trivial, so let us try something harder.
Suppose there is an extra constraint—that we want the last Friday in a month. This will stand for all other such problems in which we insist on the last day identified in each month being a particular day of the week. In many fields, specific things happen on particular days of the week. You may know examples in your own field, whether it is economics, epidemiology, ecclesiology, eschatology, or something quite different. Even if you have never met this problem, keep reading for a little extra technique that will help if you do encounter it in the future.
If the code so far does yield a Friday as the last day in that month, then we are done for that month. If it is a Thursday, we need 6 days before; Wednesday, 5 days before; Tuesday, 4 days; Monday, 3 days; Sunday, 2 days; and Saturday, 1 day.
Writing out code for all seven cases would solve the problem. We might hope to find a simpler solution if we can. Done slowly or quickly, we need a function to find the day of the week. Scrutiny of
Across cultures, countries, and professions, there are many variations on which day of the week is considered first, or equivalently which day is last. Given any particular rule, we can still use
Let us do this slowly and then see if we can find a pattern we can exploit to simplify the code. If the last day of the month is Friday,
Now if you stare at the code, you should see first that seven lines can be reduced to three lines. If the last day is Friday, we are done; if it is a Saturday, we just subtract 1; otherwise, we can subtract the day of the week as given by
It is good to be cautious about whether that reduction is correct, so we have tried it both ways. Shortly, we will check to see that we get the same result.
We can reduce the code all the way down to one line. The subtracted correction can be thought of as the combination of two rules into one using
Even cleaner—at least if you are comfortable with remainders—is a formulation using
We can also test that using the equivalent function in Mata:
If you are new to Mata, what you see is creation of a column vector with values descending from 6 to 0, followed by use of
That last solution leads to an easy guess about the same solution for any day of the week to be the last reported date: just change 5 to the result from
Let us check that our different solutions match.
In this case, it is easy enough to scan the data to see that all is well. Programmatically, it is better to use [D]
Other problems with weekly dates, including day of the week, were discussed in previous tips (Cox 2010, 2012a,b).
5 Merits of modular arithmetic
In the movie Peggy Sue Got Married (1986), the 43-year-old Peggy Sue (played by Kathleen Turner) wakes up to find herself back in her past, just before she left high school. Faced with an algebra test, she tells her teacher: “I happen to know that in the future I will not have the slightest use for algebra, and I speak from experience.” (You can find a video clip at https://www.youtube.com/watch?v=-3eKzmozvrI.)
Usually, I dislike jokes against mathematics, but this one always makes me chuckle when I remember it. The serious point for us: what is the algebra that we will find use for in our work? It certainly includes modular arithmetic. Cox (2007) gave the following as general references: Graham, Knuth, and Patashnik (1994); Knuth (1997); and Biggs (2002). To those, I will add Stewart (1975), Conway and Guy (1996), and Gardner (1997) at the light, entertaining, or introductory end, and Boute (1992), Leijen (2001), and Dershowitz and Reingold (2012) if this is all standard stuff and you want to go deeper or further.
In the tip on uses of the modulus function (really, remainder or residue function), keywords were selections, sequences, and extractions. A keyword that deserves as much if not more prominence is rotations. In the third problem, the day of the week is returned as integers 0 to 6, and the last day of the month can be 0 to 6 days later than the last one acceptable, so the correction is a rotation of integers 0 to 6. More generally, whenever there is a rotation, it is likely that
Another example of rotation involving dates is wanting to plot seasonal data that are centered on Northern Hemisphere winters or Southern Hemisphere summers rather than on months of the conventional calendar year running from January to December (Cox 2006, 2015). If the response of interest is snow in Switzerland or sunshine in Sydney, the peak of interest will be around the turn of the calendar year, which would be better in the middle of your graph, not split between two ends. Given, say, a variable
Let us use Mata to think that through. Opening up with a
As before,
The trick here, as in many other problems, is that the remainder 0 upward is literally one step away from what you want. Adding 1 finally gets you a rotation from 1 to 12 to a new 1 to 12.
As in the previous section, there are solutions using other functions, such as
6 Counsel for coders
We can find simple morals in this tale that extend to many more problems.
Use sandbox datasets to find solutions.
People wanting this kind of calculation with dates often have large datasets, perhaps with many panels, irregularly spaced dates, and even yet other complications. Set your real dataset aside and make up a simple dataset for which you can check solutions. I started with one based on the observation number. With dates, it may be as or more convenient to use very recent dates so that you know the correct answer or can glance at an accessible calendar (say, on your phone or laptop) to check that the code is correct. If I had chosen for an example some dates that are very recent as I write, which is in April 2019, that would become less convenient during the time that this column may remain useful. The examples show that Mata too can be useful for play within sandboxes.
Know about functions and use the help to look for others. Be prepared to combine functions, typically by feeding the results of one function to another.
Stata has many functions. I would be surprised at any user outside StataCorp who had much need for more than a few of them. But it is worth occasionally scanning the help to find out about functions in your territory that might be useful. Alternatively, see Cox (2011) for a rapid survey of some personal favorites. Naturally, you might have seen quickly which functions to use in these problems.
Knowing several functions often means that you will know more than one way to solve a problem, which is always good news.
Experiment. Sometimes you may need to write out longer code before you can see how it can be shortened.
Solutions to the third problem using
7 Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
Supplemental Material
Supplemental Material, dm0100 - Speaking Stata: The last day of the month
Supplemental Material, dm0100 for Speaking Stata: The last day of the month by Nicholas J. Cox in The Stata Journal
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
