Abstract

A simple way of adding a variable nonlinearly to a model is to transform that variable. Common transformations are adding a quadratic term or taking a logarithm, but other transformations are also possible, such as taking the cube root (Cox 2011) or adding splines (see [R]
Sometimes, a continuous variable consists of qualitatively different segments. A good example of such a variable is the number of hours a respondent usually works per week. In many countries, numbers less than 40 on such a variable represent respondents who work part-time, the number 40 represents respondents who work full-time, and numbers above 40 represent respondents who routinely work overtime. Using

Linear effect of hours worked per week
However, we might hypothesize that working “normal” hours makes it easier for companies to standardize the allocation of tasks to the workers. As a consequence, companies might be willing to pay a premium for working full-time. This means that working more hours may increase average hourly wage, but there is an extra “jump” at 40. To test that, we can add both the variable

Linear effect of hours worked per week with a jump at working full-time
Sometimes, overtime is paid at a higher rate. So we might expect that working more hours generally increases the average hourly wage, but after 40 hours there is an extra jump that does not immediately disappear like before but persists. To test that, we can introduce the variable

Linear effect of hours worked per week with a persistent jump for overtime
We forgot that not everybody gets his or her overtime paid. For those who get paid for working overtime, overtime will increase their average hourly wage. However, for those who are not paid for overtime, overtime will decrease their average hourly wage. We might expect that unpaid overtime happens in professions where people are intrinsically motivated (for example, academics), so they may work long hours. Whereas paid overtime happens in occupations where people are less intrinsically motivated, in which case both the workers and the employers have an incentive to keep the amount of overtime within bounds. So we hypothesize that the group of respondents working small amounts of overtime mainly consists of people getting paid overtime, while the group of respondents working large amounts of overtime consists mainly of people who do not get (completely) paid for overtime. In that case, we would expect a sharp increase in average hourly wage at 41 hours per week but a decrease after that. This is implemented by including an interaction between the overtime indicator variable and the

Different linear effects of hours worked per week for respondents working overtime or not with a jump
By combining continuous and indicator variables, one can allow for nonlinearity by adding spikes, persistent jumps, or complete breaks to the regression line. This flexibility allows one to tailor the kind of nonlinearity in the model to the research question and what one knows about the variables involved with only a few parameters. Moreover, those parameters are easy to interpret.
