Abstract
In this commentary (which is by no means a review of chronotyping), I try to clarify the distinction between 2 different ways of assessing chronotype by questionnaires. The morningness/eveningness questionnaire (MEQ) determines daily preferences producing a score, and the Munich ChronoType Questionnaire aims to determine phase of entrainment producing a time. An understanding of the MEQ requires knowledge of its history. Although both have their respective validity, they mean very different things and should be used appropriately in studies with different aims and questions. Although they show correlations in population comparisons, they should not be amalgamated when reviewing the literature.
Typology has been a tricky business for some 2500 years—from classifying people by the color of their bile to defining their chronotype by daily preferences. Typology aims to qualitatively categorize rather than quantitatively measure. Over the years, this categorization has become quite complex, using questionnaires that probed many aspects of “the type in question.” This approach transitioned typology from sorting people into 2 categorical boxes (usually case/patient vs. control) to producing distributions of scores that positioned many different categories between extremes.
Chronotype as a Personality Trait
An example of the latter is the Morningness-Eveningness Questionnaire (MEQ) published by Horne and Östberg (1976). The MEQ is an extended translation of a Swedish questionnaire, first introduced by Oscar Öquist in his thesis (Charting Individual Daily Rhythms; Öquist, 1970), and was repeatedly modified over the years (see introduction in Horne and Östberg, 1976). According to his Swedish thesis, Öqvist’s aim was to separate “morningness” from “eveningness.” His first validations probed how good his instrument was in distinguishing these 2 opposed personality traits, which were first suggested three-quarters of a century earlier by Michael Vincent O’Shea (1900), a Wisconsin-based professor of education.
Interestingly, the father of morningness had strong convictions about teenagers: “The chief problem of parents and teachers in having youth keep reasonable hours arises in relation to the dance. In American life young persons have got into the habit of going late to their dances and staying until early morning hours. This practice, if persisted in, will work harm to body and character. No boy or girl in the teens should be up later than ten o’clock at night except on rare occasions” (O’Shea, 1920). It is quite remarkable that the difficulties of teenagers to follow the timetable of grown-ups (Carskadon et al., 1993; Roenneberg et al., 2004) was already discussed 100 years ago—well before the advent of social media, which are meanwhile commonly blamed for teenage lateness. It is even more astonishing that what I once have called the “disco argument” (Roenneberg, 2012) was already used back then, claiming that teenagers could easily fall asleep and get up early if they only wouldn’t go dancing.
A contemporary of O’Shea, Lewis M. Terman, educational psychologist at Stanford and the father of the IQ test, opposed O’Shea’s convictions: “The European custom of beginning school at 7 to 8 o’clock in the morning works great hardship, often causing the pupil to rush away to school in nervous haste and without breakfast. The American practice of beginning at 9 o’clock is far wiser, and should never be changed unless for very special reasons” (Terman and Hocking, 1913).
Initially, chronotyping wanted to separate 2 personality types (O’Shea, 1900). Later, Freeman and Hovland (1934) suggested 4 temporal types (based on daily performance profiles), but the famous Nathaniel Kleitman (1939) rejected this inflation, suggesting only 2 types, with intermediates being of “minor importance” (see also introduction in Horne and Östberg, 1976). Notably, all versions and variants of the MEQ produce a higher index (give you more points) for being early than for being late, thereby somehow being loyal to O’Shea’s convictions. It is important to appreciate the intellectual and moral environment that gave birth to the morningness concept, more than 70 years before Horne and Östberg published the English version of a questionnaire that is still being used today (for a recent review of chronotyping instruments, see Levandovski et al., 2013).
The 19 items of this psychological questionnaire probe different daily behaviors with the aim to classify people into the 2 categories—morningness or eveningness. The questionnaire introduced the concept of a “feeling best rhythm,” according to which subjects are asked to give their preferred bedtime, get-up time, or best time for physical exercise. Other aspects are appetite, alertness, tiredness and sleep inertia, or mental and physical performance. There are oddities in some of the original questions. For example, question 4 probed sleep inertia under “adequate environmental conditions” without specifying them. Even in the instrument’s current versions, question 9 probes how subjects would perform if asked by a male friend to engage in some physical exercise (“the best time for him is 7.0-8.0
That the MEQ aims to distinguish between only 2 personality types, morning and evening types, is emphasized by the introduction of “neither type” (in line with Kleitman’s views about intermediates), which people get assigned to when scoring around half-maximum points. In their discussion, Horne and Östberg (1976) state, “The intermediate group is probably made up of afternoon types and also a ‘both Morning and Evening type’. However the questionnaire was specifically designed to identify Morning and Evening types, and therefore at present the Intermediate group can only be defined as not being clearly within the parameters of either the Morning or Evening types.” Despite this “specific design” of the MEQ, it is routinely being used as if it represents a continuous quantitative trait with a more or less normal distribution.
Chronotype as Phase of Entrainment
Distinguishing between different personality types is a method of traditional personality psychology while biology-based disciplines (including medicine and biological psychology) aim to define phenotypes based on more objectively measurable traits. The success of circadian biology in discovering the molecular nuts and bolts of the circadian system is based on the exquisite description of numerous quantifiable phenotypes by our pioneers. An excellent historical overview can be found in 2 edited books: the Proceedings of the 1960 Cold Spring Harbor Conference – especially in Pittendrigh’s (1960) “Generalizations” – and the first edition of the Handbook of Behavioral Neurobiology on Biological Rhythms (Aschoff, 1981). One of these quantifiable phenotypes is “phase of entrainment” (Ψ; see Aschoff et al., 1965), which describes the difference between a given phase of a circadian rhythm (e.g., the trough of core body temperature or the midpoint of sleep) and that of the zeitgeber (e.g., dawn or mid-dark).
The Munich ChronoType Questionnaire (MCTQ; Roenneberg et al., 2003) was developed to assess chronotype based on phase of entrainment rather than on preferences. While the outcome of the MEQ is a score that represents morningness, the result of the MCTQ is a local time based on the midpoint sleep on free days, corrected for oversleep (MSFsc; for analysis algorithms, see Supplementary Material in Roenneberg et al., 2012). Although the qualitative “daily” preferences and the quantitative “phase of entrainment” correlate (Zavada et al., 2005), their aims are quite different, which I will illustrate using body height as an example. A quantitative approach simply wants to find out how tall people are, either by asking or—even better—by measuring them, while a qualitative approach might develop a TSQ, a Tallness-Shortness Questionnaire. At first glance, this concept seems absurd, but as Martin Ralph pointed out (personal communication, Mendoza, Argentina, 2013), such an instrument serves an important purpose: it can be used to characterize how people compare their body height to others, how that makes them feel, and what their preferred height is (which they could come closer to by, for example, wearing high heels). All these psychological traits (opinions, wishes, aspirations, preferences, yearnings, etc.) would never be discovered by simply assessing or measuring body height.
Since the MEQ encourages people to compare themselves with others, it will be less sensitive to differences between groups or cultures. The answers to “What on Earth is chronotype?” compared to “What on Mars is chronotype?” would certainly be very different according to the MCTQ (since the phase of entrainment depends on the length of the entraining cycle, which is longer than 24 h on Mars). But the usage of the MEQ within a futuristic human colony on Mars could lead to quite similar results compared to the same colony on Earth. Whether this is advantageous or not depends on the aim of the study.
When to use What
It is essential to know what instrument to choose for a given research question. Although the preference to wear high heels surely correlates (negatively) with a person’s actual body height, one should not try to infer body height from this preference, and inversely one should not assume that body height reliably predicts this preference. When it comes to chronotyping, some researchers seem to be confused about the different concepts behind daily preferences and phase of entrainment. The MEQ is often used to answer questions that would benefit from the assessment of phase of entrainment rather than from determining daily preferences. I once read a poster describing results of a study that had used both the MEQ and the MCTQ. It referred to the corrected mid-sleep times (MSFsc) as actual morningness. Studies that investigate the influence of chronotype on some other human trait or factor should refer to and discuss the results of previous studies. But these considerations should clearly separate whether the MCTQ or the MEQ had been used since they use very different methods, have very different aims, and are thus not interchangeable. Such amalgamations and apple-orange comparisons have to be avoided at all costs! Morningness refers to daily preferences and MSFsc to phase of entrainment. One instrument produces a score (traditionally biased by moral convictions; see above) and the other a local time. ‘Score’ refers to the notes of a piece of music, or to how many points (goals, runs, etc.) someone achieves in a contest, or to how many correct answers achieves on a test or exam (www.merriam-webster.com). It is therefore surprising that it is also being used in conjunction with the MCTQ (Kantermann et al., 2015).
The differences in the resulting dimensions, score versus time, entail another difficulty. Although MEQ score and MCTQ time correlate, there is no way of testing how well one represents the other. A simple way of testing for the goodness of a mutual representation between A and B is to determine the slopes of both regressions, A = f(B) and B = f(A), and test how close the slope of the bisecting line approaches 1. Mid-sleep times are a fairly good representation of dim-light melatonin onset (Burgess and Eastman, 2005; Crowley et al., 2014; Kantermann et al., 2015; Kitamura et al., 2014; Simpkin et al., 2014; Wright et al., 2013), but such a test cannot even be performed between the MEQ and some other objective measurement that produces a local time. Theoretically, an excellent correlation between the MEQ score and sleep timing (MSFsc) would even result if the entire range of points toward morningness (total of 71; between 16 and 86) would translate to differences in MSFsc of less than an hour. In reality, of course, the MEQ score translates to a wider range of mid-sleep times. I will use 3 studies that applied both the MEQ and the MCTQ in different cultures to show the problem of attempting to make a conversion between the two. In a Brazilian study (Miguel et al., 2014), 10 points on the MEQ scale corresponded to 44 min of difference in mid-sleep times; in a Dutch study (Zavada et al., 2005), they corresponded to 51 min; and in a Japanese study, 64 min (Kitamura et al., 2014). Theoretically, predictors of chronotype should cover the 24 h of the day (which can easily be accommodated when quantifications use local time). The full range of MEQ scores, however, only covers about 7 h (and we cannot be sure whether the scores between 16 and 86 transform linearly onto time).
In summary, the 2 instruments measure different things and should therefore not be converted or simply interchanged. The MEQ should be applied if the study is interested in psychological traits differences, and the MCTQ should be applied whenever chronotype is meant to represent a circadian trait (phase of entrainment or internal time; for limitations of the MCTQ, see below).
Type Relativity
We are often asked what cutoffs should be used in the MSFsc distribution to distinguish extreme from intermediate chronotypes. This also holds for the classification of delayed or advanced sleep phase syndromes (DSPS, ASPS). MSFsc times are a continuous trait, and their distribution is not only age and sex specific but also population specific. Classifications into groups are therefore only meaningful (although arbitrary) within a given cultural/geographical population.
The MSFsc time of the earliest lark in a cohort of teenagers would be far from being an outlier in an elderly population, and that of the earliest elderly in Central Europe would find many companions in the general population of India (Roenneberg and Kumar, unpublished data). DSPS and ASPS are often compliance problems with health consequences (rather than health problems per se), because people are awake and sleep at times when the majority of the population is not. If we knew all the factors that determine chronotype (latitude, longitude, climate, time of year, degree of industrialization, position within time zone, rural vs. urban, age, sex, zeitgeber strength, and potentially more that we don’t know of yet) and if we also knew all the correction algorithms for these factors (most would be somehow linked to light exposure), we could normalize distributions. We might then find that the earliest bird in New York would still be the earliest bird in a village outside of Mumbai. But as long as we cannot perform these normalizations, we should be careful with compartmentalizing the MSFsc distribution into categories that separate patient from normal. The position of an individual within such a normalized distribution (relative to others) would give us a clearer insight into genetic contribution to this trait.
For statistical analyses of a given study, we often compare the early, intermediate, and late third of the MSFsc distribution (e.g., Juda et al., 2013a; Vetter et al., 2011; Vetter et al., 2012). In all of these cases, the analyses concerned homogeneous populations assessed in the same geographical region and at a similar time of year. In one study, we even used the thirds of the MSFsc distribution to adjust shift workers to different schedules based on their chronotype, again relying on the relative distribution of chronotypes (Vetter et al., 2015).
Internal Time
Although MSFsc is a good surrogate for phase of entrainment (internal time), the MCTQ has several limitations. Its calculations are based on sleep timing, yet sleep timing is not only controlled by the circadian clock but also by a homeostatic component (Borbely, 1982; Daan et al., 1984). Although the correction of MSF to MSFsc involves sleep deprivation and recovery sleep – and thus considers homeostasis, the MCTQ does not provide information about the relative contributions of the 2 components to chronotype. In addition, we have known ever since the observations of internal desynchrony in the bunker that the sleep/wake cycle can be quite distinct from physiological rhythms (Aschoff, 1965). On the other hand, shift workers show remarkably stable MSFsc times regardless of prior shift schedules and despite big differences in both wake times and sleep durations (Juda, 2010; Juda et al., 2013a, 2013b). This finding suggests that the MCTQ does to a large part assess internal time.
The chronotype calculations of the MCTQ rely both on structured work schedules (including those of shift workers: Juda et al., 2013b) and on sleep times on days, when people can fall asleep and wake up by themselves (usually work-free days). The latter is necessary to estimate as close as possible the participant’s circadian sleep window (surrogating internal time), and the former is necessary to calculate the weighted average week sleep duration, used for determining MSFsc (for analysis algorithms, see Supplementary Material in Roenneberg et al., 2012). In its present form, the MCTQ cannot successfully chronotype people, who do not comply with these prerequisites.
Since the MCTQ’s key questions demand answers in local time, participants must also have a clear notion of and access to local time, which excludes many patients and people living in preindustrial cultures. Although it is true that the MEQ is easier to use for these populations than the MCTQ, the morningness score does not—as discussed above—assess phase of entrainment. In lieu of a clock-time-free MCTQ, internal time of these people can only be assessed by actimetry or by dim-light melatonin onsets. It should also be noted that the MCTQ should not be used to chronotype people suffering from severe sleep disturbances.
Whenever a study measures a biochemical, metabolic, neuronal, or cognitive function in humans across the 24-h day, the results should be analyzed not only according to local time but also in reference to sun time, to individual internal time as well as to time awake. If one, for example, would investigate the mistakes people make when typing a given text into the computer at different times of day, I would take a bet that (1) the number of mistakes would systematically vary over the course of 24 h and (2) that these variations would be chronotype dependent. To find out the time when people are “having trouble typing,” I would certainly analyze the results on all 4 time-axes: 1) local time is obviously available from the experimental protocol, 2) the respective sun time can be looked up or calculated, 3) time awake should be determined by asking participants when they woke up, and finally 4) internal time (expressed as hours since MSFsc), which can be readily assessed by the MCTQ.
Footnotes
Acknowledgements
I thank Céline Vetter, Lena Keller, Luisa Klaus Pilz, and Serge Daan for their very constructive comments on the manuscript.
Conflict of Interest Statement
The author is also one of the authors of the MCTQ. Otherwise the author has no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
