Abstract
Through an analysis of the U.S. Department of Education’s High School Longitudinal Study, this article offers models of educational attainment that assess the explanatory value of four complementary measures. On the 11th grade survey instrument, questions are offered on both educational aspirations and expectations, which is the first time since 1972 that both questions have been asked on a comparable longitudinal survey in the United States. The instrument then poses two questions that elicit probabilistic forecasts of future educational attainment, which have never been asked of high school students on such surveys. The empirical analysis in this article shows that the first two questions affirm the continuing relevance of the status socialization theory of educational attainment, and related empirical modeling that has been shaped by it, but the two new questions provide support for embracing more recent models that focus on belief formation and uncertainty.
Explanations of educational performance and attainment have proliferated in the social sciences, but scholarship on their implied mechanisms has not kept pace. Consider responses to the question “As things stand now, how far in school do you think you will get?” In the past five decades, this item has appeared on the questionnaires of many longitudinal surveys of adolescents and young adults, and for most researchers the responses are labeled educational expectations. Their predictive utility has been demonstrated in hundreds of published articles. Even so, no consensus exists on how to interpret the associations that educational expectations have with outcomes of interest, or whether alternative measures of future orientations might provide complementary, or superior, explanatory value.
A more complete assessment of mechanisms such as this one is now possible because of the fielding of the High School Longitudinal Study (HSLS) of the U.S. Department of Education (USDOE). On its 11th grade survey instrument, an educational-expectations question was embedded within a well-deigned, four-question sequence, beginning with an idealistic educational-aspirations question not asked since the 1970s and then ending with two new probabilistic forecasts. In this article, I analyze this sequence of four questions and how they predict patterns of high school graduation and college entry. I focus on the value added by the two new questions that elicit probabilistic forecasts.
In the remainder of this introduction, I fix attention by first detailing the four survey items. I then summarize relevant background literature to motivate an empirical analysis. I conclude by specifying an analysis plan and research questions. After presenting the results, in conclusion I discuss implications from the literature on survey cognition and argue for deeper engagement with models of educational attainment that focus on how forward-looking commitment is shaped by uncertainty.
Four Measures of Beliefs about Future Educational Attainment
I offer general information on the data source later, and in this section, I present only an introduction to the four measures from the survey instrument that serve as key predictors for the empirical analysis. Under the heading “Plans and Preparation for the Future,” the 11th grade HSLS student questionnaire offers two questions:
If there were no barriers, how far in school would you want to go?
As things stand now, how far in school do you think you will actually get?
The response options are ordered lists of ascending levels of education: 7 categories for the first question and a more detailed 12 categories for the second. Both questions offer an escape route as the last response option: “You don’t know.” These two questions yield measures that most researchers, as discussed later, would label educational aspirations and educational expectations. 1
Immediately after these questions, the HSLS offers two complementary questions, at least partly in response to interest from the research community in measures of future orientations that might be able to better encode uncertainty:
How sure are you that you will receive a high school diploma?
How sure are you that you will pursue a Bachelor’s degree?
The response options for these two questions are natural language referents to an implicit scale of probability: “Very sure you will, “You probably will,” “You probably won’t,” and “Very sure you won’t.” Accordingly, I refer to these two questions as “probabilistic forecasts.”
As survey items, all four questions are well designed. The expectations question benefits from the return of the traditional aspirations question, absent from such surveys since the 1980s (see the Supplement for its history). In particular, the pairing of “no barriers” in the aspirations question with “will actually get” in the expectations question establishes a strong contrast structure, which the literature on survey design implies will sharpen the measurement of educational expectations as realistic appraisals. 2
The two new probabilistic forecast questions that follow are opportunities for students to then signal the degree of their uncertainty about the likelihood of progressing through two highly salient transitions on the near horizon. This opportunity is particularly valuable for students who offer a “don’t know” in response to either of the prior two questions.
For the probabilistic forecast questions, all students are presented with a strong forced-choice design. If they are not “very sure” in one direction or the other for the referenced educational transition, they must choose either “probably will” or “probably won’t.” No satisficing middle point, pegged to an implicit probability of 0.5, is offered for either question. And no route to escape through a separate “don’t know” option is offered either.
Taken together, and with the results from the empirical analysis to follow, I argue that this battery represents a substantial upgrading in our capacity to elicit from students their idealistic ambitions, their realistic appraisals, and their underlying uncertainty about actual future behavior. These responses can then improve the prediction of postsecondary behavior, suggesting lines of development for improved models of educational attainment.
Status Socialization, Alignment, and Prefigurative Commitment
The second of the four questions presented earlier yields a variable for empirical analysis, usually labeled educational expectations, that is among the most widely analyzed in the sociology of education. For some recent work that has examined its patterning with the same data source considered in this article, see, for example, Ahearn (2021), Carolan (2017), Renzulli and Barr (2017), Schneider, Kim, and Klager (2017), and Schneider and Saw (2016). 3 Although all of these articles offer insight into the patterning of educational expectations, none draws on the information yielded by either of the probabilistic forecasts that will be the focus of this article.
The appeal of the expectations measure as an object of analysis follows from the depth of the past literature, which has moved from an early focus on socialization processes toward those that emphasize adaptation in recognition of structural constraint. As I detail in this section, by revisiting both classic and contemporary arguments about the relevance of aspirations and expectations, the introduction of the two new probabilistic forecast questions was a natural next step in this long tradition of analysis.
Status Socialization
The status socialization model of educational attainment maintains that adolescents are the developed products of a stable socialization regime (see Spenner and Featherman 1978 and Haller 1982 for the full argument). 4 Individuals enter adolescence with varying levels of measurable cognitive skill and family socioeconomic standing, and each has a demonstrated record of academic performance from childhood. Taking in these inputs, parents, teachers, and peers form expectations for students’ transitions to adulthood. Students then conform to these expectations, adopting achievement motivations that set them on pathways toward alternative educational and occupational destinations.
The status socialization model remains a key touchstone in the sociology of education for two reasons. First, it provides a strong rationale for the measurement of educational expectations and then studying their connections to both family differences in socioeconomic status as well as subsequent behavioral outcomes. Second, subsequent research has shown that it is clearly deficient as a complete model, being simultaneously too shallow and too rigid. As a result, it is usually invoked in the current literature to justify newer perspectives, ones that are pitched either as more deeply penetrating, more expansive, or both.
The most trenchant critique was offered at the time that the status socialization perspective was in development, published first in English and in full form as Bourdieu (1973). Students’ futures are so strongly determined by class conditions that it is an ideological trap to view their aspirations or expectations as meaningful components of a mechanism of upward mobility. Using the later semantics of Bourdieu’s work, the position is that aspirations and expectations are by-products of embodied dispositions that are themselves shaped by how the laws of competition in the education system map onto a class differentiated distribution of cultural capital. As a result, status socialization dynamics, which purportedly involve the efficient construction of aspirations and expectations in response to both significant others and past academic performance, are better interpreted as epiphenomena that offer no genuine insight about the true causal model of actual educational attainment. Worse yet, for Bourdieu, attaching explanatory relevance to such variables promotes an ideology of fair competition that is inconsistent with the prevalence class immobility.
This strong critique suggests that measures derived from the four questions earlier will align very closely with each other and with measures of socioeconomic position. They will do so precisely because they all strongly predict patterns of educational attainment. In what follows, I show a variable pattern of responses across the four questions, each configuration of which predicts a different pattern of educational attainment, and none of which is explained away by measures of socioeconomic position, either at the family level or the school level.
Accordingly, more recent perspectives, which are consistent with such patterned variability, are more useful for developing interpretations to move the literature forward. I offer two related perspectives next, each of which motivates the following analysis.
Alignment
First, for the alignment model, the assumed coherence of the status socialization model is inaccurate for many adolescents (Schneider and Stevenson 1999; see also Schneider et al. 2017; see also Ahearn 2021 and Renzulli and Barr 2017). Rather than being socialized into a consistent set of educational and occupational plans, students are susceptible to other developmental processes that can cause them to develop inconsistent plans. The archetype of concern is a student who projects to enter a high-status job in adulthood but plans to embark after high school on an educational pathway that is unlikely to result in eventual entry into the occupation. This perspective is different from, but not necessarily inconsistent with, an older claim in the literature that many individuals lower both their educational and occupational expectations because they believe, correctly, that they are likely to encounter opportunity constraints (e.g., Kerckhoff 1976). The focus of the alignment model, instead, is on the prevalence of incoherence in the planning process itself, which generates bundles of plans for many students that, to an outside analyst, may seem misguided and irrational. 5
Commitment
Second, for the prefigurative and preparatory commitment model, the underlying focus is on the latent uncertainty that is generated by deficits and contradictions in available information (see Morgan 2005; see also Morgan et al. 2013a, 2013b). Some students see the paths that are open to them and can judge accurately the costs and benefits of pursuing each. These students can be characterized as subject to a stable regime of developmental socialization, and in fact they may project a hyper-rational set of future expectations, labeled by Morgan (1998) as rational fantasies.
But for many other students, the signals and cues are less clear. These students are often able to detail normative understandings of the value of alternative educational pathways, but they remain less certain in their relevance to them personally, based on the uncertainties in their underlying beliefs. Will they be able to make it through college? Will college give them the advantage in the labor market that others claim is a sure thing? Is a four-year degree really more valuable than a technical college alternative?
In these cases, plans are more variable and uncertain, even if still ambitious. And the uncertainty inherent in them lowers preparatory commitment toward the most demanding educational pathways that require such commitment for entry and subsequent success. This perspective grows out of a prior concern with how to weaken conceptually the assumed motivational power of educational expectations in light of emergent empirical patterns (e.g., Alexander and Cook 1979; see also Bozick et al. 2010).
In relation to the alignment model, the prefigurative commitment model focuses more on the underlying uncertainty that is present in students’ beliefs about postsecondary education and labor market opportunities, using the structure of rational actor models as a point of departure (e.g., Breen and Goldthorpe 1997; Manski and Wise 1983). From this perspective, situated uncertainty can generate misaligned bundles of plans, even if students are not systematically misinformed. Some students, for example, may expect to enter a high-status occupation in adulthood but only commit prefiguratively while in high school to everyday behavior that routes them to a suboptimal educational pathway. Still, to them, their specific educational plans are a plausible, if uncertain, step toward their envisioned high-status occupation.
Thus, at heart, some of the misalignment that is evident in the results produced in the alignment tradition, summarized in the foregoing, might be better seen as a type of temporal misalignment. Seemingly contradictory plans are, in many cases, still buttressed by underlying optimistic beliefs in the capacity to beat the odds through persistence, even if embarking on a nonstandard pathway that is, at least nominally, a flexible one. Still, preparatory commitment in the near term may be lower than required to ensure success in the long run (see Morgan et al. 2013b).
Analysis Plan and Entailed Research Questions
In this article, I focus on a key educational attainment transition that clearly differentiates life course patterns: entry into any type of college after high school graduation and, within this group, entry into a four-year college in the first enrollment window after high school graduation. 6 Overall, the goal of the analysis is to assess whether not the two new probabilistic forecast measures introduced for the HSLS offer predictive value, and then, if so, what the implications for theories of educational attainment may be. 7
To set a baseline for the analysis and introduce the outcome measures, I first demonstrate that the aspirations and expectations elicited by the first two HSLS questions predict college entry patterns, just as the existing literature from the 1970s suggests that they should. 8 The core of the analysis then pursues answers to three questions:
Do students offer probabilistic forecasts that are reasonable?
Do students’ probabilistic forecasts predict patterns of college entry?
Does the predictive capacity of students’ probabilistic forecasts survive after adjustments are introduced for socioeconomic status, other background characteristics, and both aspirations and expectations?
I offer an analysis that provides affirmative answers to all three of these questions. And, in conclusion, I then argue that the four-question battery is a clear improvement over prior measurement strategies in the same survey program. I also discuss throughout why it is reasonable to regard the new probabilistic forecast questions as measures of latent uncertainty that can be used to improve models of educational attainment.
Data and Methods
The HSLS sampled first-year public and private high school students in fall 2009 and administered questionnaires to students, parents, teachers, counselors, and school administrators (see Ingels et al. 2011). Students and parents were first followed up in spring 2012, when most were in their junior years (see Ingels et al. 2014), and in later years to measure patterns of postsecondary activity. The timing of the first HSLS follow-up survey is notable. Unlike its predecessors, when data collection was scheduled for 10th and 12th grades only, the HSLS offers measurement in the spring of the junior year, which is now thought to be a crucial moment in which preparatory college-going behaviors are more explicitly initiated. In contrast, 10th grade is too early for most students, and 12th grade is too late for many students.
For analysis, I use the panel sample from the first two waves, augmented with college entry and attainment measures collected in subsequent years. I use the base-year and first follow-up panel weight (W2W1STU), modified with an estimated ratio adjustment, using an auxiliary logit model for nonresponse to the items analyzed. In other words, the data distributor’s panel weight is used to adjust for nonresponse due to attrition between the first and second wave, and my ratio adjustment accounts for additional unit-specific missing data on key survey responses, including college entry patterns obtained for nonattriters in subsequent rounds of data collection.
I report sandwich variance estimates that are heteroscedasticity-consistent and which are further inflated by an adjustment for the clustering of students within the original sampled schools. I used sequential equations to impute item-specific missing values for the background adjustment variables, relying for these imputations on the first-stage imputations that were first performed by the data distributors. I provide other modeling details, such as school-level fixed-effect adjustments and inverse probability of treatment (IPT) weighting, in the following and in the accompanying Appendix as I present the relevant models.
Results
Table 1 provides the HSLS pattern of educational attainment progression considered in this article, weighted to the full population of students who entered high school in fall 2009. As shown in the first row, 86.2 percent of students graduated high school on time in the spring or early summer of 2013. For college attendance, 76.1 percent of students entered some form of postsecondary education by 2016, and 40.6 percent entered four-year colleges in the first enrollment window after graduating from high school. Although sequentially ordered for many HSLS students, these three outcomes are defined and coded separately. For example, a small group of students who did not graduate from high school on time were still able to enter a four-year college immediately after graduating from high school.
Educational Attainment Progression by 11th Grade Levels of Educational Aspirations and Expectations.
In subsequent panels, Table 1 then presents these same three outcomes, conditional on students’ aspirations and expectations. The first two panels show that levels of aspirations and expectations both predict educational attainment progression, but expectations are slightly more predictive than aspirations. Furthermore, the 9 percent and 10 percent of students who select “don’t know” for aspirations and expectations respectively (see the marginal distributions for each in Table A1 in the Appendix) have lower levels of subsequent attainment than those who aspire or expect to obtain bachelor’s degrees.
The last panel of Table 1 uses a 10-category cross-classification of aspirations and expectations to subclassify the sample. 9 Students whose expectations fall below their aspirations because of the “barriers” referenced in the aspirations question are less likely to enter college. For example, 90.0 percent of students who aspired and expected to obtain at least a bachelor’s degree subsequently enrolled in some form of postsecondary education. However, among those who aspired to obtain at least a bachelor’s degree but who also expected a lower level of educational attainment because of barriers, only 59.2 percent enrolled in some form of postsecondary education.
Altogether, Table 1 demonstrates that these two measures predict educational attainment progression as the classic literature from the 1970s suggests that they should, and these patterns hold for this more recent cohort of students, regardless of how their perceptions of their future behavior may have been shaped by an emergent college-for-all ethos. Aspirations and expectations can still be interpreted as two separate predictors:
Table 2 presents the same three outcome measures, but now conditional on the two probabilistic forecasts, to address the first and second research questions. 10 The results show that students can forecast their futures in a probabilistic fashion, and their answers predict their subsequent behavior. For example, among students who were very sure they would pursue a bachelor’s degree while in 11th grade, 91.8 percent enrolled in some form of postsecondary education. In addition, 61.4 percent of these students entered four-year colleges immediately after graduating from high school. These rates decline steadily with the probabilities implied by the remaining response options, such that among those who are very sure they will not pursue a bachelor’s degree in 11th grade, only 24.5 percent enrolled in any type of postsecondary education, with only 3.6 percent entering four-year colleges immediately. A similar pattern is present for the relationship between forecasts of high-school graduation and subsequent educational transitions.
Educational Attainment Progression by 11th Grade Probabilistic Forecasts.
For the first and second research question, it is therefore reasonable to conclude that these two survey items generate interpretable response patterns, suggesting that the “how sure” question structure, with response options based on “very,” “sure,” and “probably” semantics, appears to work sufficiently well. Accordingly, the new probabilistic forecasts have clear face validity, suggesting that students are able to respond as the survey designers hoped they would.
For the third research question, alternative modeling possibilities are available. I proceed shortly to modeling the educational transitions directly, answering the third research question explicitly. But I also offer in Tables A3 and A4 an entropy-based analysis that shows that (1) the probabilistic forecasts contain information that is not already embedded in the measures of aspirations and attainment, and (2) the forecasts’ unique predictive capacity is located disproportionately in the categories of “don’t know” and the categories for less than a bachelor’s degree for both educational aspirations and expectations. The implication of this patterning is that the forecasts are most uniquely predictive for students at the margin of a college enrollment decision.
Table 3 provides four linear probability models for the two postsecondary education outcomes (see Table A5 for analogous models for the on-time high school graduation outcome, which provides further support for the general conclusions here). The first model, with coefficients multiplied by 100 to furnish a percentage metric, uses only the probabilistic forecast of the sureness of pursuing a bachelor’s degree. The reference category for the four categories is “very sure you will,” and thus the three coefficients presented in the table align exactly (subject to rounding error) with the percentage differences in the second and third columns of the second panel in Table 2.
Linear Probability Estimates of the Differences in College Entry Patterns by 11th Grade Probabilistic Forecasts (Percentages).
Models 2 through 4 then add additional covariates. The first set, used for models 2 and 4, includes the five core variables for socioeconomic status (parents’ education and occupational prestige, as well as total yearly family income), an indicator variable for a two-category self-identified gender, and six indicator variables for a seven-category variable for self-identified race/ethnicity. The second set, used for models 3 and 4, includes 24 indicator variables for the 25 cells of the fully cross-classified variables for aspirations and expectations. When specified, each set of covariates reduces the net coefficients for the probabilistic forecasts, but even in model 4 the coefficients remain substantial and are many multiples of their standard errors.
For Table 4, I present the same four models in Table 3 for the same two outcomes, but for the new augmented models I also specify fixed-effect parameters for schools (and see Table A6 for an analogous augmentation of the results in Table A5 for high school graduation). The HSLS sample is a two-stage sample, and the 15,586 students analyzed in this article are nested within 940 randomly sampled high schools, both public and private. The four models offered in Table 4 for each outcome fit 939 parameters to partial out all variation across the 940 schools from the data, leaving only pooled within-school variation for the linear probability models.
Fixed-Effect Linear Probability Estimates of the Differences in College Entry Patterns by 11th Grade Probabilistic Forecasts (Percentages).
Compared with those in Table 3, all of the coefficients in Table 4 are smaller, as expected. These declines are evidence that school-level variation is relevant. However, the remaining within-school associations between forecasts and outcomes are not eliminated, and thus the net associations in Table 3 cannot be attributed to unmodeled between-school differences, whether those are differences in school quality or average school differences in resources, such as family wealth or neighborhood safety. The results in Table 4 imply that the net associations in Table 3 are mostly produced by variation across individuals educated within the same schools. 11
How strong of an interpretation can we give to the results in Tables 3 and 4? Adopting the descriptively specified semantics of the second and third research questions presented earlier, the answers to each are clearly “yes.” The results show that when net associations are obtained, using covariates for the key the background variables of status attainment research (socioeconomic status, gender, and race/ethnicity) as well as their favored mechanistic predictors (aspirations and expectations), the probabilistic forecasts still have substantial predictive power. When the parameterization is expanded to purge all between-school variation, most of the predictive power of forecasts remains.
Accordingly, the results in Tables 3 and 4 can be interpreted as support for the value of the probabilistic forecasts, insofar as I am able to demonstrate a robust association that cannot be accounted for by other widely deployed predictors of educational attainment. Nonetheless, the results should not be interpreted in a more structural sense. Although it is likely uncontroversial to assert that the socioeconomic status, gender, and race/ethnicity are factors that are causally prior to the probabilistic forecasts, similar uncontroversial orderings for the forward-looking beliefs measured by all four questions are, at this stage of research, still unavailable.
Conclusions
The 11th grade educational aspirations and expectations of HSLS students provide informative distributions that predict patterns of subsequent educational attainment. Students’ aspirations and expectations are neither too high nor too uniform in the “college-for-all” era that one can conclude that they lack validity as measures of students’ own idealistic preferences and realistic appraisals. Instead, both measures complement each other, and the HSLS survey designers were wise to rehabilitate the aspirations question to reestablish the complementary value of each.
Even so, students’ 11th grade probabilistic forecasts of their future education, through reflection on two transitions on the near horizon, contain additional information that cannot be accounted for by either aspirations or expectations. The questions elicit responses that have clear face validity, implying that these comparatively new questions are not too cognitively demanding for adolescents to manage. More important still, the probabilistic forecasts appear to contain information on underlying uncertainty about constraints on future outcomes, or the implications of expected shortfalls in preparation, that are distinct from students’ reasons for lowering their expectations relative to their aspirations. Accordingly, the probabilistic forecasts predict patterns of college entry, net of adjustments for background characteristics, race and gender self-identifications, school differences, aspirations, and expectations.
These results suggest that we have more work to do on understanding how adolescents plan for their futures and whether they do so in a way that shapes whether they decide to pursue schoolwork in the near term, as in the literature on “lost talent.” 12 Unlike what some discussions have asserted in the past, these variations cannot be modeled simply by measuring structural differences in opportunities for students who attend different types of schools or who identify with different gender, ethnic, and racial categories. If such strong structural determinism was valid, the net associations presented in this article for the four measures would greatly diminish when parameters are fit for groups of respondents defined by self-identified gender and race/ethnicity, and when school-level determinants are purged from the models with fixed effects. The associations do decline in line with some level of structural constraint on opportunity, but not strongly enough that we can forego deeper investigation of the cognition involved as students reflect upon and then predict their futures for us.
Discussion
Although the empirical results just provided offer compelling evidence of the predictive utility of probabilistic forecasts, the results themselves do not suggest why this is the case. I conclude with two lines or argument, first applying evidence from the literature on survey cognition to extend the interpretations offered so far, and second discussing how the two new measures may be useful for elaborating a modeling approach that gives greater scope to uncertainty of beliefs and how it shapes current and future attainment-relevant behavior.
Interpretations Suggested by the Literature on Survey Cognition
The shift for the HSLS survey instrument was to move away from a single educational expectations measure to the well-conceived and well-ordered battery with all four questions analyzed in this article. I have made the case that the four-question battery performs well in an empirical analysis, insofar as the three additional questions enhance prediction. Having shown this result, the next natural question is: How should we think about what students are attempting to convey to us as researchers when they answer these questions?
Alongside advances in cognitive science, survey methodologists have developed a literature on the cognition of survey response (for a review, see Tourangeau and Bradburn 2010). Although little of this literature has been developed from a direct examination of questions just like the four analyzed for this article, the literature is rich enough that it suggests clear implications for how students can be assumed to approach and answer the four-question battery.
First, most students cannot reply to the questions by retrieving responses from memory as simple recall tasks. Consider a contrasting example. Earlier in the same questionnaire, the HSLS students are asked: “How many times did the following things happen to you during the last 6 months you were in school?” Items are then listed—“You were put on an out-of-school suspension or probation from school,” “You were arrested,” and so on—with response options of “Never,” “Once,” and “More than once.” The survey response literature suggests that students have precisely this type of information at their disposal, can retrieve it efficiently, and then can formulate answers by fitting their recalled information into one of the three response options. These recall questions may still generate incorrect answers, of course, because students may not be able to apply the 6-month restriction properly. But there is a clear mapping from salient information on past events that is stored in memory to the response options of the question.
The four HSLS questions are more cognitively challenging. To answer the first two, students must interpret “if there were no barriers,” “want to go,” “as things stand now,” and “will actually get.” Although it is possible that some students have answers to sufficiently similar questions already stored in memory, it is much more likely that they must form beliefs about barriers to educational attainment, both general and personal, as well as their own life-course goals, as they see them at the time. They likely have many relevant pieces of information that they could retrieve, as well as higher-order beliefs that are relevant, such as (1) subjective beliefs about the costs and benefits of alternative courses of future behavior and (2) what they think others believe they should and will actually do. To answer the questions, students must retrieve these types of information and beliefs, likely sampling from distributions of both in ways that may be context dependent, on the basis of prior questions and external factors relevant on the day of survey administration. They then must construct answers that fit the response categories offered. If they cannot do so, perhaps because their retrieved information and beliefs are too contradictory, they can select “don’t know.”
The third and fourth questions are more straightforward to comprehend, but they are not cognitively simple. Students likely use much of the same information retrieved for the prior two questions, but now also with knowledge of what they have just answered as aspirations and expectations. For each probabilistic forecast, they must construct answers on a scale that may or may not match how they think about the likelihood of their own futures unfolding. They may feel pressure to provide answers that are consistent with responses just given as aspirations and expectations, although (as discussed earlier) they do not have an escape response of “don’t know” for the probabilistic forecasts. That impossibility likely prompts some additional information and belief retrieval, so that, for example, students who answered “don’t know” when considering a full menu of attainment options just before, can then construct an answer to report the likelihood of two specific courses of action. This task may be easy, such as for a student whose “don’t know” to the aspirations question is based on being unable to pick between “complete a bachelor’s degree” or “complete a Master’s degree.” For other students, selecting between “probably” and “very sure” in one direction or another may require that they retrieve and evaluate even more specific beliefs about their future behavior.
Finally, these insights from the survey response literature are the best-case scenario. The same literature also suggests that some students will instead offer what they think the researchers expect of them, perhaps invoking a “college for all” frame to construct generically lofty aspirations and expectations, never truly processing the “no barriers” and “actually get” wordings that ask for individualized assessment. And some students will satisfice, offering plausible answers quickly that they think will be regarded as sufficiently fit for purpose (i.e., getting them over a response-quality threshold that will speed them through the questionnaire, with little introspection about their actual life conditions and chances). This portion of the literature, nonetheless, has a silver lining. Very few students who are willing and engaged enough to complete the questionnaire will construct answers that are purposefully misleading. Any such responses would require the most cognitive demand of all, necessitating the construction of reasonable responses first, after which alternative misleading ones would then be developed and reported with their own consistency.
In sum, the extant literature on survey response suggests why these questions generate information that is predictive. No single question, such as one for educational expectations, should be able to elicit the range of information that all four HSLS questions do because each new question prompts sequential augmentation of information retrieval. Overall, the responses offered can be thought of as a performative act in which students offer to us features of their projected futures that are collectively more subtle than simply reporting a fixed educational plan into which they have been socialized. As a result, the foregoing analysis shows that the responses constructed in this performative survey completion moment have predictive validity beyond what only an educational expectations measure can offer.
Some of the uncertainty about the response patterns across all four questions could be addressed in future research through cognitive interviewing, a technique from the survey methods literature in which respondents are asked in a structured way what they were thinking when responding to the survey questions under study. It is somewhat surprising that we do not have a research tradition of this type of scholarship in the sociology of education, especially given the large contingent of researchers who have generated insight by open-ended interviewing of students and their families. Major surveys could be improved by drawing on this expertise, both during development and after fielding surveys to better understand response patterns.
Implications for Models of Educational Attainment
Although more work is needed to pin down the specific cognition involved when answering the questions analyzed for this article, the results presented provide support for further developing models of educational attainment that include mechanisms for belief formation and uncertainty. The alignment and commitment models presented in the introduction seek to explain decision-making processes that generate alternative postsecondary education trajectories, as they emerge from the everyday behavior that unfolds as students contemplate their futures. The key relationship remains what status attainment researchers focused on long ago: how adolescents’ near-term behaviors while in secondary schooling, whether conceptualized as motivation or something else, generate a population-level distribution of educational attainment that later allocates students to roles in adult society.
The probabilistic forecasts elicited by the third and fourth questions might be reasonably interpreted as subjective uncertainty, tied to educational transitions on the near horizon, generated by underlying factors that are distinct from those that set students’ levels of idealized and expected future attainments. That has been the interpretation taken in this article. But this is a matter for further investigation as well. It may be that additional analysis will show that the responses are better interpreted as simple dispassionate forecasts of future behavior, not valid indicators of subjective uncertainty among students who recognize the limitations of their forecasting capacities. This interpretation, among others, requires further analysis so that the relevance of probabilistic forecasts to improved models of educational attainment can be established.
Supplemental Material
sj-pdf-1-srd-10.1177_23780231251346283 – Supplemental material for Back to Predicting Future Educational Attainment: Two New Measures and Their Validity
Supplemental material, sj-pdf-1-srd-10.1177_23780231251346283 for Back to Predicting Future Educational Attainment: Two New Measures and Their Validity by Stephen L. Morgan in Socius
Footnotes
Appendix
Tables A1 and A2 provide the joint distributions for the variables that define the rows of Tables 1 and 2. Tables A3 and A4 provide a direct assessment of whether the probabilistic forecasts contain information not already captured by the questions for educational aspirations and expectations. For Table A3, cross-tabulations are offered to show the basic pattern for educational expectations, where sufficient probability mass of the forecasts is in the “off-diagonal” cells. For Table A4, a more formal examination is offered, using an index of the amount of information encoded in the probabilistic forecasts: the Shannon entropy metric from the literature on information theory:
where
Because the probabilistic forecast questions have four possible values, I set the base of the logarithm as 4 to deliver entropy values that range from 0 to 1. With this setup, a response pattern that is a discrete uniform distribution (e.g., a probability of 0.25 for each of the four responses) yields the maximum entropy of 1. If, instead, all of the probability mass is associated with a single response (e.g., all respondents select “very sure you will”), then the variable yields the minimum entropy of 0.
The first row of Table A4 indicates that the question on receiving a high school diploma generates less information than the question on pursuing bachelor’s degrees, 0.341 versus 0.822. The reason for this difference is that much more of the probability mass is piled up in the “very sure” category for the question on receiving a high school diploma. As shown in Table A2, 86.2 percent of respondents are very sure they will receive a high school diploma. In contrast, for the question on pursuing a bachelor’s degree, the responses are more dispersed, with values of 45.7 percent, 35.7 percent, 14.2 percent, and 4.4 percent for “very sure you will” through “very sure you won’t.”
Now, consider the remaining rows of Table A4, in which entropy is calculated for subpopulations of students. Consider the last panel in the table for some examples. All students who aspire and expect to obtain a bachelor’s degree or more have entropy values for the high school diploma forecast of only 0.125. Most of these students are very sure that they will receive high school diplomas and so their forecasts provide little additional information. But just to the right, for this same subgroup of students, the entropy value for forecasts of pursuing bachelor’s degrees is much higher at 0.501. Here, the information in the measure lies mostly in the differences between those who select “very sure you will” and “you probably will,” presumably with the latter expressing some uncertainty about potential barriers that are not yet fully in view.
Across these three panels, the entropy values are generally higher for lower levels of aspirations and expectations. And perhaps most importantly, the entropy values are considerable for the students in the “don’t know” categories for aspirations and expectations, suggesting that the forced-choice response options for the probabilistic forecast questions are able to extract additional information from these students.
Overall, the patterns in Table A4 provide evidence that the information captured by the probabilistic forecasts is substantial and nonredundant with respect to aspirations and expectations. The other results in the article show that this additional information predicts educational attainment.
Tables A5 and A6 provide results analogous to those in Tables 3 and 4, switching to on-time high school graduation as the educational transition of focus, along with the probabilistic forecast pegged to it. The same basic patterns are present as for college entry in Tables 3 and 4.
For Tables A7 and A8, I consider the relevance of compositional differences and how they may interact with variation in the strengths of the relationships between probabilistic forecasts and educational attainment. The results offered in Tables 3 and 4 were developed only with a traditional additive-effects regression strategy, yielding coefficients for probabilistic forecasts that are best interpreted as average group differences, subject to the conditional-variance weighting that is inherent in least squares minimization. Borrowing from modeling techniques most associated with the literature on causal inference, it is possible to offer estimates of average effects that permit a probing of the relevance of differences in net associations across the joint distribution of the covariates.
As a setup to this extended approach, I consider the reference group of “very sure you will” to be an implicit control group for each set of probabilistic forecasts. The students who select the other three response categories are then putative treatment groups. Their probabilistic forecasts fall below the sureness of the control group for reasons that are not directly measured.
With this mapping, we can then define six different treatment parameters. The first three are average treatment effects (ATEs) using the full distributions of the covariates: the three average differences in the outcomes for the full sample as if all individuals answered “you probably will,” “you probably won’t,” or “very sure you won’t,” rather than answering the base control response of “very sure you will.” The other three are the ATEs for the treated (ATTs), which are the three average differences restricted only to those who have covariate distributions equivalent to those who actually answered “you probably will,” “you probably won’t,” or “very sure you won’t.” The three ATTs may differ from their respective ATEs because the ATTs apply to specific groups of individuals who do not have joint distributions of covariates that match the joint distribution of the full sample.
Table A7 presents IPT weighting estimates of each set of three ATEs and ATTs. Model 2 in Table A7 uses the same socioeconomic status, gender, and race covariates as for model 2 in Table 3. Notice first that the IPT-based ATE estimates in Table A7 are close to the prior model 2 estimates in Table 3, suggesting that using the same covariates to generate weights for IPT estimates of the ATE does not produce estimates that differ much from using those same covariates in linear probability regression models to generate ATE estimates that are implicitly conditional variance weighted. Now, consider the very small differences between the ATE and ATT estimates in Table A7. These differences imply that the net associations between probabilistic forecasts and educational attainment outcomes vary little on the basis of whether the comparison is focused on the full sample or the types of individuals most likely to respond with one of the three lower levels of sureness. Altogether, model 2 suggests that with respect to socioeconomic status, gender, and racial self-identification, no evidence exists that a substantial nonlinear pattern of associations is present that casts doubt on what was already concluded on the basis of Table 3.
For additional IPT models, some constraints emerge. The sample size is too small to use the full 25-cell cross-classification of aspirations and expectations in order to estimate weights for the ATT target parameters. Too many zero cells are present in the cross-classifications for the smallest treatment group. However, the 10-category coarsened version of the cross-classification of aspirations and expectations, which was used earlier to save space for Tables 1 and A3, does have enough variation. It collapses the underlying cross-classification just enough to be usable for IPT weight construction. Model 5 is therefore roughly comparable with model 4 in Table 3 in its use of covariates to estimate effects. Substantively, model 5 in Table A7 offers ATE estimates in the same pattern and close to model 4 in Table 3. And, the ATT estimates for model 5 do not differ much from the ATE estimates for model 5.
Table A8 offers doubly robust variants of the models presented in Table A7. These additional models include supplemental linear adjustments for the covariates, in addition to using the covariates to generate the IPT weights. The results are very similar to those in Table A7, with the same general pattern of point estimates, suggesting that the IPT weights alone balance the data sufficiently well. The largest differences are for the “very sure you won’t” treatment group, which is the group where balance is hardest to achieve because it is the farthest from the multidimensional center of the joint distribution of covariates for the full sample. But even here the differences are still small, in the range of 4 to 7 percentage points.
Taken together, the results in Table A7 and A8 reinforce the results offered in the main body of the article. The probabilistic forecasts appear to have value because their net associations with educational attainment progression are robust. Compositional differences across the joint distributions of covariates have only small consequences for estimates of the predictive power of the probabilistic forecasts.
Finally, in response to the request of a reviewer, Table A9 presents the models in Table 4, showing the coefficients for the cross-classification of the five-category aspirations and expectations variables (i.e., the 25 cells of Table A1) that were suppressed in tables of the main body of the article to save space. With the most common category of a bachelor’s degree for both aspirations and expectations as the reference, the coefficients show the expected pattern: declining rates of college entry as both expectations and aspirations decline, as well as when “don’t know” responses are offered.
Acknowledgements
I thank Avery M. D. Davis for his research assistance.
Supplemental Material
Supplemental material for this article is available online.
1
The questions follow a seven-item battery for which students indicate how many of their “close friends” get good grades, have taken steps toward postsecondary education, and plan to attend several types of higher education. This prior battery, and other questions preceding it, set the context for the aspirations question and presumably cause respondents to begin to retrieve information that they see as relevant to their own goals and plans.
2
In addition, and as further explained in the Supplement, the word
3
For examples of other recent work that uses measures for educational expectations with alternative data sources, see, for example, Brand (2023), Jerrim (2014), Karlson (2015, 2019), Park, Wells, and Bills (2015), Salazar, Cebolla-Boado, and Radl (2020), and
.
4
For a review of the full research program, see Sewell et al. (2003). For additional information, including the debut of the first questions by the Educational Testing Service, see the Supplement. See also
, Chapter 2).
5
It is also notable that the alignment model was developed at a time when many researchers were questioning whether reported educational expectations since the 1990s were elevated inauthentically by a pervasive “college-for-all” ethos in schools. See Rosenbaum (2001), Reynolds et al. (2006), and
for detail on these arguments.
6
Because of the first probabilistic forecast, I also model on-time high school graduation. This is a secondary outcome of interest, and its patterning follows the same basic contours as college entry.
7
As noted later, the overall goal of the HSLS was to study post–high school trajectories. College graduation cannot be modeled in complete enough fashion for my purposes in this article, both because of sample attrition and because of the timing of the follow-up surveys. Moreover, the probabilistic forecasts that are the focus of the analysis apply only to high school graduation and college entry, not graduation from a college degree program.
8
Although only an affirmation of an existing perspective, this result is nonetheless valuable to offer, even without its value as a setup for subsequent analysis, because the HSLS (as explained in the Supplement) is the first longitudinal survey from the USDOE to ask both questions on the same survey instrument since the 1980s.
11
In the Appendix, I offer one additional set of results that supports construct validation. Models that use IPT weighting show that the results in Tables 3 and 4 do not hide consequential heterogeneity attributable to compositional differences across levels of forecasts. See Tables A7 and A8. For completeness, Table A9 provides the coefficients for aspirations and expectations for the models in
.
12
The “lost talent” argument is common in the literature but not attributable to a particular author. The key idea is that although students are motivated by their idealistic educational aspirations, they may not engage as fully with schoolwork if their educational expectations are lower than their aspirations. The entailed reduction of effort decreases levels of educational achievement and attainment for these students, generating what the literature refers to as “lost talent.” The results of this article suggest that whatever occurs through the leveled expectations mechanism is only one causal pathway for belief-based shortfalls in everyday commitment, insofar as the results show that students who are generally uncertain of their futures may enact the same behavior, apart from whether their expectations are lower than their aspirations.
13
Two points about entropy and information theory are important to mention for those unfamiliar with this index. First, as a matter of practice, if
Author Biography
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
