Abstract
This article provides clear and practical guidelines for researchers seeking to use national secondary datasets to conduct evidence-based research. Drawing from our own experiences, we discuss a six-step research process model (Bryan et al., 2010, 2017) to help researchers navigate the use of these datasets. We present examples from the school counseling literature using these datasets and discuss the implications for evidence-based practice. Recognizing that quantitative research with secondary datasets has the potential to produce unintended harm to children with marginalized identities, we discuss the value of a QuantCrit lens in the research process. This article will increase researchers’ understanding of these datasets, demystifying the process so that they may use them more effectively to inform practice and policy.
Keywords
School counselors and other school professionals need evidence that informs counseling practice and drives decisions about educational and counseling interventions in schools. We propose that national secondary datasets represent a valuable source of data for researchers who wish to engage in evidence-based research (Bryan et al., 2010, 2017). Many such datasets exist, funded by agencies such as National Institutes of Health (NIH), National Science Foundation (NSF), and Institute of Education Sciences (IES). Many national datasets, including some from these federal agencies, can be found in data repositories such as the Inter-university Consortium for Political and Social Research (ICPSR; https://www.icpsr.umich.edu). Research conducted with national secondary datasets aligns with evidence-based educational and counseling research, which seeks to provide information about the factors that hinder or improve students’ academic, social-emotional, mental health, and college-career outcomes in the various settings in which students conduct their lives (Dimmitt & Zyromski, 2020, 2023; Zyromski & Dimmitt, 2022). Certainly, research using these national datasets allows researchers and practitioners to gain a better understanding of the experiences of students, families, and communities, and to identify potentially effective prevention and intervention programs and activities that they can further explore in subsequent research. While in this article we focus mostly on federal educational datasets, the strategies and procedures we share may be applied to any national and international secondary dataset.
In particular, national secondary educational datasets, such as those funded and managed by the IES and its statistical arm, the National Center of Education Statistics (NCES), are well-suited for evidence-based research, especially in school counseling, because they provide a rich source of data on students, schools, teachers, parents and, more recently, school counselors. As a result, they allow researchers to test theories of change that help researchers and practitioners understand the effects of a variety of school- and family-related contextual and ecological factors (e.g., school connectedness; school bonding; family engagement; parent empowerment; student-counselor contact/interactions; counselor caseload; school counseling college-going culture; school counseling programs and services) on students’ academic, college-career, mental health and social-emotional attitudes, behaviors, and outcomes. Further, studies with national secondary datasets allow researchers to identify opportunity gaps and malleable factors that may have implications concerning which practices, mindsets, and resources should be the focus of school counselors and educators in developing interventions to improve student outcomes. Lastly, the use of national datasets promotes interdisciplinary collaboration that may foster meaningful research and solutions, as this research is best done with a team of scholars who bring varied skills and perspectives to the team.
An extant body of research has demonstrated the utility of the NCES datasets (e.g., Early Childhood Longitudinal Study, ECLS; National Household Education Surveys, NHES; Educational Longitudinal Study 2002, ELS: 2002; Middle Grades Longitudinal Study of 2017, MGLS:2017; High School Longitudinal Study 2009, HSLS:09) in producing evidence-based research in evidence-based school counseling (EBSC). For example, researchers have examined the relationships of school counselor caseloads or ratios (Goodman-Scott et al., 2018; Shi & Brown, 2020); school counselor contact and interactions with students (Bryan et al., 2011; Kim et al., 2024); referrals to school counselors (Bryan et al., 2012); and school counselor aspirations (Bryan et al., 2017) to students’ academic, career; and college-going outcomes. Further, these datasets have allowed researchers to test theoretical frameworks such as school counselors as social capital (Bryan et al., 2011; Cholewa et al., 2018); college counseling opportunity structures (Engberg & Gilbert, 2014; Rangel & Ballysingh, 2020); career aspirations (Edwin et al., 2019); school counseling college-going culture (Bryan et al., 2023a; 2023b); and parent empowerment (Kim et al., 2017a, 2017b, 2017c; Kim & Bryan, 2017, 2022). All of these studies have important implications for school counseling practice and policy. However, despite the benefits of the use of national secondary datasets, they are still largely under-utilized in EBSC research (Bryan et al., 2010, 2017).
Recently, NCES started including measures of school counselor roles and practices, as well as student experiences with their school counselors, in its national surveys. They included the first survey of school counselors in their last national dataset, the High School Longitudinal Study 2009 (HSLS:09), and the second survey in their forthcoming dataset, the High School and Beyond 2022 (HS&B: 22). Evidence-based research with these school counseling data can inform school counselors and other school personnel about how they can change school counseling programs, as well as school systems and policies, by implementing practices and interventions to benefit students, particularly marginalized students. Further, more recently the EBSC framework directs researchers’ focus to the need for research evidence that is anti-racist and/or equity focused, given the growing diversity of student populations in the U.S. (see Dimmitt et al., 2023).
Thoughtful research with these highly funded, highly valued datasets that address the effects of schools and school staff on student outcomes have significant impact and policy relevance. Indeed, they are designed for and garner the attention of national and state policymakers and funding agencies. Despite the value of these datasets, relatively few researchers have produced EBSC research with these national secondary datasets (e.g., Bryan et al., 2009, 2011, 2023a, 2023b; Kim et al., 2017, 2017, 2017, 2022, 2024; Cholewa et al., 2018; Engberg & Gilbert, 2014; Goodman-Scott et al., 2018; Shi & Brown, 2020). In this article, our aim is to present information on applying national datasets in a practical, user-friendly manner that makes it easier for researchers to use these datasets for evidence-based research. Throughout the discussion, we incorporate insights that we have learned on our journey of conducting evidence-based research with national secondary datasets. We provide step-by-step guidelines using a six-step research process model (Bryan et al., 2010, 2017), which guided us as doctoral students and early career faculty in our research with these datasets. We start with a reflection on our researcher positionality in conducting research with national secondary datasets, followed by a discussion of the importance of integrating an equity-focused and anti-racist lens (i.e., QuantCrit or Quantitative Critical Race Theory; or CritQuant or Critical Quantitative; often used interchangeably) in evidence-based research with these national datasets so as not to harm marginalized students.
Our Stories, Our Journey, Our Positionality
We all received training in quantitative methodologies at Research 1 (R1) institutions in colleges and departments in which positivist and post-positivist paradigms of research appear to be more valued. As a result, we learned to use these datasets from a Eurocentric lens that elevates quantitative data and positivistic assumptions to examine, conceptualize, analyze, and interpret data. In recent years, however, our development as three women of color (one Black Caribbean and two Koreans) with international backgrounds has led us to embrace a more equity-focused, anti-racist perspective in conceptualizing, analyzing, interpreting, and reporting the data and results.
When conducting research with the use of national secondary datasets, we recognize the potential unintended harm to students with marginalized identities (Arellano, 2022). In the absence of a critical equity-focused lens, we risk conducting studies that perpetuate negative beliefs about minoritized students, place blame on them, and promote inequitable practices and policies based on biased interpretation of results (McGuinness & Wellborn, 2024). Therefore, in our commitment to promoting equity-focused EBSC research using national secondary datasets, we frame our discussion around the tenets of QuantCrit (Gillborn et al., 2018), highlighting the importance of confronting and challenging our own biases and interrogating our choices throughout the research process.
One important aspect of our growth has been challenging ourselves to learn about QuantCrit perspectives (Frisby, 2024; Garcia et al., 2018; Gilborn et al., 2018; Tabron & Thomas, 2023) on using secondary datasets so that we do not harm the very students we are trying to help. QuantCrit draws attention to the importance of using these national educational datasets so that researchers do not perpetuate systemic inequities and barriers (Garcia et al., 2018; Gilborn et al., 2018). Those conducting research without a critical or anti-racist lens may inadvertently harm marginalized students through their choices of theoretical frameworks, research questions, data analyses, and interpretations (Frisby, 2024; Tabron & Thomas, 2023). For example, we have begun to integrate positionality statements into our research articles to clarify and confront how our worldviews and inherent biases influence our research. As we discuss each aspect of the research process, we aim to be vulnerable about our new and growing knowledge of QuantCrit. We make efforts to write in an accessible and user-friendly manner so that beginning researchers or those who may not identify as quantitative researchers do not feel excluded from this type of endeavor, which is often the case with these datasets.
Integrating a QuantCrit, Equity-Focused Lens in Using National Datasets
QuantCrit is a rapidly evolving approach in quantitative research that advocates using data to promote social justice and equity, gaining prominence since the publication of the 2018 special issue in Race Ethnicity and Education (Arellano, 2022; Garcia et al., 2018). QuantCrit applies the principles of critical race theory (CRT; Crenshaw et al., 1995) to challenge traditional quantitative methods that often reinforce deficit-based perspectives (Castillo & Babb, 2024; Castillo & Gillborn, 2023). QuantCrit scholars assert that quantitative methods are well-suited to map out the broader systems shaping students’ daily lives and to reveal the structural barriers and inequities faced by students in marginalized groups (Gillborn et al., 2018; McGuinness & Wellborn, 2024). We introduce five tenets of QuantCrit, based on the core principles conceptualized by Gillborn et al. (2018).
The first tenet emphasizes the centrality of racism (Gillborn et al., 2018). QuantCrit recognizes that structural racism is deeply rooted in society and is not easily quantifiable (Crawford, 2019). It is important to acknowledge that race is a social construct reflecting power dynamics within society (Gillborn et al., 2018). Without a critical race-conscious perspective, quantitative researchers risk reinforcing and legitimizing the racial inequities pervasive in educational systems (Cruz et al., 2021; Garcia et al., 2018; Suzuki et al., 2021). QuantCrit advocates for critical awareness and an asset-based approach in framing research questions. For example, instead of asking, “Why are Black boys expelled from our schools at a higher rate than other boys?” researchers should be able to ask, “Why do schools expel disproportionate numbers of Black boys?” (Castillo & Gillborn, 2023).
The second tenet posits that numbers are not neutral (Gillborn et al., 2018). QuantCrit challenges the assumption that quantitative methods are inherently objective (D’Costa et al., 2024). In fact, many quantitative analyses reinforce racism in the educational system by using tools, models, and techniques that fail to account for everyday racism (Crawford, 2019). While statisticians often aim to show the relationship between independent and dependent variables without confounding factors, this approach can neglect the diverse experiences of students from different racial and ethnic groups. Given that socioeconomic status (SES) and prior achievement are intertwined with racism and do not operate independently from it, controlling for SES and/or prior achievement in research without critical awareness could inadvertently mean “controlling” for racist systems (Castillo & Gillborn, 2023). This could lead to researchers framing research findings as a deficit of the individual child, rather than a systemic issue. QuantCrit advocates for transparency in methodology, urging scholars to critically examine and address the biases that emerge from both the data collection and analysis processes (Cruz et al., 2021; Garcia et al., 2018).
The third tenet asserts that categories are neither “natural” nor given (Gillborn et al., 2018). Researchers must carefully examine the categories they construct for analysis, especially those related to race (Suzuki et al., 2021). As race is socially constructed and shaped by power dynamics, understanding this helps prevent inaccurate, oversimplified conclusions about race (Gillborn et al., 2018). Although quantitative data are not harmful on their own, problems arise when numbers are categorized without a critical lens that considers historical and contextual nuances, which can have serious harmful consequences for marginalized groups (D’Costa et al., 2024).
The fourth tenet asserts that data cannot “speak for itself” (Gillborn et al., 2018). All analysis, whether quantitative or qualitative, is influenced by the researchers’ beliefs about the main research problems and their theories about the processes they are investigating. This can result in multiple, often conflicting, interpretations (Castillo & Gillborn, 2023; Crawford et al., 2018). Throughout the research process, researchers should pay careful attention to intersectionality (e.g., race/ethnicity, gender, sexuality, class, dis/ability, etc.) and the lived experiences of people with marginalized identities (Gillborn et al., 2018). Even when interpreting statistically significant findings, researchers should avoid interpreting their findings in a vacuum, without considering the voices, contexts, and experiences of the individuals involved in their study (Castillo & Gillborn, 2023).
The fifth tenet emphasizes social justice and equity orientation (Gillborn et al., 2018). QuantCrit advocates for researchers to orient themselves toward achieving justice and equity in education (Castillo & Strunk, 2025). Researchers should use data not just for analysis, but as a tool to advocate for changes in schools and communities that will benefit marginalized students and families. QuantCrit challenges the notion that quantitative research is objective and value-free, and calls for researchers to examine and provide their positionality (Gillborn et al., 2018). Without critical reflexivity in quantitative work, data-driven findings can be weaponized, perpetuating inequities against marginalized students and communities (Fong & Irizarry, 2025). Below, as we discuss Bryan et al.'s (2010, 2017) six-step research process model, we attempt to integrate QuantCrit tenets along with practical tips to help simplify the use of these national secondary datasets.
The Six-Step Research Process for Using National Datasets with an Equity-Focused Anti-Racist Lens
Checklist for Navigating the Process of Using Large Secondary Data Sets.
Step 1: Gaining Access to the Dataset
Explore Existing Datasets and Data Repositories
The first step to discovering more about existing datasets is gaining access to them. Researchers can begin by reading articles on national secondary dataset analysis (e.g., Bryan et al., 2010, 2017). Bryan et al. (2017) provides an extensive list of many available national educational and mental health datasets. Another important source of information about these datasets can be found on the websites of data repositories, which house datasets for sharing and analysis. The Inter-university Consortium for Political and Social Research (ICPSR) houses a variety of education, social science, and mental health datasets (https://www.icpsr.umich.edu/web/pages/ICPSR/index.html). Other popular data repositories include Harvard Dataverse, Data.gov, ResearchDataGov, and Google Dataset Search. Many university libraries and agencies provide a webpage on finding datasets and secondary data, with links to data repositories. For example, the NIH library provides a comprehensive webpage on finding datasets, data repositories, and data standards (https://www.nihlibrary.nih.gov/finding-datasets-data-repositories-and-data-standards).
Perhaps the most important repository for educational data is the National Center for Education Statistics (NCES). The NCES data website (https://nces.ed.gov/surveys/) stores a wide range of educational data from pre-K–12 schools, educational agencies, and postsecondary institutions. NCES is the agency within the U.S. Department of Education responsible for collecting data. Some of the datasets suitable for EBSC research include a range of elementary and secondary datasets such as the Educational Longitudinal Study 2002 (ELS: 2002; https://nces.ed.gov/surveys/els2002/); the Early Childhood Longitudinal Study (ECLS; https://nces.ed.gov/ecls/); the Middle Grades Longitudinal Study of 2017 (MGLS:2017; https://nces.ed.gov/surveys/mgls/); and the High School Longitudinal Study 2009 (HSLS:09; https://nces.ed.gov/surveys/hsls09/). NCES is currently collecting data and compiling some forthcoming datasets, including the Early Childhood Longitudinal Study, Kindergarten Class of 2023-24 (ECLS-K:2024; https://myecls.ed.gov/) and the High School and Beyond 2022 (HS&B:22; https://surveys.nces.ed.gov/hsb22), scheduled to be released in 2025. The HSLS:09 and HS&B:22 are the only datasets that include a survey of school counseling programs. In 2009, the HSLS:09 surveyed the heads of school counseling programs in over 900 schools, the first time this population was included.
Access Restricted Data by Applying for a License
Another important aspect of working with datasets is accessing restricted data and learning how to secure the data. With many national secondary datasets, only public use or non-restricted data are available for immediate use. These usually can be downloaded easily from the websites or data repository where the data are housed. To use restricted data, which contain sensitive information, researchers typically need to apply for a restricted use data license. For example, some data in the NCES datasets are restricted (e.g., counselor caseload, school size). Researchers can apply online for a restricted use data license (https://nces.ed.gov/pubsearch/licenses.asp) at ResearchDataGov (https://www.researchdatagov.org), a web portal for discovering and requesting access to restricted data from federal statistical agencies and units.
To apply for a license, researchers often write a short proposal that describes their research strategy, specifying what they want to do with the data, including some of their research questions, and describing their security plan for the data. NCES guidelines on applying for a restricted use data license (https://nces.ed.gov/statprog/instruct_apply.asp) require researchers to designate a Principal Project Officer (usually the researcher/primary investigator); Senior Official (an authority responsible for research in the college/university, e.g., research dean or VP of research); and Systems Security Officer (usually a manager or official in the college/university IT department). Security of the data may involve using a computer that does not connect to the internet, as is the case with NCES. As an example, our deans approved a used desktop computer, which the IT department easily configured to prevent connection to the internet. More recently, since COVID, some data repositories provide the option of remote access to restricted data, but this may involve a cost.
Invest Time in Training for Datasets Including a QuantCrit Perspective
We advise researchers to invest time in training, which is integral to understanding the datasets, how the data was collected, their sampling design and who the data represent, and data analysis fundamentals such as using sample weights and controlling for sampling error in complex samples that most data sets use. We encourage researchers to begin examining QuantCrit perspectives early in order to lay a just foundation for conducting the research. Begin by reading relevant articles on conducting quantitative research using a QuantCrit approach especially with secondary datasets (e.g., Arellano, 2022; Frisby, 2024; Garcia et al., 2018; Gilborn et al., 2018; Tabron & Thomas, 2023). It is not enough to simply acquire the necessary technical knowledge without querying the paradigms behind these big data initiatives and datasets. Those committed to producing equity-focused research should recognize that it is important to explore how to use these datasets responsibly, gaining understanding of the ways they can be used to inappropriately analyze data about marginalized populations and to perpetuate inequities.
Technical training in datasets can be found through the NCES Data Institute offered by the Association for Institutional Research (AIR); National Assessment of Educational Progress (NAEP) Data Training workshops; and various courses offered by professional associations at their national conferences (e.g., American Educational Research Association). Valuable online training modules are available at the NCES DataLab Learning Center (https://nces.ed.gov/datalab/support/learning) through the Distance Learning Dataset Training (DLDT, https://nces.ed.gov/training/datauser/#/). The DLDT modules provide training on the characteristics of each of the datasets and how to use them, including the sampling weights. These modules introduce scholars to specific NCES datasets (e.g., HSLS: 09), their design, and special considerations for data analysis to facilitate correct use and analysis of the specific dataset.
Step 2: Getting to Know the Data and Evaluating its Suitability for the Study
This step involves understanding what variables are in the dataset, what items were used to measure them, what measurement scale or response categories each variable comprises, and whether these variables can be used separately or combined to create additional variables of interest. From a QuantCrit perspective, researchers should seek to understand whose knowledge is represented in the dataset. This entails asking questions such as: How are marginalized groups represented? How are race, socioeconomic status, disability, and intersectional identities represented, operationalized, categorized, and/or constructed? Are racial/ethnic groups represented as monolithic (i.e., single categories)?
Explore the Technical Manuals, Codebooks, and Related Articles
Researchers should begin by reading the manuals and codebooks to understand the design of the study and the contents and layout of the data files. Information from the manuals and codebooks is typically found online. These sources include which questions were asked of participants, whether the data include the variables of interest in a format that would answer the proposed research questions, and how these variables are structured (e.g., response codes, non-response, missing data). NCES has created a Datalab (https://nces.ed.gov/datalab/) with data tools (e.g., an online codebook for most of its datasets (https://nces.ed.gov/datalab/onlinecodebook/). The Datalab allows researchers to explore the variables available in its public use datasets and conduct descriptive and some inferential analyses of the data. Another way we learned about these datasets was through reading articles that described studies that used them, especially studies related to school counseling and school counselor education. We found that immersion in studies with the dataset provided concrete ideas and important understanding of context.
Create a Diverse Research Team
Creating and collaboration with a diverse research team brings multiple perspectives to crucial research tasks. These tasks may include: discussing and refining the proposed research questions; examining the relevance of the variables and whether they actually can measure what you are hoping for; exploring or developing conceptual frameworks that explain the team’s theory of change; and preparing and analyzing the data or helping to locate methodologists who are familiar or comfortable with these datasets. We recommend that the research team comprise colleagues and students with diverse perspectives, backgrounds, and expertise. In addition, having a team member familiar with QuantCrit can be helpful. However, as we are discovering in our own journey, team members can push each other to learn and apply QuantCrit at each stage of the process.
Some researchers may believe that they cannot conduct studies with these datasets if they do not have significant statistical skills. This is a myth. If quantitative methodology and analytical skills are not in the team’s repertoire, the team should collaborate with a colleague with strong quantitative methodology and statistics skills; ideally, this person will have a background in school counselor education or at least education in general. If no one is available, research teams should be able to find a colleague in another discipline (e.g., statistics, survey methodology, sociology, health) across campus or at another university.
Develop a Strong Conceptual Framework
We have learned that it is crucial for members of the research team to spend time immersing themselves in the literature and developing their conceptual or theoretical framework. This framework helps to shape all important research decisions, including: how to phrase research questions; what continuous or categorical variables to choose; how to recode variables to create the categories or groups you wish to study; what data analyses to select; how to interpret or make sense of the results; and the subsequent practice, training, and policy implications. It is crucial that researchers examine their frameworks from a QuantCrit perspective, pushing themselves to challenge the conceptual frameworks they use to study marginalized groups and asking themselves whether the underlying assumptions promote stereotypes and deficit perspectives of marginalized groups. Some of the conceptual frameworks we have used or created for our studies include parent empowerment, social capital, school connectedness, school bonding, disproportionality in discipline, perceived discrimination, and college-going culture. We did not always scrutinize these frameworks for hidden assumptions or narratives about marginalized children and families. Our work has made us more aware of how many of the popular conceptual frameworks and models used in school counselor education and psychology perpetuate systemic biases and deficit views of marginalized children and families (Washington et al., 2023).
Reflect on your Own Biases
In our earlier work with these datasets, we presented racial and ethnic groups as monolithic, failing to look at variation within ethnic subgroups. QuantCrit scholars have since challenged us to acknowledge and examine how our identities and experiences as three international scholars of color have influenced our research. Thus, QuantCrit has led us to practice reflexivity, so that at each stage of the research process we reflect on how our biases affect our conceptualizations of the study, interpretations of results, and recommendations.
Step 3: Preparing the Data for Analysis
Create Working and Backup Files
The research team needs to first create a working data file to use in conducting the analyses, as well as a backup data file that can be used to start over from scratch if necessary. The process involves extracting the relevant variables from the original dataset, making sure to identify and include the appropriate sample weights, clusters, strata, and identification variables needed to correct or account for the effects of the complex sample design.
Determine how to Handle Missing Data
Another important decision is how to handle missing data. Researchers need to evaluate whether missing data are random or non-random, then decide on suitable statistical techniques for handling missing data (e.g., listwise or pairwise deletion, multiple imputation, or maximum likelihood). Some controversy and debate surround the techniques and appropriateness of handling missing data. In cases where researchers decide not to replace missing data, they should have a credible reason for not doing so. In not addressing missing data bias may limit the generalizability of the findings. Further, missing data on students from marginalized groups can result in missing voices and perpetuate their invisibility in educational systems and policy change. A QuantCrit perspective emphasizes the importance of questioning who the missing data represents. Missing data that appear to be random may systematically exclude some groups due to structural barriers that affect data collection (e.g., income, language, accessibility). Some groups may be over- or under-represented in the sample due to missing data. For example, analytic samples with the HSLS:09 typically have a larger proportion of students in the higher income quintile (see Bryan et al., 2023a). A QuantCrit perspective calls for transparency about the missing data and how it is handled. Examining how these missing data affect underrepresented groups will help to avoid the potential to reinforce and legitimize inequities in educational systems (Cruz et al., 2021).
Create Composite Variables Using Appropriate Data Reduction Techniques
An important task that researchers face is creating composite variables (i.e., combining multiple items or variables together) to measure their constructs. Researchers should exercise caution when using composite variables created by the original secondary dataset, as those variables might differ from the construct the research team hopes to measure. For instance, the High School Longitudinal Study (2009-2016) created composite variables for school connectedness using five items. However, two items (i.e., “school is often a waste of time” and “getting good grades in school is important to you”) did not appear to represent the operational definitions of school connectedness recommended by the Centers of Disease Control and Prevention (CDC). Therefore, researchers may need to create new composite variables by combining items that represent the construct of interest. The conceptual or theoretical framework provides a rationale for what items to select for the composite variables. A number of factor analysis methods may be used to reduce the data and combine items into groups of variables that are highly related to each other (e.g., exploratory factor analysis, principal components analysis, and confirmatory factor analysis). When only a few items exist, these may not be suitable for factor analysis, and researchers may use simple averaging to combine the items. It is crucial to note that the conceptual or theoretical framework and research literature should guide researchers in selecting the items most suitable for creating composite variables to measure their constructs.
Create Subgroup Categories Considering Intersectionality
QuantCrit posits that categories are neither “natural” nor given, highlighting the importance of analyzing intersectional identities to reveal inequities that aggregated categories might obscure (Gillborn et al., 2018). Researchers must pay close attention to creating socio-demographic categories or groups in their studies. National secondary datasets include a range of socio-demographic factors (e.g., immigration, English proficiency, first-generation status, family income, gender, race), ethnic groups (e.g., Mexican, Chicano, Cuban, Dominican, Puerto Rican, Central American, Honduran, South American), and contextual factors (e.g., public or private schools, urban or rural schools). These variables allow researchers to investigate how intersectional identities (e.g., Asian females in low SES categories, Black males in high SES categories) are associated with student outcomes. Indeed, a benefit of large national datasets is the ability to examine intersectionality, which can highlight hidden patterns not easily recognized in primary data. For instance, in one study we found that school counseling college-going culture and school connectedness work differently among Black males and Black females in relation to college outcomes (see Bryan et al., 2023b). Such nuanced findings may provide information on school counseling practices and policy recommendations to better serve and advocate for marginalized students.
Step 4: Conducting Appropriate Data Analyses
Correct for the Complex Design by Applying the Correct Weights, Strata, and Design Effects
After all this careful preparation, it is finally time to analyze the data! This step entails examining the chosen data using statistical methods that align with the research questions and the type of data (Bryan et al., 2017). Before going further, it is crucial to correct for the effects of the complex sample. Note that many samples in secondary data analysis are complex samples, meaning they do not meet the assumptions of simple random samples and may include any or all of the following features: stratification, clustering, or unequal sample weights. Results that do not account for these complex design effects produce very small standard errors or increased Type I errors that confound the results of the analyses. If the appropriate weights, strata, and clusters are not applied to the data, this leads to inaccurate and artificial findings. Researchers can find ample information on the appropriate weights, strata, and design effects in the technical manuals (e.g., https://nces.ed.gov/surveys/hsls09/usermanuals.asp). Increasingly, statistical software packages include applications for complex samples (e.g., SPSS, SAS, Stata, Mplus, HLM, and Latent Gold). More details and references on using sampling weights and strata and cluster variables to control for complex sample design effects can be found in the chapter by Bryan et al. (2017).
Compare Differences Between the Analytic Sample and the Original Sample
Another important step that researchers often neglect is comparing their analytic sample (i.e., the data on which they are running their analyses) to the original set of data. This involves asking questions such as how the distribution of SES, racial/ethnic minority students, or students with disabilities in your analytic sample compares to the proportions in the original dataset. Another question might be how these groups compare on key variables in the analytic sample versus the original dataset. As mentioned previously, an anomaly with the HSLS:09 is that once missing data are removed, analytic samples appear to have larger percentages of high-income minoritized students when compared to the original data.
Disaggregate the Data to Tell the Full Story
National secondary datasets contain rich stories about students and their teachers, principals, school counselors, and other adults in schools. However, in order to find those stories, researchers must be willing to examine and describe these data. It is easy to jump directly to advanced statistical analyses because national secondary datasets are particularly well-suited for applying multivariate statistical methods, such as multiple linear and logistic regression, factor analysis, structural equation modeling, multilevel or hierarchical linear modeling (Lynch, 2012; Peugh, 2010; Snijders & Bosker, 2012), and latent class analysis (LCA; Collins & Lanza, 2013; Lanza & Cooper, 2016; Lanza & Rhoades, 2013).
In this step, we advise you to start with single-level and univariate analyses, such as descriptive and correlational analyses, instead of plunging into the more advanced inferential and multivariate analyses right away. From the QuantCrit perspective, it is imperative to pay careful attention to intersectionality in analyzing data and to tell the stories of the groups and subgroups within the data (Gillborn et al., 2018). Thus, it is crucial to start with the basics and build up from there, disaggregating the data to describe the sample and who it represents. It is also important to take as long as needed to gain insights about the outliers, subgroup differences, and both linear and nonlinear relationships between variables. This process takes time but often reveals important patterns and meanings that underlie the more complex results. Disaggregation of the data and descriptive analyses will help you determine whose knowledge is represented by the data and how different subgroups (i.e., ethnic, racial, gender, etc.) are represented by the data.
When we examined intersectionality in our own studies, we uncovered some interesting patterns. For example, in a study of the college enrollment of Asian ethnic subgroups, we found very different levels and patterns of college enrollment among South Asian, Southeast Asians, and East Asian, Chinese, and Filipino subgroups. In another study, we found that Asian students who attended rural schools had lower odds of attending 4-year colleges and higher odds of attending 2-year colleges. These differences were masked when we examined Asian Americans as one homogeneous ethnic group, which caused us to miss important patterns in college-going outcomes (e.g., Bryan et al., 2009, 2011).
Replicate your Studies
Given the importance of replication in making recommendations for educational policies and practices (Plucker & Makel, 2021), researchers should also consider replicating previous research. As many educational secondary datasets often collect similar data on participants over time, it is beneficial to replicate studies with other available datasets to test the consistency of the findings (Makel & Plucker, 2014). These replication efforts allow researchers to make stronger conclusions and provide more meaningful implications for policy, future research, and practice (Leppink, 2017). For example, Bryan et al. replicated their study on school counseling college-going culture and its effects on various college outcomes (see Bryan et al., 2023a) and found that this culture seems to apply better for White and Asian students than Black and Hispanic students. This may be because items that capture systemic and structural factors that influence some marginalized groups (e.g., racism, discrimination) are not available in the dataset. Instead, we have had to use variables like school connectedness and school bonding as proxies to capture the effect of racism and discrimination on student relationships with teachers.
Step 5: Interpreting Results and Examining All Implications, Including Policy Implications
Link Findings to Systemic Inequities and Structural Barriers
National secondary datasets are powerful tools that provide researchers with the opportunity to connect their findings to relevant educational and mental health issues, as well as systemic barriers, in schools, families, communities, and governments. We find it particularly important to adopt a QuanCrit approach when linking findings to policy recommendations, making sure not to perpetuate systemic and structural biases and oppression. Researchers must be careful not to blame students and families, and to pay attention to structural barriers that may affect student outcomes. Further, researchers should challenge their narratives for deficit thinking regarding students from low-income backgrounds, students of color, students with disabilities, and other marginalized students.
Address Policy Implications of Results
Practice, training, and research implications are important and usually addressed. However, researchers often neglect policy implications. Writing the implications and recommendations section is especially important when using national secondary datasets, as policymakers use these findings to make important decisions about supporting students and schools. Policy implications from these datasets are fundamental to decision-making because of their increased generalizability to the larger population (compared to simple random samples), which allow policymakers to then draw conclusions about student, school staff, and family behaviors, trends, or patterns. Thus, researchers should think about translating findings to policymakers. From a QuantCrit perspective, it is important to consider the implications of these findings in the context of the structural and systemic barriers that impact marginalized groups. This means asking questions and speculating about the meaning of the results and discussing implications from an equity perspective.
A number of EBSC studies with datasets have highlighted important policy implications, an example being the consistent negative effect of large school counselor-student caseloads (Goodman-Scott et al., 2018; Shi & Brown, 2020; Woods & Domina, 2014) on various student outcomes. These findings were replicated in Bryan et al.’s (2021) study on school counseling college-going culture, indicating the importance of school counselor caseloads in promoting college-going outcomes. This study highlighted the need for policymakers to develop state or legislative policies or strategies that increase the number of school counselors, which will in turn increase student access to school counselors (Bryan et al., 2021). Indeed, the study suggested that the existing policy of having designated school counselors specifically for college counseling might be less effective than increasing the number of counselors in the schools (Bryan et al., 2021).
Step 6: Considering and Describing the Limitations of the Data
Identify and Acknowledge the Limitations
While recognizing the benefits of the national secondary datasets, it is important that researchers identify and acknowledge the limitations that these datasets present (e.g., missing data, underrepresentation or omission of some groups, lack of contextual variables that measure various forms of oppression) and how they may influence the application of the findings to various populations. Describing the limitations prevents readers from misinterpreting or misusing the findings. Several common limitations exist within national secondary datasets. First, it is especially important that researchers do not overgeneralize findings to populations not represented by the sample. Further, researchers often must use items as proxies for the constructs they wish to assess. In other words, as these national datasets were collected by others, researchers often need to use items that best measure the constructs that the items were not originally designed to assess. For example, we have used school connectedness as a proxy to capture students’ experience of discrimination. Additionally, using only one or two items to measure a variable may affect the reliability and construct validity of the variable you are attempting to measure. Due to these limitations, researchers should be cautious about overextending their conclusions and should be transparent about the boundaries in their study.
Another limitation researchers may overlook is that the variables used to assess student-adult relationships often do not capture the quality of the relationships between students and school staff or between students and their families. The quality of these relationships is especially important in schools. Finally, the QuantCrit lens has helped us to recognize that it is likely that institutional and systemic biases are reproduced by the institutions that create these datasets, underscoring the importance of replicating studies with primary data using qualitative and mixed-method designs that result in more in-depth analysis and insights.
Recommendations for Evidence-Based Research with National Datasets
Our aim is to make the process of navigating evidence-based research with national secondary datasets more manageable. We hope that this article will encourage early career scholars and doctoral students to explore and undertake this work. Many such datasets are available, and although we focused on educational datasets, the information we shared is applicable to most national secondary datasets. We conclude with six final recommendations that researchers, school counselor educators, and others may find helpful as they use these datasets: 1. Start simply. 2. Incorporate a QuantCrit and equity lens throughout the entire research process. 3. Explore the funding sources available to support studies using these datasets. 4. Prioritize research focused on student outcomes, such as student mental health and social-emotional outcomes, that inform educational policy changes. 5. Expose doctoral students and early career scholars to national secondary datasets and relevant training. 6. Advocate for continued development and expansion of school counselor surveys in these national datasets.
Below, we expand on each of these recommendations.
Conclusion
Given the benefits of using national datasets to advance EBSC, we discussed a six-step research process model (Bryan et al., 2010, 2017) and shared our own experiences as examples to demystify the research process. The use of national datasets allows researchers to have a better understanding of “the nature of the situation we hope to impact” and “how to best to create change in that situation” (Zyromski & Dimmitt, 2022, p. 2). To promote equity-focused anti-racism school counseling practice and research, we incorporated the tenets of QuantCrit in each of the six steps of the research process (Bryan et al., 2010, 2017). We have also provided principles to enhance researchers’ awareness of the potential unintended harm that quantitative research with national secondary datasets could cause if they do not use a QuantCrit perspective. We hope that our accessible and user-friendly guidelines empower beginning researchers or those who may not identify as quantitative researchers to have the courage to conduct research using national datasets.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
