Abstract
A rapidly growing number of algorithms are available to researchers who apply statistical or machine learning methods to answer social science research questions. The unique advantages and limitations of each algorithm are relatively well known, but it is not possible to know in advance which algorithm is best suited for the particular research question and the data set at hand. Typically, researchers end up choosing, in a largely arbitrary fashion, one or a handful of algorithms. In this article, we present the Super Learner—a powerful new approach to statistical learning that leverages a variety of data-adaptive methods, such as random forests and spline regression, and systematically chooses the one, or a weighted combination of many, that produces the best forecasts. We illustrate the use of the Super Learner by predicting violence among inmates from the 2005 Census of State and Federal Adult Correctional Facilities. Over the past 40 years, mass incarceration has drastically weakened prisons’ capacities to ensure inmate safety, yet we know little about the characteristics of prisons related to inmate victimization. We discuss the value of the Super Learner in social science research and the implications of our findings for understanding prison violence.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
