Abstract
This paper describes a successful attempt to replicate DeSimone [5] which investigates the effect of fraternity membership on binge drinking using both probit and interval regression models. We encountered software-related difficulties that hampered our replication effort, though ultimately we showed that DeSimone's published results are replicable. This is due to default settings in Stata that correctly identify the problem of complete separation, in which the maximum likelihood estimator does not exist. Other statistical packages, such as R, do not recognize the complete separation problem and do not automatically remove the observations before estimation. This poses several econometric obstacles for replication. Without prior knowledge of which observations Stata drops from the probit regression, inconsistencies between the samples used for the probit and interval regression models would arise, regardless of which software package is used. Second, when attempting replication in a program, such as the GLM function in R, that does not automatically remove the necessary observations, one will not be able to exactly duplicate the results due to a sample that is not identical to the one used for estimation by Stata. Thus, without knowledge of which observations are dropped, precise replication would be impossible. Therefore, we advocate the benefits of data/code archives in facilitating accurate verification of published results. We find that once the correct sample is obtained we are able to identically replicate the paper. It should be noted that the results are qualitatively identical, regardless of which sample size or software package is used.
Get full access to this article
View all access options for this article.
