Abstract
Data collection from online platforms, such as Amazon’s Mechanical Turk (MTurk), has become popular in clinical research. However, there are also concerns about the representativeness and the quality of these data for clinical studies. The present work explores these issues in the specific case of major depression. Analyses of two large data sets gathered from MTurk (Sample 1: N = 2,692; Sample 2: N = 2,354) revealed two major findings: First, failing to screen for inattentive and fake respondents inflates the rates of major depression artificially and significantly (by 18.5%–27.5%). Second, after cleaning the data sets, depression in MTurk is still 1.6 to 3.6 times higher than general population estimates. Approximately half of this difference can be attributed to differences in the composition of MTurk samples and the general population (i.e., sociodemographics, health, and physical activity lifestyle). Several explanations for the other half are proposed, and practical data-quality tools are provided.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
