Abstract
Organizational scientists must capitalize on the big data revolution to better understand the nomothetic, idiographic, multilevel, and/or dynamic processes that make up today’s workplace. Simultaneously, researchers must collect high-quality data and be careful, diligent, and deliberate during data wrangling and data analysis so that all results can be replicated and all inferences are appropriate. Unfortunately, big data create many uncommon challenges during data acquisition and data wrangling that must be considered and overcome to fulfill the promise and potential of big data. Specifically, during acquisition, organizational scientists must become familiar with concepts like web scraping and databases, determine how to divide big data files into manageable chunks for cleaning and analysis, all while ensuring not to violate data usage rules and regulations. Likewise, once acquired, to effectively wrangle data so that they are ready for analysis researchers must be able to handle multiple file formats and data encoding standards, utilize a variety of software to visualize and diagnose data structure, and be adept at using functions and algorithms to determine variable structure and evaluate records and variables for missing or erroneous information. The current article provides a concise definition of big data and addresses each of these novel challenges and concepts related to big data acquisition and wrangling, specifically focusing on providing guidance and recommendations. Finally, a detailed big data example, team development using play-by-play basketball data, is provided. Each step of the process of scraping the data from the web as well as wrangling the multilevel big data into tidy data form is discussed, accompanied by a supplemental
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
