Sage Journals: Discover world-class research

Abstract

The social sciences are at a remarkable confluence of events. Advances in computing have made it feasible to analyze data at the scale of the population of the world. How can we combine the depth of inquiry in the social sciences with the scale and robustness of statistics and computer science? Can we decompose complex questions in the social sciences into simpler, more robustly testable hypotheses? We discuss these questions and the role of machine learning in the social sciences.

Keywords

Data analysis machine learning social sciences robust methodology computational aesthetics scalable science

The social sciences are at a remarkable confluence of events. Advances in computing have made it feasible to analyze data at a scale far beyond the reach of earlier researchers. The web has become the largest observatory for the social sciences. At the same time, decades—even centuries—of theories of human behavior form the backdrop for these recent capabilities. Traditionally, these theories have relied on detailed surveys and experiments of modest scale. The depth of these studies stands in contrast to the statistically robust but simple results of large data analysis. We must rethink the questions and methodologies we study in light of the data capture methods and data now available. There is a further subtlety here. Beyond some facets of rational behavior in microeconomics (more on this below), we lack a language for codifying human behavior. What if scientists could codify human behavior and compose behavioral elements, in the way that the elementary laws of physics can be composed to produce complex missions such as sending humans to the moon? In this article, we pose two questions that we believe are central to the nascent discipline of computational social sciences; as we examine how these questions could be attacked, we in turn unearth several subsidiary questions that remain open.

How can we combine the depth of inquiry in the social sciences with the scale and robustness of statistics and computer science? The celebrated “six degrees” experiment of Travers and Milgram (1969) relied on a few hundred subjects in a relatively uncontrolled experiment—not quite the rigor with which statistical hypotheses are tested in the exact sciences. On the other hand, Granovetter’s (1995) classic work on weak ties in job-finding highlights an important feature of his (still relatively smallscale) methods: he was able to conduct interviews that elicited far more nuance from his subjects than, say, the observation of millions of clicks on a web page. By observing millions of users, we can precisely predict what fraction of users will click on that page; but we will have little understanding of why they click. Can we address questions in the social sciences by interpolating between the depth of field studies and the rigor of large-scale data analysis?

Can we decompose complex questions in the social sciences into simpler, more robustly testable hypotheses? Consider a complex question such as: do memes become popular mainly through social diffusion between acquaintances? Could we decompose this into a testable hypothesis where we isolate the effects of diffusion from other sources of influence (mass media, formal education or advertising), along with another postulate on the structure of networks that sustain efficient diffusion? Being able to do so would mean that researchers could work on a toolkit of robust hypotheses—building blocks from which more complex social theories could be assembled.

It is instructive to consider the burgeoning interaction between microeconomics and computation. Microeconomic theory helps us understand behavior in markets—under the onerous assumption that participants in a market are rational. Few would argue that most human activities are rational, in the technical sense that game theory accurately models these activities. That said, a significant swath of human economic activities—practiced over many millennia—is captured by microeconomic theory. Crucially, the propositions of microeconomics, resting as they do on the formalism of mathematics, are composable. Complex theories on market behavior are derived from simpler assertions.¹ Moreover, the predicted behavior is replicable and reliably observable. The mathematical foundation of microeconomics yields an additional benefit: interoperation with computation, which is also built on mathematics. This means for instance that an auction reserve price whose existence is established by microeconomic theory can be computed efficiently using an algorithm. It means that parameters in an incentive scheme can be estimated to within a prescribed error margin using a large-scale experiment. Perhaps more crucially, we can quantify the benefit from better estimates (and thereby the impact of better understanding of behavior on an economic objective). This has led to a rich crossfertilization of ideas from microeconomics with those in statistics and computer science, leading to new academic subfields and programs, conferences, and coursework. A side effect is that we have something of a self-fulfilling prophecy: as the notion of rational behavior in markets is made technically and computationally precise, it becomes feasible for much market activity to be conducted by software robots that trade on behalf of humans (examples include stock and commodities markets and advertising exchanges). In the process, market behavior is increasingly driven by rational (software) agents and conforms to the clean abstractions underlying microeconomic theory.

Given this successful interaction of microeconomics with computation, might we aspire to tackle the simplest questions about behavior that cannot be modeled as rational? Consider these sample questions of this genre: (1) We wish to understand whether people shown certain symbols online (e.g. a red cross) are more likely to donate to charitable causes following exposure; (2) We seek to quantify the correlation between people purchasing a gadget and the number of their friends who recently purchased that gadget. The former is difficult to do right (we would have to carefully randomize a plethora of confounding variables), but—with the right cautions and caveats—within reach. The methodology would carefully establish randomized control and treatment groups and measure the differential outcomes. Of course, we require the ability to show selected individuals the red cross symbol (say, when they log into their email) and give others a “placebo” (a graphic not related to charities, but visually similar). Already, this methodology is fraught with challenges. First, we would have to be careful in our randomization, e.g. individuals who go online infrequently (or those with slow internet connections, or who historically donate, or a host of other conditions) cannot be disproportionately represented in either the control or treatment groups. The placebo symbol should not alter individuals’ charitable instincts (say, an advertisement that asks whether you have saved enough for retirement). The trial should not overlap a period when donation behavior varies endogenously (e.g. during Christmas or a tax year ending). Such controlled experiments have historically been used to measure the effect of advertising on sales and the effect of page layout on users’ propensity to return to the web page, for instance. We thus have a growing set of instances where conclusions about behavior can feasibly be attacked by careful large-scale methods, but there is much more that can be done and this remains a promising but nascent area.

What about the second question above, on influencing gadget purchases? Classically, one might address this question by surveying gadget buyers, asking how many of their friends had recently purchased the gadget, what factors they considered important in their choice, etc. In the process, the survey elicits additional information on buyers, such as where they live—information that is often sacrificed in a pithily summarized aphorism such as “friends influence gadget purchases”. For instance, the extent (and existence) of influence may depend on what item is being purchased. How does large data analysis alter this landscape? Here we run into difficulties beyond those in our earlier example of charitable donations. First, it is much harder to cleanly randomize control and treatment groups: we need to find sample populations for each group consisting of similar propensities to purchase gadgets, with similar sets of friends, who have exhibited similar distributions of purchase behavior. Even if we were to get this far, other complications arise: we must ensure that gadget purchases are indeed attributable to the influence of friends and not other influences such as advertising campaigns and newspaper reviews. Another key issue is stationarity: the influence of friends may be very different when a gadget is new, versus when it is three months old. Thus, a simple statement of correlation (“if you learn that three of your friends purchase it, you are 74% likely to purchase it within a week”) may not hold over a period of time. Stationarity (roughly: people continue to behave in the future as they have been observed to in the past) is key to modeling behavior from observations of the past, yet is often unjustified in practice. Thus, there are questions in our purview that will not succumb to routine “clinical trial” experiments. Both the design of the experiment and the effects of stationarity make it difficult to precisely identify causality.

The availability of large data affords yet another lens for viewing behavioral questions such as the gadget purchase scenario. Advances in machine learning—a discipline at the nexus of statistics and computer science—have made it feasible to solve many prediction problems simply by throwing a large number of historical examples at a machine-learning program. In our running example, we would give the program exemplars of gadget buyers, as well as nonbuyers. For each exemplar, we would provide a number of features such as their sets of friends, their purchases (of gadgets, and indeed any other data about their purchases, ages, genders, addresses, etc.), perhaps even the weather on the date of purchase. Without making any attempt to understand how the behavior of friends influences one’s gadget purchases, the program optimizes its ability to predict whether a hitherto unseen individual will purchase a gadget in the future. The program may build itself a complex prediction formula (e.g. “divide the person’s annual income by 50,000, add three times the number of friends who live within 4.2 miles, and subtract the person’s age; if the result exceeds 3.9, predict that they will purchase”)² under the sole criterion that it fits the observed data well. It is possible to obtain extremely accurate results for such prediction tasks, with no ability to correlate the predicted variable (gadget purchases) with a specific feature (friends’ purchases); arguably, the formula adds nothing of generality to our understanding of purchase behavior beyond the specific scenario that gave rise to the data. For many problems addressed by machine learning, the state of the art is that prediction accuracy is bottlenecked by data volume more than it is by cleverer machine-learning algorithms—thus, having more data is in fact material. This example underscores why successful prediction (which large data analysis fares well at) may be easier than testing a theory of behavior. A bonus with machine learning, however, is that we retain elements of composability: we can break a larger task into subproblems and compose predictions, invoking the language of probability theory. This allows us a clear view of the role of machine learning in computational social science: it affords us predictability and composability, but almost never replicable understanding. By the latter we mean a simple principle (“friends are unconditionally the biggest influence on gadget purchases”) that we can transfer from one setting to another (“hence, friends are the biggest influence on choices in vacation travel”).

Machine-learning methods have evolved to “solve” many problems that we associate with intelligence: recognizing written text, translating text from one language to another, and detecting fraudulent credit card transactions. In the practice of computer science, many of these problems are difficult enough to characterize that programmers can no longer specify a precise method for solving them;³ instead, they train a machine-learning program with examples (say, of fraudulent and genuine card transactions) and let the program figure the rest. In other words, many problems we once viewed as requiring “higher intelligence” are now solved routinely by this repetitive process of showing program positive and negative examples. The program (unlike a human) has the capacity to speedily sift through billions of examples, repeatedly reinforcing its guesses with positive examples and penalizing them on negative examples. This still leaves open the possibility that some characteristics associated with human behavior—such as emotions—are too “irrational” to be modeled by machine learning. The emerging field of affective computing aims to extend machine-learning methods to recognize a person’s emotions by studying pictures of the subject, videos, and other signals: is the subject angry? is he pleased? confused? This opens up a counterpoint: can we train a machine to recognize beauty and aesthetics? It is within our reach to train a machine to recognize whether a picture was drawn by a two-year-old; more usefully, a machine can learn what is salient on a page layout. Can we train it to tell a great painting from a good one? Flipping the question around, can we generate a short summary of rules for songwriters aspiring to write hits (“an introduction in C major with a beat of at least 47 per second, followed by a slower passage in A minor, followed by a reprise”)? Or is it the case that such pithy rules may explain the popularity of many songs, but truly great music springs from not subscribing to any small set of rules?⁴ In summary, machine-learning methods are a valuable part of our toolkit in understanding behavior, but we do not yet understand the precise limits of their applicability.

We conclude by discussing an area that is intrinsically at the nexus of computing and social sciences: that of synthetic social experiences. The first generation of digital experiences was a transcription of real-world experiences into the online world: we played games, we read the news, we exchanged messages, we shopped. Then came a second generation with the emergence of social networking sites such as Twitter, Facebook, and their ilk; these are not transcriptions of physical world activities; people in the physical world were not constantly broadcasting a stream of real-time updates to all around them. Thus, sites such as Twitter and Weibo have generated new forms of communication that did not exist before. We have long recognized that human communication has many dimensions: is it long running or ephemeral? is it transactional and binding, merely informative, or playful? But we did not envision new forms of communication that fed unmet dimensions of human needs, made possible by the online medium. Prior to the advent of Twitter, there was no corresponding activity in the physical world where individuals emitted terse messages (sharing news, media, debates, and arguments) asynchronously. Facebook is described as “a social network,” but there is no simple physicalworld activity that it replaces. Clearly, these experiences would be impossible without the underlying technology (computation, bandwidth, machine learning). But for social scientists, I believe there are two rich families of questions that arise. First, we need analysis that examines why individuals participate (and indeed, spend a significant fraction of their waking hours) in these experiences—experiences they did not know they needed, and did not exist until recently; why did the particular manifestations of Facebook and Twitter and Weibo burgeon, where many others failed? The challenge is to cast the questions into ones that can be validated from aggregate data at the scale of billion users of every background and culture. For instance, one might posit that Twitter flourishes due to a fundamental need for self-promotion in many humans; how would we confirm or deny this, with the tools and data we have? But there is a second, arguably more interesting family of questions that arise. If Twitter and Facebook represent synthetic experiences with no parallel in the physical world, what other needs remain unmet, to be addressed by future synthetic experiences? Could we for instance rethink human communication and interaction to pigeonhole the role of email versus microblogging versus social streams, in a way that tells us where the gaps are—thus paving the way for discovering new synthetic experiences? This brings us back from analysis to synthesis, going from the analysis of prior behavior to the generation of new behavior.

Imaginably, the above discourse could be considered disturbing by many social scientists; it may provoke a reaction to forsake computational methods as being naive, simplistic in merely forming data digests bereft of understanding, irrelevant across cultures, etc. At least some of these criticisms may equally be leveled at many classical studies—digesting (small) data, failure to represent all cultures, etc. But the argument does not have to be between data scale and understanding; the challenge for computational social scientists is to find ways of combining both, of using data to build understanding, and using this understanding to demand more data from careful experimentation. We have the observatory and at least some of the tools; rather than deny their existence or influence, it behooves us to put them to use to attack the exciting challenges before us. Inevitably, new tools beg new questions; this has been a constant in science through the ages. That said, we propose that the biggest contributions before us are not new algorithms or new social theories, but new methodologies for decomposing a hard question in the social sciences into a series of robust analyses that are replicable and composable. In doing so, we believe that we will make progress not only in solidifying our understanding of classical social theories but also in generating new ones that address synthetic experiences whose role in human society will continue to grow.

Footnotes

Acknowledgements

The author thanks Elizabeth Churchill, Alessandro Panconesi, and Duncan Watts for their insightful comments on an early draft of this article.

Notes

References

Granovetter

(1995) Getting a Job: A Study of Contacts and Careers, Chicago: University of Chicago Press.

Travers

Milgram

(1969) An experimental study of the small world problem. Sociometry 32(4): 425–443.