Institutionalizing Big Data methods in social and political research

Abstract

We expect Big Data methods to contribute to research with results that are not inferior to those attained in other ways but possibly better, or hard or impossible to generate in other ways. Those who apply these methods may also aspire to augment the arsenal of research methods, offer surrogates for existing research designs, and re-orient research. Moreover, we can critically examine the institutional, societal and political effects of the Big Data methods and the conditions for the solid institutionalization of these methods in social and political research. To reach its primary objective, this article elaborates conclusions on how Big Data methods, not only by means of their ‘social life’ but also by their ‘political life’, may influence the institutionalization of social and political research. To reach its secondary objective, the article re-examines a study of budgetary legislation in 13 countries carried out by means of Big Data methods to draw conclusions concerning the augmentation of the arsenal of research methods, the surrogation of existing research designs, and the re-orientation of research.

Keywords

Digital social research institutional analysis neo-institutionalism institutional change latent trait scaling topic modeling

Introduction

We expect Big Data methods to contribute to research with results that are not inferior to those attained in other ways but are possibly better, or hard or impossible to generate in other ways. Those who apply these methods may also aspire to use them to augment the available arsenal of research methods, offer surrogates for existing research designs, and re-orient research (Edwards et al., 2013). Moreover, we can critically examine the direct and indirect societal and political effects and conditions of the institutionalization of Big Data methods (Law and Ruppert, 2013).

The primary objective of this article is to elaborate conclusions on how Big Data methods, not only by means of their ‘social life’ (Law and Ruppert, 2013) but also their ‘political life’ (a notion that will be explained in the next section), influence the institutionalization of social research with special reference to political research. To support its pursuit of its primary objective, the article also pursues a secondary objective, comprised of the re-examination of a study of budgetary legislation in 13 countries carried out by means of Big Data methods. The purpose of this re-examination is to elaborate conclusions on the augmentation of the arsenal of research methods, the surrogation of existing research designs, and the re-orientation of research.

The following two sections elaborate the theoretical rationale of this article and reproduce the research hypotheses of the study that the article re-examines. The subsequent section and Appendix 1 introduce the Big Data methods and the research material of the re-examined study. Utilizing the re-examined study as a context, this article examines in its next two sections two methods of Big Data analysis, comprising a method of unsupervised latent trait scaling and a method of topic modeling. The purpose of the last section is to summarize the contributions of this article.

The theoretical rationale of this article

In important respects this article leans on neo-institutional analysis with no fewer than seven to twelve present-day orientations (Lowndes and Roberts, 2013; Peters, 2010). Making its pick from among this multitude, the article follows authors starting from Berger and Luckmann but principally comprising John W Meyer and his colleagues and followers (see Powell and Colyvas, 2008). Berger and Luckmann (1991) understood institutionalization to take place by means of habituation and resulting taken-for-grantedness (p. 72): ‘Institutionalization occurs whenever there is reciprocal typification of habitualized actions by types of actors. Put differently, any such typification is an institution.’

This article shares the assumption that not only human beings but also artifacts and therefore also Big Data methods and Big Data itself play active roles in research and other institutionalized action (generally see, for instance, Cecez-Kecmanovic et al., 2014; D’Adderio, 2011; Latour, 2005). The article fixes its attention to institutionalization related to Big Data methods in an analogous although not identical way to Law and Ruppert (2013), who elaborate the examination of the ‘social life’ of methods by means of characterizing what they call the relevant ‘patterned teleological arrangements’. Moreover, this article agrees with boyd and Crawford (2012) that mythologies of institutionalization in Big Data contexts need study, and therefore pays attention to the examination of ‘rationalized myths’ of Big Data, which rather than fulfilling common explicit promises to contribute to rationality support institutional legitimation (Meyer and Rowan, 1977).

This article not only seeks to examine the habitually institutionalized social life of Big Data methods but also the ‘political life’ of disruptions and radical transformations related to these methods. The article shares the understanding of politics of Pocock (1975: 156) as the ‘art of dealing with the contingent event,… with pure, uncontrolled, and unlegitimated contingency’, and also fixes its attention to the role of performativity (Austin, 1975) that catalyzes the disruptions and transformations indicated. We find analogous arguments on performativity in political theory (Skinner, 2009), science and technology studies (D’Adderio, 2008, 2011; Latour, 2005), and organization research (D’Adderio, 2014; Deroy and Clegg, 2015; Maguire and Hardy, 2009). The authors indicated locate disruptive, radical institutional change in situations in which contentious actors, finding that unprecedented opportunities have opened up, mobilize performatives by means of which these actors may successfully de-legitimate the incumbent actors and those performatives that have helped entitle the latter actors to their positions.

The rationale of this article also pushes it to examine chances to augment the arsenal of existing methods, offer surrogates for known research designs, and more generally re-orient research (see Edwards et al., 2013). A common motivation to use Big Data methods is to try to reap economies of scale in managing large datasets, data dredging, and implementing research designs. Moreover, Big Data methods may lead researchers to transcend boundaries between research fields and re-orient research in other ways (Ruppert, 2013).

The hypotheses proposed in the study that this article re-examines

Big Data methods can be used either in exploratory analysis (O’Neil and Schutt, 2013) or explanatory research, the latter being the case in the study that this article re-examines. The empirical subject matter of the study indicated comprised a ubiquitous trite practice, namely government budgeting. However, the objectives of the study that this article re-examines had pushed it away from entrenched research on budgetary governance (de Haan et al., 2013; Hallerberg et al., 2007) or adaptations of generic research on public sector reform adapted to examine government budgeting (see, for instance, Hyndman et al., 2013). Instead, the study indicated examined historical legal traditions as possible influences on government budgeting.

The study that this article re-examines started with observations that despite indications that legal traditions are dead (Lindahl and Schadewitz, 2013; Pargendler, 2012), research acknowledging these traditions continues (Ma, 2012; Painter and Peters, 2010). Moreover, heterogeneity between national systems of government budgeting despite decades of global harmonization called for explanation (International Monetary Fund (IMF), 2013; Jones et al., 2013; Lienert, 2013; OECD, 2005; Wanna et al., 2010). The indicated study proposed hypotheses that legal system traditions indeed explain differences in government budgetary legislation according to entrenched historical, social, cultural, and political divisions in common law with British origins, Napoleonic civil law first codified in France, civil law of the German type, and Nordic law resembling German law in many respects (Glenn, 2010; Zweigert and Kötz, 1998). The study that this article re-examines approached differences between the legal systems taking the less common inroad of examining differences in legal language (Brake and Katzenstein, 2013; Kischel, 2009).

The first three hypotheses proposed in the study that this article re-examines were tested by means of a Big Data method of unsupervised latent trait scaling. The first two of these three hypotheses were:

Hypothesis 1: A grand division into common law legal systems and civil law legal systems explains differences in government budgetary legislation.

Hypothesis 2: Within countries of civil law traditions, finer divisions into Napoleonic, German and Nordic legal systems explain differences in government budgetary legislation.

The longer-term harmonization efforts of government budgeting indicate countries with common law traditions as the foremost present-day global ideal models (IMF, 2013; Wanna et al., 2010). A third hypothesis was proposed:

Hypothesis 3: The older the budgetary legislation is in a civil law country, the wider its differences from budgetary legislation in common law countries.

The last two hypotheses of the study that this article re-examines were tested by means of a Big Data method of topic modeling. These hypotheses were:

Hypothesis 4: Divisions into legal system traditions explain divisions into topics in government budgetary legislation.

Hypothesis 5: Divisions within government budgetary legislation according to different topics explain differences in the vocabularies of these topics in the legislation indicated.

Research methods and the research material in the study that this article re-examines

Methods

The foremost methods of the study that this article re-examines comprised a method of unsupervised latent trait scaling and a method of topic modeling. A brief background characterization follows, and Appendix 1 gives more details.

Latent trait scaling first appeared in social research of the political science variety in the shape of supervised latent scaling (Laver et al., 2002), which requires the researcher first to feed a ‘seed text’ and next utilize machine learning algorithms to do the scaling. Later, unsupervised scaling evolved (Proksch and Slapin, 2008), using machine learning algorithms throughout the research process. First and foremost, both two scaling methods have been applied to examine ideological polarization in politics by means of using such Big Data textual materials as parliamentary speeches, political statements of government ministers, or political party programs (Lowe and Benoit, 2013).

Topic modeling found its way into social research at about the same time as unsupervised latent trait scaling evolved within the political science variety of this research (Grimmer and Stewart, 2013). Rather than within political science research (Clark and Lauderdale, 2010), this method has found applications in sociological research (see, for instance, DiMaggio et al., 2013; Fligstein et al., 2014; Levy and Franklin, 2013).

Research material

The research material in the study that this article re-examines comprised legislation on federal or national government budgeting in the stable and highly developed Western democracies of Australia, Austria, Canada, Estonia, Finland, France, Iceland, Italy, the Netherlands, New Zealand, Sweden, Spain, and the United Kingdom (Table 1). It would have been both relevant and interesting to examine the United States, Germany, Switzerland, and Belgium, but these countries were omitted because of their complex federal structures. The corpus examined had 13 documents with a total of 270,000 words.

Table 1.

The text material of the study that this article re-examines.

Country	Legal tradition	Titles of legal texts examined, in English	Latest upgrade	Words in English
Australia	Common law	Financial Management and Accountability Act 1997	2013	12,503
Austria	German	Federal Organic Budget Act 2013	2013	40,151
Canada	Common law	Financial Administration Act 1985	2013	37,994
Estonia	German	Law on State Budget 1999	NA	4,659
Finland	Nordic	1988 Budget Act, 1992 Budget Decree	2007	21,494
France	Napoleonic	The Organic Law on Budget Laws 2001	NA	8,946
Iceland	Nordic	The Government Financial Reporting Act 1997	NA	4,690
Italy	Napoleonic	Law of Accounting and Public Finance 2010	2013	23,306
The Netherlands	Napoleonic, German	Government Accounts Act 2001	2005	15,332
New Zealand	Common law	Public Finance Act 1989	2013	47,378
Spain	Napoleonic	General Budgetary Law 2001	NA	43,711
Sweden	Nordic	Budget Act 2011	2011	4,290
United Kingdom	Common law	Exchequer and Audit Departments Act 1866, Charter of Budget Responsibility 2011	2013	3,847

Notes: Column 4 gives the year of the latest legislative upgrade available in the research material of this article, and column 5 indicates the number of words in the legislation examined before the text pruning preceding the analysis proper in the study this article re-examines.

Most of the materials were derived from a World Bank (2013) website with legal texts in English or English translation. Two texts derived were from governmental websites, in English translation (Austria, 2013) or in Italian (Italy, 2013), of which the latter was translated into English. Applying a common and allowable practice of Big Data research for the UK and Finland, two separate legal documents were merged into a single text.

Examining augmentation, surrogation, and re-orientation in research by means of Big Data methods

Examining unsupervised latent trait scaling

The method of unsupervised latent trait scaling used in the study that this article re-examines (for technical details see Appendix 1) enables delivering estimating and testing results on what the developers of this method have named the fixed effects related to individual words (θ, theta), the fixed effects related to each different document included in the data (α, alpha), the word weights (β, beta), and the positions taken in the documents (ω, omega). The notion of ‘fixed effect’ is generic to many varieties of quantitative analysis, indicating that a variable is treated for the technical purposes of quantitative analysis as if it were non-random. All that is specific to estimating fixed effects by means of latent trait scaling derives from the character of this method as a Big Data method of textual analysis. The results obtained by means of using this method are typically uninteresting as concerns the fixed effects, the calculation of which therefore plays only a subordinate technical role to enable the calculation of the word weights and the positions taken in the documents. Moreover, the calculation of the fixed effects may essentially enhance the possibilities graphically to display the estimation of the word weights, the positions taken in the documents, or both the weights and the positions.

Words best discriminating between the documents examined and therefore between the countries from which these documents derive situated themselves at the extreme ends of β values, as is generally expected in applications of the unsupervised latent trait scaling method. This pinpoints an asset of this method: it emphasizes rare as opposed to frequent words discriminating between text documents. This very characteristic also eliminates the confounding effects of the different lengths of the texts examined. However, the assets of the unsupervised latent trait scaling examined must be weighed against its less outstanding characteristics, which included a graphical output plot with no fewer than 2992 different words according to their weights (β), or far too many either to present meaningfully or to make subject to a sensible substantive interpretation. In their turn, the α (alpha) fixed effects, related to documents and the countries behind these documents, are of no interest for the model interpretation, but the dimension they represent usefully takes visually apart the positions taken in the documents, or the important ω (omega) values (Figure 1).

Figure 1.

Position estimates of countries in unsupervised latent trait scaling of the study that this article re-examines. Note: The x-axis indicates the estimates for each document’s (and country’s) position, ω (omega), and the y-axis indicates the fixed effects related to each document (and country), α (alpha).

As proposed in Hypothesis 1 of the study that this article re-examines, a grand division indeed prevails between countries with common law traditions and countries with civil law traditions. Hypothesis 2 could also be sustained, although the countries examined do not arrange themselves quite neatly into the Napoleonic, German, and Nordic variants of civil law. Certain countries sharing legal system traditions received resembling ω (omega) values, such as Spain and Italy, or Sweden and Finland, but there were exceptions such as Iceland more resembling Spain and Italy than the two other Nordic countries of Sweden and Finland. Moreover, Austria, one of the two representatives of the German tradition in the analysis, received a resembling ω (omega) value with Sweden and Finland. The Netherlands, with its Roman–Napoleonic–German legal characteristics (Glenn, 2010), comprised a unique case as could be expected. The fact that budgetary legislation was old at the time of the investigation both in Estonia and Iceland suggested the acceptance of Hypothesis 3 on differences between civil law countries with older as opposed to more recently reformed budgetary legislation.

In the study that this article re-examines, statistical credibility intervals were calculated for the ω (omega) values of each document and, indirectly, for the country which each document represents (Table 2). According to the results, certain documents and countries were similar enough to receive overlapping credibility intervals with the intervals for other documents and countries. Australia and New Zealand did not differ statistically from each other, nor did Austria, Finland and Iceland, or Estonia and Spain. The other countries, Canada, Italy, France, the Netherlands, Sweden, and the United Kingdom, were empirically more unique.

Table 2.

Test results of unsupervised latent trait scaling in the study that this article re-examines.

Countries	Indicator of position taken in documents representing the country, ω	95% Lower credibility interval of ω	95% Upper credibility interval of ω
Australia	1.4401	1.4002	1.4600
Austria	−0.4418	−0.4493	−0.4043
Canada	1.8086	1.7689	1.8227
Estonia	−0.9689	−1.0135	−0.9469
Finland	−0.4544	−0.4684	−0.4097
France	−0.7192	−0.7476	−0.6848
Iceland	−0.5077	−0.5420	−0.4436
Italy	−1.0599	−1.1131	−1.0666
The Netherlands	0.0644	0.0412	0.1218
New Zealand	1.4748	1.4423	1.4895
Sweden	−0.1680	−0.0422	−0.1220
Spain	−0.9953	−1.0317	−0.9922
United Kingdom	0.4814	0.4300	0.5579

Examining topic modeling

The topic modeling program library used in the study that this article re-examines would have set no technical limits to estimating high numbers of topics, but Hypotheses 4 and 5 and their background argumentation indicated a maximum of four topics that could be given a substantive interpretation. However, a German topic and a Nordic topic did not evolve separately in the four-topic estimation, although this is understandable given the closeness of the two civil law varieties. Only a three-topic solution is presented (Table 3).

Table 3.

Characteristics of a three-topic model by words in the study that this article re-examines.

Words	Topic 1, Nordic and German law	Topic 2, Common law	Topic 3, Napoleonic law
1	‘shall’	‘minister’	‘article’
2	‘budget’	‘may’	‘state’
3	‘government’	‘crown’	‘shall’
4	‘federal’	‘act’	‘budget’
5	‘section’	‘corporation’	‘public’
6	‘finance’	‘section’	‘law’
7	‘management’	‘financial’	‘general’
8	‘act’	‘public’	‘finance’
9	‘minister’	‘must’	‘year’
10	‘year’	‘money’	‘accounts’
11	‘accounting’	‘person’	‘expenditure’
12	‘may’	‘treasury’	‘may’
13	‘financial’	‘report’	‘financial’
14	‘statement’	‘shall’	‘paragraph’
15	‘accounts’	‘department’	‘entities’
16	‘ministry’	‘appropriation’	‘referred’
17	‘expenditures’	‘year’	‘accounting’
18	‘expenditure’	‘board’	‘ministry’
19	‘fiscal’	‘parliament’	‘following’
20	‘state’	‘subsection’	‘treasury’
21	‘audit’	‘council’	‘provisions’
22	‘central’	‘fiscal’	‘revenue’
23	‘referred’	‘information’	‘said’
24	‘cash’	‘respect’	‘provided’
25	‘information’	‘regulations’	‘credits’
26	‘account’	‘amount’	‘auditing’
27	‘assets’	‘governor’	‘account’
28	‘line’	‘means’	‘report’
29	‘report’	‘made’	‘sector’
30	‘provisions’	‘general’	‘social’

Note: The table indicates words in a descending order of probability that these words belong to a given topic. Only 30 words receiving the highest posterior probabilities in each topic are included in this table. Referring to the article text related to this table, the italicized words comprise examples of content words of budgetary governance in the first topic, words referring to institutionalization in the second topic, and accountability words in the third topic.

The topic modeling results (Table 3) allowed Hypothesis 4 to be sustained, revealing three topics: a joint topic of ‘German and Nordic law’, a topic of ‘common law’, and a topic of ‘Napoleonic law’. As is commonplace in topic modeling, many words appear in two or all three topics. However, some of these words only illustrate that the corpus comprised legal texts (for instance, ‘law’, ‘must’, ‘shall’, and ‘may’), or indicated references and cross-references common in legal texts (for instance, ‘section’, ‘article’, or ‘paragraph’).

Looking at words of substance that characterize certain topics rather than others, Hypothesis 5 on differences between the legal vocabularies of different legal systems could be sustained (Table 3). Words dealing with the contents of budgeting can be observed in the first topic of ‘German and Nordic law’, such as ‘budget’, ‘finance’, ‘financial’, ‘expenditure’ (both in the singular and the plural), ‘cash’, ‘assets’, and ‘provisions’. In the ‘common law’, topic words were found to characterize the institutionalization of budgetary governance with special reference to the key actors: ‘minister’, ‘crown’, ‘corporation’, ‘treasury’, ‘board’, ‘parliament’, ‘council’, and ‘governor’. Finally, words characterizing accountability received emphasis on the third, ‘Napoleonic’ topic: ‘accounts’, ‘accounting’, ‘report’, and ‘auditing’. Unfortunately, the relationships between different regimes of budgetary governance and different textual frames of budgetary legislation were too under-researched to receive an elaboration proper in the study that this article re-examines, and have to await future study.

In the study that this article re-examines, the distribution of the three topics in the documents examined and, respectively, in the countries that these documents represent gave further support to Hypothesis 5 (Table 4). In the Austrian document, only the German-Nordic topic was present. This was almost the case in the Swedish and Finnish documents, whereas the weight of this same topic in the Icelandic document was about two-thirds. In Estonia, despite its predominantly German legal heritage (Glenn, 2010), more than two-thirds of the vocabulary examined represented the ‘Napoleonic law’ topic. Only future studies can examine the reasons for this aberration.

Table 4.

Distribution of the three topics by countries in the study that this article re-examines.

Countries	Topic 1, Nordic and German law	Topic 2, Common law	Topic 3, Napoleonic law
Australia	0.0000	1.0000	0.0000
Austria	1.0000	0.0000	0.0000
Canada	0.0000	1.0000	0.0000
Estonia	0.3078	0.0000	0.6921
Finland	0.9426	0.0000	0.0573
France	0.2109	0.0000	0.7890
Iceland	0.6364	0.0358	0.3278
Italy	0.0000	0.0000	1.0000
The Netherlands	0.8176	0.1258	0.0566
New Zealand	0.0000	1.0000	0.0000
Sweden	0.9450	0.0000	0.0550
Spain	0.0000	0.0000	1.0000
United Kingdom	0.2815	0.5522	0.1663

Note: The figures in the table indicate the distribution of topics in the document or documents representing each country according to the latent Dirichlet allocation (LDA) topic model that was estimated in the study that this article re-examines.

The countries in which only the second, ‘common law’ topic was present comprised Canada, New Zealand, and Australia, and this topic also predominated in the United Kingdom with a weight over half of the total. According to the estimation results, in Spain and Italy only the third, ‘Napoleonic law’ topic was present. France, in its turn, represented itself rather as a hybrid between the Napoleonic topic and the German-Nordic topic, possibly reflecting the historical influence of German legal traditions in northern France and the characteristics of the Code Napoleon as a hybrid between the law of southern France and northern France (Glenn, 2010).

In documents representing six countries (Australia, Austria, Canada, Italy, New Zealand, and Spain) only one topic evolved, which was almost the case in Finland and Sweden. In Estonia, France, and Iceland a dominant topic and a minor topic could be found, and in the United Kingdom and the Netherlands the study that this article re-examines revealed a dominant topic and two minor topics.

Baseline comparisons concerning the Big Data methods that this article examines

Baseline comparisons have been carried out before between traditional methods and the unsupervised scaling method that the study that this article re-examines utilized (see, for instance, Grün and Hornik, 2013; Lowe and Benoit, 2013). The performative struggles between researchers using traditional methods (see, for instance, Biernacki, 2014; Budge, 2013) on the one hand, and researchers using Big Data methods (DiMaggio et al., 2013; Laver et al., 2002; Proksch and Slapin, 2008) on the other, also offer lessons for baseline comparisons.

The baseline comparisons in this article receive support from long-time researcher familiarity with both quantitative research and semiotic and rhetorical textual analysis. During the preparation of an international refereed article that went into press in 2014, extending a three-country study (Hyndman et al., 2013) to cover a fourth country, the laborious material collection, the tedious and error-prone manual coding and dissatisfaction with the analysis of word frequencies made methods of Big Data analysis attractive (for the methods actually chosen, see Appendix 1). Given this modest baseline, the study that this article examines, once completed, represented a substantial improvement.

While preparing the study indicated above, the path led to the best-evolved Big Data methods to examine ideological polarization, comprised of methods of latent trait scaling. An additional interest to learn what unsupervised rather than supervised scaling would deliver made the Wordfish program (Proksch and Slapin, 2009) the choice. The study that this article re-examines was driven towards topic modeling by two additional forces. The former of these comprised researcher familiarity with the classical rhetorical examination of (Aristotle, 2006) and the resulting curiosity to learn about the performance of a Big Data method that steps forward as an inheritor of the classical tradition. The latter driving force was a technical interest in using two methods of Big Data analysis as baselines in respect to each other in examining the same research material.

Conclusions and discussion

Augmentation, surrogation, and re-orientation of research

From among the two types of Big Data methods that this article has examined, latent trait scaling has evolved with the explicit purpose of augmenting (Edwards et al., 2013) earlier methods. The authors of both supervised latent trait scaling (Laver et al., 2002) and unsupervised latent scaling (Proksch and Slapin, 2008) have been critical towards the conventional methods applied in a minor international research project started in the mid-1980s to examine political preferences of parties since 1945 finally in more than 50 countries (MARPOR 2015). Although sensitizing MARPOR representatives towards vulnerability in their methods, the altercations have ended in stalemate (Budge, 2013; Volkens et al., 2013), and comparative examinations of the merits of the traditional and Big Data methods (see, for instance, Lowe and Benoit, 2013) have failed to resolve the fundamental disputes.

Big Data methods of topic modeling originate from within hybrids of computer science and statistics rather than from within social research. Topic modeling explicitly augments traditional latent variable modeling initiated with Karl R Pearson’s principal component analysis at the beginning of the 20th century and continued with such methods as LL Thurstone’s factor analysis in the 1930s (Blei, 2014; Grimmer and Stewart, 2013). However, researchers with classical humanist inclinations may want to critically consider how much topic modeling actually augments the 2500-year rhetorical tradition of examining topics or other procedures of classical humanist textual interpretation (see, for instance, Biernacki, 2014).

The results of this article suggest agreement with Edwards et al. (2013) that Big Data methods may not easily provide surrogates for conventional research designs of social research despite the fact that such advances are not ruled out (see, for instance, Hale et al., 2014; Nickerson and Rogers, 2014). In the study that this article has re-examined, the research design was cross-sectional and comparative by modest default rather than by explicit design. A possible future step forward would lead to longitudinal Big Data research of texts of budgetary legislation in a number of countries (for analogous examples of longitudinal Big Data analysis, see, for instance, DiMaggio et al., 2013; Grimmer, 2010; Proksch and Slapin, 2008).

Edwards et al. (2013) indicate that the re-orientation of research by means of Big Data methods may start from where their mere augmentation of conventional methods stops. The empirical results on government budgetary legislation in the study that this article has re-examined suggest that latent trait scaling has potential in empirical research over and above the study of the ideological dimensions of politics (Laver et al., 2002; Lowe and Benoit, 2013; Proksch and Slapin, 2008). More specifically, in the study indicated, a specific method of unsupervised latent trait scaling was used to discern latent traits of legal traditions in government budgetary legislation in 13 countries by means of examining the texts of this legislation. Big Data research has been evolving within legal research proper (see, for instance, Hildebrandt, 2012; Surden, 2014), but the study that this article has examined indicates re-orientation within political science research on government budgeting and legal policy-making rather than legal studies.

The foremost contribution of the topic modeling application in the study that this article has re-examined was confined to social research of the political science variety. This is possibly a contribution in itself, as topic modeling has been applied within political science research comparatively rarely thus far (but, see, for instance, Clark and Lauderdale, 2010), whereas it has been substantially more common within sociological research (DiMaggio et al., 2013; Fligstein et al., 2014; Levy and Franklin, 2013; more generally, see also Bail, 2014).

The social and political life of Big Data methods influencing the institutionalization of research

As indicated at the beginning, as its foremost objective this article examines the ‘social life’ and the ‘political life’ of Big Data methods as influences on the institutionalization of research. The social structures, the organization and the identity of Big Data research, and its researchers within the social and political sciences were still weak rather than entrenched in most countries and most research fields in the mid-2010s. At the same time, mainstreaming of Big Data methods was still advancing within statistics proper rather than having been fully accomplished (see, for instance, EMC, 2015). Genuine innovations in social and political research concerning Big Data methods have been relatively few thus far although not nonexistent (see, however, Grimmer and Stewart, 2013; Laver et al., 2002; Proksch and Slapin, 2008). However, the Big Data activism of distinguished social and political scientists (see, for instance, DiMaggio et al., 2013; Fligstein et al., 2014; Laver et al., 2002) may be changing the situation.

According to one scenario, social and political research utilizing Big Data methods will institutionalize itself, that is, it will become habitual and taken-for-granted in social and political research. However, should this institutionalization take place, it would also generate its ‘rationalized myths’ (Meyer and Rowan, 1977) with the exaggeration of the merits and contributions of these methods and formal rather than substantive commitment to them. We would also witness not only the enhancement of the rationality of the core analytic processes of research by means of the Big Data methods, but also the strengthening of the external legitimation of institutions of research by the same means.

For a social researcher in political science, particularly interesting characteristics of the political life of the Big Data methods include the performative roles that these methods and their applications may play in disrupting the achieved institutionalization of research and, possibly, in catalyzing institutional transformations. This article has indicated two specific performative struggles. The first one of these struggles has revolved around a long-term international project examining ideological political polarization (MARPOR, 2015), and has been waged between those who defend traditional methods (Budge, 2013; Volkens et al., 2013) on the one hand, and those who apply Big Data methods (Laver et al., 2002; Proksch and Slapin, 2008) on the other. As indicated above, proposals of appeasement (Lowe and Benoit, 2013) have reaped little success so far. The other performative struggle, which to this date has been milder, has been waged between those who apply Big Data methods including topic modeling (DiMaggio et al., 2013; Fligstein et al., 2014) on the one hand, and those who prioritize methods derived from classical humanistic traditions (Biernacki, 2014) on the other.

In the mid-2010s, time has generally been working to the advantage of those social and political researchers who apply or promote Big Data methods, as observations accumulate that in many countries public sector and private sector funding authorities have placed heavy emphasis upon Big Data research projects. Exhibiting a ‘vivid awareness of the relationship between personal experience and the wider society’ better known as ‘sociological imagination’ (Mills, 1959: 3), these developments have lately become keenly acute not only for numerous global colleagues but also for the research team from among whose products this article is one. Observations of opportunities intermingle with experiences of smaller or larger moments of success in attracting external research funding and having refereed article manuscripts accepted in international journals, winning variable success in mainstreaming Big Data research and related teaching in academia, and disappointments insofar as funding applications or article manuscripts receive rejections. What else do the current characteristics of the ‘political life’ of Big Data methods and its researchers represent than politics in the very sense of Pocock (1975: 156) as the ‘art of dealing with the contingent event… with pure, uncontrolled, and unlegitimated contingency’?

Footnotes

Declaration of conflicting interests

The author declares that there is no conflict of interest.

Funding

This study was supported by Koneen Säätiö/Kone Foundation, Helsinki, Finland, project grant.

Appendix 1

References

Aristotle (2006) Art of Rhetoric, Reprint of 1926 edition. Cambridge: Harvard University Press.

Austin

(1975 [1962]) How to do Things with Words Vol. 1, Oxford: Clarendon.

Austria (2013) Federal organic budget act of 2013. Available at: http://www.ris.bka.gv.at/Dokumente/Erv/ERV_2009_1_139/ERV_2009_1_139.pdf (accessed 15 October 2013).

Bail

(2014) The cultural environment: Measuring culture with big data. Theory and Society 43(3–4): 465–482.

Berger

Luckmann

(1991 [1966]) The Social Construction of Reality, Garden City, NJ: Anchor Books.

Biernacki

(2014) Humanist interpretation versus coding text samples. Qualitative Sociology 37(2): 173–188.

Blei

(2012) Probabilistic topic models. Communications of the ACM 55(4): 77–84.

Blei

(2014) Build, compute, critique, repeat: Data analysis with latent variable models. Annual Review of Statistics and Its Applications 1: 203–232.

Blei

Jordan

(2003) Latent Dirichlet allocation. Journal of Machine Learning Research 2: 993–1022.

10.

boyd

Crawford

(2012) Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679.

11.

Brake

Katzenstein

(2013) Lost in translation? Nonstate actors and the transnational movement in procedural law. International Organization 64(4): 725–757.

12.

Budge I (2013) The ‘paradox of the manifestos’ – satisfied users, critical methodologists. Available at: https://manifesto-project.wzb.eu/down/papers/budge_paradox.pdf (accessed 31 January 2015).

13.

Cecez-Kecmanovic

Galliers

Henfridsson

(2014) The sociomateriality of information systems: Current status, future directions. MIS Quarterly 38(3): 809–830.

14.

Clark

Lauderdale

(2010) Locating Supreme Court opinion in doctrine space. American Journal of Political Science 20(3): 329–350.

15.

D’Adderio

(2008) The performativity of routines: Theorizing the influence of artefacts and distributed agencies on routines dynamics. Research Policy 37(5): 769–789.

16.

D’Adderio

(2011) Artifacts at the centre of routines: Performing the material turn in routines theory. Journal of Institutional Economics 7(2): 197–230.

17.

D’Adderio L (2014) Performing modularity: Competing rules, performative struggles and the effect of organizational theories on the organization. Organization Studies. Available at: http:/www.doi.org/10.1177/0170840614538962 (accessed 3 February 2015).

18.

Deroy

Clegg

(2015) Back in the USSR: Introducing recursive contingency into institutional theory. Organization Studies 36(1): 73–90.

19.

de Haan

Jong-A-Pin

Mierau

(2013) Do budgetary institutions mitigate the common pool problem? New empirical evidence for the EU. Public Choice 156(3–4): 423–441.

20.

DiMaggio

Nag

Blei

(2013) Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics 41(6): 570–606.

21.

Edwards

Housley

Williams

(2013) Digital social research, social media and the sociological imagination: Surrogacy, augmentation and re-orientation. International Journal of Social Research Methodology 16(3): 245–260.

22.

EMC (2015) Data Science and Big Data Analytics: Discovering, Visualizing and Presenting Data, Indianapolis, IN: EMC Education Services.

23.

Feinerer I (2013) tm – text mining. Available at: http://tm.r-forge.r-project.org (accessed 15 October 2013).

24.

Fligstein N, Brundage JS and Schultz M (2014) Why the Federal Reserve failed to see the financial crisis of 2008: The role of ‘macroeconomics’ as sense-making and cultural frame. Available at: irle.berkeley.edu (accessed 31 January 2015).

25.

Glenn

(2010) Legal Traditions in the World: Sustainable Diversity in Law, 4th ed. Oxford: Oxford University Press.

26.

Greer

(2011) Reporting results to a skeptical audience: A case study on incorporating persuasive strategies in assessment reports. The American Review of Public Administration 41(5): 577–591.

27.

Grimmer

(2010) A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases. Political Analysis 18(1): 1–35.

28.

Grimmer

Stewart

(2013) Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21(3): 267–297.

29.

Grün B and Hornik K (2013) Topicmodel: topic models. Available at: http://cran.r-project.org/web/packages/topicmodels/index.html (accessed 15 November 2013).

30.

Hale SA, John P, Margetts HZ, et al. (2014) Investigating political participation and social information using big data and a natural experiment. Available at: http://ssrn.com/abstract=2454570 (accessed 1 February 2015).

31.

Hallerberg

Strauch

von Hagen

(2007) The design of fiscal rules and forms of governance in European Union countries. European Journal of Political Economy 23(2): 338–359.

32.

Hildebrandt

(2012) The meaning and mining of legal texts. In: Berry

(ed.) Understanding Digital Humanities, Houndmills: Palgrave Macmillan, pp. 145–160.

33.

Hyndman

Liguori

Meyer

(2013) The translation and sedimentation of accounting reforms: A comparison of the UK, Austrian and Italian experiences. Critical Perspectives on Accounting 25(4): 388–408.

34.

International Monetary Fund (IMF) (2013) Public Financial Management and its Emerging Architecture, Washington, DC: IMF.

35.

Italy (2013) La legge di contabilità e finanza pubblica. Law of accounting and public finance (in Italian). Available at: http://www.rgs.mef.gov.it/VERSIONE-I/Servizio-s/Note-brevi/La-legge-d/ (accessed 20 November 2013).

36.

Jones

Lande

Lüder

(2013) A comparison of budgeting and accounting reforms in the national governments of France, Germany, the UK and the US. Financial Accountability & Management 29(4): 419–441.

37.

Kischel

(2009) Legal cultures – legal languages. In: Olsen

Lotz

Stein

(eds) Translation Issues in Language and Law, Basingstoke: Palgrave Macmillan, pp. 7–17.

38.

Latour

(2005) Reassembling the Social: An Introduction into Actor-network Theory, Oxford: Oxford University Press.

39.

Laver

Benoit

Garry

(2002) Extracting policy positions from political texts using words as data. American Political Science Review 97(2): 311–332.

40.

Law

Ruppert

(2013) The social life of methods: Devices. Journal of Cultural Economy 6(4): 1–12.

41.

Levy

KEC

Franklin

(2013) Driving regulation: Using topic modeling to examine political contention in the U.S. trucking industry. Social Science Computer Review 32(2): 182–194.

42.

Lienert (2013) The legal framework for public finances and budget systems. In: Allen

Hemming

Potter

(eds) The International Handbook of Public Financial Management, Houndmills: Palgrave Macmillan, pp. 68–83.

43.

Lindahl

Schadewitz

(2013) Are legal families related to financial reporting quality? Abacus 49(2): 242–267.

44.

Lowe W (2011) Austin: do things with words. R package version 0.2. Available at: https://r-forge.r-project.org/projects/austin/ (accessed 20 November 2013).

45.

Lowe

Benoit

(2013) Validating estimates of latent traits from textual data using human judgment as benchmark. Political Analysis 21(3): 298–313.

46.

Lowndes

Roberts

(2013) Why Institutions Matter: The New Institutionalism in Political Science, Houndmills: Palgrave Macmillan.

47.

(2012) Legal tradition and antitrust effectiveness. Empirical Economics 43(3): 1263–1297.

48.

Maguire

Hardy

(2009) Discourse and deinstitutionalization. Academy of Management Journal 52(1): 148–178.

49.

MARPOR (2015) Manifesto project database. Available at: https://manifesto-project.wzb.eu/information/information (accessed 31 January 2015).

50.

Meyer

Rowan

(1977) Institutional organizations: Formal structure as myth and ceremony. American Journal of Sociology 83(2): 340–363.

51.

Mills

(1959) The Sociological Imagination, London: Oxford University Press.

52.

Nickerson

Rogers

(2014) Political campaigns and big data. Journal of Economic Perspectives 28(2): 51–74.

53.

OECD (2005) The legal framework for budget systems: An international comparison. OECD Journal on Budgeting 4. (1) (entire special issue).

54.

O’Neil

Schutt

(2013) Doing Data Science: Straight Talk from the Front Line, Sebastopol, CA: O’Reilly.

55.

Painter

Peters

(2010) Tradition and Public Administration, Basingstoke: Palgrave Macmillan.

56.

Pargendler

(2012) The rise and decline of legal families. American Journal of Comparative Law 60(4): 1043–1074.

57.

Peters

(2011) Institutional Theory in Political Science: The ‘New Institutionalism’, 3rd ed. London: Continuum.

58.

Pocock

JGA

(1975) The Machiavellian Moment: Florentine Political Thought and the Atlantic Republican Tradition, Princeton, NJ: Princeton University Press.

59.

Powell

Colyvas

(2008) Microfoundations of institutional theory. In: Greenwood

Oliver

Suddaby

(eds) The Sage Handbook of Organizational Institutionalism, London: Sage, pp. 276–298.

60.

Proksch

S-O

Slapin

(2008) A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705–722.

61.

Proksch S-O and Slapin JB (2009) Wordfish manual version 1.3. Available at: http://www.wordfish.org (accessed 10 January 2013).

62.

Ruppert

(2013) Rethinking empirical social sciences. Dialogues in Human Geography 3(3): 268–273.

63.

Skinner

(2009) Visions of Politics, Vol. I, Regarding Method, Cambridge: Cambridge University Press.

64.

Surden

(2014) Machine learning and law. Washington Law Review 89(1): 87–115.

65.

Volkens

Bara

Budge

(2013) Mapping Policy Preferences from Texts: Statistics Solutions for Manifesto Analysts, Oxford: Oxford University Press.

66.

Walter

Uhr

(2013) Budget talk: Rhetorical constraints and contests. Australian Journal of Political Science 48(4): 431–444.

67.

Wanna

Jensen

de Vries

(2010) The Reality of Budgetary Reform in OECD Nations: Trajectories and Consequences, Basingstoke: Edward Elgar.

68.

World Bank (2013) World Bank – IMF country budget law database. Available at: http://web.worldbank.org (accessed 10 October 2013).

69.

Zweigert

Kötz

(1998) An Introduction to Comparative Law, Oxford: Oxford University Press.