Sage Journals: Discover world-class research

Abstract

In biology—as in other scientific fields—there is a lively opposition between big and small science projects. In this commentary, I try to contextualize this opposition in the field of biomedicine, and I argue that, at least in this context, big science projects should come first.

Keywords

Big Data biology big science small science biomedicine The Cancer Genome Atlas scientific progress

In 2012, Bruce Alberts wrote an editorial on Science about the conflicts between small and big science in biology. His reflections stemmed from the publication of 30 articles by the ENCODE Project Consortium on Nature. ENCODE, Alberts said, is an instance of big-science style of research in biology, characterized by a Big Data production. Since such projects require large-scale investment of money, and since we are living in times of very tight resources, Alberts posed important challenges: Does the existence of such large-scale projects undermine the survival of small-science biological projects? Which projects are more likely to promote the progress of biology?

Such questions are rhetorical if analyzed in light of Alberts’ implicit assumptions. First, big science projects in biology provide catalogues of biological objects. Second, progress in biology is measured by the understanding of tiny details of biological phenomena and the causal chains producing them. Since small molecular biology promotes exactly that kind of understanding of biological phenomena and their molecular details, and since what remains to be understood in biology is most of all biochemical tiny details, then it follows that small science is still required and that “the grand challenges that remain in attaining a deep understanding of the chemistry of life will require going beyond detailed catalogs” (Alberts, 2012: 1583). Thus, Alberts is claiming for the superiority of small science over big science in biology.

In this contribution, I want to put forth my two cents on the grand topic of ‘small science versus big science’ in biology, by claiming that big science projects should prevail. However this is not just the opposite of Alberts’ verdict. The way I will motivate my claim shows that the dichotomy ‘big science versus small science’ he holds is misleading because it does not consider the importance of contexts in the process of biological discoveries. Therefore my claim is deeply contextualized, in particular within the field of biomedical research. Drawing from the examples of big science projects in biomedicine (e.g. The Cancer Genome Atlas (TCGA), HapMap or genome-wide association studies), I argue that big-science style of research can, at least in biomedical research, better promote scientific progress contextual to this field and, derivatively, also in Alberts’ sense. Big science can potentially set the research agenda of small biology in a more efficient way and should therefore be put at the top of a rationally reconstructed epistemic hierarchy that frames a peculiar division of labor. This claim is worth of attention for a variety of reasons, but mainly because a remarkable fraction of funds to basic research in biology is provided within the context of the biomedical agenda.

Progress in biology

Let’s start with Alberts’ claim that we still need small biology because we need to understand the tiny details of biological phenomena. Biologists usually express such understanding in terms of mechanisms of production of biological phenomena.

Alberts’ initial claim is that we need detailed mechanistic descriptions for the progress of molecular biology. However, he does not provide sounded motivation for that, but he just takes for granted that the aim of biology is to uncover the tiny details of causal production, to understand the ‘chemistry of life’ (sic). In other words, he thinks that we should move toward such detailed models for the sake of knowledge. As a philosopher I am sympathetic with this claim, but as a taxpayer contributing to basic research I would like to know more about that. Alberts’ claim looks like physicist Steven Weinberg’s last attempt (Kevles, 1997) to avoid funding cuts for the Super Conducting Collider. When it became clear that payoffs of physics research (fostered mainly within the context of the Cold War) were not motivating the 8 billions dollars necessary for the Super Conducting Collider, Weinberg played the card of ‘knowledge-for-the-sake-of-knowledge’ saying that if the Super Conducting Collider was killed, “you may as well say good-bye to (…) any hope in this country in our time of discovering a final theory of nature” (quoted in Kevles, 1997: 284). On the same premises, Alberts is saying that if we cut small science, then we may say ‘good-bye’ to the dream of discovering the secrets of the biological realm.

Yet, the aim of biology (with strategies to achieve it) is not established in a vacuum, but should always be contextualized within specific research and public agendas. Research priorities are established on the basis of reasons exceeding mere curiosity. Indeed, the very notion of ‘aim’—especially in the contemporary environment of bureaucratic and institutionalized research—depends strictly on a pre-existing agenda. Therefore the right way to face the problem of the dichotomy between small and big science in biology is to ask which of the two approaches can better serve the motivations of a particular research agenda. In other words, why do small or big science matter in a given context¹?

Once we have identified a context, there are at least two ways of evaluating the claim of ‘superiority’ of big over small science or vice versa. In a strong sense, it means that the kind of biological understanding needed in the specific context chosen could be achieved only through ‘small science’ (or big science). In other words, it is aut small biology aut big biology. In a weaker sense we need both but, within a context, it is small science—with its findings and practices—that sets the research agenda of big science (or viceversa). In this short article, I shall claim that in the context of biomedicine big biology is ‘epistemically superior’ to small biology in the weaker sense explained above.

Big science and biology

The first step towards a proper contextualization is the clarification of some concepts that are used in the debate as if they were clear and unambiguous.

First, ‘big science’—as the readers of this journal surely know—is not new. Actually, in physics big science has existed at least for 70 years. Nor big science is something new even in biology; the Human Genome Project (HGP) has been the first instance of big science in biology (Hilgartner, 2013).

Another issue to clarify is the ambiguous identification of ‘big science’ and ‘Big 6’. You can have ‘Big Data’ without ‘big science’ as in social media platforms that, while using ‘Big Data’ analytics, are not directly related to big science projects. But you can have also big science without Big Data like the Manhattan Project.

Moreover, there is not just Big Science. Indeed, Eddy (2013) lists different types of big science and he claims that biology as a big science is more prone to what he calls big map defined as “data resource—comprehensive, complete, closed ended—to be used by multiple groups, over a long time, for multiple purposes” (Eddy, 2013: R261). According to him, it is in the nature and the history of biology to be more oriented towards maps and big taxonomies rather than big experiments like the ones done at the Large Hadron Collider in Geneva. Finally, it is part of the narratives of projects such HGP, ENCODE, HapMap or TCGA that large-scale efforts should have as an outcome—among the others—voluminous data sets. This means that today all big science projects in biology are also Big Data.

The debate in the time of HGP

Another step towards a proper understanding of the problem ‘small science versus big science’ in biology is to highlight that a similar debate took place during the early phase of HGP. The challenges posed to ‘small molecular biology’ by HGP were strikingly similar to the ones posed by more recent big biological projects as TCGA or ENCODE. Sometimes, characters are even the same (e.g. Robert Weinberg).

The reader may have a look at Hilgartner’s work (2013) on that matter. He is very detailed in highlighting many of the fears that ‘small biologists’ had in facing HGP, especially new forms of authorship, funds, and research strategies.

But despite these (and other) issues, it became soon clear that HGP did neither reshape priorities of traditional molecular biology, nor put small lab biology in a position of epistemic inferiority, but rather it helped (even today) ‘ordinary biology’ (Hilgartner’s own words) by providing multiple resources.² No ‘ordinary biologist’ would deny how useful is HGP. But the dichotomy small versus big science has somehow dissolved during the development of HGP by realizing that HGP itself was a sort of ancilla scientiae.

Setting the controversy within a context

If big science projects in biology are Big Data and they are maps, then Alberts is right in saying that—qua ‘catalogues’—they do not advance biology. But his notion of ‘progress’ is surprisingly naïve. As the author of a famous textbook on the general topic of cell biology, he is genuinely interested in the mechanisms of biological phenomena. However, it is very difficult to contextualize this interest within a specific research agenda other than ‘the knowledge pursued for the sake of knowledge’. Now that we have understood what is big science in biology, and how a similar debate was faced, let’s try to understand how the dichotomy could be conceptualized within a specific context.

An important strand of research in molecular biology of the last 30 years has been motivated within the general background of Nixon’s ‘War against cancer’, which has triggered the flow of public money to molecular studies in biology. Even today, much of basic research in molecular biology is motivated by long-term future applications in the biomedical field. The small science that Alberts supports has brought substantial payoffs to the biomedical agenda, especially in the field of cancer studies (see for instance Weinberg, 2014).

Due to the importance of biomedicine today, let’s try to contextualize the controversy in the biomedical field, especially in molecular oncology. Unsurprisingly, in cancer studies the dichotomy ‘small versus big science’ has provoked heated disagreements (e.g. Golub, 2010 and Weinberg, 2010) similar to the ones provoked by HGP. Therefore now the question is: is small science (or big science) the approach that can better serve the aims of biomedicine (especially cancer studies)?

An example of big science project in biomedicine

To explain my tenet on the opposition between small and big science in the context of biomedicine, I focus on the big science project of TCGA.³

TCGA is a big science project in biomedicine and it is organized as a consortium of several universities and hospitals. It has been launched in 2005 by National Cancer Institute, National Human Genome Research Institute, and National Institutes of Health as a pilot project for a large-scale effort to map and characterize the molecular basis of several tumor types. Like HGP, it required a kind of ‘regime’ that is rather different from the one of a small laboratory of molecular biology. The consortium itself is organized around numerous centers,⁴ located geographically throughout USA and cooperation among these units is essential to do what TCGA has to do. In a nutshell, what TCGA’s scientists do is to sequence genomes of thousands of cancer samples and to organize data into a big map of somatic mutations and structural variations. The reason for doing this is genuinely statistical and rooted in an evolutionary framework. To put it very simple, since mutations influencing the development of cancer confer a growth advantage to cancer cells, they should be positively selected. If mutations are positively selected, then they should be detected more often than passenger mutations. Therefore, the bigger the sample size, the more it is likely to detect mutations that are significant for the development of cancer (though it increases the chance of detecting false positives too).

The work of TCGA so far has corroborated most of the discoveries (i.e. cancer genes) in molecular oncology from the last 30 years (typically a small science as suggested by Weinberg in 2014). However, it has also led to the discovery of new genes and mutations as well as suggesting, due to the statistical power of its studies, that some processes never considered by small-based molecular oncology are actually involved in cancer development, e.g. cell metabolism.

The difference between TCGA and the small science of molecular oncology is first in terms of discovery strategies. Clearly, TCGA operates with a ‘brute force’ approach: roughly, TCGA sequences more samples as it can and then ‘raw data’ are analyzed by computer scientists. Small molecular oncology is more creative; to trivialize a bit, you start with a general guess (generated on the basis of existing literature, preliminary data but also intuitions) about the activities of certain entities, and you design smart experiments to develop your initial guess, until you reach a description of the organization and the interaction between molecules that is detailed enough. A representative of such an approach—as well as a supporter of the small science approach in biology (Weinberg, 2010)—is Robert Weinberg, a pioneer in molecular oncology. If you look at any of the articles published by his group, you will be impressed by the amount of experiments performed and the way initial guesses are developed to form sophisticated and detailed mechanistic models.

Speaking of mechanistic details, there is also a substantial difference in the results achieved. Small molecular oncology elaborates detailed descriptions of biological mechanisms, i.e. the kind of relevant biological understanding sought by Alberts. Consortia like TCGA aim to achieve significant correlations—grounded on Big Data sets—between specific entities (genes, mutations, etc.) and biological phenomena (prostate cancer, breast cancer, etc.) with few connections to mechanistic descriptions.

The biomedical bottomless pit

A recent trend in biomedicine is to use knowledge of molecular biology not just to understand the genesis of certain diseases, but also to look for molecular targets for the development of new drugs. At least, this is how many grant proposals successfully pass application processes in funding institutions. It is not uncommon to end a story in a scientific article by saying that how the protein x interacts with the protein y can be, potentially, a landmark discovery that could fuel the development of a drug in the future. So, let us contextualize the dichotomy ‘small science versus big science’ within the role that basic research (ordinary molecular oncology versus projects such as TCGA) can possibly play in the development of a new drug.

One problem that drug discovery has to deal with is to find reliable molecular targets to prioritize to start drug development. This is the phase of drug discovery that is called ‘target identification’ (Hughes et al., 2011). ‘Molecular target’ here applies to a broad set of biological entities (e.g. proteins, genes, mutations). A good molecular target should be, in the first instance, relevant to the disease we want to cure. ‘Relevance’ can be assessed in several ways. For instance, a good target should be either overexpressed in disease tissues, or its mutations should be correlated with the disease (Butcher, 2003). Therefore, the role of basic research within this context is to discover molecules that can be of some interest for drug discovery. Individual groups may choose in which direction developing the phase of ‘target identification’. For instance, a leading review (Lindsay, 2003) emphasized how target identification was (at least at those times) geared towards a molecular approach, which emphasizes “an understanding of the cellular mechanisms underlying disease phenotypes of interest” (p. 831). Alberts would probably support such an approach. The important point is that to identify promising targets there might be different approaches (emphasizing the understanding of tiny mechanisms as well as other views), but the fact that the role of basic research in biomedicine is exactly to provide such materials is not controversial, and any review about the early phase of drug discovery would confirm that. This is an assumption that Hughes et al. (2011) in a recent but widely cited review on early phases of drug discovery made when first they associate target identification with basic research, and then when they explicitly said that “[t]he initial research, often occurring in academia, generates data to develop a hypothesis that the inhibition or activation of protein or pathway will result in a therapeutic effect in a disease state. The outcome of this activity is the selection of a target which may require further validation prior to progression into the lead discovery phase in order to justify a drug discovery effort” (p. 1239). This is the division of labor in the biomedical field. This is not to say that biomedicine is just the quest for molecular target. Rather, this is to say that basic research in biomedicine—pursued mainly by the kind of molecular biology endorsed by Alberts—has the main aim of identifying targets. What kind of evidence basic research would provide (whether of the kind sought by Alberts or of other types) to establish which molecule is a target can be a matter of debate. After a target is identified, basic research has fulfilled its aim and targets can be validated to assure that the molecule is really implicated in the disease of interest (see again Hughes et al., 2011).

Now the dichotomy could be rephrased as follows: what is the best approach for identifying molecular targets? Imagine that we look for promising molecular targets for prostate cancer. Today, we have at least two options. First, we can make an extensive literature search in Pubmed. We will find out several interesting studies like the typical small science lab that, on the basis of cancer samples coming from two or three patients, identifies genes of interest and it elaborates detailed mechanisms on how these genes do what they do. These results would have at least one important flaw, i.e. they are usually associated to a very limited sample size. This means that while from these results one could depict important tiny details on how a biological phenomenon is produced, the model itself is corroborated by very few data. The other option is to look at the database of TCGA. Here we will find out which genes have been found mutated in all prostate cancer samples sequenced by TCGA, the frequency, the kinds of mutations, the clinical information, etc. Most important, a platform such as TCGA would tell you at the population level which genes seem to be implicated in prostate cancer.

Is it more important—in this context of prioritization of molecular targets—to have detailed mechanistic models based on small sample size that can hardly represent the actual population, or less detailed models but statistically more robust? If we define promising targets as those molecules that are more likely to be involved in a certain disease, then the statistical significance of findings becomes crucial because a target would be promising only if it offers the prospects of being crucial in as many similar cases of the same disease as it can. Robert Weinberg, one of the main supporter of Alberts’ ideas within biomedicine, is used to publish many detailed articles which emphasize the tiny micro details of interactions between molecules (see for instance Chaffer et al., 2013; Guo et al., 2012), but the results of his group are limited to few cell lines, to a single animal model or to few patients. The fact that his group depicts many molecular details is not a measure of the applicability of these results to a stratified population. On the contrary, as Levins taught us, there is a trade-off between precision and generality (Matthewson and Weisberg, 2008). In order to think that a target could be promising in being used as the starting point of the development of a drug for a specific population, you need evidence that such a target has some role in as many individuals of that population as possible. To put it simple, when you know that in a specific subpopulation some genes have most of the time certain type of mutations, and that mutations are not present in a healthy population (i.e. the case of genome-wide association studies), then you have good preliminary reasons to further investigate such mutations. If my argument is right, then in drug discovery Big Data are likely to make the difference for the identification of targets because they provide exactly the type of data sets that are amenable for the meta-analysis that can provide exactly those insights at a population-level. This is exactly what practitioners have started to think about. Projects such as TCGA “are providing a growing list of genes that are causally involved in cancer (…) GWASs are also contributing to the generation of lists of candidate genes that have a biologically and pathologically compelling role in cancer” (Patel et al., 2012: 35). Others say that “data mining of available biomedical data has led to a significant increase in target identification” (Hughes et al., 2011: 1240) or that target identification can potentially be fostered because “we are embracing an unprecedented omics era with the explosion of biological data and information” (Yang et al., 2009: 147). It seems that practitioners are moving towards projects such as TCGA for target identification.

In the context of the flow of public money within the biomedical field with its epistemic needs, big science projects such as TCGA could produce interesting results (i.e. identifying targets) in a sort of ‘assembly-line’ way, thereby being faster than small science labs. In other words, if the role of basic molecular research in biomedicine and drug development is to discover reliable molecular targets (and this is what the leading reviews in drug discovery say), then big science can achieve this aim faster and better than small science. This is not to say that big biology is now identifying more promising targets. Rather, I mean to suggest that in principle could be faster and more reliable than small biology in doing so. Also, big biology can be—again, in principle—less biased than small biology. This is because without the big amount of data generated by big consortia and the possibility of systematic exploration, biologists tend to set the starting point of the quest for targets with ad hoc criteria (Patel et al., 2012) or, as it has been extensively done in the past, “researchers have tended to work on a handful of favored genes, often identified in the literature by academic groups, amenable to low-throughput analysis” (Butcher, 2003: 367–368). This is also the reason why there is an effort to integrate databases that emphasize different aspects of the biomedical constitution of diseases (Garraway and Lander, 2013): The more relevant data about a condition in a population we have, the more we can systematically isolate those aspects that are likely to play an important role.

However, one might argue that to have a reliable molecular target (let’s say a gene) we need also to have a grasp on the mechanism that the molecule puts in place. Otherwise, we run the risk of having many spurious molecular targets. If this is the case, then we still need small science. This means that the kind of understanding favored by Alberts, though it is not the priority, it is still required.

I buy this argument, but it changes just slightly my conclusion. In fact, this is what is done in the phase of target validation, and I am not claiming that the kind of small science sought by Alberts cannot have a role here. Actually, small biology here is still required. However, for the supporters of ‘small biology’ this is no good news; actually, this remark has an interesting consequence for the independence of the kind of biology favored by Alberts (at least in the field of biomedicine).

The phase of validation takes place because we have solid reasons to think that a certain molecule is implicated somehow in the disease of interest. In other words, we are justified in focusing on certain targets. We might say that big consortia such as TCGA provide justificatory reasons to do what ‘small labs’ do. In other words, the reasons to fund a small project (as a ‘target validation’ project, not anymore as a basic research project) instead of another depend also on the solidity of the hypothesis proposed. Now let us ask what would be a more robust reason to investigate a mechanism. Are more robust reasons based on statistical significant associations between genes and diseases provided by projects such as TCGA, or reasons based on habits, intuitions and literature search on Pubmed done without any competence in text mining? Again, in this context big science should be prior to small science, by providing molecules to prioritize for small science’s investigations, and thereby centralizing research around its statistically significant findings. What I am proposing is that big science should be the main source of preliminary hypotheses that will be investigated by small science, and this means that the viability of a project in small biology would be entirely dependent on the big biology infrastructure and its potential to bring into light promising target from an immense sea of data.

Here lies also the difference of the quarrel ‘big versus small’ biology in biomedicine between today, and the same debate in the context of HGP. As I noted, HGP did not subjugate small biology, since small biology remained largely independent. On the contrary, projects such as TCGA—even without advancing our understanding of biological phenomena in Albert’s sense—have the potential to establish the research agenda of ordinary biology. In this context, small labs then will survive only by focusing on what big biology would discover.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The present work has been funded in part by a fellowship of the European School of Molecular Medicine (SEMM) in Milan, and in part by a fellowship of the John Templeton Foundation within the project “Developing Virtues in the Practice of Science” hosted by the University of Notre Dame.

Notes

References

Alberts B (2012) The end of “small science”? Science 337: 1583.

Butcher

(2003) Target discovery and validation in the post-genomic era. Neurochemical Research 28(2): 367–371.

Chaffer CL, Marjanovic ND, Lee T, et al. (2013) Poised chromatin at the ZEB1 promoter enables breast cancer cell plasticity and enhances tumorigenicity. Cell 154(1).

Eddy

(2013) The ENCODE project: Missteps overshadowing a success. Current Biology 23(7): R259–R261.

Garraway, Levi A and Eric S Lander (2013) Lessons from the Cancer Genome. Cell 2013; 153(1): 17–37. doi:10.1016/j.cell.2013.03.002.

Golub

(2010) Counterpoint: Data first. Nature 464(7289): 679.

Guo

Keckesova

Liu Donaher

(2012) Slug and Sox9 cooperatively determine the mammary stem cell state. Cell 148(5): 1015–1028.

Hilgartner

(2013) Constituting large-scale biology: Building a regime of governance in the early years of the Human Genome Project. BioSocieties 8: 397–416.

Hughes

Rees

Kalindjian

(2011) Principles of early drug discovery. British Journal of Pharmacology 162(6): 1239–1249.

10.

Kevles

(1997) Big science and big politics in the United States: Reflections on the death of the SSC and the life of the Human Genome Project. Historical Studies in the Physical and Biological Sciences 27(2): 269–297.

11.

Lindsay

(2003) Target discovery. Nature Reviews, Drug Discovery 2(10): 831–838.

12.

Matthewson

Weisberg

(2008) The structure of tradeoffs in model building. Synthese 170(1): 169–190.

13.

Patel

Halling-Brown

Tym

(2012) Objective assessment of cancer genes for drug discovery. Nature Reviews Drug Discovery 12(1): 35–50.

14.

Weinberg

(2010) Point: Hypotheses first. Nature 464(7289): 678.

15.

Weinberg

(2014) Coming full circle-from endless complexity to simplicity and back again. Cell 157(1): 267–271.

16.

Yang

James Adelstein

Kassis

(2009) Target discovery from data mining approaches. Drug Discovery Today 14(3–4): 147–154.