Abstract
In biology—as in other scientific fields—there is a lively opposition between big and small science projects. In this commentary, I try to contextualize this opposition in the field of biomedicine, and I argue that, at least in this context, big science projects should come first.
Keywords
In 2012, Bruce Alberts wrote an editorial on
Such questions are rhetorical if analyzed in light of Alberts’ implicit assumptions. First, big science projects in biology provide catalogues of biological objects. Second, progress in biology is measured by the understanding of tiny details of biological phenomena and the causal chains producing them. Since small molecular biology promotes exactly that kind of understanding of biological phenomena and their molecular details, and since what remains to be understood in biology is most of all biochemical tiny details,
In this contribution, I want to put forth my two cents on the grand topic of ‘small science versus big science’ in biology, by claiming that big science projects
Progress in biology
Let’s start with Alberts’ claim that we still need small biology because we need to understand the tiny details of biological phenomena. Biologists usually express such understanding in terms of mechanisms of production of biological phenomena.
Alberts’ initial claim is that
Yet, the aim of biology (with strategies to achieve it) is not established in a
Once we have identified a context, there are at least two ways of evaluating the claim of ‘superiority’ of big over small science or
Big science and biology
The first step towards a proper contextualization is the clarification of some concepts that are used in the debate as if they were clear and unambiguous.
First, ‘big science’—as the readers of this journal surely know—is not new. Actually, in physics big science has existed at least for 70 years. Nor big science is something new even in biology; the Human Genome Project (HGP) has been the first instance of big science in biology (Hilgartner, 2013).
Another issue to clarify is the ambiguous identification of ‘big science’ and ‘Big 6’. You can have ‘Big Data’ without ‘big science’ as in social media platforms that, while using ‘Big Data’ analytics, are not directly related to big science projects. But you can have also big science without Big Data like the Manhattan Project.
Moreover, there is not just Big Science. Indeed, Eddy (2013) lists different types of big science and he claims that biology as a big science is more prone to what he calls big map defined as “data resource—comprehensive, complete, closed ended—to be used by multiple groups, over a long time, for multiple purposes” (Eddy, 2013: R261). According to him, it is in the nature and the history of biology to be more oriented towards maps and big taxonomies rather than big experiments like the ones done at the Large Hadron Collider in Geneva. Finally, it is part of the narratives of projects such HGP, ENCODE, HapMap or TCGA that large-scale efforts should have as an outcome—among the others—voluminous data sets. This means that today
The debate in the time of HGP
Another step towards a proper understanding of the problem ‘small science versus big science’ in biology is to highlight that a similar debate took place during the early phase of HGP. The challenges posed to ‘small molecular biology’ by HGP were strikingly similar to the ones posed by more recent big biological projects as TCGA or ENCODE. Sometimes, characters are even the same (e.g. Robert Weinberg).
The reader may have a look at Hilgartner’s work (2013) on that matter. He is very detailed in highlighting many of the fears that ‘small biologists’ had in facing HGP, especially new forms of authorship, funds, and research strategies.
But despite these (and other) issues, it became soon clear that HGP did neither reshape priorities of traditional molecular biology, nor put small lab biology in a position of epistemic inferiority, but rather it helped (even today) ‘ordinary biology’ (Hilgartner’s own words) by providing multiple resources.
2
No ‘ordinary biologist’ would deny how useful is HGP. But the dichotomy small versus big science has somehow dissolved during the development of HGP by realizing that HGP itself was a sort of
Setting the controversy within a context
If big science projects in biology are Big Data and they are maps, then Alberts is right in saying that—
An important strand of research in molecular biology of the last 30 years has been motivated within the general background of Nixon’s ‘War against cancer’, which has triggered the flow of public money to molecular studies in biology. Even today, much of basic research in molecular biology is motivated by long-term future applications in the biomedical field. The small science that Alberts supports has brought substantial payoffs to the biomedical agenda, especially in the field of cancer studies (see for instance Weinberg, 2014).
Due to the importance of biomedicine today, let’s try to contextualize the controversy in the biomedical field, especially in molecular oncology. Unsurprisingly, in cancer studies the dichotomy ‘small versus big science’ has provoked heated disagreements (e.g. Golub, 2010 and Weinberg, 2010) similar to the ones provoked by HGP. Therefore now the question is: is small science (or big science) the approach that can better serve the aims of biomedicine (especially cancer studies)?
An example of big science project in biomedicine
To explain my tenet on the opposition between small and big science in the context of biomedicine, I focus on the big science project of TCGA. 3
TCGA is a big science project in biomedicine and it is organized as a consortium of several universities and hospitals. It has been launched in 2005 by National Cancer Institute, National Human Genome Research Institute, and National Institutes of Health as a pilot project for a large-scale effort to map and characterize the molecular basis of several tumor types. Like HGP, it required a kind of ‘regime’ that is rather different from the one of a small laboratory of molecular biology. The consortium itself is organized around numerous centers, 4 located geographically throughout USA and cooperation among these units is essential to do what TCGA has to do. In a nutshell, what TCGA’s scientists do is to sequence genomes of thousands of cancer samples and to organize data into a big map of somatic mutations and structural variations. The reason for doing this is genuinely statistical and rooted in an evolutionary framework. To put it very simple, since mutations influencing the development of cancer confer a growth advantage to cancer cells, they should be positively selected. If mutations are positively selected, then they should be detected more often than passenger mutations. Therefore, the bigger the sample size, the more it is likely to detect mutations that are significant for the development of cancer (though it increases the chance of detecting false positives too).
The work of TCGA so far has corroborated most of the discoveries (i.e. cancer genes) in molecular oncology from the last 30 years (typically a small science as suggested by Weinberg in 2014). However, it has also led to the discovery of new genes and mutations as well as suggesting, due to the statistical power of its studies, that some processes never considered by small-based molecular oncology are actually involved in cancer development, e.g. cell metabolism.
The difference between TCGA and the small science of molecular oncology is first
Speaking of mechanistic details, there is also
The biomedical bottomless pit
A recent trend in biomedicine is to use knowledge of molecular biology not just to understand the genesis of certain diseases, but also to look for molecular targets for the development of new drugs. At least, this is how many grant proposals successfully pass application processes in funding institutions. It is not uncommon to end a story in a scientific article by saying that how the protein
One problem that drug discovery has to deal with is to find reliable molecular targets to prioritize to start drug development. This is the phase of drug discovery that is called ‘target identification’ (Hughes et al., 2011). ‘Molecular target’ here applies to a broad set of biological entities (e.g. proteins, genes, mutations). A good molecular target should be, in the first instance, relevant to the disease we want to cure. ‘Relevance’ can be assessed in several ways. For instance, a good target should be either overexpressed in disease tissues, or its mutations should be correlated with the disease (Butcher, 2003). Therefore, the role of basic research within this context is to discover molecules that can be of some interest for drug discovery. Individual groups may choose in which direction developing the phase of ‘target identification’. For instance, a leading review (Lindsay, 2003) emphasized how target identification was (at least at those times) geared towards a molecular approach, which emphasizes “an understanding of the cellular mechanisms underlying disease phenotypes of interest” (p. 831). Alberts would probably support such an approach. The important point is that to identify promising targets there might be different approaches (emphasizing the understanding of tiny mechanisms as well as other views), but the fact that the role of basic research in biomedicine is exactly to provide such materials is not controversial, and any review about the early phase of drug discovery would confirm that. This is an assumption that Hughes et al. (2011) in a recent but widely cited review on early phases of drug discovery made when first they associate target identification with basic research, and then when they explicitly said that “[t]he initial research, often occurring in academia, generates data to develop a hypothesis that the inhibition or activation of protein or pathway will result in a therapeutic effect in a disease state. The outcome of this activity is the selection of a target which may require further validation prior to progression into the lead discovery phase in order to justify a drug discovery effort” (p. 1239). This is the division of labor in the biomedical field. This is not to say that biomedicine is
Now the dichotomy could be rephrased as follows: what is the best approach for identifying molecular targets? Imagine that we look for promising molecular targets for prostate cancer. Today, we have at least two options. First, we can make an extensive literature search in Pubmed. We will find out several interesting studies like the typical small science lab that, on the basis of cancer samples coming from two or three patients, identifies genes of interest and it elaborates detailed mechanisms on how these genes do what they do. These results would have at least one important flaw, i.e. they are usually associated to a very limited sample size. This means that while from these results one could depict important tiny details on how a biological phenomenon is produced, the model itself is corroborated by very few data. The other option is to look at the database of TCGA. Here we will find out which genes have been found mutated in all prostate cancer samples sequenced by TCGA, the frequency, the kinds of mutations, the clinical information, etc. Most important, a platform such as TCGA would tell you
Is it more important—in this context of prioritization of molecular targets—to have detailed mechanistic models based on small sample size that can hardly represent the actual population, or less detailed models but statistically more robust? If we define promising targets as those molecules that are more likely to be involved in a certain disease, then the statistical significance of findings becomes crucial because a target would be promising only if it offers the prospects of being crucial in as many similar cases of the same disease as it can. Robert Weinberg, one of the main supporter of Alberts’ ideas within biomedicine, is used to publish many detailed articles which emphasize the tiny micro details of interactions between molecules (see for instance Chaffer et al., 2013; Guo et al., 2012), but the results of his group are limited to few cell lines, to a single animal model or to few patients. The fact that his group depicts many molecular details is not a measure of the applicability of these results to a stratified population. On the contrary, as Levins taught us, there is a trade-off between precision and generality (Matthewson and Weisberg, 2008). In order to think that a target could be promising in being used as the starting point of the development of a drug for a specific population, you need evidence that such a target has some role in as many individuals of that population as possible. To put it simple, when you know that in a specific subpopulation some genes have most of the time certain type of mutations, and that mutations are not present in a healthy population (i.e. the case of genome-wide association studies), then you have good preliminary reasons to further investigate such mutations. If my argument is right, then in drug discovery Big Data are likely to make the difference for the identification of targets because they provide exactly the type of data sets that are amenable for the meta-analysis that can provide exactly those insights at a population-level. This is exactly what practitioners have started to think about. Projects such as TCGA “are providing a growing list of genes that are causally involved in cancer (…) GWASs are also contributing to the generation of lists of candidate genes that have a biologically and pathologically compelling role in cancer” (Patel et al., 2012: 35). Others say that “data mining of available biomedical data has led to a significant increase in target identification” (Hughes et al., 2011: 1240) or that target identification can potentially be fostered because “we are embracing an unprecedented omics era with the explosion of biological data and information” (Yang et al., 2009: 147). It seems that practitioners are moving towards projects such as TCGA for target identification.
In the context of the flow of public money within the biomedical field with its epistemic needs, big science projects such as TCGA could produce interesting results (i.e. identifying targets) in a sort of ‘assembly-line’ way, thereby being faster than small science labs. In other words, if the role of basic molecular research in biomedicine and drug development is to discover reliable molecular targets (and this is what the leading reviews in drug discovery say), then big science can achieve this aim faster and better than small science. This is not to say that big biology is
However, one might argue that to have a reliable molecular target (let’s say a gene) we need also to have a grasp on the mechanism that the molecule puts in place. Otherwise, we run the risk of having many spurious molecular targets. If this is the case, then we still need small science. This means that the kind of understanding favored by Alberts, though it is not the priority, it is still required.
I buy this argument, but it changes just slightly my conclusion. In fact, this is what is done in the phase of
The phase of validation takes place because we have solid reasons to think that a certain molecule is implicated somehow in the disease of interest. In other words, we are justified in focusing on certain targets. We might say that big consortia such as TCGA provide
Here lies also the difference of the quarrel ‘big versus small’ biology in biomedicine between
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The present work has been funded in part by a fellowship of the European School of Molecular Medicine (SEMM) in Milan, and in part by a fellowship of the John Templeton Foundation within the project “Developing Virtues in the Practice of Science” hosted by the University of Notre Dame.
