Abstract
This article provides an ethnographic account of how Big Data biology is produced, interpreted, debated, and translated in a Big Data-driven cancer clinical trial, entitled “Personalized OncoGenomics,” in Vancouver, Canada. We delve into epistemological differences between clinical judgment, pathological assessment, and bioinformatic analysis of cancer. To unpack these epistemological differences, we analyze a set of gazes required to produce Big Data biology in cancer care: clinical gaze, molecular gaze, and informational gaze. We are concerned with the interactions of these bodily gazes and their interdependence on each other to produce Big Data biology and translate it into clinical knowledge. To that end, our central research questions ask: How do medical practitioners and data scientists interact, contest, and collaborate to produce and translate Big Data into clinical knowledge? What counts as actionable and reliable data in cancer decision-making? How does the explicability or translatability of genomic Big Data come to redefine or contradict medical practice? The article contributes to current debates on whether Big Data engenders new questions and approaches to biology, or Big Data biology is merely an extension of early modern natural history and biology. This ethnographic account will highlight how genomic Big Data, which underpins the mechanism of personalized medicine, allows oncologists to understand and diagnose cancer in a different light, but it does not revolutionize or disrupt medical oncology on an institutional level. Rather, personalized medicine is interdependent on different styles of (medical) thought, gaze, and practice to be produced and made intelligible.
Introduction
Trish Keating is the “poster child” for the Personalized OncoGenomics (POG) experimental clinical genomics cancer trial, located in Vancouver, British Columbia. A retired film costume designer, she was diagnosed with stage-four colorectal cancer in 2010 which had metastasized to other organs in her body. Traditional cancer treatments including chemotherapy, radiation, and surgeries failed to treat her cancer. As her cancer grew more aggressive, Trish became palliative and prepared for her last few months on earth. Trish enrolled in POG as a last resort. POG used whole genome sequencing, machine learning, and other computational techniques to process and analyze biomedical Big Data to identify patterns and abnormalities in the tumor genome. Trish’s DNA and RNA sequencing data showed a curious biomarker, which is a protein responsible for the growth of her tumors. This specific protein can be easily inhibited by a high blood pressure medication. Within five weeks of starting on a low-cost high blood pressure drug her tumors went into remission. News articles and documentaries called Trish’s case both a miracle and a scientific breakthrough of genomic sciences in oncology care.
Doctors agreed with the media. They called it miraculous that a common blood-pressure drug can inhibit the growth of aggressive cancer tumors where standardized cancer treatments, including radiation and chemotherapy, failed (Fayerman, 2015). What does this “miracle” signify? Trish Keating’s case shows a rise in computational approach to cancer, which uses computational analysis to understand cancer in terms of numbers, measurement levels, and statistical correlations. Trish Keating’s case exemplifies the ways in which Big Data can influence clinical decision-making. However, for Big Data to be applied into clinical decisions it is helpful to explore the processes of how Big Data is collected, processed, and transformed from the human flesh and translated into clinical knowledge.
Our research contributes to critical-interpretive perspectives of the use of Big Data in the healthcare domain (Stevens et al., 2018). It also contributes to current debates on whether Big Data engenders new questions and approaches to biology (Ratti, 2016; Stevens, 2013, 2016), or Big Data biology is merely an extension of early modern natural history and biology (Leonelli, 2014; Strasser, 2012; Strasser and Edwards, 2017). While these discussions center around the nature and value of Big Data biology and the production of Big Data in genomic laboratories, little has been written about how Big Data is produced and translated into clinical decision-making.
In this article, we examine the production and translation of Big Data into clinical genomics, which is a complex biomedical process that comprises two or more epistemological systems of knowledge about cancer. The article explores epistemological differences between bioinformatic analysis and pathology assessment of cancer. The former focuses on large-scale data points of gene expression produced from statistical analyses, and the latter is concerned with protein expression of tumors obtained from microscopic studies. We conducted an ethnography with a Big Data-driven cancer clinical trial, entitled “Personalized OncoGenomics”, located in Vancouver, Canada. POG is a cutting-edge collaborative project between radiology, pathology, molecular biology, bioinformatics, and oncology. It is an ideal site to observe and understand how genomic Big Data is produced, interpreted, contested, and actioned in a clinical setting. These questions drive our research: How do medical practitioners and data scientists interact, contest, and collaborate to produce and translate Big Data into clinical knowledge? What counts as actionable and reliable data in cancer decision-making? And how does the explicability or translatability of genomic Big Data come to redefine or contradict medical practice?
POG helps us understand how cancer and the body multiple (Mol, 2002) through the different gazes and how these different practices come together to render meaningful clinical knowledge of life itself. We study three types of actors at POG, radiologists, pathologists, bioinformaticians, and oncologists, to observe and understand how each makes sense and enacts knowledge about cancer and the body. They each operate from their own perspectives and gazes: a clinical gaze, molecular gaze, and informational gaze (Foucault, 1963; Kay, 2000; Rose, 2007). We examine their individual characteristics, interactions, and interdependences as they produce Big Data biology and translate it into clinical knowledge. We argue Big Data biology and clinical work are interdependent on one another in the realm of oncogenomics and the entrance of genomic Big Data does not automatically translate into clinically useful knowledge. This ethnographic study highlights how genomic Big Data allows oncologists to understand and diagnose cancer using data-driven methods. The genomic data is new, but it does not necessarily revolutionize or disrupt medical oncology on an institutional level. We observe a highly collegial and interactive collaborative network of professionals in which clinicians still act as the gatekeeper to medical action. Their medical judgment, expertise, and ethics hold a higher authority than data-driven approaches in clinical decision-making. Most importantly, our ethnographic data sheds light on what stakeholders felt was really at stake in the adoption of Big Data into oncology setting, which is the life of the patient.
The “non-revolutionary” convergence of Big Data and biology in cancer genomics trials
Scholars in social studies of biomedical innovation have extensively investigated “bioclinical collectives” of multidisciplinary experts and disciplines that make up new ways of practicing medical care (Bourret, 2005; Cambrosio et al., 2014; Rabeharisoa and Bourret, 2009; Timmermans et al., 2017). “Bioclinical collectives'' of cancer clinical trials attend to the ways in which different epistemologies collaborate, contest, negotiate, and co-construct knowledge of cancer. With the proliferation of molecular biology and high-throughput genomic technologies, oncology care has embraced a new form of “experimental care” in which the boundaries between research and care as well as biology and computing sciences have been blurred (Cambrosio et al., 2018; Keating and Cambrosio, 2011; Nelson et al., 2013). Such epistemological blurring has led to opportunities in understanding cancer in a new light. Genomic medicine gives rise to a new “sociotechnical regime” in oncology (Nelson et al., 2013: 407). While “actionable” genomic data influences clinical decisions, it also shifts “the types of evidence on which clinicians can act” (Nelson et al., 2013: 425). Big Data biology also engenders challenges in translating computational data into clinically actionable knowledge. The translatability of Big Data into clinical settings is an important topic, which is currently understudied in critical data studies. Many studies have focused on how bioinformaticians analyze and produce biological knowledge from Big Data (Stevens, 2011, 2013, 2016). Less attention has been paid to other medical practitioners, such as radiologists or pathologists who also play vital roles in the production of biological samples that are later sequenced into Big Data. This ethnography attends to a set of medical practices employed by these different cancer specialists to extract, process, and assess cancer tumor samples that are transformed into Big Data and then translated into clinical knowledge. The ethnography sheds light on the multidirectional and multi-modal flows of heterogeneous materials, information, and knowledge (Crabu, 2016) in the re-inscription of human tumor flesh into genomic Big Data. It also contributes to the current debates on the impacts of Big Data in clinical practices.
Data-driven or data-centric approaches to medicine have been a prolific topic of discussion for many scholars in the history of science, communication, and philosophy of (big) data. Most of the literature about Big Data in health tend to focus on the ethics, ownership, and governance of data and databases (Ostherr et al., 2017; Prainsack, 2019; Sharon and Lucivero, 2019). Others focus on the economic impacts of Big Data and information in healthcare (Stevens, 2011; Vezyridis and Timmons, 2017). More importantly, an ongoing debate on Big Data biology centers around the differences between Big Data biology and traditional biological practices. Historian of science Bruno Strasser (2012) posits that Big Data science is deeply rooted in early modern natural history, and hence, data-driven science is not a novel invention. According to Strasser (2012), a fundamental difference between contemporary data-driven science and early modern natural history concerns the nature of the data. He argues that while the former relies on petabytes of digital data stored in computerized databases, the latter mainly comprises “analog images” and “scientific objects” stored in “closed boxes” and library archives (p. 86). A second difference, Strasser continues, stems from the proliferation of data analysts in contemporary data sciences, who analyze data sent to them by other stakeholders and do not necessarily collect or produce any data themselves. Whereas in the early modern periods, naturalists such as Darwin or Cuvier tended to be the ones who analyzed the data that they themselves collected or produced. Therefore, Strasser concludes that the characteristics of contemporary data-driven science are not different from hypothesis-driven methods of early modern natural history. Historian and philosopher of science, Sabina Leonelli (2014), also challenges the “revolutionary” discourse of Big Data. She argues the mechanisms of Big Data are well-aligned with the core epistemologies of early modern biology in conducting “explanatory experimentation, sampling and the search for causal mechanisms” (p. 9). As such, Big Data does “make a difference” in biology, just in “non-revolutionary” ways.
Other scholars, however, argue Big Data can transform biology by engendering new questions and approaches (Chow-White and García-Sancho, 2012; Stevens, 2013, 2015, 2016). Historian of science Hallam Stevens (2016) examines the implications of text-search algorithms, a text-mining tool in Big Data research, in reconceptualizing genomics not as biological properties but as text-based information. To Stevens, biostatisticians or bioinformaticians are less concerned about the biological meanings of DNA than with finding “commonly occurring or overrepresented patterns” (p. 364) in a large dataset. Therefore, Big Data can enable new approaches to understanding human biology as large-scale text-based entities and change the ways in which the human genome is ontologized. However, conceptualizing DNA as text-based entities limits “the kinds of questions and answers that genome biologists pose and attempt to answer” (p. 353).
While much research has focused on the production, sharing and control of Big Data, or “data journeys” in laboratories and research institutions (Leonelli, 2016; Stevens, 2011), relatively little research has examined the ways in which Big Data is produced, negotiated, and translated into clinical knowledge. This article is set out to fill that gap by documenting the production or re-inscription process of cancer tumor into Big Data, followed by three vignettes that depict the translatability, or the lack thereof, of Big Data into actionable clinical knowledge. We examine various medical practices required to produce and analyze Big Data within cancer research by looking at how human flesh and tumors become data and how data analysis is negotiated and translated into clinical care. This critical examination shows how actors produce cancer Big Data from traditional biological practices and worked into medical diagnosis, rather than transforming medical practices altogether. The ethnography of this article highlights a complex amalgam of clinical, molecular, and bioinformatic gazes of seeing, feeling, and understanding the cancer in terms of fleshy forms, molecular cells, and bits and bytes information system (Kay, 2000). Our analysis seeks to further advance the discussion on life as information by examining how seeing the cancer as an information system by the bioinformatician can complement or contradict the clinician’s way of seeing and gazing at the cancer through medical intuition and ethics.
The bodily gazes
Scholars argue bodies have always been subjects under one or more forms of gaze, whether it is the clinical gaze, the male gaze, and/or the white gaze. The “clinical gaze” helps us understand the ways in which traditional diagnosis is established on the basis of anatomical pathology, which is an observable association between pathological structures and bodily syndromes (Foucault, 1973). As we enter a post-genomics era, individuals are increasingly subject to a gaze that renders the bodies and their biology as numerical, measurable, and informational. We call this an “informational gaze”. The informational gaze is an extension of the “molecular gaze,” a concept developed by sociologist Nikolas Rose (2007). Rose conceptualizes the molecular gaze as a molecular style of thought about life itself in terms of variations in nucleotide sequences and molecular mechanisms of gene and protein expressions, which renders life molecular (Myers, 2015). When the ontology of life takes a molecular turn, biological components of the body including cells, tissues, DNA, RNA, or proteins can be fragmented, decomposed, stabilized, frozen, extracted, manipulated, commodified, and capitalized. Genomic technologies are embedded with these molecular practices of extracting, constructing, and sequencing DNA and RNA for comprehensive transcriptome and gene expression analysis. These comprehensive analyses of our molecules are analyzed and understood using large-scale informational data, computational codes, and probabilistic values. In other words, the molecular forms of life are made intelligible by processes of calculation, quantification, measurement, and informatization. Thus, in the postgenomic age, the body may still be understood in terms of their molecules, but their molecules have taken on a form of information, data, and numerical values. To this end, the body along with the normal and the pathological are seen as an informational system. This is the core of the informational gaze. The informational gaze reflects a profession vision (Goodwin, 1994) of doctors and scientists who use Big Data analyses and statistical visualization to draw pathways and causation between tumors, mutations, and markers. Traditional biology views the body in terms of cells and molecules. Under the regime of Big Data biology, the body is understood in forms of codes, numbers, and information. Whole genome and transcriptome sequencing are common methods for producing this information. In an exchange with a leading genome scientist in Vancouver, Canada, the scientist shared with us the reason POG chose to do whole genome and transcriptome rather than just whole exome sequencing, which is cheaper and produces less data to analyze. The bigger the data captured, the higher chance to produce statistically significant probability value and minimize any margins of error: I want the whole thing. Until you measure the genome, you don't know what you have missed and so I don't want to miss the answer and I don't think we need to miss the answer. We deal with complex information systems. That’s what the genome center does. We are experts in whole genome and transcriptomes. We can do transcriptomes. That's not the point. The more important point is, when you look at the observations we make in POG that are most likely to impact clinical course of action, the transcriptome is the dominant feature. Then aspects of the genome that support the transcription become very important. We need it all. We can't be fooling around with one percent of the genome. (Genome Scientist)
“Personalized Oncogenomics” cancer clinical trial
POG is a flagship research program at the Genome Science Center (GSC) in collaboration with British Columbia Cancer Agency (BCCA) and Vancouver Cancer Center (VCC). POG’s weekly board meeting takes place at the BCCA’s BC Cancer Research Centre (BCCRC) building, while the sequencing work happens at GSC. The GSC and BCCRC are two separate buildings located a few blocks away from each other. BCCRC is located across the street from VCC where most cancer patients are treated, and cancer specialists are located. The GSC is located in a typical eight-storey office building, framed with red brick and large glass windows, between Cambie and West 7th Streets in a non-descript block of low-rise office buildings mixed with residential services and businesses. It is an area of Vancouver that is home to Vancouver General Hospital and one of the largest clusters of medical services in North America, including the University of British Columbia Medical School’s teaching campus. GSC was founded by Michael Smith—a Canadian Nobel Laureate in Chemistry awarded in 1999. The role of GSC is twofold: research department and technology platform of both sequencing and bioinformatics. The GSC pioneered next-generation sequencing technology in support of life sciences research of both cancer and infectious diseases. Today, the GSC is one of the most advanced genome research centers in Canada and a world class facility for publications and high throughput and DNA sequencing which has produced “more than 2.2 peta-bases (2.2 × 1015) of DNA sequence, equivalent to the number of base-pairs in 660,000 human genomes” (BCGSC, 2019). In other words, GSC has produced a massive amount of human genome data.
Each patient at POG undergoes a tumor biopsy and has comprehensive DNA and RNA sequencing. Tumor sequencing can produce about 600 GB of raw data that requires a lot of computational power of coding to analyze. That job goes to bioinformaticians who perform analyses using computer algorithms including machine learning and statistical analyses to identify variants that may be cancer “drivers”. Then, oncologists use the results to identify any known drugs that can target the cancer drivers. Before the oncologist applies these data in their clinical decision, POG holds a weekly tumor board meeting every Thursday. The weekly meeting brings together a group of multidisciplinary scientific and medical experts to discuss different patient cases.
We summarize the POG pipeline in Flowchart 1 (Figure 1) to provide an overview of the process of transforming cancer tumor in its fleshy form into computational Big Data. The flowchart visualizes a pipeline production that starts with a biopsy process done by an interventional radiologist at the VCC. POG withdraws blood samples from the patient to use as the germline genome to compare and contrast with the patient’s tumor samples, the somatic genome. Their tumor samples are sent to a pathology department also at the VCC for quality assessment. Both the blood and the tumor samples are then transferred to the 5th floor of the GSC for DNA and RNA Library Core Construction. Next, the DNA & RNA libraries are then sequenced in a HiSeq machine produced by Illumina on the 6th floor of the GSC building. The whole genome sequencing takes about three to four days to produce the raw genomic data. This dataset is then sent to a bioinformatics team for downstream quality control and analysis. The genomic analysis focuses on causal relationships between the genes and DNA and the causes of pathogenic variants that drive the growth of cancer cells. The end result is a genomic pathway that resembles a datafied map of what drives the cancer tumor. This datafied map illustrates an algorithmic pathway of cancer, encompassing expressional levels and other numerical measurements, biomarkers, mutations, alleles, and proteins. In other words, we have reached a complete informational inscription of our DNA and RNA. In what follows, we are going to delve into different practices, visions, and tensions among this multidisciplinary group of medical experts at POG precisely to highlight the interactions between the clinical gaze, the molecular gaze, and the informational gaze in the production and translation of genomic Big Data. Through these multiple gazes, we can also see multiple versions of cancer in the forms of flesh, of cells and molecules, and of Big Data. In what follows, we describe the biomedical pipeline of POG that inscribes bodily tissues into Big Data. We will then discuss three vignettes highlighting the interactions between various practices in their efforts to translate Big Data into actionable and meaningful knowledge.

POG flowchart: Multiple gazes of the cancer.
The cancer multiple
Radiology (clinical gaze): Rawness of the flesh
In summer 2017, we started shadowing scientists at POG in their laboratory to observe scientific processes of extracting DNA from human tumors and normal samples and constructing DNA library from these samples. Along with these observations, we also obtained 31 interviews with different medical actors involved with POG pipeline of extracting, assessing, sequencing, and analyzing both normal and pathological samples of the patient. The first step of the pipeline is a tumor biopsy, a physical extraction of a tumor sample from the cancer cell located inside the body. The interventional radiology department, specialized in tumor biopsy, is located on the 4th floor of VCC. We had an interview appointment with radiologist Doctor Farred. 1
Walking into Dr Farred’s office, he was diligently going through ultrasound and PET scan images of different cancer patients. He seemed highly skilled and disciplined. His eyes were stuck onto the screen while greeting one of the researchers on our team. He was comparing one image to another, identifying tumor locations, and making some kind of tacit decisions in his head regarding the quality of the tumors and how to extract them. At this moment, Dr Farred exerted a clinical gaze into the cancer through image-guided technology of ultrasound, PET scans, and MRI. Dr Farred obtained this clinical gaze through years of specialized training of studying medical images to diagnose cancer. To an interventional radiologist like Dr Farred, cancer remains as a physical property projected through images.
Dr Farred also performed biopsy on patients, in other words, an extraction of cancer tumor samples. Tumor biopsy is a standard procedure in oncology. However, tumor biopsy for genomic sequencing requires higher levels of cellularity than standard biopsies in order for the sample to yield higher content of DNA and RNA. As such, interventional radiologists, who are in charge of the biopsy process, typically need to use ultrasound to identify the most appropriate site of the lesion to extract. The POG biopsy also requires cells that are still active, and not necrotic, because these cells are later broken down and blended to release various enzymes. As such, a sample with a lot of necrotic tissues can damage the cellularity of the sample and fail to provide sufficient data for downstream analysis. POG biopsy samples strictly require cancer cells, and not normal cells, for sequencing and analysis. In the bodily milieu, normal and pathological cells are intricately layered with one another. To ensure the tumor samples meet the standards for sequencing, a technician from POG who specialized in assessing and processing tumor tissues was present in the biopsy room. The technician provided feedback to Dr Farred on the quality of the samples based on their color and consistency level. For example, a good liver tumor sample is typically solid with a creamy consistency. Dr Farred explained how you could touch the sample with a gloved hand to feel whether it is harder, firmer, or less elastic than the surrounding normal tissue. As such, it was a collaborative feedback system between Dr Farred and the technician to assess the quality of the tumor sample using their bodily senses of sight and touch.
Cancer, in a raw medical sense, refers to a pathological state of the cell and the flesh. At the biopsy stage, cancer exhibits a physical state of color and consistency that can be seen and touched. However, actors assess, define, and negotiate the quality of the tumor based on a feedback loop comprising two professional visions (Goodwin, 1994) of both the radiologist and the technician. By using their touch and vision to assess the quality of the tumor, both the radiologist and the technician “develop a feeling for” the tumor. This “feeling for the tumour” requires a tacit form of knowledge to measure what counts as creamy and firm tissues, and what counts as scanty and greasy tissues. Developing a feeling for tumor echoes “a feeling for error” in which scientists embody and materialize error to produce “stable and accurate” data (Garnett, 2016), or in this case, creamy and solid tissues. Hence, this visual feedback loop between the radiologist and the technician put the tumor tissues, the object or the materiality of the disease, on display or on stage, open for multiple interpretations and discussion. This feedback communication between the radiologist and the technician illustrates that the clinical gaze of assessing tumor is not pure but mediated by different approaches to ensure the tumor samples can yield sufficient data for sequencing and Big Data analysis in the later steps of the POG pipeline. At this biopsy stage, cancer exhibits its pathological bodily properties that are socially defined, negotiated and performed or enacted through an amalgamation of different gazes between two medical actors.
Pathology (molecular gaze): Spatial context of cells and molecules under the microscope
The next step of the POG pipeline presents a point of entry into the molecular gaze in which cancer is decomposed and understood in terms of protein expression and translocation. After a roll of tissue is extracted from the patient’s body, Dr Farred gave that roll of sample to the technician to process. The technician then applied a layer of optimal cutting temperature (OCT) compound, a frozen chemical, in order to freeze the tissue sample and preserve the best cellularity quality content. Then, the OCT-tissue sample was transferred to histopathology department at Vancouver Cancer Centre for sectioning and quality control. Histopathology is a medical specialty trained in studying diseases using microscopic examination of cells and tissues. The pathologist assesses the tumor content and the tumor cellularity by looking under a microscope.
In the microscopic view, the pathologist looks for spatial contexts of the cells through the expression, location, and nuclear translocation of proteins. The pathologist then produces a report describing the type of cancer, the size of the tumor, the invasiveness of the tumor, the grade of the cancer reflecting how the cancer cells look like in comparison with the normal cells, the mitotic rate (or the rate of cells dividing), the tumor margin, and the existence of cancer cells in lymph nodes, blood vessels or other organs. While the interventional radiologist defines cancer through their clinical gaze of tumors in the form of ultrasound images and creamy flesh, the pathologist exerts a molecular gaze through the microscopic view of the proteins embedded in the spatial contexts of cells and tissues making up the cancer. However, the molecular gaze of the pathologist is also hardly only molecular. The pathologist focuses their gaze not only on the molecules but also on the cells along with their parts and their environment. A leading pathologist at POG explained the pathological assessment of cancer could capture biological mechanisms of the cancer better than the knowledge and information provided from genomic sequencing. He argued that pathology could examine both the expression of proteins and spatial contexts of cells that make up the tumor, whereas he described bioinformatic analysis of cancer as “a soup of tumors, microenvironment and cells”. This comparison highlights one of the epistemological differences between pathology and bioinformatics, in which the former enacts their knowledge about cancer through studying tumors in relation to their microenvironment while the latter combines all these elements in the sequencing process and bioinformatic analysis to identify biomarkers and pathways that drive the cancer.
After the pathologist sections the biopsy samples, they transfer them to the GSC building located a few blocks away for DNA and RNA extraction of both the tumor and blood samples. The extraction process captures the total nucleic acid (tNA) which contains both DNA and RNA. Whole genome sequencing requires a library for both DNA and RNA. The DNA library is then used for genome sequencing, while the RNA library is for transcriptome analysis. Whole genome sequencing offers the complete orders of the entire DNA sequences in one’s genome at a specific time. Transcriptome sequencing provides insights into altered expressions of genetic variants that underlie the growth of the cancer cells. The name of the step, “Library Core Construction,” reflects a liminal gaze between molecular and informational in which the DNA molecules are treated less as biological properties but informational and textual objects that can be constructed into a library. In this step, DNA molecules are sheered, chopped up, end repaired, size selected, added with other tails for an exponential amplification. These rigorous steps make us question the plasticity and manipulability of our cells and molecules. Up until this point, the patient body has only been rendered molecular. She is about to become fully informational.
Bioinformatics (informational gaze) – Informational production of cancer
To make sense of these DNA and RNA libraries constructed from the tumor and blood samples of the patient requires the sequencing process. The DNA and RNA libraries of the samples are then transferred to the 6th floor at the GSC building for the sequencing process. For each patient, POG typically sequences one blood DNA and three tumor DNA samples. Sequencing these four samples requires reagents, a chemical produced by the biotech company Illumina. It costs roughly $10,000 CAD ($7650 USD) to sequence a genome of one human being. These DNA samples then go through a polymerase chain reaction (PCR) process to rapidly amplify and extend into 500,000,000 clusters, each of which contains 1000 molecules (or strands of DNA). The PCR process was done in a cBOT machine, which takes about 3.5 hours. After PCR, the samples are transferred to an Illumina HiSeq machine for sequencing, which contains two flow-cell holders and a set of sophisticated cameras. These cameras are set to take images that can signify locations and markers of each A, C, G, or T nucleotides. The final sequencing product is a large-scale file in a binary base call (BCL) format, containing actual base sequences and indexes that are uploaded to a secured server. The BCL file containing one human genome is about 600 GB in size. This dataset is then sent to bioinformatics and genomic analyst teams for downstream quality control and analysis. Subsequently, through genome and transcriptome sequencing, DNA and RNA molecules are fragmented, amplified, clustered, cloned, and inscribed from biological specificities into informational data and images.
Once the sequencing is uploaded, machine learning algorithms automatically pick up the file, identify cancer markers based on different probability values, and generate an automatic report of relevant biomarkers. Genome analysts will then assess the report to construct a genomic pathway with detailed annotation of “somatic single nucleotide and copy number variants, indels, gene fusions, genome rearrangements, and dysregulated gene expression pattern” (Laskin et al., 2015: 7). The analyst generates three expression metrics based on “a fold change in gene expression of [the tumor] compared with a compendium of normal tissues, a percentile ranking of gene expression within similar tumor types, [drawing from The Cancer Genome Atlas [TCGA] database], and a within-sample expression rank of each gene” (p. 7). Hence, the genomic pathway analysis is primarily concerned with copy number changes, gene expression levels, and percentile values of various markers that can pinpoint abnormalities. The pathway diagram and bioinformatic report are then sent to the clinician. At this stage, the body/cancer is fully converted or inscribed into numerical biomarkers and understood in terms of percentile scores and levels of expression. The flesh, cells, tissues, or molecules no longer hold an important feature for bioinformatic analysis. Data points, scores, numbers, and percentages of the genes now become the main essence of the body. The molecular gaze has been rendered into the informational gaze.
Thus far, we have “followed the samples'' through the POG pipeline to reflect the interactions between clinical, molecular, and informational gaze. In these gazes, the cancer takes on multiple forms, transitioning from creamy tissues into tNA into sheered DNA and, finally, into 600 GB data. At the bioinformatic stage, cancer is no longer constituted of one single disease, but now fractured into multiple mutations, biomarkers, and pathways measured and understood in terms of expression levels and percentile values. The process, however, does not end here. A 600 GB of Big Data does not automatically equate to clinical knowledge. Rather, the translation of Big Data into clinical knowledge requires constant negotiation, contestation, and communication between different medical stakeholders at POG. In some POG cases, the bioinformatic analysis allows clinicians to change their course of diagnosis and medical decision. However, in other cases, clinicians rely on their medical intuition and go against the uncertainty posed by the genomic results. In the next section, we explore a common challenge in Big Data studies regarding the translatability of data into practice. We delve into three vignettes that highlight the interactions between the three systems of knowledge, between pathology and bioinformatics and between oncology and bioinformatics, in determining what constitutes clinically actionable knowledge.
How genomic Big Data is translated into actionable/meaningful knowledge?
In this section, we will recount three vignettes that highlight the interactions between the various practices at POG in their efforts to translate genomic Big Data into actionable and meaningful knowledge. During the fieldwork, the term “actionable” alone conveys many ambiguities and contradictions among practitioners at POG. To a clinician, actionable data referred to novel variants being identified, and in turn, leading to an alteration of clinical decisions. In the first vignette, we recount a successful translation of Big Data into a reclassification of cancer diagnosis. During an interview with Harpreet, a PhD Candidate in Bioinformatics, she shared how Big Data and machine learning techniques employed in the bioinformatic pipeline can unveil new insights, and at times, challenge conventional methods in cancer diagnosis. In an article recently published by the bioinformatic team at POG, the team was able to employ supervised machine learning to revise diagnosis of an unknown primary cancer from vulva to breast cancer (Grewal et al., 2017). The initial pathology assessment for this patient with the unknown primary cancer was poorly differential. The pathologist suggested the patient might carry vulvar cancer. The bioinformatic team used machine learning to compare the cancer against 27 different TCGA cancer types, including breast and gynecologic cancers. The result significantly correlated the cancer with breast cancer and led to a “reclassification of the tumor as a primary HER2+ mammary-like adenocarcinoma of the vulva” that most resembled genomic profile of breast carcinomas (Grewal et al., 2017: 1). The POG team confirmed the genomic analysis with further immunohistology assessment. The oncologists then prescribed the patient therapies targeting HER2+ breast cancer.
This first vignette shows how Big Data and machine learning can lead to fragmentation of a single disease into multiple types based on biomarkers. This, in turn, can reclassify disease types and treatment. Vulva cancer was reclassified into the same group with breast cancer, based on the shared similarity in HER2+ overexpression, and the patient’s treatment was altered with targeted therapies of breast cancer. According to Harpreet, machine learning algorithms in this case were able to overcome uncertainties in pathology assessment to pinpoint the exact tumor type of the unknown primary cancer and revise treatment therapies for the patient. As such, the genomic data in this vignette was clinically actionable, as it successfully identified notable variants that altered clinical diagnosis. This definition of “actionable” aligns with the characteristics of actionability to most bioinformaticians. However, in addition to the identification of notable variants, actionability also “needs to be data-driven,” according to a leading bioinformatician at POG. By “data-driven,” the bioinformatician referred to a myriad of statistical methods ranging from Bayesian inference and probability to other computational algorithms such as hidden Markov models, and evidence collected from literature review and other clinical trials. As such, what constitutes as “actionable” knowledge of cancer and the body in the informational gaze follows statistical logics of probability and inference that can advance beyond conventional methods of diagnosis, and in turn, reclassify disease and treatment.
In the second vignette, we show the perspective of the pathologist toward “actionable data”. While the bioinformatician is driven by data, information, and statistics, the pathologist is driven by molecular structures of the body. Their visions and agendas for “actionable knowledge” are different from that of bioinformatician who is more driven by the validity and utility of the data. To a leading pathologist in pancreatic cancer, “actionable” meant “actually doing something” that was “beneficial to the patient” either by giving them a drug, re-defining the diagnosis, or overruling the morphology. He went on to share with us an interesting example that highlighted the limitations of bioinformatics: Bioinformaticians are people who look at the raw data and distill the data based on the current knowledge with an understanding of biology. Pathologists are clinicians who are tasked with making a diagnosis. So, my day to day [work] is to look through the [micro]scope and make a diagnosis. I give you the example, I don’t know what the POG number is, but it is a case where a patient had a colorectal cancer and they sequenced it and the bioinformatician presented the data and told us that this is what it’s all about and then we point it out them that the material that they just sequenced is just stool. The point being here is that as a bioinformatician you can find anything in the data but if what you put in is not being adequately assessed then what's the point. So, there's a huge difference between pathology and bioinformatics. (Pathologist) We are MDs and they aren’t. We are clinicians. We understand the impact of a diagnosis and what that means. Bioinformaticians at the end of the day are researchers and they are being given a set of data to analyze something. The example I gave you earlier is that they sequenced feces and they don't know. The whole point is, at the end of the day, everyone is dependent on pathology on the right tissue to go forward. (Pathologist)
The last vignette of the article will highlight the lost in translation or the lack of medical intuition between actionable data and meaningful clinical knowledge. In one of the gastric cancer cases, a patient at POG exhibits an overexpression of c-MET, which makes the patient “genomically eligible” for c-Met inhibitor. Bioinformatic analysis advised the clinician in charge of her case to give the patient a drug of c-Met inhibitor. However, the clinician did not agree with this genomic result as they have “enough data” to believe otherwise: There has been a randomized phase 3 trial stating that c-MET inhibitor is actually detrimental to patients, so we said no, we are not [going] do that. But the analyst also points out, the trial is done on genomically unselective patients, it's not fair to do that. And I was like, “we have a major randomized phase 3 trial that shows inferiority. Ethically we can't do that.” So, I think sometimes we know some data that the analyst doesn’t. Or at least the interpretation of the data where we feel, although it's a suggestion, ethically we feel we have enough data to not give that treatment. (Oncologist)
Conclusion
Big Data has made deep and wide-reaching contributions to clinical medicine over the last 20 years since the completion of the Human Genome Project. However, the path from discovery to application has been a slow and careful one. In the article, we describe and unpack a transformation process of cancer tumors into Big Data and the translatability of Big Data into clinical knowledge. Our research engages with critical data studies debate on the change versus continuity engendered by Big Data in the medical setting. We do not find the vision of Big Data having a strong revolutionizing effect in cancer care. Instead, we find different knowledge groups carefully and productively interacting and contesting each other in the knowledge making processes to produce and translate Big Data into clinical decisions. POG’s production of Big Data from cancer tumors requires an extensive network of collaboration among different medical practices from the biopsy to the imaging processes in order to ensure the sufficient quality of the samples for sequencing. As such, clinical genomics is not purely data-driven but interdependent on different styles of (medical) thought, gaze, and practice to be produced and made intelligible.
We also find, however, data-driven approaches to understanding cancer engenders a structural shift in the production of oncology knowledge by adding an extra layer of data-driven evidence to understanding cancer. Doctors are now increasingly collaborating with a special kind of data scientist, bioinformaticians, to understand genomic data and translate that into their treatment decisions. This structural change, to a certain extent, has led to an epistemological shift in what counts as reliable or actionable knowledge in understanding and treating cancer. Based on the first vignette, Big Data and machine learning algorithms can help the bioclinical collectives at POG to overcome uncertainties in medical diagnosis. This generates an ontological shift in the nature of cancer and treatment itself. Cancer treatments are now understood less in terms of pathology-based assessments than of biomarker-driven approaches that regroup and reclassify different cancers with shared biomarkers into the same basket of treatment. However, the entrance of Big Data does not equate to meaningful knowledge for many practitioners at POG. In the second vignette, the source of the Big Data was initially flawed with the bioinformaticians sequencing and analyzing feces, rather than cancer. In the third vignette, the knowledge generated from the data was not aligned with the ethical responsibilities or intuition of the doctor regardless of how “data-driven” the therapeutic recommendation was. Hence, to many doctors, the data-driven recommendations, which underpins the mechanism of precision medicine, can contradict the medical ethics, experience, and intuition of doctors to render diagnosis.
This research also contributes to critical Big Data and algorithm studies’ debates about power in knowledge production. Producing oncology knowledge or making clinical decisions creates interactions between different stakeholders and communities of practice that can cause conflict and challenges to hierarchy. However, this research observes a different problem that is equally important to understanding the nature of interactions and collaboration between multidisciplinary experts in clinical genomics. The ethnographic data shows how the difference medical specialties operate from different gazes and how these multiple gazes “hang together” to produce meaningful clinical knowledge for cancer patients. Empirically, what lies at stake of this collaborative network to produce and translate Big Data into clinical decision is life itself, the life of the patient. Our observations and interactions as researchers and collaborators with this line of clinical genomics projects show less of a competition over hierarchical position and more of a collaboration and convergence of different stakeholders and fields to solve a common problem. Of course, there is a gatekeeper at the end of that collaboration and that is the oncologist. They are the ones with the responsibility to choose a course of treatment action. This decision can literally be a life or death one. While there is a variation in the risk sensitivity among cancer doctors, they all have the goal of protecting the patient with treatment that has an acceptable level of research studies and practical consensus among their community.
This article also contributes a nuanced description of the ontological paradox of personalized medicine to science and technology studies and social studies of medicine. The discourse surrounding personalized medicine tends to focus on the benefits of personalizing or individualizing health care. Barbara Prainsack (2017) questions this very narrative in which she argues personalized medicine thrives on the discourse of empowering by individualizing patient care. Yet, in reality that empowerment works in not the favor of the patient, but of the new data economy through the intense datafication of the patient body. As our article has shown, although Big Data is embedded with commodity value, the fact that Big Data in oncology care is still filtered through the clinical gaze of oncologists means that the effects of data-commodification of precision medicine are tempered with medical intuition and ethics in medical settings. This paradox of personalized medicine is best captured with a response from one of the pathologists who argued that whichever gaze one employs to understand cancer, the ultimate goal should not be for “knowledge sake,” or for the dominance of one specialty over another but for “[finding] the cure” to “kill the cancer” for the benefit of the most important stakeholder, the patient: We're doing this because we want to kill the cancer (…) In some ways I don't care how we get there, whether we do it by protein expression or gene expression profiling, as long as we find the cure. That's what I want. I think the two of us, bioinformaticians and clinicians and pathologist should have that goal in mind. Some people might not have that goal in mind, like they couldn't care less if they cure it or not. They just want to know more about it right. I don't know who those people are, but I'm sure those exist right, like they just want to know it for knowledge sake. But I think if you work with real life patient and you're making decisions for them, then you should have that goal of we want to make their lives better. I think when you lose that focus, then you'll start getting into things that might harm, then you would start to say, “oh well this is likely elevated so let's just try this inhibitor or whatever,” right, you start to lose focus on whether that is really in the best interest of the patient. (Pediatric pathologist)
Footnotes
Acknowledgments
The authors are indebted to the generosity of the team of medical professionals in the Personalized OncoGenomics Group (POG) for allowing us to observe their work in their clinical and ethics meetings, labs, tumor board, and other work sites. We are also grateful for intellectual support of members in the GeNA lab at Simon Fraser University, in particularly Pippa Adams. Special thanks to Alberto Lusoli for helping edit
and to David Pham for careful transcriptions of the interviews. We are also thankful for intellectual support of faculty and fellow students at Cornell, including Prof. Michael E Lynch, Prof. Andrew Willford, Prof. Grant Farred, Ranjit Singh, Chris Hesselbein, Lisa Avron. Finally, we want to express our sincere gratitude to all three anonymous reviewers for their constructive comments and serious engagement with our article. The manuscript is comprised of original material that is not under review elsewhere, and that the study on which the research is based has been subject to Simon Fraser University Research Ethics Board and Cornell University Institutional Review Board for Human Participant Research.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research has been funded by Genome Canada and Social Sciences and Humanities Research Council of Canada (SSHRC) Insight Grant 435-2017-0602 and Sage Summer Fellowship from Cornell.
