Sage Journals: Discover world-class research

Abstract

Massively parallel sequencing helps create new knowledge on genes, variants and their association with disease phenotype. This important technological advancement simultaneously makes clinical decision making, using genomic information for cancer patients, more complex. Currently, identifying actionable pathogenic variants with diagnostic, prognostic, or predictive impact requires substantial manual effort. Objective: The purpose is to design a solution for clinical diagnostics of lymphoma, specifically for systematic variant filtering and interpretation. Methods: A scoping review and demonstrations from specialists serve as a basis for a blueprint of a solution for massively parallel sequencing-based genetic diagnostics. Results: The solution uses machine learning methods to facilitate decision making in the diagnostic process. A validation round of interviews with specialists consolidated the blueprint and anchored it across all relevant expert disciplines. The scoping review identified four components of variant filtering solutions: algorithms and Artificial Intelligence (AI) applications, software, bioinformatics pipelines and variant filtering strategies. The blueprint describes the input, the AI model and the interface for dynamic browsing. Conclusion: An AI-augmented system is designed for predicting pathogenic variants. While such a system can be used to classify identified variants, diagnosticians should still evaluate the classification’s accuracy, make corrections when necessary, and ultimately decide which variants are truly pathogenic.

Keywords

artificial intelligence clinical decision making machine learning massively parallel sequencing next-generation sequencing variant analysis variant filtering

Introduction

Technologies such as massively parallel sequencing, also known as next-generation sequencing (NGS), allow for faster and more efficient analysis of genomes, paving the way to new knowledge by providing a large amount of data with high accuracy, in a way that is quicker and less costly than conventional sequencing techniques such as Sanger sequencing.¹ While other methodologies used for targeted analysis such as Polymerase Chain Reaction (PCR) or Quantitative PCR (qPCR) are available at lower costs, they can only detect a limited number of known variants. Highly sensitive, with a large capability for discovery,² NGS employment has increased with time. Sequencing data undergoes extensive processing to identify clinically relevant variants. These variants are used for disease diagnosis, prognosis, therapeutic decision, and follow-up of patients with cancer.

Lymphoma is a malignancy of the lymphatic system, a highly heterogeneous group of diseases, with more than 80 subtypes defined in recent classifications,^3,4 which map to a similarly heterogeneous landscape of genetic aberrations, with hundreds of recurrently mutated genes in most entities.⁵ A select number of gene variants have been identified with diagnostic, prognostic and/or predictive impact.^6,7 Based on genetic alterations, improved subgrouping has been suggested in certain entities, paving the way for implementing precision medicine approaches in lymphoma. The value of NGS in lymphoma is currently being evaluated,⁸ making it suitable for consideration, even if other domain restrictions, like rare diseases, would have been an alternative. While improvements have been made regarding the variant filtering process for NGS analyses, there are still variants that present difficulties in interpretation; especially structural variants, resulting in a bottleneck that requires manual work and time while still remaining error-prone.^9,10 Targeted Sequencing (TS) allows researchers to use gene panels to target specific regions of interest for analysis.¹¹ Considering the large and diverse genomic landscape of lymphoma subtypes, it is a good candidate to apply targeted gene panels to, in order to identify the most relevant genes. The feasibility of NGS-based panel sequencing in lymphoma where results are discussed at a molecular tumour board (MTB) has also been demonstrated.¹² The use of TS here is motivated by it currently being the default method in most laboratories for the detection of somatic variants in hematological malignancies, providing a list of confirmed and potentially relevant genetic variants with diagnostic, prognostic and predictive impact.¹³

An important goal of cancer care is to treat patients efficiently while minimising side effects that affect patients’ quality of life. To this aim, precision medicine will aid individual treatment decisions using targeted therapies based on genomic information and biomarkers.¹⁴

NGS technologies have transformed genomics, but this advancement has not come without challenges in interpretation and analysis.¹⁵ Reproducibility in results requires quality control assurance, standardized laboratory protocols for library preparation and sequencing, and effective data management and storage.^16–18 This has also raised ethical, legal and societal concerns regarding privacy and informed consent.

The focus here will be on supporting clinical decision making regarding filtering variants and interpretation of remaining variants. Variants can be filtered based on their quality parameters and annotations. Filtering retains variants that reach the thresholds for different parameters such as coverage, read depth, variant allele frequency (VAF), etc. This step is usually subject to hard filtering and fixed thresholds. The variants are then filtered through a large population database to remove common variants found in the general population. Next, the remaining variants are evaluated for their association with cancer using annotations from datasets in somatic variant databases such as COSMIC. Until recently, classifications of variants for their pathogenicity lacked standardization. Each laboratory would subjectively interpret their results, leading to discrepancies and inconsistencies between reports from different labs. Two important efforts in standardisation are the ACMG/AMP guidelines for clinical classification of variants^19,20 and the Belgian guidelines for biological classification of variants,¹³ which categorize variants in five classes: P (pathogenic), LP (likely pathogenic), VUS (variant of uncertain significance) , LB (likely benign), B (benign).The most important advance, however, is tiering, such as the ESMO Scale for Clinical Actionability of molecular Targets (ESCAT).²¹

Efforts have been made to apply AI to genomics, attempting to correlate mutations with clinical phenotypes for cancer patients.²² With the increasing amount of available data on genes and variants detected in different diseases, it seems that AI could be a novel tool for precision hemato-oncology by providing accurate and quick diagnosis and finding the best treatment tailored to each individual, arguably shifting the definition of what constitutes a “gold standard” in cancer care.^22,23 Literature databases and knowledgebases for precision oncology contain vast amounts of information on genes, diseases, biomarkers, and treatments. Exploring new and existing publications in these databases is crucial for applying the latest knowledge to research and patient care. However, this process comes at a significant cost—the time spent carefully extracting the needed information.²⁴ Already 20 years ago, Natural Language Processing (NLP) was combined with bioinformatics as a means to improving sequence-retrieval and sequence annotation²⁵ and its use of PubMed training for generative AI is currently an active research area.²⁶

There are several bioinformatics tools aimed at different filtering steps that can be employed but their respective performances vary widely.^27,28 Hard filtering, using fixed thresholds for filtering out variants, is commonly used but adjustable thresholds may be necessary for maintaining a balance between sensitivity and specificity of variant calls.²⁹ This process demands extensive manual review and depends on geneticists conducting multiple assessments of the variants to draw conclusions.³⁰ Machine learning could streamline variant filtering and reduce the manual workload of investigating and interpreting genetic variants. Currently, the manual efforts of geneticists, who are evaluating the variants, create a significant bottleneck in delivering timely genetic analyses to clinics. While researchers have developed specific tools for various filtering tasks, few studies have examined a comprehensive, AI-augmented system to support the entire variant analysis process.³¹ This gap motivates our current study.

Methods

This study was based on the Design Science Research Methodology approach (DSRM)³² as it aims to let health informatics research and insights guide the development of a new solution by following six steps, as shown in Figure 1. For this paper, we focused on the first three steps, namely Identifying the problem, Defining objectives and Designing the solution. To this effect, a scoping review was first conducted, in February 2023, according to PRISMA-ScR³³ with details elaborated in Online Resource 1. The finely detailed outcome of the first step of DSRM, in which the problem is identified, based on all published research found relevant, is mostly in Online Resource 2. To illustrate the approach, Filtering Strategies are considered in the main text, while the supplement holds Pipelines and software, Algorithms and Artificial Intelligence.

Figure 1.

In this study, the first two steps and the beginning of the third step of the DSRM were completed.³² For the first step, a scoping review was conducted, as well as a hands-on demonstration by two molecular geneticists, showing their current workflows. A blueprint was designed during the second step, containing the components and objectives of the proposed solution. The blueprint serves as an almost complete third step, given the validation round with expert interviews, but at the end of the third step, a prototype would also be developed and evaluated with users, left for future research.

The main findings from the scoping review were used to design a solution blueprint. To gather information on the potential and pitfalls of the proposed blueprint, semi-structured interviews with experts from different fields were conducted, all of whom were stakeholders (Online Resource 3-6).

Results

DSRM1: Problem identification

Filtering strategies

Reviewed papers on filtering strategies generally considered criteria that can be split into three categories, as described by Sukhai et al.³⁴ These consist of parameters for quality filtering as a first step, followed by labelling germline and somatic variants based on minor allele frequency (MAF) values, in different population variant databases (PVDs), as well as the evidence on their pathogenicity according to databases such as ClinVar, HGMD and COSMIC (Table 1). One preliminary study compared AI-aided analysis of NGS using the decision-support software Watson for Oncology, with the standard manual method of NGS analysis done by a bioinformatician.⁴⁴ The results indicated that the automated computing could increase the efficiency of detecting and interpreting variants, important for making timely decisions for patient care and therapy.

Table 1.

Filtering strategies and criteria resulting from the scoping review.

Filtering strategies		Ref.
1. Quality filtering
• Remove synonymous variants (annotation filter)		^28,34–36
• Quality PASS flag (no strand bias)		³⁴
• Read depth	≥ 250	³⁴
	≥ 100	³⁷
	20x-30x for TS	²
	50 reads	^28,36
• VAF	≥ 5%	^34,35,37,38
	$>$ 4%	³⁶
	$>$ 1%	^28,39
	0.1%–0.2% TS	²
• Not in FLAGS⁴⁰ genes		³⁵
2. Labelling germline (normal) variants
• MAF	≥1% in PVDs	⁴¹
• B/LB variants according to	ClinVar	^28,34,36,41
	HGMD	^34,41
	LOVD	^41,42
3. Labelling somatic (tumour) variants
• MAF $<$ 1% in the PVDs	gnomAD	^2,37,39
	1000Genomes	^2,28,34,37
	dbSNP	^28,36
• Variants appearing in COSMIC database	Once	^{2,28,34,37,41,43}
	Twice	³⁴

DSRM2: Objectives

Drawing heavily on the literature findings (Table 2), the objective in the form of a blueprint is presented in Figure 2. The design starts with a pre-filtering step, which can be conducted even by existing tools. Most variant callers contain a pre-filtering option to retain only variants flagged as PASS, which pass the quality control parameters set by the caller.^29,43,48 Next, the filtering process is organised in three layers: input, AI model and interface for dynamic browsing. The main sources of the input for training the model are:

• a VCF file, containing a list of variants with labels confirmed by experts or in literature,

• public databases for somatic variants (COSMIC), population databases containing common variants (gnomAD), and catalogues of disease-associated variants (ClinVar, HGMD),

• an external NLP system for literature database mining.

Table 2.

Literature findings that served to design the blueprint.

Findings	References
Quality pre-filtering, retaining PASS variants	^34,36
Label germline variants against PVDs	^28,34,36,41
Label somatic variants against COSMIC	^{2,28,34,37,41,43}
Using adaptive cut-offs for values of VAF, MAF, read counts, etc.	^29,39,45
AI as an alternative to hard filtering	^46–50
Non-linear learning models	^{30,46,51–55}
SHAP and feature importance for AI models	^47,51
Mining the literature and NLP	^12,42,56
Interactive browser with dynamic filtering	^45,57
Internal database of identified variants	³⁷
MTB integration	^12,42,58

Figure 2.

Blueprint of an AI-augmented system for variant analysis.

These three sources of input were used and explored to construct a combined set of features with which to build a discriminative AI model. The model should be able to find patterns in the data and avoid the need for ad hoc fixed thresholds and user-defined criteria. Firstly, feature engineering (i.e., feature selection or feature construction) is important for optimising the results of machine learning models, and using only the most informative features or combinations of features is important for scalability of the model. An exception is for deep learning models that do not require this step of feature engineering. Second, machine learning methods successfully and commonly used in relevant literature include Random Forest and XGBoost. However, other algorithms could and should be experimented with and benchmarked. For internal validation, the method of 5- or 10-fold cross validation is common practice and/or reserving 10%–20% of the data as a holdout.⁵⁹ The AI model’s output labels variants as “Pathogenic” (including P and LP), “Normal” (including VUS, LB, and B, none of which are clinically actionable), or “Artifacts.” This classification adapts to clinicians’ preference for a maximally distilled view. For research purposes, the classification task would use six classes: P, LP, VUS, B, LB, and artifact. The output labels are followed by a confidence score from zero to 100, providing a more complete view of the classification decision. Finally, an interface for dynamic browsing is necessary to visualise the results of the filtering. Geneticists should be able to browse variants by class type and confidence levels and select the ones that need closer investigation. For example, a normal variant with a very low confidence score could be selected to show the features associated with it such as VAF, MAF, annotation and if necessary, visualise it in an IGV browser to determine if it is a true variant or an artefact. The system should allow manual re-labelling and saving all labels in an internal database. At the end, a report of confirmed pathogenic variants can be generated, or alternatively, data could be sent or shared to an MTB portal, as appropriate. Experts interviewed about the blueprint solution expressed a positive attitude towards it. They provided knowledge based on their backgrounds, highlighted both challenges and opportunities, suggested involving other stakeholders, and expressed a desire for integrating multimodality into the solution, as shown in Figure 3. Attitude and knowledge elicitation are the most important and are the ones most detailed in Online Resource 6.

Figure 3.

Validation round of the blueprint with expert interviews identified five main categories and additional subcategories.

Discussion

Current challenges in variant analysis described in literature include false positives, false negatives and the presence of artefacts, each of which may result in clinical misinterpretations,⁴³ or even incomplete or missed diagnoses.³⁹ Even though several filtering tools and pipelines have been developed, the manual inspection of each candidate variant remains common practice and is time- and labour-intensive, as well as prone to error.⁴³ Furthermore, the variant filtering process lacks a gold standard and filtering results vary for the same sample over different platforms, different variant calling tools,⁶⁰ and even between two runs on the same platform.²⁸ Different strategies for variant annotation and filtering also lead to differences in the variants detected and reported.³⁸ In short, machine learning methods need to learn to distinguish between true variants and artefacts. Such methods can also help with interpretation of the variants using information from several layers of evidence and databases. Generative AI can be used to enable a summary of findings to be preserved alongside the supportive information from existing literature in PubMed.²⁶

Despite much considerations for the future of clinical genomics, the most used sequencing method today in cancer diagnostics is TS, which is best fitted for clinical use since it can benefit from highly sensitive variant calling tools focused in the regions of interest.⁵⁷ Additionally, using small-medium size panels avoids introducing large amounts of technical artefacts related with sequencing, alignment or analysis.¹³ Another benefit is that TS, if it includes unique molecular identifiers and error correction, allows for the identification of variants at low VAF (as low as 0.1%–0.2%) with greater confidence.² Therefore, for the design of the blueprint, the focus stayed on TS and the workflows associated with it.

A system powered by machine learning methods for the filtering process has promise: eliminating hard filtering techniques would allow the model to make decisions based on patterns observed in the training data and interpret the variant by considering the complexity of combining knowledge from different features at the same time. Most tools can already today support interpretations of the variant in light of what they know about the phenotype of a particular patient. For example, consider a patient with diffuse large B-cell lymphoma and a variant in the BCL6 gene, typically expressed in the germinal centre B-cells. If histopathological examination shows aberrant expression patterns, such as BCL6 expression outside the germinal centres, it could be indicative of the functional impact of the variant. If a patient with a BCL6 variant shows significantly increased BCL-6 protein expression in the lymphoma cells compared to typical DLBCL cases, this might suggest that the variant leads to overexpression or increased stability of the BCL-6 protein. Currently, geneticists interpret the potential biological impact of most variants they find (P, LP, VUS) without knowing their true effect, especially when multiple variants are present in a single biopsy sample from tumour tissue. More functional multiomics research is needed before geneticists can draw conclusions about the actual impact on patient outcomes. As such, AI plays a crucial role when multiomics data becomes available.

Specific to lymphoma, researchers still generally lack, with some exceptions, comprehensive knowledge regarding the prognostic and treatment-decisive impact of genetic alterations. As lymphoma is so heterogeneous, this knowledge will likely need to be collected and validated in large prospective clinical trials and real-world data sets. Another promising avenue of research that could constitute a potentially clinically useful addition to guide prognosis and treatment will be longitudinal genetic analyses of circulating tumour DNA during the course of lymphoma.⁶¹

An issue with AI models is that their performance is tested by comparing it to the ground truth, which represents expert knowledge. Studies comparing AI to clinician performance have often indicated that AI might perform as good as, or better than clinicians.⁶² Therefore, if an AI model were to perform better than expert knowledge, this would not be reflected in the results, as any deviation from the human ground truth is considered as worse performance.⁶³ It is also a known and common occurrence for clinicians to disagree on a certain case.⁶⁴ With a great amount of inconsistencies in NGS results and reporting, even with existing standards and guidelines, it becomes a necessity to improve the consistency and replicability of NGS results. For this reason, automating parts of the process and implementing AI solutions can lead to the desired improvement of these criteria. When using databases to annotate and prioritise variants, the results will depend on the quality of the chosen databases.⁶⁵ COSMIC was chosen as the most comprehensive database of somatic cancer mutations, also supported by the scoping review, while for population databases containing common variants gnomAD was chosen, leaving out the dbSNP database as it contains a number of pathogenic cancer variants.^41,66

Automatic variant analysis with interpretation by geneticists will not only reduce the time of the analysis itself but also reduce the amount of genetic data to be used for further clinical interpretation. This has implications for health informatics regarding standardisation issues when integrating NGS data with patient data, as well as implications for data visualisation, human-computer interaction and decision support when implementing variant analysis results into the clinical work process. Relevant applications include decision support for the single (hemato)oncologist at the point of care, use in MTBs to discuss single-patient cases,¹² or support of research across multiple patients and settings with clinically enriched genetic data.⁶⁷ More extensive visualisation of the integrated data in patient portals needs to be further researched.

The strengths of this study are also related to decreasing the need for manual reviewing of all filtering results from an NGS analysis. The blueprinted solution can reduce the rates of false positives and false negatives, preventing misclassification. The AI tool can also perform genetic interpretation in seconds rather than in hours, allowing for efficient scaling up in size and for the interrater variability to approach zero. Comparative studies have also shown that AI tools can achieve high concordance rates with expert panels in variant classification. Challenges include interpretability of AI predictions, securing high quality data for training, and the elaborate procedures for regulatory approval. Clinical interpretations are not explicitly dealt with, and so whether or not the analysis is linked to diagnostic, prognostic and/or predictive impact is not considered. Further development and use of the pipeline that the blueprint prescribes will take this into account. It is important to note that decision power is still with the geneticists, as they are the ones reviewing the output and examining the results more closely.⁶⁸ The automation of decisions entirely would potentially bring sequencing and its results outside the scope of prior informed consent. In contrast to published literature describing experiments, algorithms and pipelines to improve filtering process mainly through hard filtering, a systemic view has been provided that seeks to integrate all sub-processes of variant filtering. An important addition can be the incorporation in an MTB where a multidisciplinary team consisting of oncologists, pathologists, geneticists, hematologists and bioinformaticians gather to discuss findings for especially difficult and complex patient cases.¹² Not only will this solution benefit the most complex patient cases but will also bring targeted cancer care closer to every lymphoma patient by improving the efficiency and efficacy of conducting NGS research. While this study was shaped by the context of lymphoma, it is not exclusive to it. The proposed solution can be generalised to other heterogeneous cancers, including both solid and hematological malignancies.

This study has several limitations. The results of the scoping review were formulated by a single reviewer, possibly introducing bias in what was considered important to include. No librarians were employed for a more thorough search that would involve more sources than only the three main databases used in this study. Although older publications might have been interesting to examine, most of the papers included in this study mentioned the lack of a gold standard which suggests that the present choice had no severe consequences regarding missed articles because of the publication year delimiter. Snowballing did allow for finding articles that were not found by the search strategy.

Limitations of the proposed solution are mainly related to feasibility. Considering the various data sources included in the input layer, their integration and the best feature compositions will be subject to further research and experiments. Another limitation is data availability. Building such a system, where the “small n, large p” problem is a recurrent issue in genomic data, could prove to be challenging. Therefore, improving data sharing⁶⁹ between laboratories and publishing their results on genomic variants will be necessary for the proposed solution. Investigations into the link from genetic variants to phenotype were limited to improving the filtering process of NGS results. Data sharing and data availability will be even more important when dealing with linking variants to phenotype. In the case of lymphoma, an AI-powered solution would require enough representations for each of the subtypes to reach good performance in making predictions. Separating P and LP variants from the rest is desirable, as they will be important for further interpretation and represent clinically actionable variants. For the blueprint to be realised at the clinic, ethics must be considered. A principle of central importance from a legal and ethical perspective is that of autonomy and that is in its turn guaranteed by securing prior informed consent. Patients have the right to be informed of the ways data, the use of data, and data sharing could potentially impact their right to privacy. In cases where the participant will not receive the benefits of the research on the data, it is considered particularly important in a research setting to provide an understanding of the risks and benefits of participation, so that the participant may make an informed and voluntary consent. The complexity of the technology used, the potentially unexpected results it could generate as well as the difficulties in mapping the exact process steps when this is performed by means of machine learning are potentially sources of complication that need to be explored when drafting ethical guidelines.

Taking into consideration these practical limitations, it is important to identify consent for genomic analysis within consent-in-practice, and develop new forms of consensual approaches. To achieve it, one has to actively work with a reorientation of the bioethical discussion towards the under-explored area of patients’ understandings of the communicative process that is consent-in-action, instead of insisting on a strict legalistic perspective that focuses primarily on the interpretation of consent forms.

Conclusion

The results presented should not be interpreted as the final word or a definitive solution, as the entire field of variant analysis and interpretation is under rapid methodological development.^70,71 This makes blueprints and process flow charts tentative, in anticipation of clinical trials. With that in mind, this study was able to propose a design of an AI-augmented system for variant filtering and analysis, which would reduce the costly and error-prone manual labour associated with this process. There are many possibilities for further research.

Supplemental Material

Supplemental Material - Smart variant filtering - A blueprint solution for massively parallel sequencing-based variant analysis

Supplemental Material for Smart variant filtering - A blueprint solution for massively parallel sequencing-based variant analysis by Orlinda Brahimllari, Sandra Eloranta, Patrik Georgii-Hemming, Zahra Haider, Sabine Koch, Aleksandra Krstic, Frantzeska Papadopoulou Skarp, Richard Rosenquist, Karin E. Smedby, Fulya Taylan, Birna Thorvaldsdottir, Valtteri Wirta, Tove Wästerlid, Magnus Boman in Health Informatics Journal.

Footnotes

Acknowledgment

The authors thank Noura Ezaz-Nikpay for comments on an earlier draft.

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: RR received honoraria from Abbvie, AstraZeneca, Janssen, Illumina and Roche.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Vetenskapsradet; 2021-04610.

Ethical statement

ORCID iDs

Sabine Koch

Richard Rosenquist

Fulya Taylan

Tove Wästerlid

Magnus Boman

Data availability statement

No datasets were created or stored within this study.

Supplemental Material

Supplemental material for this article is available online. Auxiliary tables and figures have been deposited as online resources.

References

Hussen

Abdullah

Salihi

, et al. The emerging roles of NGS in clinical oncology and personalized medicine. Pathol Res Pract 2022; 230: 153760.

Bewicke-Copley

Arjun Kumar

Palladino

, et al. Applications and analysis of targeted genomic sequencing in cancer studies. Comput Struct Biotechnol J 2019; 17: 1348–1359.

Alaggio

Amador

Anagnostopoulos

, et al. The 5th edition of the world health organization classification of haematolymphoid tumours: lymphoid neoplasms. Leukemia 2022; 36(7): 1720–1748.

Campo

Jaffe

Cook

, et al. The international consensus classification of mature lymphoid neoplasms: a report from the clinical advisory committee. Blood, The Journal of the American Society of Hematology 2022; 140(11): 1229–1253.

De Leval

Alizadeh

Bergsagel

, et al. Genomic profiling for clinical decision making in lymphoid neoplasms. Blood, The Journal of the American Society of Hematology 2022; 140(21): 2193–2227.

Bühler

Martin-Subero

Pan-Hammarström

, et al. Towards precision medicine in lymphoid malignancies. J Intern Med 2022; 292(2): 221–242.

Mansouri

Thorvaldsdottir

Laidou

, et al. Precision diagnostics in lymphomas – recent developments and future directions. Semin Cancer Biol 2022; 84: 170–183, Precision Medicine in Cancer.

Smedby

Wästerlid

Tham

, et al. The biolymph study–implementing precision medicine approaches in lymphoma diagnostics, treatment and follow-up: feasibility and first results. Acta Oncol 2023; 62(6): 560–564.

Shao

Tian

, et al. Analysis of error profiles in deep next-generation sequencing data. Genome Biol 2019; 20: 50.

10.

Tamborero

Dienstmann

Rachid

, et al. The molecular tumor board portal supports clinical decisions and automated reporting for precision oncology. Nat Can (Ott) 2022; 3(2): 251–261.

11.

Grada

Weinbrecht

. Next-generation sequencing: methodology and application. J Invest Dermatol 2013; 133(8): e11.

12.

Rodríguez Ruiz

Abd Own

Ekström Smedby

, et al. Data-driven support to decision-making in molecular tumor boards for lymphoma: a design science approach. Front Oncol 2022; 12: 984021.

13.

Froyen

Le Mercier

Lierman

, et al. Standardization of somatic variant classifications in solid and haematological tumors by a two-level approach of biological and clinical classes: an initiative of the belgian compermed expert panel. Cancers 2019; 11(12): 2030.

14.

Lassen

Makaroff

Stenzinger

, et al. Precision oncology: a clinical and patient perspective. Future Oncol 2021; 17(30): 3995–4009.

15.

Bacher

Shumilov

Flach

, et al. Challenges in the introduction of next-generation sequencing (ngs) for diagnostics of myeloid malignancies into clinical routine use. Blood Cancer J 2018; 8(11): 113.

16.

Anderson

Schrijver

. Next generation DNA sequencing and the future of genomic medicine. Genes 2010; 1(1): 38–69.

17.

Gullapalli

Desai

Santana-Santos

, et al. Next generation sequencing in clinical medicine: challenges and lessons for pathology and biomedical informatics. J Pathol Inf 2012; 3(1): 40.

18.

Samariya

Aryal

, et al. Detection and explanation of anomalies in healthcare data. Health Inf Sci Syst 2023; 11(1): 20.

19.

Datto

Duncavage

, et al. Standards and guidelines for the interpretation and reporting of sequence variants in cancer: a joint consensus recommendation of the association for molecular pathology, american society of clinical oncology, and college of american pathologists. J Mol Diagn 2017; 19(1): 4–23.

20.

Richards

Aziz

Bale

, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the american college of medical genetics and genomics and the association for molecular pathology. Genet Med 2015; 17(5): 405–424.

21.

Mateo

Chakravarty

Dienstmann

, et al. A framework to rank genomic alterations as targets for cancer precision medicine: the ESMO Scale for Clinical Actionability of molecular Targets (ESCAT). Ann Oncol 2018; 29(9): 1895–1902.

22.

Bibault

Burgun

Fournier

, et al. Chapter 18 - Artificial intelligence in oncology. In Artificial Intelligence in Medicine. Academic Press, 2021, pp. 361–381.

23.

Derbal

. Can artificial intelligence improve cancer treatments? Health Inf J 2022; 28(2): 14604582221102314.

24.

Harmston

Filsell

Stumpf

. What the papers say: text mining for genomics and systems biology. Hum genomics 2010; 5: 17–29.

25.

Yandell

Majoros

. Genomics and natural language processing. Nat Rev Genet 2002; 3(8): 601–610.

26.

Jin

Yang

Chen

, et al. GeneGPT: augmenting large language models with domain tools for improved access to biomedical information. ArXiv 2024;40(2): btae075, arXiv:2304.09667.

27.

Niroula

Vihinen

. How good are pathogenicity predictors in detecting benign variants? PLoS Comput Biol 2019; 15(2): e1006481.

28.

Sandmann

Karimi

de Graaf

, et al. Appreci8: a pipeline for precise variant calling integrating 8 tools. Bioinformatics 2018; 34(24): 4205–4212.

29.

Schneider

Smith

Rossi

, et al. Validation of a customized bioinformatics pipeline for a clinical next-generation sequencing test targeting solid tumor-associated variants. J Mol Diagn 2018; 20(3): 355–365.

30.

Zhang

Wang

Zhou

, et al. VariFAST: a variant filter by automated scoring based on tagged-signatures. BMC Bioinf 2019; 20(22): 713.

31.

Rissanen

. Translational health technology and system schemes: enhancing the dynamics of health informatics. Health Inf Sci Syst 2020; 8(1): 39.

32.

Peffers

Tuunanen

Rothenberger

, et al. A design science research methodology for information systems research. J Manag Inf Syst 2007; 24(3): 45–77.

33.

Tricco

Lillie

Zarin

, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 2018; 169(7): 467–473.

34.

Sukhai

Misyura

Thomas

, et al. Somatic tumor variant filtration strategies to optimize tumor-only molecular profiling using targeted next-generation sequencing panels. J Mol Diagn 2019; 21(2): 261–273.

35.

Ulgen

Can

Bilguvar

, et al. Sequential filtering for clinically relevant variants as a method for clinical interpretation of whole exome sequencing findings in glioma. BMC Med Genom 2021; 14(1): 54.

36.

Zhong

Wagner

Kurt

, et al. Multi-laboratory proficiency testing of clinical cancer genomic profiling by next-generation sequencing. Pathol Res Pract 2018; 214(7): 957–963.

37.

Sun

Thorson

Murray

. Annotation of variant data from high-throughput DNA sequencing from tumor specimens: filtering strategies to identify driver mutations. Methods Mol Biol 2019; 1908: 49–60.

38.

Schejbel

Novotny

Breinholt

, et al. Improved variant detection in clinical myeloid NGS testing by supplementing a commercial myeloid NGS assay with custom or extended data filtering and accessory fragment analysis. Mol Diagn Ther 2021; 25(2): 251–266.

39.

Najafi

Caspar

Meienberg

, et al. Variant filtering, digenic variants, and other challenges in clinical sequencing: a lesson from fibrillinopathies. Clin Genet 2020; 97(2): 235–245.

40.

Shyr

Tarailo-Graovac

Gottlieb

, et al. FLAGS, frequently mutated genes in public exomes. BMC Med Genom 2014; 7: 64.

41.

Gao

Zhang

. Comprehensive elaboration of database resources utilized in next-generation sequencing-based tumor somatic mutation detection. Biochim Biophys Acta Rev Cancer 2019; 1872(1): 122–137.

42.

Hamamoto

Koyama

Kouno

, et al. Introducing AI to the molecular tumor board: one direction toward the establishment of precision medicine using large-scale cancer clinical and biological information. Exp Hematol Oncol 2022; 11(1): 82.

43.

Kim

Lee

, et al. FIREVAT: finding reliable variants without artifacts in human cancer samples using etiologically relevant mutational signatures. Genome Med 2019; 11(1): 81.

44.

Chen

Yan

Xie

, et al. Comparative analysis of target gene exon sequencing by cognitive technology using a next generation sequencing platform in patients with lung cancer. Mol Clin Oncol 2021; 14(2): 36.

45.

Astrinaki

Kanterakis

Latsoudis

, et al. Zazz: variant annotation and exploration of next generation sequencing variants. In: 2019 IEEE 19th international conference on bioinformatics and bioengineering (BIBE), Athens, GR, 28 October 2019. IEEE, pp. 856–860.

46.

Anzar

Sverchkova

Stratford

, et al. NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer. BMC Med Genom 2019; 12(1): 63.

47.

Harkins

Chang

Patel

, et al. Remaining challenges in predicting patient outcomes for diffuse large B-cell lymphoma. Expert Rev Hematol 2019; 12(11): 959–973.

48.

Asada

Kaneko

Takasawa

, et al. Integrated analysis of whole genome and epigenome data using machine learning technology: toward the establishment of precision oncology. Front Oncol 2021; 11: 666937.

49.

Dotolo

Esposito

Roma

, et al. Bioinformatics: from NGS data to biological complexity in variant detection and oncological clinical practice. Biomedicines 2022; 10(9): 2074.

50.

Dlamini

Francies

Hull

, et al. Artificial intelligence (AI) and big data in cancer and precision oncology. Comput Struct Biotechnol J 2020; 18: 2300–2311.

51.

Favalli

Tini

Bonetti

, et al. Machine learning-based reclassification of germline variants of unknown significance: the RENOVO algorithm. Am J Hum Genet 2021; 108(4): 682–695.

52.

Ainscough

Barnell

Ronning

, et al. A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data. Nat Genet 2018; 50(12): 1735–1743.

53.

Ravasio

Ritelli

Legati

, et al. GARFIELD-NGS: genomic vARiants FIltering by dEep Learning moDels in NGS. Bioinformatics 2018; 34(17): 3038–3040.

54.

Vadapalli

Abdelhalim

Zeeshan

, et al. Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine. Briefings Bioinf 2022; 23(5): bbac191.

55.

Ren

Kong

Zhou

, et al. FVC as an adaptive and accurate method for filtering variants from popular NGS analysis pipelines. Commun Biol 2022; 5(1): 975–979.

56.

Zeng

Shufean

. Molecular-based precision oncology clinical decision making augmented by artificial intelligence. Emerg Top Life Sci 2021; 5(6): 757–764.

57.

Wünsch

Banck

Müller-Tidow

, et al. AMLVaran: a software approach to implement variant analysis of targeted NGS sequencing data in an oncological care setting. BMC Med Genom 2020; 13(1): 17.

58.

Kurz

Perera-Bel

Höltermann

, et al. Identifying actionable variants in cancer - the dual web and batch processing tool MTB-report. Stud Health Technol Inf 2022; 296: 73–80.

59.

Eloranta

Boman

. Predictive models for clinical decision making: deep dives in practical machine learning. J Intern Med 2022; 292(2): 278–295.

60.

Vilov

Heinig

. Deepsom: a cnn-based approach to somatic variant calling in wgs samples without a matched normal. Bioinformatics 2023; 39(1): btac828.

61.

Roschewski

Rossi

Kurtz

, et al. Circulating tumor dna in lymphoma: principles and future directions. Blood Cancer Discov 2022; 3(1): 5–15.

62.

Nagendran

Chen

Lovejoy

, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020; 368: m689.

63.

Boman

. Human-curated validation of machine learning algorithms for health data. Digital Society 2023; 2(3): 46.

64.

Adamson

Welch

. Machine learning and the cancer-diagnosis problem-no gold standard. N Engl J Med 2019; 381(24): 2285–2287.

65.

Lenassi

Carvalho

Thormann

, et al. EyeG2P: an automated variant filtering approach improves efficiency of diagnostic genomic testing for inherited ophthalmic disorders. J Med Genet 2023; 60: 810.

66.

Vossen

Verhagen

Grenman

, et al. Role of variant allele fraction and rare SNP filtering to improve cellular DNA repair endpoint association. PLoS One 2018; 13(11): e0206632.

67.

Gruendner

Wolf

Tögel

, et al. Integrating genomics and clinical data for statistical analysis by using genome mining (gemini) and fast healthcare interoperability resources (fhir): system design and implementation. J Med Internet Res 2020; 22(10): e19879.

68.

Naik

Hameed

Shetty

, et al.

Legal and ethical consideration in artificial intelligence in healthcare: who takes responsibility?

Front Surg 2022; 9: 862322.

69.

Vassilakopoulou

Aanestad

. Communal data work: data sharing and re-use in clinical genetics. Health Inf J 2019; 25(3): 511–525.

70.

Horak

Griffith

Danos

, et al. Standards for the classification of pathogenicity of somatic variants in cancer (oncogenicity): joint recommendations of clinical genome resource (clingen), cancer genomics consortium (cgc), and variant interpretation for cancer consortium (vicc). Genet Med 2022; 24(5): 986–998.

71.

Jäger

. Bioinformatics workflows for clinical applications in precision oncology. Semin Cancer Biol 2022; 84: 103–112, Academic Press.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.27 MB