Sage Journals: Discover world-class research

Abstract

Background:

Accurate documentation of tumours presents significant opportunities for advancing cancer research and improving patient care, yet it also poses challenges for healthcare management.

Objective:

This study aimed to assess the effectiveness and resource implications of source data verification (SDV) in enhancing the quality of tumour documentation data, focusing on accuracy, completeness and correctness.

Method:

Using tumour documentation data from a large German University Hospital, an SDV was conducted by an external audit group (group RE), comparing the data initially documented by the centre’s tumour documentalists (group TD) to available source documents for the years 2016–2020. The analysis set comprised 240 cases, with exemplary data fields strategically selected across various organ entities and other tumour features. Identified errors were cross-validated by a third group (group CO).

Results:

Visualisations depicted error frequencies by diagnosis year and organ entity. Potential errors were identified, providing feedback to the tumour documentation unit. However, uncertainties in error identification raised questions about the efficacy of SDV.

Conclusion:

While effective in identifying errors, SDV faced challenges due to ambiguous source data and potential bias from external auditors, as well as being deemed uneconomical. The study suggests SDVs suitability for small sample validation but questions its scalability for large datasets.

Implications for health information management:

Alternative methods, such as data exchange interfaces to subsystems or plausibility checks, are recommended for enhancing data quality. This study emphasises the need to explore alternatives for improving data quality in tumour documentation.

Keywords

data analysis data quality health information management quality indicators source data verification tumour documentation

Introduction

In medical research and clinical practice, accurate and reliable documentation of tumour-related data can serve as a prerequisite for a better understanding of cancer and improved patient care (Kourou et al., 2021). As medical data are increasingly converted to digital formats, data quality challenges become more relevant (Raghupathi and Raghupathi, 2014). Ensuring the integrity, completeness and accuracy of tumour documentation data is not only essential for research and clinical decision-making but also has the potential to significantly impact patient outcomes and treatments (Zarour et al., 2021). In this context, source data verification (SDV) has emerged as a potential strategy to improve data quality by validating different data quality indicators of source data against their documented representation (Nonnemacher and Stausberg, 2006).

In many countries, such as the United States (based on the Surveillance, Epidemiology, and End Results programme (Cronin et al., 2014), the United Kingdom (based on the Nation Cancer Registration and Analysis Service) (Henson et al., 2020) or Germany (based on the Oncological Base Dataset) (Basisdatensatz, 2021), standardised tumour documentation has been established, driven by certification requirements or legal mandates.

Tumour documentation data are derived from routinely collected information. The utilisation of routinely collected data carries a risk highlighted by the “garbage-in, garbage-out” phenomenon, which posits that the quality of output is contingent upon the quality of input within any given system (Kilkenny and Robinson, 2018). Additionally, transcription errors represent another potential source of inaccuracies in data entries, manifesting as errors that occur during the manual transcribing of information from unstructured sources into structured tumour documentation systems (Feng et al., 2020). Theoretically, the obligation to document should ensure the absence of data gaps; however practically, regulations in terms of data quality are often missing. While there is an awareness of the importance of high-quality data, the establishment of a structured and sustainable data quality measurement and enhancement process to ensure reliable data for health research, audits, certifications and broader quality assurance efforts is notably lacking (Jacke et al., 2012).

Borner et al. (2022) addressed this problem by conducting a study about a data quality analysis with a SDV for 2014 and 2015 at the Ludwig Maximilian University (LMU) Hospital, based on Comprehensive Cancer Centre (CCC) (Comprehensive Cancer Centre Munich, 2023) tumour documentation data. The term “SDV” entails the examination of collected data against original documents, such as laboratory reports, to ensure accuracy and reliability (Andersen et al., 2015).

The results in Borner et al.’s (2022) experiment were largely positive and indicated that SDV might serve as an effective tool for detecting errors in tumour documentation. However, retrospectively, a potential methodological problem was identified in which the local tumour documentalists themselves carried out the SDV within their own context. Consequently, it is possible that systematic errors that were considered correct within the local team of tumour documentalists might have been assessed as incorrect by an external auditor.

Motivated by this, the current analysis team embarked on a reassessment of SDV for the years 2016–2020, on the LMU-Hospital’s tumour documentation data focusing on selected data fields and utilising an external audit group not being part of the team of tumour documentalists, for the verification process. The aim was to re-evaluate the effectiveness and resource implications of SDV as a means of enhancing data quality in tumour documentation. The key areas of concentration included the data quality indicators accuracy, completeness and correctness, as well as the cost-effectiveness of the method.

Method

Research design

The methodological framework employed in this study encompasses a SDV as its primary approach and quantifying the data quality indicators of completeness, correctness and accuracy within a subset of data extracted on 9 February 2021 from the LMU Hospital’s CREDOS system (Cancer Retrieval Evaluation and Documentation System), a local clinical cancer registry that is not publicly accessible (Nasseh et al., 2020; Voigt et al., 2010).

While the local tumour documentation system was relatively extensive, encompassing over 50,000 cases in the year 2022, and comprising more than 100 tables and 2000 data fields, a deliberate reduction in data scope to a sample (hereinafter the ‘analysis set’) was undertaken for the purposes of evaluation (1 February 2021). This reduction led to a final dataset of 240 cases. It is essential to underscore that the cases exclusively pertained to primary cases, as defined by Onkozert (2021), a certification-based designation best described as cases that received their primary treatment at one of the local cancer centres. In our context, these centres were affiliated with the CCC Munich. The selection of cases for inclusion in this analysis set was conducted with an emphasis on an equitable distribution across the years of diagnosis 2016 to 2020.

Additionally, the dataset was stratified based on the following organ entities, with equal representation for each: breast, colon, gynaecological, head and neck, lung, neurological, pancreas and prostate tumours. Furthermore, an additional selection criterion was applied, leading to the inclusion of a total of six cases per year and organ entity based on their initial therapy, where two cases received either surgery, radiation therapy or systemic therapy. This selection criterion was adopted to ensure a focus on representative data fields in the domains of diagnosis, therapy and progression, as shown in Box 1. All cases with the given data contents were exported into a study database.

Box 1.

Description of selected fields. Each field belongs to one of the three categories: first assessment; therapy; progression.

First assessment
Diagnosis	Diagnosis code according to ICD-10-GM. International Statistical Classification of Diseases and Related Health Problems (ICD-10, 2019)
Histology	Histology code according to the ICD-O-3 catalogue (morphology code) (Brierley et al., 2017)
Date of diagnosis	Date of diagnosis
Grading	Grading according to the TNM classification
T	Size or direct extent of the primary tumour according to the TNM classification (Brierley et al., 2017)
N	Degree of spread to regional lymph nodes according to the TNM classification (Brierley et al., 2017)
M	Presence of distant metastases according to the TNM classification (Brierley et al., 2017)
R-classification	Overall assessment of the resection margin of the cancer, including possible distant metastases (Brierley et al., 2017)
Therapy
Date of therapy (start)	Starting date of a therapy
Date therapy (end)	Ending date of a therapy
Operation: OPS-code	Description of all performed procedures or operations according to the OPS systematic (Bundesinstitut für Arzneimittel und Medizinprodukte, 2023)
Systemic therapy: Substance	Active substances in the given systemic therapy protocol.
Radiation: Total dose	Total dose during radiation
Radiation: Area of radiation	Area of radiation
Progression
Overall assessment	Overall assessment of the disease (e.g. progression or remission)
Tumour status	Changes in the primary tumour
Status of metastases	Indication whether a metastasis is newly progressing
Lymph nodes	Overall assessment of the regional lymph nodes

OPS: operations and procedures; TNM: Tumour, Node, Metastasis.

The execution of this study was entrusted to two groups, group RE (re-evaluation) and group CO (control), both freshly affiliated with the CCC in terms of the study, albeit external to the tumour documentation team, but experienced in the field of medicine. This strategic choice was deliberate, serving the purpose of ensuring an absence of bias associated with team internal regulations governing tumour documentation.

Group RE was assigned the responsibility of re-documenting the provided sample in the absence of access to the preceding tumour documentation. Group RE was given a blank version of the research database, only containing patient identification numbers and Supplemental Material about the year of diagnosis, entity and the required documentation in terms of therapeutic interventions, encompassing surgical procedures, systemic therapy or radiation treatments. Like the tumour documentalists on the internal team, group RE then got access to all source data, such as doctors’ notes and radiation therapy reports. This allowed group RE to fill in the missing parts of the study database. The initial data collection process was straightforward, focusing only on the fields selected for the study. For therapy-related fields, only those associated with the first therapy were documented. In this process, it was revealed that the operations and procedures (OPS) code consistently exhibited full correctness, as the information was semi-automatically imported from a SAP i.s.h.med-based surgery subsystem (Hoeksma, 2009). Consequently, contrary to the initial plan, this data field was excluded from the analysis.

Regarding the patient’s progress, it was initially decided to document only the fields of the most recent progress in relation to the export date of the sample. However, this approach was abandoned during the data collection phase (a decision we discuss in the subsequent discussion section of this article).

Prior to data collection, a copy of the database containing exports from the tumour documentation system was created and isolated from group RE. Later, this enabled a computational comparison between the original data collection from the tumour documentation conducted by group TD (tumour documentalists) and the independent data collection by group RE. During data reconciliation, three types of errors were distinguished:

Completeness errors: Documentation for a data field was present for either group RE or group TD but not both.

Correctness errors: Documentation was present in both datasets but exhibited significant discrepancies (e.g. ICD-10 diagnosis codes C34 and C55).

Accuracy errors: Documentation was available in both datasets, but the discrepancy was considered an inaccuracy based on analysis specific rules (e.g. ICD-10 diagnosis C34.9 instead of C34.3). A listing of our accuracy rules can be found in Box 2.

Box 2.

Description of accuracy errors and how it was applied.

Data field	Accuracy rule
ICD-10	The first three characters match but the sub-classification differs (e.g. “C50.9” vs “C50.1”). Correctness error: The first three characters do not match.
Date of diagnosis	14 days tolerance
Date of therapy	14 days tolerance
Date of therapy end	14 days tolerance

A pitfall observed in Borner et al.’s (2022) work was the attribution of errors solely to the detriment of the tumour documentarians (group TD). In reality, it is plausible that errors could have been made by group RE as well. While our study regarded concurrence in errors as genuine agreements, all cases of disagreement were re-evaluated by a third individual (group CO), who also had access to source data and previous documentation but without the knowledge of its origin (group RE or group TD). Group CO thus adjudicated responsibility for errors. There were scenarios where group CO did not concur with the findings of either group RE or group TD. In these instances, the results from group CO had to be neglected due to ambiguity. The complete workflow is illustrated in Figure 1.

Figure 1.

SDV process conducted by group RE and group CO.

The following error classes were defined for the final evaluation (see Box 3). Using these error classes, the analysis results were visualised using Python v. 3.8.8 (Python, 2021). If group RE committed an error, it was annotated as ACC_RE, COR_RE, and COM_RE. For group TD as ACC_TD, COR_TD, COM_TD. In the event that group CO determined the documentations of both groups to be incorrect, the labels ACC_CO, COR_CO, and COM_CO were applied.

Box 3.

Error categorisation for accuracy, correctness and completeness in documentation for groups RE, TD and CO, including uncertainty (U) for group CO.

Group	Accuracy	Correctness	Completeness	Uncertainty
Group RE	ACC_RE	COR_RE	COM_RE
Group TD	ACC_TD	COR_TD	COM_TD
Group CO	ACC_CO	COR_CO	COM_CO	U

RE: re-evaluation; CO: control; TD: tumour documentalists.

Ethics approval

Formal ethics approval from a Human Research Ethics Committee (HREC) was considered unnecessary as the data were retrospective, de-identified and aggregated for publication.

Results

Figure 2 portrays the relative frequency of errors uniquely made in the original tumour documentation, focusing on the distribution of errors related to correctness, completeness and accuracy within the individual audit fields. This chart essentially represents the result of the SDV process, evaluated by group RE and validated by group CO. Errors or uncertainties of both groups TD and RE according to group CO, due to ambiguity, were excluded from this graph. It should be noted again that the audit fields in the domain of progressions were not used due to methodological difficulties (see ‘Discussion’ section).

Figure 2.

Individual error rates of group TD.

Figure 3(a and b) (first assessment) and Figure 4(a and b) (therapies), on the other hand, compares the distribution of the earlier introduced error classes also concerning uncertainties or errors made by both group TD and RE (according to group CO). In these graphs column TD again displays the errors attributed to group TD, while column RE shows the errors attributed to group RE by group CO. The last column represents the fields where neither group RE nor group TD, according to group CO, were deemed correct, or cases in which group CO was uncertain (degree of ambiguity). ‘

Figure 3.

(a and b) Potential error classes of fields related to first assessment grouped by group RE, group TD or ambiguity (CO) with graph A showing relative error rates and graph B showing absolute numbers.

Figure 4.

(a and b) Potential error classes of fields related to therapies grouped by group RE, group TD or ambiguity (CO) with graph A showing relative error rates and graph B showing absolute numbers.

The errors, categorised by type (correctness, completeness and accuracy), uniquely made by Group TD were also analysed over the years, providing insights into potential trends in data quality (see Figure 5). Figure 6 provides insights into the level of accuracy in date fields, as these are often suspected to exhibit accuracy errors, rather than errors related to correctness or completeness. A substantial proportion of discrepancies remain within a narrow margin from the original date (14 days), a range that is unlikely to have a significant impact on research outcomes and is therefore classified as an accuracy error. Most deviations beyond this threshold are typically within approximately 50 days. However, a small number of extreme outliers are evident, representing clear and significant errors that may warrant closer examination.

Figure 5.

Potential error rates over the years of group TD.

Figure 6.

Accuracy error assessment (date fields): The blue region (14 days) delineated signifies the interval where date errors are exclusively evaluated as accuracy errors and do not carry the classification of correctness errors.

Next, the performances of group RE and group TD were re-evaluated (excluding those errors made by both according to group CO). Heat maps were generated to represent error frequencies, stratified by entity (Figure 7) and by diagnosis year (Figure 8). The entities/organs were anonymised in the statistics to prevent inferences about individual documentalists. While the majority of combinations of entity and data field (Figure 7), as well as year of diagnosis and data field (Figure 8), are entirely error-free, certain areas emerge as potential sources of concern, with darker colours indicating more critical issues. Moreover, it becomes apparent that group RE (positioned at the top of each figure) and group TD (positioned at the bottom of each figure) exhibit issues in distinct and different areas.

Figure 7.

Potential errors by group RE (a) and group TD (b) per anonymised organ entity (A–H) versus individual data fields.

Figure 8.

Potential errors by group RE (a) and group TD (b) per diagnosis year versus individual data fields.

Regarding the whole set of analyses fields (according to group CO), group RE achieved a total (total amount of errors/total amount of fields) correctness rate of 96.4%, while group TD achieved 97.4%. Completeness rates were 98.7% for group RE and 95.8% for group TD, respectively, with accuracy rates of 96.7% for group RE and 97.4% for group TD.

In the overall process, group RE invested 300 hours examining the source data and re-documenting the relevant data items to set up the analysis database. The examination of our complete set of fields per case was projected to demand around 60 minutes for both group RE and CO. The TD group required approximately 30 minutes for rectifying individual errors (feedback).

Discussion

High-quality health data, such as tumour documentation, play a crucial role in tasks like clinical decision-making, research advancements or external certifications. However, uncertainties, inconsistencies or errors can undermine trust in its reliability, underscoring the importance of robust mechanisms to ensure transparency and data quality (Lighterness et al., 2024; Zarour et al., 2021). This study sought to address these concerns by evaluating the practical utility of SDV within the context of tumour documentation. By comparing the performance of internal tumour documentalists with that of an external audit group, we aimed to not only assess the reliability of tumour documentation but also to explore the potential biases introduced by different methodological setups. With this new approach, we believe our findings provide a more realistic perspective on the strengths and limitations of SDV in this specific setting. The results revealed that the quantification of correctness and completeness (group TD) appears to be lower compared to Borner et al.’s (2022) analysis conducted a few years ago (correctness 97.3% vs 99.2%, completeness 95.8% vs 99.0%) even though, a portion of correctness errors in the current study has been classified as accuracy errors (accuracy: 97.4%).

However, we posit that the observed discrepancy may not necessarily indicate a deterioration in local data quality. Rather, it might suggest that the decrease of data quality has been affected by the new experimental setup. An aspect is that the evaluation team in the previous study (Borner et al., 2022) were integral members of the internal team of tumour documentalists. As such, they controlled their own data through SDV, potentially leading to an underestimation of errors simultaneous with an overestimation of data quality due to internal rules and regulations. Furthermore, it is essential to consider that the analysed data fields between the two studies varied. In our current study, certain fields, such as the OPS code, were excluded from the analysis due to its inherent nature of being semi-automatically integrated, leading to full correctness and completeness in regard to transcription errors.

While bias introduced through internal verification has been successfully eliminated in the current setup, a novel concern has emerged compared to the experiment conducted by Borner et al. (2022). The team of documentalists possesses a higher level of training within their respective field compared to the group RE. Group RE included a dentist with a completed degree, a family history of cancer and relevant medical experience, which fostered a strong interest in oncology. While possessing general knowledge in the medical domain, the participant lacked formal training in tumour documentation. To address this, introductory training related to the fields under investigation was provided, and additional support was offered by the CCC staff, including experienced documentalists, when challenges in documentation arose. While external auditors help mitigate bias, it would have been advantageous if the auditors themselves were also external tumour documentalists. Ideally, the experimental design would have facilitated an exchange of tumour documentalists between multiple university hospitals or cancer registries, ensuring that all auditors possessed comparable levels of training and expertise in tumour documentation. However, this optimal setup was not feasible due to funding constraints. It is important to note, though, that even among regular tumour documentalists, there is typically no standardised national training programme, and professionals in this field often come from diverse educational and professional backgrounds.

Nonetheless, the educational background of the auditors could be perceived as the main limitation of this study. However, it is crucial to note that the findings of group RE underwent validation by a separate group, group CO, which also possesses medical expertise (Master’s students specialising in Medical Information Management and Nutrition) and received similar support from the CCC staff when needed. What emerged from this arrangement was that group CO, in fact, identified a comparable number of errors in the disagreements of TD and RE. Additionally, it should be noted that group CO also, at times, encountered difficulties in discerning who was at fault. Due to this ambiguity, when measuring the error rates of group TD, we mainly focused on errors where group RE and group CO agreed. Still, these problems display that even in those cases, a certain degree of ambiguity remains and it is crucial to emphasise that the errors identified may not be definitive errors; instead, they may serve as indications or markers of potential errors. This outcome raises questions regarding the efficacy of SDV as a standard tool for improving data quality for tumour documentation.

SDV proves efficient when information is strictly confined to specific source documents ensuring the absence of ambiguity. However, tumour documentation faces a difficulty: the source of documented data lacks traceability (as the source itself is usually not documented), and varying sources for the same data field are present at different time points. This issue is being addressed in many studies where source documentation is often available, detailing which information originates from which source (Bargaje, 2011). However, this is usually not the case in tumour documentation. For example, information about the diagnosis code may originate from both medical reports or pathology findings. It might even be possible that, at the time of documentation, certain information may not have been available. Logically, the individuals documenting later in time (in this case, group RE documented after group TD) possess more information/source documents since the tumour documentation process aims to record information as promptly as possible, usually within 3 months. As described in Giusti et al. (2023) there is usually a trade-off between timeliness and both completeness and validity in data quality.

In our study, the issue of ambivalent source data has been particularly pronounced in the monitoring of disease progression and the identification of the original source document reached a level where the evaluation became devoid of meaningful interpretation. Ultimately, this led us to exclude the examination of progress fields from the methodology.

If we focus solely on potential errors within the context of group TD, we see a decline in completeness in the year 2020 (refer to Figure 5). Upon examining the heatmaps, it becomes evident that especially the fields related to “Tumour, Node, Metastasis” (TNM) are the cause, as illustrated in Figure 8. While subject to interpretation, we posit that this phenomenon may be attributed to site specific internal changes in the documentation of TNM. Based on our own experiences, TNM appears to be one of the most inconsistently documented fields in Germany, lacking semantic clarity on how TNM should be documented. Therefore, the error may also stem from the fact that group RE and group CO, despite operating independently, interpreted the documentation of TNM differently than group TD, suggesting that these are potential rather than definitive errors.

Furthermore, it is noteworthy that certain issues manifest only in specific organ entities, indicating a non-uniform occurrence of the phenomenon. For instance, in Entity A (refer to Figure 7(b)), we observe a 33% error rate in accuracy, implying that group RE was able to provide a more accurate diagnosis than group TD in these cases. This discrepancy could be attributed to the timeliness of the source documents, indicating that group RE and CO had access to more recent and precise data, or it may signify an individual problem specific to a particular tumour documentalist. Another example is the recurring issue of poor completeness in TNM, particularly notable in organ E.

A distinctive aspect of our study was the nuanced categorisation of deviations not solely as correctness errors but also differentiated as accuracy errors based on our own defined criteria (see Box 2). This methodology carries medical significance, particularly in instances where subtle variations in factors like diagnosis dates have negligible effects on calculations such as survival curves. Considering the roughly equal prevalence of correctness and accuracy errors in our results, we advocate for the adoption of this refined distinction in other potential SDV projects.

Despite the challenges encountered, as common after an SDV, we can report potential errors as feedback, or at least as error indicators to the tumour documentation unit. Feedback mechanisms play a crucial role in identifying and correcting errors, ensuring continuous improvement in both planning and execution phases (Oyebode, 2013). The feedback would allow the TD team to rectify potential errors. However, since these are only potential errors, the tumour documentation unit would essentially also need to undertake unnecessary work, including re-examining valid cases.

Ultimately, the data quality is certain to benefit from such a measure; however, there is a need to scrutinise the time investment and economic viability associated with it. In our assessment, the benefits of regular SDV interventions in the realm of tumour documentation, aimed at rectifying errors, do not justify the expended effort. In our study, the time investment required for the SDV process was systematically recorded without any predefined assumptions regarding its expected duration. Although we only verified selected fields rather than entire cases, we found that this still required nearly full review of the respective cases, as the necessary information was often distributed across multiple sources. Even for this targeted sample-based approach, the effort was substantial. Identifying errors in specific fields thus often resembled the workload of complete re-documentation. This level of effort makes widespread implementation of SDV economically and practically unfeasible.

However, it appears to be suitable for uncovering individual systematic problems, such as (in our study) incompleteness in TNM staging by specific staff members, which could be re-trained accordingly after receiving feedback. For more sensitive and wide spread error detection and quality improvement, we suggest exploring alternative approaches. An alternative method for enhancing data quality appears to be the direct integration of subsystems (instead of using source documents), as exemplified in the current scenario with the surgery system. Pathology and pharmacy systems, in particular, offer potential for this approach. However, it is technically demanding, albeit effective in preventing transcription errors entirely. Other options could involve implementing plausibility checks, standardisation of tumour documentation education, or refining the semantics of data standards.

Finally, it is important to emphasise that this work is not intended to cast a negative light on tumour documentation. Rather, the errors observed in group RE highlight the difficulty in achieving completely error-free documentation. Additionally, due to the case of unclear source documentation, it may not always be precisely clear what constitutes an error. Therefore, the profession should be valued accordingly.

Conclusion

In conclusion, our findings suggest that, based on the employed methodology, SDV is most suitable for validating small samples and identifying systemic errors in tumour documentation data. The time investment required for a thorough examination of the entire dataset is not economically justifiable and does not yield a favourable cost–benefit outcome. The obstacle of data quality, especially in tumour documentation, poses a challenge to trust. Hence, it is advisable to also explore alternative methods, including automated data transfer from subsystems, conducting plausibility checks, and making semantic enhancements to existing standards, to effectively enhance data quality in this particular context.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iDs

Pauline Hubert, BSc(Dentistry)

Julia Kasprzak, MA

Lara Kazmaier, MA

Lisa Knaier

Theres Fey, PhD

Daniel Nasseh, PhD

References

Bargaje

(2011) Good documentation practice in clinical research. Perspectives in Clinical Research 2(2): 59–63.

Basisdatensatz (2021) Einheitlicher onkologischer Basisdatensatz. Available at: https://basisdatensatz.de/basisdatensatz (accessed 16 August 2022).

Borner

Schweizer

Fey

, et al. (2022) A source data verification-based data quality analysis within the network of a German comprehensive cancer centre. In: Tschudin

Brown

(eds.) Transdisciplinary Perspectives on Public Health in Europe: Anthology on the Occasion of the Arteria Danubia Project. Wiesbaden: Springer Fachmedien Wiesbaden, pp.189–200.

Brierley

Gospondarowicz

Wittekind

(2017) The TNM Classification of Malignant Tumours, 8th edn. Chichester: Wiley Blackwell.

Bundesinstitut für Arzneimittel und Medizinprodukte (2023) Operationen- und Prozedurenschlüssel (OPS). Available at: https://www.bfarm.de/DE/Kodiersysteme/Klassifikationen/OPS-ICHI/OPS/_node.html (accessed 27 March 2025).

Comprehensive Cancer Centre Munich (2023) Comprehensive Cancer Centre Munich. Available at: https://www.ccc-muenchen.de/ccc-muenchen (accessed 27 March 2025).

Cronin

Ries

Edwards

(2014) The surveillance, epidemiology, and end results (SEER) programme of the National Cancer Institute. Cancer 120: 3755–3757.

Feng

Anoushiravani

Tesoriero

, et al. (2020) Transcription error rates in retrospective chart reviews. Orthopedics 43(5): e404–e408.

Giusti

Martos

Negrão Carvalho

, et al. (2023) Quality indicators: Completeness, validity and timeliness of cancer registry data contributing to the European Cancer Information System. Frontiers in Oncology 13: 1219128.

10.

Henson

Elliss-Brookes

Coupland

, et al. (2020) Data resource profile: National cancer registration dataset in England. International Journal of Epidemiology 49(1): 16–16h.

11.

Hoeksma

(2009) Siemens buys i.s.h.med from T-Systems Austria. Digital Health. Available at: https://www.digitalhealth.net/2009/09/siemens-buys-ishmed-from-t-systems-austria (accessed 6 March 2025).

12.

International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10), Chapter II Neoplasms (C00-D48) (2019) Available at: https://icd.who.int/browse10/2019/en#/II (accessed 7 July 2022).

13.

Jacke

Kalder

Koller

, et al. (2012) Systematische Bewertung und Steigerung der Qualität medizinischer Daten. Bundesgesundheitsblatt – Gesundheitsforschung – Gesundheitsschutz 55(11): 1495–1503.

14.

Kilkenny

Robinson

(2018) Data quality: “Garbage in–garbage out”. Health Information Management Journal 47(3): 103–105.

15.

Kourou

Exarchos

Papaloukas

, et al. (2021) Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Computational and Structural Biotechnology Journal 19: 5546–5555.

16.

Lighterness

Adcock

Scanlon

, et al. (2024) Data quality-driven improvement in health care: A systematic literature review. Journal of Medical Internet Research 26: e57615.

17.

Andersen

Byrjalsen

Bihlet

, et al. (2015) Impact of source data verification on data quality in clinical trials: an empirical post hoc analysis of three phase 3 randomized clinical trials. British Journal of Clinical Pharmacology 79(4): 660–668.

18.

Nasseh

Schneiderbauer

Lange

, et al. (2020) Optimizing the analytical value of oncology-related data based on an in-memory analysis layer: Development and assessment of the Munich online comprehensive cancer analysis platform. Journal of Medical Internet Research 22(4): e16533.

19.

Nonnemacher

Stausberg

(2006) Leitlinie zum adaptiven management von datenqualität in kohortenstudien und registern [Guideline for adaptive management of data quality in cohort studies and registries]. Ergebnisse eines durch die Telematikplattform für Medizinische Forschungsnetze (TMF) eV geförderten Projektes, derzeit noch nicht publiziert.

20.

OnkoZert (2021) OnkoZert. Available at: https://www.onkozert.de/ (accessed 30 February 2021)

21.

Oyebode

(2013) Clinical errors and medical negligence. Medical Principles and Practice 22(4): 323–333.

22.

Python

(2021) Python. Python releases for Windows.

23.

Raghupathi

(2014) Big data analytics in healthcare: Promise and potential. Health Information Science and Systems 2: 1–10.

24.

Voigt

Steinbock

Scheffer

(2010) CREDOS 3.1 ein Baukasten zur Tumourdokumentation für epidemiologische, klinische, tumourspezifische und zentrumsregister integriert in das Kis Sap/r3 Is-h: Po344. Onkologie 33: 52.

25.

Zarour

Alenezi

Ansari

MTJ

, et al. (2021) Ensuring data integrity of healthcare information in the era of digital health. Healthcare Technology Letters 8(3): 66–77.

Is source data verification a valid tool to improve data quality of tumour documentation data? A critical assessment