Abstract
Background:
Remote medical scent detection of cancer and infectious diseases with dogs and rats has been an increasing field of research these last 20 years. If validated, the possibility of implementing such a technique in the clinic raises many hopes. This systematic review was performed to determine the evidence and performance of such methods and assess their potential relevance in the clinic.
Methods:
Pubmed and Web of Science databases were independently searched based on PRISMA standards between 01/01/2000 and 01/05/2021. We included studies aiming at detecting cancers and infectious diseases affecting humans with dogs or rats. We excluded studies using other animals, studies aiming to detect agricultural diseases, diseases affecting animals, and others such as diabetes and neurodegenerative diseases. Only original articles were included. Data about patients’ selection, samples, animal characteristics, animal training, testing configurations, and performances were recorded.
Results:
A total of 62 studies were included. Sensitivity and specificity varied a lot among studies: While some publications report low sensitivities of 0.17 and specificities around 0.29, others achieve rates of 1 sensitivity and specificity. Only 6 studies were evaluated in a double-blind screening-like situation. In general, the risk of performance bias was high in most evaluated studies, and the quality of the evidence found was low.
Conclusions:
Medical detection using animals’ sense of smell lacks evidence and performances so far to be applied in the clinic. What odors the animals detect is not well understood. Further research should be conducted, focusing on patient selection, samples (choice of materials, standardization), and testing conditions. Interpolations of such results to free running detection (direct contact with humans) should be taken with extreme caution. Considering this synthesis, we discuss the challenges and highlight the excellent odor detection threshold exhibited by animals which represents a potential opportunity to develop an accessible and non-invasive method for disease detection.
Keywords
1. Background
1.1.The Burden of Cancer and Infectious Diseases Worldwide
Cancer and infectious diseases are considered major health issues among men and women and are among the most common cause of morbidity and mortality worldwide. On the one hand, there were an estimated 18.1 million (95% UI: 17.5-18.7 million) new cases of cancer (17 million excluding non-melanoma skin cancer) and 9.6 million (95% UI: 9.3-9.8 million) deaths from cancer (9.5 million excluding non-melanoma skin cancer) worldwide in 2018. 1 On the other hand, infectious diseases such as urinary tract infections (UTI), tuberculosis, Clostridium difficile infections, Methicillin-resistant Staphylococcus aureus (MRSA) infection, pandemic outbreaks like Ebola, and lately severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), are also claiming the lives of many people.
1.2. Early and Rapid Detection of Diseases
Disease detection is the first step before diagnosis and care. However, detection is not easily accessible everywhere. Efforts to control infectious diseases or detect cancers early would benefit from new screening technologies.2,3 For instance, early diagnosis could reduce mortality for many cancer types 4 As well, quick, reliable, and widespread testing is vital to control a pandemic.
1.3. Need for a Noninvasive Low Cost and Reliable Detection Method
Diagnostics rely on direct imaging or on collecting samples from individuals or contaminated environments, transportation of samples to a laboratory, and subsequent laboratory testing to demonstrate the presence or the absence of the pathogen of interest. This results in a significant delay in response times and containment efforts. These procedures can be invasive and require skilled human resources and costly equipment and consumables depending on the diseases. A desirable screening method should be noninvasive, painless, inexpensive, and easily accessible to many patients. In addition, it should allow diagnosis at an early stage. 5
1.4. Diseases Emit VOCs
The human body emits Volatile Organic Compounds (VOCs) originating as a result of normal biochemical or physiological processes (endogenous processes), after absorption of external contaminants (eg, food), and by bacterial metabolism (eg, armpit odor).6-9 VOCs are organic chemicals with a high vapor pressure at typical room temperature, resulting in evaporation or sublimation of the molecules into the air surrounding the source. It has been shown previously that some diseases emit specific VOCs. 9 Disease-related VOCs may be found in the blood, breath, feces, skin, sputum, sweat, urine, and vaginal secretions of affected individuals. Such a signal could pave the way for a new detection technique: using VOCs as biomarkers for disease detection. Research investigating the VOCs profiles associated with various human diseases is underway, primarily driven by the goal of developing instrumentation for use in clinical diagnostics.10,11
Currently, intensive studies are being carried out to identify compounds that could be markers of cancer12-17 and could eventually support or even replace traditional screening methods. To do so, techniques such as gas chromatography-mass spectrometry (GC-MS) have already been developed, and several research teams and companies aim at developing electronic noses (e-noses).18-20 Currently, the development of this technology is limited by the high cost of the necessary laboratory instrumentation, difficulties in standardizing sample collection, and preparation procedures in clinical settings. 21 These limitations can, for instance, be due to threshold, non-optimized odor-capturing materials, low signal-to-noise ratio, costs, and the complexity of both the chemical signature and the subsequent data analysis. It is worth noticing that the origin and the nature of VOCs emitted by cancers are not well understood. Whether the chemical signature originates from the tumor, from the tumor environment, or both is still under investigation15,22
1.5. The Consistent Use of Animals to Detect Diseases
Several studies evaluating and reporting the potential ability of trained animals to detect certain diseases thanks to their sense of smell have raised many hopes.23,24 Sense of smell has been extensively studied and is reported to be highly developed in certain species. 25 Primarily, the canine sense of smell has been deeply investigated.26-28 Dogs have been trained to locate explosives, illicit drugs, banknotes, missing persons, and disaster victims.29,30 Rats and several other animals have also been successfully trained to identify targeted substances. 31
Animal olfactory detection of human diseases has attracted increasing interest from researchers in recent years. 28 In 1989, the first case was reported where a dog seemed to have detected his master’s melanoma. 32 Similar cases have been reported in the following years.33,34
Anecdotal findings allow emitting the hypothesis that some animals could potentially be used to detect diseases. However, these findings alone do not ensure that animals can be employed as reliable tools to detect diseases. Furthermore, it should be noted that training dogs incurs costs and require time and an appropriate facility. Therefore, this potential new tool must be further explored and developed following a scientific method.
Several structured research programs have reported the abilities of dogs, rats, ants, and other animals to detect diseases such as cancers, diabetes, epilepsy, tuberculosis, malaria, urinary tract infections (UTI), and SARS-COV-2 among others.35-40 These studies focus on animal capabilities and research and optimize sampling protocols and materials, storage and use of odors, scent lineup parameters, animal welfare, and testing conditions. To do so, research programs usually gather several professionals such as medical staff, chemists, biologists, physicists, statisticians, data scientists, veterinarians, ethologists, and dog handlers.
1.6. Still Many Unanswered Questions
Because of the inconsistent findings reported in this body of research and the complexity of scent detection research, it seems complicated to ascertain the potential value of animal detectors in diagnostic.22,41,42 To our knowledge, no disease-specific VOC has been identified so far despite the number of studies reporting the ability of trained animals to detect diseases. With a rising number of publications tackling this issue, carrying out a structured and objective state-of-the-art seemed necessary.
1.7. Objectives
In this systematic review, we aim at outlining the performances (sensitivity, specificity) of trained dogs and rats in distinguishing cases of cancers or infectious diseases cases from controls in humans, thanks to their sense of smell, published in peer-reviewed research. Additionally, methodological issues leading to inconsistencies in research are reviewed, and further recommendations to improve performances are given. We excluded studies using other animals (nematodes, 43 insects44,45), studies aiming at detecting agricultural diseases, animal infections (dogs, cows, ducks46-49), and other diseases (hypo/hyperglycemia, neurodegenerative diseases, where the dogs are mostly serving for both assistance and detection). Only original articles published between 01/01/2000 and 01/05/2021 were included, and reviews were excluded.
2. Methods
2.1. Literature Search
Pubmed and Web of Science were independently searched based on the standards of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). 50 Studies about detecting cancer and infectious diseases (hypo/hyperglycemia, neurodegenerative diseases were excluded) in humans (agricultural diseases, animal infections, and animal cancers were excluded) by dogs’, rats’ sense of smell, in the databases from 01/01/2000 to 01/05/2021 were retrieved. The search strategy was adjusted for each database. For PubMed the following string was employed: ((“Dogs”[Mesh]) OR (canine*) OR (“Rats”[Mesh]) OR (“Mice”[Mesh])) AND ((“Volatile Organic Compounds”[Mesh]) OR (Volatile AND Organic AND Compound*) OR (“Odorants”[Mesh]) OR (odor*) OR (odour*) OR (“Smell”[Mesh]) OR (nose) OR (scent*) OR (sniff*) OR (olfact*)) AND ((“Disease”[Mesh]) OR (“Neoplasms”[Mesh]) OR (cancer)). For Web of Science the subsequent sequence was researched: (TI= ((dog* OR canine* OR rat* OR mouse OR mice) AND (cancer* OR neoplasm* OR disease*) AND (smell* OR scent* OR sniff* OR olfact* OR odo$r* OR volatomic* OR (volatile organic compound*) OR (volatile* AND organic* AND compound*)))) OR (AB = ((dog* OR canine* OR rat* OR mouse OR mice) AND (cancer* OR neoplasm* OR disease*) AND (smell* OR scent* OR sniff* OR olfact* OR odo$r* OR volatomic* OR (volatile organic compound*) OR (volatile* AND organic* AND compound*)))). The retrieved articles were further reviewed for original articles. Non-human disease detection and articles written in a language other than English were excluded.
2.2. Study Selection and Eligibility Criteria
In total, 5665 records were identified, 2057 from PubMed, and 3608 from Web of Science, of which 210 were duplicates (Figure 1). We reviewed the remaining 5455 titles and abstracts to identify relevant studies. Three authors (P.B., M.L., and I.F.) reviewed the abstracts and/or full-text manuscripts independently and selected those that were regarded as relevant. No disagreement on the selection of articles was seen between the 3 reviewers. The inclusion criteria were for studies on disease detection with dogs and rats in original articles. Articles that described original research involving animal olfactory detection of human disease using samples collected from human participants were selected for inclusion. Review articles and articles not directly relevant to the topic were excluded. Sixty-two full-text papers were reviewed for inclusion; none were excluded after full-text review. A total of 62 papers were included in the systematic review.

PRISMA flowchart.
2.3. Data Extraction
The relevant data were extracted from the 62 selected peer-reviewed journal articles. A standardized table was designed to abstract the studies of interest. Information abstracted from each study included: details on the articles (Table 1), patients (Supplementary Table 1), samples (Supplementary Table 2), animal details (Supplementary Table 3), test setting (Supplementary Table 4), performances (Supplementary Table 5), and the number of samples (Supplementary Table 6) used. All graphics and tables were made in Office Excel.
Studies Included Within the Systematic Review.
Abbreviations: UC, unclear; FC, forced choice.
3. Results
3.1. Systematic Presentation and Synopsis of the Characteristics and Findings of the Included Studies
The last 20 years have seen an increase in the number of publications dealing with disease detection in dogs and rats (Figure 2). Starting with case reports in 1989, several proofs of principle in the early 2000s were reported, followed by more complex studies in the last decade. A summary of key information from each of the studies is provided in Table 1, including the disease targeted for detection, the type of body matrices used, the animal detector, and the sensitivity and specificity reported.

Evolution of the number of peer-reviewed publications per year among the selection.
Cancer detection has received the most attention, with 2/3 of the studies targeting one or more cancers.5,52-55,57-62,66-69,71-74,77,78,80-82,85-88,91-97,99-101,105 The remaining studies targeted tuberculosis, MRSA, 84 Malaria, 36 UTI, 79 Clostridium difficile,63,70,83,89,98 and Covid 19102-104,106-110 (Figure 3). However, since the Covid-19 outbreak in 2020, already 8 original articles reporting the ability of dogs to detect Covid 19 have been published, and more will possibly follow.

Proportion of diseases (upper) and different cancers (lower) over the included studies.
3.2. Selection of Subjects
A diagnostic test is designed to accurately discriminate patients from controls. Therefore, the choice of patients and controls is critical. Patient and control description among reviewed studies is reported in Supplementary Table 1.
Positive patients selected are subjects diagnosed with the disease of interest before any treatment. Diagnosis is mainly done with a reference test corresponding to the gold standard (histology, imaging, PCR & immunoassays). Histopathologic diagnosis is usually the reference test for cancer. The accuracy of the reference tests is, however, not systematically reported among reviewed studies.
Controls are of several types: (i) Healthy volunteers (healthy = absence of the disease of interest) who do not have and never had the disease; (ii) Healthy volunteers who do not have the disease anymore; (iii) Volunteers diagnosed with other diseases than the one of interest.
The absence of the targeted disease is not the only criterion for the selection of controls. Some teams often report to match age, gender, skin color, smoker status, diet, symptoms, and other comorbidities to limit confounders. Several studies, however, included controls with unmatched criteria compared to patients. A major drawback is the absence of control screening in most reviewed studies. This can lead to false negative samples.
3.3. Type of Body Matrix and Logistics
3.3.1. Diversity of sampled body matrices
When detection was designed to be done without contact between patients and animals, several types of body matrices have been collected to present odors to the animal detector. Urine is the main body fluid used (n = 20), followed by breath (n = 18), saliva (n = 10), skin secretions (n = 8), cell cultures (n = 7), feces (n = 6), blood/serum (n = 5), tissue (n = 4), and smear (n = 1). Direct contact with patients or infected areas was conducted in 4 publications. The total is superior to the number of publications, as some studies report to have used several types of samples. These data are reported in Supplementary Table 2, as well as in Figure 4. Three studies performed detection in direct contact between animals and humans (patients and controls).

Bar chart of the different body matrices employed within the included studies.
3.3.2. Sampling materials and protocols
Sampling materials and devices to capture VOCs are reported in Supplementary Table 2. For urine, blood, and feces, no material was specially designed to capture VOCs efficiently. Only a receptacle (recipient, jar, cup, vial) was used. For sweat sampling, cotton pads are usually used. The composition of these pads is not always well-described. For breath, 3 types of materials/recipients were used: (i) Only a container (eg, breath sampling bag 60 ); (ii) A tube filled with an absorber (eg, cylindrical polypropylene organic vapor sampling tube (Defencetek, Pretoria, South Africa) 53 ; (iii) Other materials (ex: Face mask taken off and placed into a Ziploc® bag 81 ). Tissues do not require specific sampling materials. In general, the choices of sampling materials are poorly justified, and material characterization and properties that capture VOCs are inaccurately described.
Sampling protocols are essential for reproducibility and limiting bias. They are reported for each study in Supplementary Table 2. They are well described primarily for 2 types of samples: exhaled air and skin secretions. For instance, Thuleau et al report that all patients and controls must shower with identical odorless soap before skin secretions sampling. 88 As well, some studies add information about fasting requirements before sampling, to limit biases (see Supplementary Table 2, column Sampling protocol).82,92-94
3.3.3. Sample storage conditions and conservation duration
Storage temperatures are reported in Supplementary Table 2. Storage temperatures (T) are described for 73% (45 out of 62 articles) of the reviewed articles. We chose to classify them into 3 categories: (i) room temperature; (ii) cold: 0°C < T< 8°C; (iii) frozen: T <0°C (Figure 5). However, the choices of storage temperatures are not explained except in Willis et al 59 : The team primarily stored samples at −80◦C, which has been the most desirable for retaining volatile chemicals. 111

Storage temperatures of the employed samples from the included studies.
When stored at room temperature, all studies specify that samples were stored in the absence of light. Even if not specified for cold and frozen conditions, we assumed it was stored in the absence of light in a fridge or a freezer. Such a parameter is essential, as light is known to alter VOCs. 112 Air humidity data and atmospheric pressure were not described.
Also, sample conservation duration is poorly described among the reviewed articles and varies from a few days to several months. No data has been found about the quantification and the variation of VOCs captured in samples.
3.4. Animal Types
The data concerning animal details can be found in Supplementary Table 3.
3.4.1. Dogs
Dogs are used in 92% (n = 57, total = 62) of the reviewed studies. In total, 226 dogs have participated in the studies. Among these 226 dogs, 186 (82%) have completed the whole study process (ie, have undergone full training and have participated in final testing). Most of the studies reported that dogs have been trained by professional dog handlers (data not shown).
Dogs seem to be the first choice when it comes to using animals to detect diseases. This choice might come from the extensive use of dogs in drugs and explosives detection, the availability of dog trainers in many countries worldwide, and therefore the accumulated knowledge concerning their education. However, little information is provided about dog selection, except those choices are based on motivation (willing to search and play) and sense of smell. Standard selection tests to evaluate animal capacities are not described.
Some breeds such as German and Belgian shepherds, Labradors, and Springers seem to be extensively used (see Supplementary Table 3). The distribution between males and females is the following: 52% males, 48% females. The number of dogs per study varies between 1 and 10, with an average of 3.96 (SD = 2.84) dogs per study. These numbers are reported in Figure 5 and Supplementary Table 3.
3.4.2. Rats
African Giant Pouched Rats (Cricetomys gambianus) have been extensively used for tuberculosis detection in Tanzania (5 studies conducted with the same organization: APOPO). Rats were chosen for their sense of smell, easiness of operant conditioning, and availability in Tanzania. It is reported that such animals can live approximately 8 years and they can be trained within a few weeks. Mice studies were not included in the inclusion criteria, it is worth noting that only one mouse study was discarded. 113
3.5. Animal Training and Testing
All reviewed studies report positive operant conditioning methods for training, the reward being food or a toy. A clicker training method is reported in 45% (total = 62) of the studies. Animal living conditions were, however, not or poorly described. A few teams mentioned dogs’ housing conditions (for instance, dogs being hosted in families). Education durations vary from a few weeks 53 to 5 years 60 depending on the teams and the difficulty of the exercise. The frequency of training ranged from once a week to 2 sessions per day, each day. These results are reported in Supplementary Table 3.
3.5.1. Sample presentation/stations
How individual samples were presented to the dogs is reported in Supplementary Table 4. A trade-off between odor intensity and contact avoidance between samples and the dogs’ noses to limit sample pollution or dog contamination is usually reported to have led to station design. Stations’ cleaning is not always described. When reported, no rationale is given. For instance, Horvath et al 55 chose to clean both the boxes and the containers with hot water after each exercise. Two years later, the same team 57 switched and cleaned with 95% alcohol. No standardized protocol has been identified.
3.5.2. Scent line-up characteristics
Scent line-up characteristics are reported in Supplementary Table 4. Scent line-ups are usually composed of 2 to 10 stations, disposed in a line or a circle. Most of the studies report a forced-choice design, that is, a fixed number of positive samples (> 0) per scent line-up. This concept of forced-choice design has been described by Edwards et al. 112 In a forced-choice design, the handler knows that the animal must find a fixed number of samples, which can induce bias. Some studies report the possibility of having zero positive samples per line (called “blank runs”), which corresponds to an unforced choice. Another type of unforced choice is to be able to vary the number of positive samples per line. Unforced choices are less common and often lead to worse performances (see Supplementary Table 5).
Within scent line-ups, several types of samples can be found: (i) positive samples; (ii) controls (healthy, other diseases); (iii) distractors. Distractors are samples different from positive samples and control samples. For instance, Murarka et al 96 used paper clips, paper towels, cotton balls, and screws as distractors. They are used to stimulate the animals to search.
3.5.3. Number of samples used for training and testing
The average number of samples used for training and testing is reported in Supplementary Table 6 as well as the standard deviation, minimum, and maximum numbers. These numbers are not systematically reported among studies, especially for training. Indeed, only 37% (15 out of 41) of cancer studies and 9.5% (2 out of 21) of infectious diseases studies gave information about the number of samples used for training. Information was more exhaustive for testing: 78% (32 out of 41) of cancer studies and 72% (15 out of 21) of infectious diseases studies gave the exact number of samples used. Considering only studies which provided information, the mean number of samples used for training per study was 258 (SD = 560; min = 20; max = 2600), and the mean number for testing was 184 (SD = 186; min = 14; max = 902).
The number of times samples are used is not well reported. For training, some studies report training with only new samples to avoid 2 biases: (1) the memory effect (ie, animals do not learn to generalize, but remember each sample), and (2) the “novel object preference” effect (ie, animals select every sample they never encountered before). For instance, Ehmann et al 61 report that during the training and later in the testing, every test tube containing a human breath sample was used only once to preclude simple memory recognition of participants’ unique odor signatures. 61 Even if not always described, when looking at the numbers, it is evident that most of the studies reuse some samples for training.
For testing, most teams used samples (positive and controls) only once per animal. However, a few studies report sample re-uses (eg, Cornu et al 58 (p. 201), Supplementary Table 5). The potential issues brought by reusing samples are discussed below. The mean number of rats used per publication was 11 for training and testing. In the case of dogs, on average 4 dogs were employed for training and testing (see Figure 6). The number of employed animals plays a crucial role in the feasibility of the study.

The mean number of animals used per publication for training and testing.
3.5.4. Blinding conditions
Results on blinding conditions can be consulted in Supplementary Tables 4 and 5. Several studies report having worked in blinded or double-blinded conditions. However, these terms do not seem to be used the same way among studies. In this review, we chose to classify the blinded conditions with the following terms:
Unblinded conditions (UB): the dog handler knows the nature or the position of the sample to evaluate.
Single blinded conditions (SB): an operator in the room (visible by the dog) knows the nature and the position of the samples to analyze, but the dog handler does not.
Double-blinded conditions (DB): nobody in the room knows the nature or the position of the samples to analyze. This can be subdivided as follows: Someone outside the room (or at least completely hidden) knows the nature of the sample to analyze (DB1) and can give feedback ■ In this configuration, the animals’ indications can be evaluated each time, and therefore the handler can: • Reward his animal (= positive reinforcement) • Decide whether to continue or not the evaluations (because he knows if his animal is doing well or not) Nobody knows the nature of the sample to analyze, or at least, cannot communicate it to the field (DB2). This condition is similar to a diagnostic condition.
■ In this configuration, the animals’ indications cannot be evaluated each time, and therefore the handler:
• Does not know whether to reward his animal or not • Does not know when to continue or to stop testing
In 78% of reviewed studies (48 out of 62), evaluations were done in double-blinded conditions to avoid the “Clever Hans” bias.114,115 When well described, DB1 is the major double-blinded subtype reported (42%). Such conditions have limitations (see discussion). In 58% of studies (38 out of 62), the scent line-up had a forced-choice configuration.
3.6. Performances
Sensitivity and specificity varied widely, ranging from perfect to chance performance, with considerable variation among studies examining the same disease, employed body matrices, and detectors. Results are reported in Supplementary Table 5 and represented in Figure 7 when both sensitivity and specificity were available.

Global performances in different blinding conditions. Data are shown only for studies providing both sensitivity and specificity results.
For “Forced Choice” situations, we report the specificity numbers from the original articles. Configurations using an unforced choice line-up in double-blind type DB2 correspond to a true “screening-like situation.” Only 6 studies were found with the latter design: 3e with rats detecting tuberculosis, 2 with dogs detecting cancer, and 1 with dogs detecting C. difficile infections.
4. Discussion
4.1. General Comments
Since 2004, many proofs of concept have been published about the ability of dogs or rats to detect diseases. Furthermore, Buszewski et al 5 employed a chromatographic method for the identification of VOCs, and the results were compared with canine smell recognition. There are often great discrepancies among results. While some publications report low sensitivities of 0.17 (Gordon et al 54 ) and specificities around 0.29 (Amundsen et al 67 ), others achieve rates of 1 in sensitivity (Horvath et al 55 ; Cornu et al 58 ; Sonoda et al 60 ) and specificity (Sonoda et al 60 ; Yamamoto et al 100 ).
Only a few studies reported testing performed in blinding conditions (DB2, unforced choice), and those usually enrolled small numbers of animals. This could be explained by the fact that screening conditions in double-blind testing combined with unforced choices are more challenging for the animals, the handlers, and the operators. Unfortunately, the little data available regarding this screening condition limits the possibility to validate such a method. These results are discussed in the following paragraphs.
4.2. Considerations About Patients’ Selection and Samples
4.2.1. Patient and control selection: Reference test and populations matching
First, careful diagnosis of patients and controls is critical to avoid bias. Making sure that patients have the disease of interest is usually confirmed with the gold standard. The accuracy of such a test must be high to avoid false-positive inclusions. Also, confirmed negative samples are critical, and all controls should be tested for the disease of interest. However, very few studies reported having rigorously tested controls. This can be explained by the fact that asking volunteers to perform non-required detection tests is costly, time-consuming, tedious, and invasive, poses ethical issues, and could lead to volunteer disengagement. However, from a scientific point of view, non-tested volunteers could be a source of false-negative samples. Such samples would be detrimental to animal training and testing. Indeed, the animal must be educated with samples with known status. An inaccurate reference test might lead to sample status errors and mislead the detector. For instance, Thuleau et al 88 reported they educated dogs to detect breast cancer from patients with cancer confirmed by histology and from volunteers with a recent (<12 months) negative mammography. Even if mammography is reliable, false negatives can occur, or cancer can appear within a few months following the screening. Within the training phase, dogs are rewarded for not identifying the samples identified as negative from the mammography, therefore mammographic false negatives induce mistakes in training phases and consequently for further cancer identification.
If the reference test has poor accuracy, then animal training can be impacted. For instance, the results reported by Cornu et al 58 show that training a dog with potential “rogue” controls affected final performances. Selected controls were patients aged > 50 with elevated Prostate-Specific Antigen (PSA, comparable with cancer patients regarding these characteristics) levels. Control patients had a mean PSA value of 8.3 ± 4.1 [range: 2-16.8]. Given these values, it can be considered that 20% to 30% of these control patients with negative prostate biopsies had prostate cancer. 58
Similarly, Willis et al 51 reported they were concerned that “rogue” control specimens from people with undiagnosed cancer elsewhere in the body might be inadvertently added to pooled samples. They did have an occasion during training in which all dogs unequivocally indicated as positive a sample from a participant recruited as a control based on negative cystoscopy and ultrasonography. After further tests, a transitional cell carcinoma was discovered. As such detection method with animals is not yet validated, not all false positives indicated by animals can be double-checked. More recently, Grandjean et al 102 had a similar issue, with 2 of their supposed SARS-CoV-2 negative controls turning out to be positive.
Second, the importance of matching the characteristics of patients and control groups to make sure that animals detect the disease itself and not a confounding factor.27,112 Matching has been reported with age, sex, skin color, other diseases, comorbidities, symptoms, smoker status, and diet.
For instance, Bomers et al 63 worked on C. difficile detection with dogs at a hospital. They reported that on the day of the detection round all cases had diarrhea compared with 6% of the controls. In such a situation, we can wonder if the dog successfully indicated the targeted disease (C. difficile), or just the presence of diarrhea.
To prevent such bias, Willis et al 51 exposed the dogs to urine from patients with a broad range of transitional cell carcinomas, in terms of grade and stage, to increase their likelihood of recognizing the common factor or factors. They took particular care to train the dogs with control samples containing elements likely to be present in urine from patients with bladder cancer and commonly occurring in other non-malignant pathologies. This way, they could teach the dogs to ignore non-cancer-specific odors. This led to the inclusion of urine samples from a variety of patients, such as people with diabetes to control for glucose, those with chronic cystitis to deal with the influence of leukocytes and protein, and healthy menstruating women to control for blood.
Several years later, the same team (Willis et al 59 ) assumed that body matrices, tissues, and emissions from young, healthy individuals differ in composition from those of older cancer patients to a greater extent than do samples from age-matched individuals with non-cancerous disease of the same organ. They performed an e-nose study in which the classification accuracy dropped when more diseased individuals were added to the healthy control group. 116 This shows that the choice of controls can markedly affect the level of specificity achieved.
4.2.2. Disease-specific odor and types of body matrices chosen
To our knowledge, it is not known whether specific cancer has a characteristic chemical signature or not, and, if so, what would be the source of such signature. The odor of cancer could come from the tumor itself, the modified environment surrounding the tumor, or both. Moreover, it is still not known yet whether all cancers have shared odors or not. For instance, McCulloch et al’s group reported good dogs’ performances after being trained to alert to 2 cancers rather than for single cancer discrimination. 53 This could mean that there is a general biochemical marker common to all cancers, with individual-specific cancers having additional markers. 54
There are different interpretations considering the localization of disease odor within the body: is it localized, organ-specific or spread? For instance, Horvath et al 55 report that one important observation during the training period was that use of fat from the same individuals from whom the carcinomas were removed did not increase the number of failures. The absence of reaction by the dog suggests that a general body odor including all organs did not exist. However, 2 years later, the same team 57 reported that for the same cancer (ovarian), dogs trained with tumors could discriminate blood samples and vice versa. Their study strongly suggests that the characteristic odor emitted by ovarian cancer samples is also present in the blood (plasma). Similarly, after observing that canine scent judgment can be used on both breath samples and watery stool samples, Sonoda et al 60 concluded that chemical compounds may be circulating throughout the body for colorectal cancer.
Murarka et al 96 comment that Yoel et al 75 found that after being trained on the breast cancer cell line, the dogs were able to detect both skin cancer and lung cancer cell lines, suggesting the possible presence of a general cancer olfactory cue within cancer cell lines. However, this study did not explore whether these dogs could also then detect cancer in patient-derived samples. In this case, there is also the possibility that the dog learned to disregard control samples (which were probably similar) instead of recognizing malignant cell cultures. This seems possible in the observations from Murarka et al, 96 whose research suggests that after training on cell lines to prepare the dogs, there were no spontaneous recognitions of cancer in blood plasma samples.
From these observations, 4 situations can be considered depending on odor specificity and localization, which are presented in Table 2.
Disease Odors Localization and Specificity Hypothesis.
Table 2 shows that the body matrix choice is critical. This also affects the choice of control cohorts. From this table, we see that an odor widespread throughout the body and non-specific to disease will lead to low specificity tests. In such situations, indications of a sample by a trained animal will not give much information on what disease to look for, and therefore will have low added value.
Body matrices used in the reviewed articles are dominated by breath and urine (Figure 4). These have the advantage of being easy to sample (liquid, air, noninvasive), easy to split into several samples, and therefore allow several trainings and tests per sample without encountering pollution or odor decrease. Liquids like urine are also easy to dilute, for instance, to increase detection difficulty by reducing the number of VOCs per sample. These dilutions also allowed to study animal detection thresholds, 113 and comparisons with GC-MS and e-noses. Urine samples have the additional advantage that they can be aliquoted and frozen for later usage. Despite this, we regret that the reasons that lead to the choices of body matrices were not or poorly documented within the selected articles.
4.2.3. Sampling protocols
After body matrices and sampling localization choice, sampling protocols and materials are key to have high-quality samples. Most of the studies report the importance of applying the same sampling procedures both for patients and controls to eliminate potential bias and confounders. For instance, Ehmann et al 61 showed that, at first, trained dogs were not discriminating against disease state, but sampling location which was different for patients (at the hospital) versus healthy volunteers (at home).
If the sampling protocol is made at home with no supervision, the risk of error can occur, leading to poor sample quality. Thuleau et al 88 report that to sample sweat they asked patients and volunteers to shower with an odorless soap, before sleeping with a cotton pad on the breast overnight. In this case, researchers cannot be sure that the person has followed each step correctly or that no incident occurred. In this example, the pad could have fallen during the night, resulting in pollution and a limited contact time of the pad with the skin, and therefore a limited number of VOCs. As well, other odors could have been impregnated on the sample, such as bedsheets’ odors, partners’ odors, and pets’ odors without any possibility of quality control. Such unsupervised sampling protocols add difficulties and should be controlled as much as possible.
A non-exhaustive list of parameters that can induce bias are smoking status, sex, age, ethnicity, diet, different sampling locations, different sampling protocols for patients and controls, and treatments. For instance, to limit diet bias, Hackner et al 77 report that for homogeneous sampling, the tested persons were constrained not to drink, eat and smoke within 90 minutes before breath sample collection. It is important to note that receiving this data among patients requires regulatory approvals.
4.2.4. Odor sampling materials
All types of body matrices do not necessarily require odor-sampling materials. For instance, urine, feces, and blood can be sampled and presented untransformed to animal detectors. However, breath and skin secretions need optimized materials to capture VOCs without releasing other odors that could disturb detection. Some sampling materials have been presented in chapter 3.5.2 and Supplementary Table 2. For instance, Willis et al 78 report that their choice of material comprising their patches came from studies on canine scenting in forensic science. In terms of the greatest variety and quantity of skin surface VOCs collected and readily released, the optimum fiber appeared at the outset of their study to be 100% cotton, so they employed a widely available, sterile, pure cotton gauze throughout. For the chosen sampling time of 15 minutes, they were again guided by the forensic science literature.
However, such a description is an exception, and as for the choice of body fluids, we regret that the choice of materials is poorly documented. The vast discrepancies among material types strongly suggest this part of the research is still empirical and needs better understanding, characterization, and standardization. In the future, this field of research would benefit from a better description of material parameters, as it is often done in publications reporting VOC detection by GC-MS.117,118
4.2.5. Sample conservation
In chemistry, it is known that temperature variations, light, and air humidity can modify VOC profiles.119-123 Such parameters are crucial but neither well described nor consensual.
Most of the reviewed studies stored samples at low temperatures (<0°C), and only a few stored them at room temperature (see Supplementary Table 2 and Figure 5). This choice is usually not justified, except in a few studies. Willis et al 59 report that urine samples were stored primarily at −80°C, which has been the most desirable temperature for retaining chemical species. 111 Mahoney et al 65 report that their samples were frozen at −20°C until the evaluation day (up to 7 days). Though there is some controversy surrounding the cellular impact of freezing and thawing sputum, past research suggests that samples may be kept frozen without significant alteration of cell viability or cell counts. 124 Not much information is given about light. However, most studies report storing samples in a refrigerator or in a freezer, where an absence of light is evident. No information has been found about air humidity or atmospheric pressure. Conservation time and the number of sample openings lack description. The heterogeneity of VOC conservation procedures shows this part is still empirical and needs better understanding and evaluation. Guidelines on minimum, maximum, and optimum conservation conditions would undoubtedly be helpful for standardization.
4.2.6. Considerations about odor threshold
Selected animals have a sense of smell superior to that of humans. 25 For instance, Horvath et al 57 observed that trained dogs could detect a quantity of 20 ovarian carcinoma cells on the abdominal fat. However, the sense of smell is not unlimited, and it loses efficiency below a certain VOC threshold. This threshold effect has been studied in Sato et al 113 (article excluded from this systematic review). Willis et al 51 also report that they had to consider the physical state of the urine when presented to the dog. They opted to train one cohort of dogs on wet samples and another on dry samples. When tested, the dogs trained on liquid urine performed significantly better, suggesting that the more volatile molecules are important in the cancer odor signature.
Odor threshold also plays a role in dog training progression. Some teams chose to directly use the same types of samples at the training start and for testing.54,58,77 On the contrary, others started detection work with samples with a higher number of VOCs and decreased the intensity step by step.55,78,80,87 The latter strategy is supposed to be easier for the animals before lowering the threshold. These samples with more VOCs can be (i) bigger (bigger in volume, surface, quantity); (ii) more concentrated (exhaled air, sweat, etc); (iii) other types of samples, such as tumors or materials directly in contact with the tumor. However, the diversity of samples used before the final configuration is not systematically reported within studies. In addition, there may have been differences in odor intensity between diseases, especially infectious and viral diseases with strong diffusion (to be related to contagion) versus hidden tumors.
4.2.7. Frequency of sample re-use: pollution and memory effect
The number of times samples are used is not always well reported. It is evident, however, that some studies reused samples at least for some training. Here, different types of “reuses” are to consider:
• Case 1: The same sample is presented several times to the same animal detector 58
• Case 2: The same sample is presented to several dogs (several times per dog or not)53,86
• Case 3: Sample replicates of the same patient are presented to an animal detector 60
In cases 1 and 2, there is a risk of pollution (by direct contact with the animal or by its breath, by the atmosphere), which lead to sample alteration each time the sample is used. Therefore, once smelled, samples are not identical to “new” samples. Moreover, opening a sample several times may lead to a decrease in VOCs quantity. In cases 1 and 3, samples from the same person are presented several times to an animal. By doing so, there is a risk of training animals’ memory instead of discrimination. This latter issue has been reported by several teams who saw their results plummet in double-blind situations with only new samples.78,107
On the contrary, however, Willis et al 78 report that multiple uses of the same sample (melanoma samples) during training did not appear to lead to a significant loss of volatile signature since the dog continued to successfully select known melanoma samples used up to 15 times over a period of 18 months post-collection. With such observation, one can assume that the dog did not learn to discriminate samples but instead memorized one specific sample.
Ideally, an animal should smell only new (uncontaminated) samples, only once per patient (to avoid memory effect). The advantage of urine, feces, blood, and breath is that these body matrices are easy to sample or to aliquotye, allowing to have several samples very quickly. This way, several dogs can be trained with samples from the same person, while preserving their quality.
In some studies (eg, Cornu et al 58 ), some control samples were reused during testing. This does not seem to be a problem in an unforced choice configuration (cf scent line-up characteristics, part 3.7). However, in a forced-choice configuration, reusing some control samples might reduce the number of new possibilities for the dogs, leading to an easier design and higher success rate just by chance.
4.3. Animals
4.3.1. Animals
Giant pouched rats have been extensively used by one team working on tuberculosis detection in Tanzania.64,65,90 Little reasoning has been given in literature regarding the choice of animal except for their high sense of smell. Dogs are the most used animals worldwide. This choice can be justified by the availability and experience of dog trainers in many countries, for instance, for drugs and explosives detection. Dogs have the advantage of being adaptable to different fields (battle, airports, rescue, remote scent tracing, and contact with humans). However, for remote disease detection only (detection done in a controlled configuration, at a distance from patients), there is no need for such adaptation. To our knowledge, no validation study has been conducted comparing rats versus dogs. Authors generally report looking for motivated dogs with high olfaction capabilities. However, there seems to be no standard validated tests for dog selection, which so far remains empirical in the absence of clear guidelines.
Gordon et al 54 mention that it has been an ongoing theory that certain breeds are better at scent detection than others. 125 However, studies have shown a greater difference in scenting ability between dogs within a breed than between breeds. We observe variation in performance in selected studies between breeds5,101 and within the same breeds54,86,92 This has been described in Jamieson et al, 126 who concluded that a dog should not be solely chosen based on its breed due to individual variation. In addition, if we consider that evaluated dogs were for the majority selected among the best, under the watchful eyes of an experienced professional, we can assume that even more discrepancies would exist without such selection. There are an estimated 500 million dogs worldwide and, so far, less than 200 dogs have been considered potentially adapted to conduct disease screening tasks in controlled studies and achieved varying results. Such a method seems to have huge potential; however, these low numbers preclude extrapolation.
4.3.2. Selection success
In Elliker et al, 69 only 3 out of 10 dogs initially recruited for the study passed the first stage of training. According to this research team, high failure rates are common when training dogs for specialist roles because of the specific behavior/temperament attributes required.69,127,128
Despite this low selection rate, 82% of the dogs mentioned in the studies completed all the exercises requested. This number may seem high but several factors might not be included within this percentage. For example, it is likely that some studies only mention the dogs who performed well and do not mention all the dogs they evaluated before selecting their champions. The loss rate is greater when the difficulty of the exercise increases (blank runs, double-blind). As most of the studies report forced choice scent line-ups, more dogs succeed. Interestingly, Murarka et al 96 report that all dogs leaving the disease detection program and switched to other odors (eg, narcotics, bed bugs, accelerants, blood plasma) have been rapidly and successfully trained. This strongly illustrates the difficulty of disease detection with dogs compared to other odors.
Elliker et al 69 report that it has been suggested that it may be useful to breed dogs specifically for cancer odor detection, 129 which may help to increase the proportion of suitable dogs available for future studies of this type.
4.3.3. Training duration
Considerable differences in training durations are observed within studies, going from a few weeks to several years. Such differences can be explained by the type of disease to detect (Supplementary Table 1), the difference between patients and controls (Supplementary Table 1), the choice of body matrices (Supplementary Table 2), the quality of samples (Supplementary Table 2), training differences (Supplementary Table 3), animal abilities (Supplementary Table 5). No correlation was observed between training duration (Supplementary Table 3) and specificity and sensibility (Table 1) among studies (n = 34) (see Figure 8). However, Ehmann et al 61 identified an improvement in lung cancer identification capabilities along with the test series and conclude that an ongoing training effect must be assumed, calling for even more extended dog training in future studies.

Sensitivity (left) and specificity (right) are set out as a function of the training time.
4.4. Scent line-up
4.4.1. Scent line-up: Number of samples and line versus circle, the distance between samples
The number of samples presented to animals ranges from 2 (Bomer et al 63 ) to 12 (Essler et al 107 ). No justification was provided considering these numbers. No study was performed with only one sample. It has been shown in the literature that dogs were able to perform tests with one sample only. 130 In such a test, dogs have to make an absolute choice. They are asked to “evaluate.” On the contrary, when several samples are presented, the dog can perform a discrimination task and is probably more stimulated. In this situation, they are asked to “search.” All studies reviewed used the latter configuration.
Samples were presented in a line, circle, or randomly (Supplementary Table 4). The choice of a line can be explained by the easiness of designing “blank” runs, such that, at the end of the line, the dog can indicate that no positive sample was found. Blank runs can also be done in a circle configuration. The advantage of the latter is that there is no start or end, so all samples are equivalent.
Space between samples is fundamental for several reasons. The most obvious reason is to preclude cross-contamination between samples. Another less apparent reason is that it gives enough time for latency and persistent olfaction times. Latency is defined as the necessary time to get an olfactive stimulus, estimated at 0.5 seconds for dogs. Persistence time is the duration the olfactive sensation stays. If samples are too close together, these durations cannot be respected, and the dogs risk either missing a sample or mixing signals.
4.4.2. Scent line-up configurations: Forced versus unforced choice
Using forced versus unforced choices scent line-ups have a strong influence on performances. 112 Unforced choice exercises are more complex. In a forced-choice exercise, the animal learns only one configuration. They know they must “find” the odor of the disease. As a result, they chose the sample which resembles the most to the target or the one which is the odd one out. Moreover, Bomers et al 63 report that anticipation of a single positive result could have influenced the trainer’s behavior, thereby unintentionally influencing the dog’s response. 116 Such configuration is therefore not only easier for the animal but also for the handler. On the contrary, animals must evaluate each sample in an unforced choice configuration and cannot choose only by simple comparison. This is a difficulty that not all animals can overcome. An unforced choice situation is, however, the only one that could be applied for screening.
With the particular configuration reported by Murarka et al 96 (see results), the dog has only one sample to evaluate, while the distractor is there for stimulation. 96 Such configuration is an interesting tradeoff between one versus several samples scent line-ups described above and can easily be applied for screening (see Supplementary Table 4).
4.4.3. Atmospheric conditions
Atmospheric conditions during training and testing are known to affect dogs’ sense of smell. These conditions are poorly documented within the reviewed studies. Those who did, however, reported working with controlled temperatures between 12°C and 20°C (see Supplementary Table 4). It can also be seen that, when not under control, this can negatively impact scent detection work, for instance, reported by Sonoda et al 60 where tests were conducted from 13 November 2008 to 15 June 2009 because the dog’s concentration tended to decrease during the hot summer season. As well, Hackner et al 77 observed some limiting influences including high humidity and elevated ambient temperature, which were found to be detrimental to the dogs’ performance. They suggest that testing should not be performed during unfavorable weather conditions.
4.4.4. Blinding conditions
4.4.4.1. Proofs of principle versus double-blind clinical trials in a screening-like situation
For a potential deployment of disease detection with animals, only double-blind clinical trials in a screening-like situation (ie, unforced choice) might be useful (see chapter 3.9 for blinded conditions). To date, only 6 studies meet these expectations (Supplementary Table 5). Focusing on these studies, the results usually decreased at first when shifting to double-blind. This drop between training and double-blind testing has often been explained by the Clever Hans effect. 115 To avoid failure, teams must train as much as they can in blind situations, as suggested by Gordon et al 54 who report that the use of blinding during the training should be initiated early to preclude unintended clues by the trainers that may contaminate the process. Willis et al 78 reported that after training the dog in a non-blinded situation, their trainer reported back a near 100% success rate in identifying the melanomas. It was decided to begin a series of double-blind tests. However, after 13 runs, the dog had successfully identified only one of the melanoma samples. 78 Implementing blinded conditions is not easy during training because dog handlers need to know when to reinforce positive behavior. To do so, a non-blinded assistant hidden from the dog and who can quickly tell the handler when to reinforce is needed.
4.4.4.2. Rewarding or not the dogs in a screening-like situation: a puzzling question
In a screening-like situation, nobody knows whether the animal’s indication is correct or not, which can be an issue for the reward. Indeed, if the trainer decides not to reward the animal, the latter can little by little lose interest. On the contrary, if the animal is rewarded every time, this might reinforce biases in case of incorrect indications. Therefore, several strategies are adopted among teams.
For instance, McCulloch et al 53 report that, since the experimenters no longer knew the status of the target breath sample, they did not activate the clicker device after a sitting indication by the dog, and therefore the handler did not reward the dog with any food. Bomers et al, 63 in the case of C. difficile infections, search in hospital wards, confirm that surveillance is principally different from the type of case-directed diagnosis in their study design because the dog cannot immediately receive a reward after a positive identification, potentially extinguishing the trained alert. The same solution was adopted by Willis et al 59 : “Both the trainers and researchers remained blinded throughout the trial, only breaking the sample and positional codes at the very end, meaning that the dogs could not be rewarded for a correct indication immediately after each test run. The trainers reported that, over time, this led to a loss of confidence in the dogs, with a deterioration in their performance.” On the contrary, Elliker et al 69 performed 2 types of tests. On the first one, they were in a DB2 situation and decided to reward the dog for each indication. However, during 3 rigorously controlled double-blind tests involving urine samples from new donors, the dogs did not indicate cancer samples more frequently than expected by chance. The team finally switched to a DB1 situation, to be able to reward the dogs only for positive responses. These are exceptions because most of the studies were conducted in DB1 configuration, which allowed trainers to know whether to reward the dog or not after each line.
According to Biehl et al, 92 rewarding dogs’ work has to be independent of the results achieved and should refer only to the work done. If dogs are only rewarded for positive indications, they will quickly learn to achieve more rewards through positive indications, which could easily lead to higher false-positive results. Hackner et al 77 attributed the inferior results to the true double-blind and screening-like conditions. They report that this factor posed immense stress on the dogs and their handlers, and therefore suggest positive feedback mechanisms for future study designs. According to them, it seems to be favorable to confront dogs relatively often with the pattern odors. Their results suggest that a test situation where dogs will always find an unblinded positive and ignore an unblinded negative sample in the line-up would probably be better. The positive sample would create the opportunity to earn a reward and would reinforce the dogs’ motivation. The negative sample assures the handler that the dog is still performing well. The other samples in the line-up should be the blinded test samples.
Another similar solution would be to alternate training lines and test lines. It could be decided that one test line has to be performed only after an amount (to determine) of successful training lines. Another training line could be performed right after the test to ensure the dog is still doing well. Such a pattern is feasible for implementation; however, it would slow the testing throughput.
This subject is crucial for implementing such a method, and no consensus nor solution has been achieved so far.
4.5. Applications/Implementation
Pickel et al 52 published a proof of concept with dogs sniffing directly human melanoma. Even if scientifically feasible, such a technique seems hardly applicable in the field. Since then, several studies using remote disease detection have step by step built a new scientific discipline. This review shows that no scientific study has validated that animals can be used as a first-line remote detection tool prior to existing technologies. Only APOPO, the organization supervising Giant Pouched Rats detecting tuberculosis in Tanzania, has found its place as a second-line screener, which makes sense for tuberculosis detection.56,131
4.5.1. How many dog decisions are needed to identify the target condition?
Most studies focus on the performances of each animal separately. However, as animals are living organisms, their performances can be subject to variations. Biehl et al 92 reported that literature data show that some dog trainers included only one dog in scent detection, whereas others had 5 to 6 dogs and collected the individual dogs’ data. McCulloch et al 53 stated that the sniffing quality of all dogs was comparable, and therefore the results obtained were similar. However, Ehmann et al 61 found differences in hit rates between individual dogs and consequently defined a “corporate dog decision” that required at least 3 out of 5 dogs with an identical decision. Amundsen et al, 67 as well as Hackner et al, 77 also showed considerable variations in single dogs’ results. These variations might be due to the dogs’ different sniffing capabilities and the dogs’ different daily conditions and training.
Biehl et al 92 report that in their study, single dogs’ results showed great differences concerning sensitivity in the range of 0.22 to 0.67 and concerning the specificity of 0.71 to 0.89. They conclude that it is advisable not to rely on a single dog’s decision but to define a corporate decision to minimize variations arising from the single dogs. This choice is not straightforward. Indeed, Mahoney et al 65 report that sensitivity declines and specificity increases when 2 individual animals are employed because a positive sample can be indicated twice. On the contrary, if only the indication of 1 of the 2 animals is needed, the sensitivity will increase, but specificity will drop. The argument can be declined for more animals and indications. For instance, Gordon et al 54 report that, at the time, their study was the only one to incorporate replicates for assessing specificity. There were 3 and 2 replicates (33 and 18 runs) for the prostate- and breast cancer patients, respectively. The team adds that any study, ultimately attempting to prove canine superiority over conventional cancer screening, must include replicates and, in the future, go head-to-head with standard screening methods. Another example is Mgode et al, 64 where for tuberculosis detection, a sample is considered positive if selected by 2 rats. Such a corporate decision is a tradeoff that has not found a consensus yet.
4.5.2. Number of samples to train a dog and maintain performances
The number of samples available for training is crucial. Indeed, many samples are needed so that animals learn to generalize and do not memorize each sample. Quantity is essential to work as often as possible with new (non-polluted) samples and limit the “novel object preference.” Willis et al 59 report their protocol also avoids the phenomenon of novel-object preference, whereby dogs preferentially chose unfamiliar items over familiar ones. 132
This is not straightforward, as organizing efficient logistics to gather samples continuously can be challenging to implement. For instance, Gordon et al 54 report that it took longer than anticipated to obtain enough samples to prepare for the final testing. This resulted in the training being spread over an extended period, 12 to 14 months. Possibly, the animals were periodically memorizing individual patients rather than recognizing an “odor signature” for cancer despite utilizing a large number of training samples. An ongoing system of recruitment of patients with cancer and control patients needs to be established, so the dogs have adequate numbers of new samples to maintain their proficiency even after the conclusion of the study. This has also been reported by Ehmann et al, 61 who wrote that during the training and also later in the testing, every test tube containing a human breath sample was used only once to preclude simple memory recognition of participants’ unique odor signatures.
This need for the continuous arrival of new samples is a huge limitation. Indeed, if intended to be implemented in countries with low access to diagnostics, this arrival of new samples from screened patients and controls will be limited. This implies continuous logistics and partnerships with hospitals that might not be cost-effective.
4.5.3. Field implementation
If scientifically validated, remote scent medical detection implementation will have to overcome several issues. First, if implemented in populations with low health access, such detection will have sense only if care can follow. We saw that many known samples, both from patients and controls, are required to train animals. If implemented in an area with low access to gold standard detection, sample recruitment might be compromised.
Routine adoption of such detection raises the question of the number of samples which can be screened every day and its cost. From the studies reviewed, it seems that one dog, if efficient, could screen roughly a dozen of new samples per day. Willis et al 78 report that only one new test was conducted per week, with training sessions in between, which is not very efficient for mass screening. Rats, however, seem to be able to screen more samples, as reported by Weetjens et al 56 “The use of trained rats to detect tuberculosis is reliable, potentially cheaper, and faster than sputum smear microscopy. One evaluation cage can contain more than 12 rats per day, and one rat can screen 140 samples in 40 minutes. The evaluation set-up can therefore process up to 1680 samples per day, while a microscopist can process up to only a maximum of 40 samples per day (WHO recommends an average of 20 samples per day).133,134
Another important consideration is the prevalence of the disease to be detected. Indeed, if very few positive samples are present, this could lower animals’ motivation and accuracy. Hence the importance of training sessions with regular new known positive samples.
As discussed in chapter 4.2.2, such detection will be helpful if the odor and/or the sampling localization is specific to a shortlist of diseases. If not, then in the case of an alert, medical staff will not know what to look for.
Free-running rapid detection might be useful for infectious diseases. Free-running proofs of concept have been published for C. difficile infection detection with encouraging results 70 However, such detection has not been proven yet to work in the field for other diseases. So far, published articles report successful proofs of concept in remote conditions (like for cancer). Free running detection has recently been presented as an objective by several teams working on SARS-COV-2 detection. For instance, Guest et al 109 report that their preparatory work indicates that 2 dogs could screen 300 people in 30 minutes, for example, the time it takes to disembark from a plane, and PCR would only need to be used to test those individuals identified as positive by the dogs. However, no study has demonstrated such application in real screening conditions in contact with people in public places so far. On a different disease, Taylor et al 89 report that despite being highly trained, dogs are vulnerable to distractions and other foreign stimuli in a unique social environment. 135 Concerning their study, Essler et al 107 report that though dogs have previously been shown to be able to discriminate between saliva samples of SARS-CoV-2 positive and negative patients, these studies are also using repeated presentations of the same samples. Thus, it is possible dogs can discriminate between their training set of positive and negative patient samples but are unable to generalize this odor to new samples. These considerations are major limitations that preclude short-term implementation. However, this relatively new field of research is progressing quickly, and future studies may address these issues.
Finally, disease diagnostics can be expensive and complex to implement because of costly infrastructure and instruments, the need for consumables, and high-skilled professionals (MSc, PhD, MD). In this context, several teams claim that medical remote scent detection with animals might be cheap, however, this has yet to be proven. Cornu et al 58 report that in their proof-of-principle study, they tested a limited number of subjects in a costly, long study that makes it difficult to conceive of extended use for this test in clinical practice. Similarly, Sonoda et al 60 declare “it may be difficult to introduce canine scent judgment into clinical practice owing to the expense and time required for the dog trainer and dog education.” No socio-economic study on the subject was found.
5. Conclusions
According to Hackner et al, 77 a suitable screening method should provide a true negative rate of near to 100% to be sufficient for safe use. Despite the number of studies reporting the potential capacity of trained animals to be used as disease detectors in a clinical setting, no validation has been issued so far. Willis et al 78 alert that introducing canine diagnosis of cancer in the absence of adequate validation, and without external quality assurance measures in place, may raise some of the same patient safety issues as those highlighted by the British Medical Association in their 2005 report on unregulated screening tests. 136
Interestingly, several teams do not recommend the use of such a technique routinely. For instance, Horvath et al 55 wrote that they do not believe that dogs may be used in clinical practice. Dogs as “living instruments” may be influenced by several factors before and during their work, leading to changes in the accuracy rates. However, under controlled circumstances, they may be used in experiments to further explore the odor of malignancies. In Willis et al, 59 researchers do not advocate the use of dogs in a clinical setting. The authors hope that a greater understanding of the VOC biomarkers associated with bladder cancer, and urological disease more widely, will help optimize the design of an electronic nose. It has been suggested by Taverna et al72,73 that dogs could be used to explore the response to cancer treatment or relapses in conjunction with VOCs identification. Although no direct comparison studies have been performed, for now, dogs appear to outperform e-noses.35,137-139
The implementation of dogs to detect infections in a free-running setting (in contact with humans) has still to prove efficiency. For instance, Bomers et al 63 report that a limitation of using an animal as a diagnostic tool is that behavior is not fully predictable. The dog’s reaction to other stimuli (eg, children’s play, being beckoned, being offered a treat) illustrates that dogs are still prone to distraction despite a high level of training.
However, this research field has made considerable progress since 2004, research teams, programs, and networks are constituted, and the main scientific obstacles seem to have been identified. By carrying out studies on materials, VOCs conservation conditions, and by better mastering the selection and variability of dogs, a rigorous process will undoubtedly lead to possible implementation. Medico-economic studies still need to be conducted.
Finally, the work done in chemistry on the olfactory signature of diseases is complementary and will probably help to understand better and standardize research conducted with animals. Subsequent progress on this subject should determine more clearly what will be possible to implement in the future.
Supplemental Material
sj-xlsx-1-ict-10.1177_15347354221140516 – Supplemental material for Remote Medical Scent Detection of Cancer and Infectious Diseases With Dogs and Rats: A Systematic Review
Supplemental material, sj-xlsx-1-ict-10.1177_15347354221140516 for Remote Medical Scent Detection of Cancer and Infectious Diseases With Dogs and Rats: A Systematic Review by Pierre Bauër, Michelle Leemans, Etienne Audureau, Caroline Gilbert, Carole Armal and Isabelle Fromantin in Integrative Cancer Therapies
Footnotes
Acknowledgements
Not applicable
Authors’ Information
P.B. and I.F. are members of the KDOG program, a research project led by Institut Curie aiming at evaluating the capacity of trained dogs to detect breast cancer from skin secretion samples. This program receives funding and support from Royal Canin company and Seris Security company, and private donors.
Authors’ Contributions
P.B.: Protocol writing, title, and abstract screening, data extraction, manuscript writing, discussion, and submission. M.L.: title and abstracts screening, data extraction, discussion, and full-text reading. C.A.: data extraction, discussion, and full-text reading. C.G., E.A.: Discussion and full-text reading. I.F.: Research idea, data extraction, discussion, and full-text reading. All authors read and approved the final manuscript.
Availability of Data and Materials
All data generated or analyzed during this study are included in this published article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Royal Canin Foundation (KDOG project, 2021 project cycle).
Ethics Approval and Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
