Abstract
Contaminated fresh produce remains a prominent catalyst for food-borne illnesses, prompting the need for swift and precise pathogen detection to mitigate health risks. This paper introduces an innovative strategy for identifying food-borne pathogens in fresh produce samples from local markets and grocery stores, utilizing optical sensing and machine learning. The core of our approach is a photonics-based sensor system, which instantaneously generates optical signals to detect pathogen presence. Machine learning algorithms process the copious sensor data to predict contamination probabilities in real time. Our study reveals compelling results, affirming the efficacy of our method in identifying prevalent food-borne pathogens, including Escherichia coli (E. coli) and Salmonella enteric, across diverse fresh produce samples. The outcomes underline our approach's precision, achieving detection accuracies of up to 95%, surpassing traditional, time-consuming, and less accurate methods. Our method's key advantages encompass real-time capabilities, heightened accuracy, and cost-effectiveness, facilitating its adoption by both food industry stakeholders and regulatory bodies for quality assurance and safety oversight. Implementation holds the potential to elevate food safety and reduce wastage. Our research signifies a substantial stride toward the development of a dependable, real-time food safety monitoring system for fresh produce. Future research endeavors will be dedicated to optimizing system performance, crafting portable field sensors, and broadening pathogen detection capabilities. This novel approach promises substantial enhancements in food safety and public health.
Introduction
Food-borne illnesses continue to pose a substantial threat to public health, and fresh produce stands as a significant contributor to these concerns. Despite advancements in food safety regulations and technologies, the Centers for Disease Control and Prevention (CDC) reports that fresh produce is responsible for nearly half of all food-borne illness outbreaks. 1 These outbreaks can lead to severe health complications and, tragically, even fatalities, underscoring the urgent need for more effective methods to detect and prevent the spread of food-borne pathogens. Traditional approaches to detecting these pathogens involve time-consuming and costly laboratory tests, which can take days to yield results. Such delays can result in contaminated produce reaching consumers, triggering further outbreaks of food-related illnesses. Figure 1 presents both types of pathogenes present in food.

Pathogens in food products.
Addressing this challenge demands the development of rapid and accurate detection methods that enable real-time monitoring of food safety within the agriculture industry. Recent years have witnessed substantial advancements in photonics-based sensing technologies and machine learning algorithms, both of which hold immense promise for enhancing food safety monitoring. Optical sensors, in particular, have emerged as a highly effective approach for detecting and quantifying pathogens in food samples, driven by their exceptional sensitivity and specificity. State-of-the-art optical sensing techniques, including Raman spectroscopy, Surface Enhanced Raman Spectroscopy (SERS), and fluorescence imaging, 2 offer rapid and precise pathogen detection capabilities, making them suitable for real-time monitoring.
Machine learning algorithms play a crucial role in analyzing data generated by these sensors. They excel at identifying patterns and making predictions about the likelihood of pathogen contamination in produce samples. Furthermore, these algorithms continuously improve detection accuracy by learning from historical data and adjusting detection thresholds accordingly. 3 In recent years, deep learning approaches, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have emerged as powerful tools for automatically extracting features from sensor data. They are particularly well-suited for analyzing complex datasets. In this paper, the author presents an innovative approach that combines optical sensing and machine learning to enable real-time detection of food-borne pathogens in fresh produce. We delve into the design and implementation of a photonics-based sensor system that offers real-time pathogen detection. Additionally, we discuss the machine learning algorithms used to analyze the sensor data. Through empirical evidence, we demonstrate the effectiveness of our approach in identifying common food-borne pathogens across a variety of fresh produce samples. Our research represents a significant step towards creating a practical and reliable system for real-time food safety monitoring in fresh produce, with the ultimate goal of enhancing public health and reducing food waste.
Literature review
Photonics-based sensing for food safety monitoring
Photonics-based sensing technologies have demonstrated significant potential for detecting and quantifying pathogens in food samples, owing to their remarkable sensitivity and specificity. Spectroscopic techniques, including Raman spectroscopy, fluorescence spectroscopy, and surface-enhanced Raman spectroscopy (SERS), have been widely employed for the detection of pathogens in food samples. Additionally, imaging techniques like hyper-spectral imaging and fluorescence imaging have shown considerable promise in real-time detection and identification of food-borne pathogens.
In the paper titled “Detection and discrimination of pathogenic bacteria with nanomaterials-based optical biosensors: A review” by Xiaodong Lin et al., 1 the focus is on the use of nanomaterials-based optical biosensors for the detection and discrimination of bacteria in food safety. The study delves into future trends in bacterial detection and discrimination, with an emphasis on the role of machine learning in enabling intelligent, rapid detection, and accurate identification of bacteria. 1 Another study, “Two birds with one stone: A multifunctional nanoplatform for photothermal-sensitive detection and real-time inactivation of Staphylococcus aureus,” also by Xiaodong Lin et al., 4 introduces a novel approach. This approach involves the use of vancomycin-modified near-infrared (NIR)-responsive copper selenide nanoparticles (Cu2−XSe@Van NPs) for photothermal determination and hyperthermal inactivation of bacteria. It not only provides a sensitive point-of-care testing (POCT) strategy for bacterial determination but also allows for the efficient inactivation of bacteria simultaneously. 4 In yet another paper, “Bacteria-Triggered Multifunctional Hydrogel for Localized Chemodynamic and Low-Temperature Photothermal Sterilization” by Xiaodong Lin et al., a multifunctional hydrogel is developed for low-temperature photothermal sterilization with high efficiency. This is achieved by integrating localized chemodynamic therapy (L-CDT), where hydroxyl radicals are generated for sterilization based on L-CDT within a short range. Coupled with the photothermal properties of CuSNPs, this approach enables low-temperature photothermal therapy (LT-PTT) for sterilization, which enhances antibacterial efficiency while minimizing damage to normal tissues. 2 Several other relevant studies contribute to the understanding of optical biosensors in food safety:
“A review of rapid methods for the analysis of foodborne pathogens” reviews various rapid methods for foodborne pathogen analysis and highlights the potential of optical biosensors to provide rapid, sensitive, and specific pathogen detection in food. 3 “Optical biosensors for pathogen detection in food” discusses the advantages of optical biosensors, including high sensitivity, selectivity, and speed, in the context of pathogen detection in food. 5 “Rapid detection of foodborne pathogens: current and emerging technologies” by Chen et al., 6 reviews current and emerging technologies for rapid foodborne pathogen detection, emphasizing the potential of optical biosensors combined with machine learning algorithms for fast and accurate detection. 7 “Recent advances in optical biosensors for environmental monitoring and early warning” explores recent developments in optical biosensors for environmental monitoring and underscores their potential for the rapid detection of pathogens in food and water. 8
Machine learning-based approaches for food safety monitoring
Machine learning algorithms have found extensive application in various aspects of food safety monitoring, encompassing pathogen detection, quality assessment, and traceability. Concerning pathogen detection, these algorithms analyze data from a range of sensing technologies, including optical sensors, to discern patterns and make predictions about the presence of food-borne pathogens in food samples. Moreover, machine learning algorithms extend their analysis to data from other sources, such as genomics and proteomics, in the development of predictive models for food-borne pathogen detection. Our approach builds upon these established technologies by amalgamating photonics-based sensing with machine learning algorithms to offer real-time food safety monitoring within the agricultural industry.9,10 In the paper titled “Optical biosensors for food quality and safety assurance: A review” by Islam et al., 11 the authors conduct a comprehensive review of various optical biosensors for food quality and safety assurance. They conclude that these biosensors hold the potential to deliver rapid and sensitive pathogen detection in food. 9 Another study, “Recent advances in optical biosensors for point-of-care diagnostics” by Wang et al., 12 reviews recent developments in optical biosensors for point-of-care diagnostics and underscores their potential for swift and sensitive pathogen detection in food and other samples. 10 In “Recent advances in optical biosensors based on localized surface plasmon resonance for detection of foodborne pathogens” by Feng et al., 7 the authors evaluate recent progress in optical biosensors utilizing localized surface plasmon resonance for foodborne pathogen detection. They emphasize the potential of these biosensors for prompt and sensitive pathogen detection in food. 13 Additionally, the paper “A review of recent advances in optical biosensors for food safety monitoring” by Ren et al., 5 reviews the latest advancements in optical biosensors for food safety monitoring. The authors conclude that these biosensors have the potential to deliver rapid and sensitive detection of pathogens in food, as well as other contaminants like pesticides and toxins. 14
Design and implementation of sensor system and data collection process
Design and implementation of a photonics-based sensor system for detecting food-borne pathogens in fresh produce includes hardware and software components as well as the data collection process. Figure 2 below indicates the module for the proposed system.

Hardware module for the proposed system.
Hardware and Software Components- The proposed sensor system consists of a photonics-based sensing module, a data acquisition system, 9 and a machine learning platform. The sensing module is designed to generate optical signals in response to the presence of food-borne pathogens in fresh produce samples. The data acquisition system captures and processes the optical signals generated by the sensing module and sends the data to the machine learning platform for analysis. The machine learning platform includes a set of algorithms for analyzing the data and making predictions about pathogen contamination in the produced samples.
Data Collection Process- To collect data for training and testing our machine learning algorithms, we obtained a variety of fresh produce samples from local markets and grocery stores. We spiked the samples with known concentrations of common food-borne pathogens, such as Escherichia coli (E. coli) and Salmonella enteric, to simulate pathogen contamination.10,13 We then processed the samples using our photonics-based sensor system and collected the resulting optical data for analysis.
The design and implementation of the sensor system and data collection process provide a solid foundation for training and testing our machine learning algorithms for detecting food-borne pathogens in fresh produce.
The dataset used for this research is a primary dataset. It is mentioned that the researchers obtained a variety of fresh produce samples from local markets and grocery stores. These samples were then spiked with known concentrations of common food-borne pathogens to simulate pathogen contamination. This process of directly collecting and preparing samples for the research constitutes the generation of primary data specifically for the study's purposes. Therefore, the dataset used is not pre-existing or collected for some other purpose, making it a primary dataset.
In the next section, we describe the machine learning algorithms used to analyze the sensor data and make predictions about pathogen contamination.
Machine learning algorithms for pathogen detection
The machine learning algorithms are used to analyze the sensor data and make predictions 14 about pathogen contamination in fresh produce samples. We provide details on the feature extraction process, the training and testing procedures, and the performance metrics used to evaluate the algorithms.
Feature Extraction- To prepare the optical data for analysis, we first extracted a set of features from the data using various signal processing techniques, such as wavelet transform and principal component analysis (PCA). These features capture the key characteristics of the optical signals generated by the sensing module in response to the presence of foodborne pathogens in the produce samples. 15
Training and Testing- We trained and tested our machine learning algorithms using a dataset of optical data collected from fresh produce samples spiked with different concentrations of Escherichia coli (E. coli) and Salmonella enteric. 6 We used a random forest (RF) classifier and a support vector machine (SVM) classifier to predict the presence or absence of pathogen contamination in the samples.
Training Set: This set is used to train the machine learning models. The training set typically contains the majority of the data, around 75% of the total dataset.
Testing Set: The testing set is used to evaluate the performance of the trained models. It is crucial to assess how well the models generalize to unseen data. The testing set usually contains the remaining portion of the data, which can be around 35% of the dataset.
Validation Set: The validation set is used for hyperparameter tuning and model selection. It helps in fine-tuning the model's parameters to achieve the best performance. It is around 15% of the dataset.
The dataset is divided into k equal-sized folds, and the model is trained and tested k times, with each fold serving as the testing set once.
To analyze the performance of an RF classifier for pathogen detection, the optical data is first preprocessed and feature extracted. The features are then used as input to the RF classifier, which is trained on a labeled dataset of fresh produce samples with known pathogen contamination status. The accuracy of the SVM model depends on the quality of the data, the selection of relevant features, and the optimization of the model. 11 By using SVM, it is possible to detect food-borne pathogens in fresh produce more accurately and efficiently, which can help to reduce the risk of foodborne illness. Stratified k-fold cross-validation is a technique used to evaluate the performance of machine learning models, including Random Forest (RF) and Support Vector Machines (SVM). The general steps involved in using stratified k-fold cross-validation to evaluate the performance of these models are as follows:
Data splitting: The dataset is split into training and testing sets. In stratified k-fold cross-validation, the dataset is split into k folds, where each fold contains a proportional representation of the classes in the dataset. 16
Model training: The RF and SVM models are trained on the training set. For each fold, the model is trained on k-1 folds and tested on the remaining fold.
Model evaluation: The performance of the RF and SVM models is evaluated on the testing set. The evaluation metrics used to measure the performance of the models include accuracy,12,17–20 precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC).
Model averaging: After evaluating the performance of the models on each fold, the results are averaged to get an estimate of the overall performance of the models.
Model selection: Based on the average performance of the models, the best model is selected. This can be done by comparing the performance metrics of the RF and SVM models.
Using stratified k-fold cross-validation can help to reduce the bias in the evaluation of machine learning models and provide a more accurate estimate of their performance. It is particularly useful in situations where the dataset is imbalanced, and the classes are not evenly distributed.21–24
Experimental results
The PCF sensor used for detecting these food-borne pathogens will be suitably used in the near-infrared wavelength region, typically ranging from 450 nm to 800 nm. The refractive index observed in the selected region shows high sensitivity and accuracy. Let's assume that we have a binary classification problem, where the target variable can take two values: 1 for the presence of food-borne pathogens and 0 for the absence of food-borne pathogens. Then we can evaluate accuracy, precision, recall, and F1-score as follows:
Accuracy: To calculate accuracy, we need to count the number of correctly predicted cases and divide it by the total number of cases.
Precision: Precision measures the proportion of correctly predicted positive cases out of all the predicted positive cases.
Recall: Recall measures the proportion of correctly predicted positive cases out of all the actual positive cases.
F1-score: The F1-score is the harmonic mean of precision and recall.
Sensitivity: It is the proportion of true positives that are correctly predicted by the model.
Specificity: It is the proportion of true negatives that are correctly predicted by the model.
These metrics are calculated as follows:
Where:
TP (True Positive) = number of correctly predicted positive cases
TN (True Negative) = number of correctly predicted negative cases
FP (False Positive) = number of incorrectly predicted positive cases
FN (False Negative) = number of incorrectly predicted negative cases
Escherichia coli (E. coli) Impact
On the basis of the dataset collected, it was observed that there is a significant impact on hospitalization and fatalities due to Escherichia coli (E. coli). The results are illustrated in Figure 3:

Parameters observed due to Escherichia coli using ML techniques.
Salmonella enterica impact
Another dataset was used and presented for Salmonella enterica, and the results are plotted in Figure 4. Both Salmonella enterica and E. coli were found to be food-borne pathogens.

Parameters observed due to Salmonella enterica using ML techniques.
The detected pathogens show concentrations as low to moderate within the specified wavelength region. It is presented in below Table 1.
Different parameters of pathogens detected.
It shows that the refractive index for Escherichia coli (E. coli) is 1.345 to 1.350 nm while it is 1.355 to 1.360 nm for Salmonella enterica.
Accuracy and sensitivity analysis
Accuracy and Sensitivity were observed for SVM and RF and are presented in Figure 5. It shows that both machine learning techniques yield approximately equal outcomes.

Accuracy and sensitivity observed for SVM and RF.
Additionally, a confusion matrix for the training and test datasets for SVM and RF is presented in Figure 6.

Confusion matrix for training and test data set for SVM and RF.
Performance evaluation
The results showed that both the random forest and SVM classifiers achieved high prediction accuracy, with F1-scores above 0.9 for both classifiers.
Overall, the presented machine learning model demonstrates the potential of combining photonics-based sensing 13 with machine learning for rapid and accurate detection of food-borne pathogens in fresh produce. In the next section, we discuss the implications of our work and potential directions for future research.
Limitations
This approach may not fully capture the complexity of pathogen contamination in real-world scenarios. The performance of the photonics-based sensor system can be influenced by environmental factors and sample variability, which may not have been comprehensively addressed in this study. As technology and data analysis techniques continue to evolve, the study's findings may become subject to obsolescence, underscoring the need for ongoing research and refinement in this field.
Conclusion
It is concluded that SVM and RF demonstrated the potential use of combining photonics-based sensing for rapid and accurate detection of food-borne pathogens in fresh produce. Both the models (SVM and RF) showed high accuracy and sensitivity. The results of the study showed that the combination of optical sensing and machine learning algorithms can accurately detect and classify food-borne pathogens, such as Escherichia coli (E. coli) and Salmonella enteric, in fresh produce samples. The use of optical sensing in this study also demonstrated its potential as a rapid and non-destructive method for detecting food-borne pathogens. Compared to traditional methods, such as culturing, optical sensing is faster, less expensive, and requires less sample preparation.
Footnotes
Acknowledgements
I am very thankful to almighty God. After that I would like to thank Dr Lokesh Tharani for guiding me and providing me resources.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Information on patient consent
We have not performed any clinical trials on human beings.
Author biographies
Sunil Sharma is a research scholar at RTU, Kota under the supervision of Dr. Lokesh Tharani.
Lokesh Tharani is an associate professor in Department of Electronics Engineering at RTU, Kota.
