Abstract
Robust in-field sensing technologies are essential for advancing precision agriculture and autonomous field robotics toward analysing internal quality attributes of fruits and vegetables. This study demonstrated in-the-field, non-contact near-infrared (NIR) spectroscopy for determining total soluble solids (TSS), a measure of sugar content, in on-the-plant strawberries under daytime conditions. A compact NIR interaction instrument (750–1020 nm), designed for robotic operation, was built and tested in a polytunnel environment under varying day- and night-time conditions. The instrument was calibrated using a partial least squares regression (PLSR) model built on laboratory data collected in 2025 from 200 strawberries of a single variety. It was tested on 100 strawberries of two varieties that were measured in 2024, while still attached to the plant. During night-time operation, TSS was predicted with a standard error of prediction (
This is a visual representation of the abstract.
Keywords
Introduction
Precision agriculture is becoming increasingly important as the food production sector strives to improve efficiency, sustainability, and product quality. This transformation is accompanied by the growing demand for healthy, fresh, and sustainably produced fruits and vegetables, and consumer willingness to pay for consistent flavour and freshness.1,2
Autonomous field robots are central to this transformation, enabling more automated and precise farming operations. Commercial examples include UV-based treatment of powdery mildew, eliminating the need for pesticides, 3 and robotic weeding systems. 4 While harvesting robots hold great potential for optimising farming operations, commercial deployment remains limited due to significant technical challenges. 5 Their success partly depends on rapid, non-destructive in-field sensing technologies capable of providing information on crop status, ripeness, and quality. Such sensors are essential for resource optimisation through improved crop management, better labour management, and more precise harvest timing.
Thus, there is a growing need for novel sensing technologies that can assess the internal quality of fruits and vegetables. Conventional methods for analysing sensory qualities are destructive, limiting measurements to a few representative samples. For instance, total soluble solids (TSS), a measure of sugar content, is typically determined via refractometric analysis of fruit juice. Non-destructive alternatives include RGB cameras, which offer a low-cost way to evaluate colour, shape, and texture for predicting ripeness and some quality parameters.6,7 However, these cameras probe only the surface and are insufficient for robust estimation of internal attributes such as sugar content, acidity, or dry matter. Similarly, hyperspectral cameras operating in reflection mode have limited penetration depth: they can detect surface features such as microbial infection and surface defects, but not sub-surface properties like internal bruising. 8 While such systems can provide reliable estimates when surface characteristics are representative, for example, for TSS in grapes, 9 they fail for heterogeneous fruits where surface appearance does not reflect internal quality.10,11
Near-infrared spectroscopy (NIRS) offers a feasible alternative for rapid, non-destructive assessment of internal fruit quality. Using an interaction sampling geometry, where illumination and detection fields are spatially separated, NIRS can capture sub-surface information that is more representative of heterogeneous samples. 12 Moreover, NIRS enables fully non-contact operation, which is important to prevent disease spread and mechanical damage to the fruit.
Strawberry (Fragaria
Wold et al. 11 demonstrated that non-contact NIR interaction spectroscopy can robustly determine TSS in strawberries. Since TSS is heterogeneously distributed within the fruit and the surface is prone to weather-dependent variations, interaction measurements outperform reflection geometries, producing prediction models with improved inter-seasonal robustness. However, the inherently weak signals of the interaction geometry, combined with the subtle spectral variations associated with TSS, makes strawberries a particularly challenging application requiring instrumentation with high signal-to-noise ratio (SNR). 11
Although NIRS has been widely explored for post-harvest assessment of fruit and vegetable quality, few instruments combine non-contact and sub-surface interaction sensing, 14 and few studies have demonstrated successful in-field implementation with fruits still attached to the plant. 6 Achieving this capability is critical to fully capitalising on agricultural robots’ potential to provide non-invasive measurements and monitor fruit quality over time without harvesting. In-field operation, however, imposes strict requirements on the sensor, which must be compact, low-power, portable, and robust to challenging outdoor conditions such as strong and fluctuating ambient illumination, ambient temperature variations that affect sample temperatures, and complex measurement scenes. On-the-plant, non-contact configurations are particularly susceptible to interference from ambient light compared with contact measurements, where the ambient light is shielded, and optimal measurement setups, where direct ambient light can be blocked or shaded. Wold et al. 11 recently demonstrated in-field, non-contact determination of TSS in strawberries using an NIR interaction geometry. While that work confirmed the feasibility of field-based NIRS, it used a benchtop instrument that was unsuitable for on-the-plant or robotic operation.
In this work, we demonstrate non-contact measurements of TSS in on-the-plant, polytunnel-grown strawberries under daytime conditions using a compact prototype NIR instrument (750–1020 nm) that we designed and built for robotic operation. We systematically investigate how environmental factors interfere with spectral measurements and assess their impact on data quality and calibration performance under varying field conditions. Furthermore, we evaluate strategies to mitigate these interferences and enhance measurement robustness. Based on these findings, we propose practical design guidelines for developing robust outdoor spectral analysers, contributing to the translation of laboratory-based methods into real-world agricultural applications.
Materials and Methods
Datasets
The ever-bearing strawberry cultivars (cvs.) Favori and Aurora were used in the experiment. These cultivars are commercially important to Norway and well suited to table-top cultivation systems that enable efficient plant care, harvesting, and integration with robotic operations. Plants were grown outdoors in open polytunnels (Haygrove Ltd., UK) on table-top systems with automatic watering and nutrient delivery, using coconut coir as the growth medium.
Two independent datasets, obtained from different farms and growing seasons, were used as calibration and test sets. The in-field test measurements were conducted first, following prior laboratory development and validation of the measurement concept under controlled conditions. A new calibration dataset was subsequently acquired the following year using the same instrument configuration; earlier calibration data collected prior to the field trials were not used, as the instrument configuration had been finalised before the field campaign and those datasets were therefore not directly comparable.
At the time of calibration measurements, only cv. Favori was available, resulting in a calibration set restricted to a single cultivar, while the test set contained both cv. Favori and cv. Aurora. This design provided a fully external evaluation across both seasonal and cultivar differences. Previous studies employing similar sampling geometries have reported robustness across seasons and cultivars, 11 providing a rationale for exploring model generalisation under heterogeneous conditions in the present study. In addition, the focus of this work was not to optimise a deployment-ready calibration spanning multiple cultivars, farms, and seasons, but to examine how varying environmental conditions, particularly illumination and temperature, affect spectral acquisition quality and prediction performance when applying a fixed calibration model.
Calibration Set
Fresh strawberries (
The NIR spectroscopic measurements were performed in a laboratory under stable temperature conditions and weak ambient light (ceiling lights turned off and windows tinted). Measurements were taken on the same day as harvest, after the berries were brought to room temperature (21–22

(a) Illustration of on-the-plant non-contact measurement of a strawberry with the NIR interaction prototype. (b) Sampling geometry consisting of a projected illumination line and detected region (white dot).
Test Set and In-Field Measurement Runs
In week 35 (August) 2024, strawberries grown in Ås, Norway, were measured in situ in an open polytunnel while still attached to the plant. Berries (
Temperature and relative humidity inside the polytunnel were monitored using a data logger (EL-SIE-2+, Lascar Electronics, UK). Given the small size and high surface-area-to-volume ratio of strawberries, berry temperature was expected to broadly follow ambient air temperature during the measurement runs, with possible deviations due to sunlight exposure and delayed thermal response.
Weather on the measurement days was clear with variable cloud cover, no precipitation, and temperatures ranging from 14
Following the in-field measurements, berries were harvested, transported to the laboratory, and brought to room temperature (21–22
Chemical Reference Analysis of TSS
Reference measurements of TSS, expressed in
NIR Spectroscopic Measurements
The prototype NIR spectroscopic instrument, shown in Figure 1a, was designed for on-the-plant non-contact interaction measurements. Compact and lightweight, with a working distance of approximately 20 cm, it can be mounted on an outdoor mobile platform, such as a robotic arm. For the experiments presented here, it was mounted on a tripod, and manually positioned in front of each berry.
The sampling geometry is illustrated in Figure 1b. A halogen light source (20 W) with projection optics generated a 20 mm
Spectra were acquired using an OEM transmission grating spectrometer with a silicon CCD detector array, covering the wavelength range 532–1154 nm with an average sampling interval of 0.6 nm and spectral resolution (FWHM) of approximately 4.5 nm. A curved white barium plate was used as a reference sample.
For the calibration dataset acquired under minimal ambient light, each spectrum had an exposure time of 10 ms, with a total of 60 spectra per measurement sequence. For the test set, including in-field measurements, the exposure time was reduced to 0.5 ms to avoid saturation under strong ambient light. 2000 spectra were collected per measurement sequence, comprising approximately 1000 sample and 1000 ambient spectra, acquired using a light shutter in the source path. Even with a spectrometer with fast readout of approximately 5.5 ms, only about 1 s of the total 12 s measurement time corresponded to detector exposure, the remainder being spectrometer readout. Although longer exposure times would have been preferable under conditions with less ambient light, identical acquisition settings were applied across all test set measurements for comparison purposes.
Ambient-Correction Methods
A light shutter was used to sequentially block the illumination, enabling alternating acquisition of illuminated spectra containing both sample and ambient contributions, and ambient-only spectra when the lamp was blocked. This acquisition scheme, illustrated in Figure 2a, allowed continuous ambient-correction to isolate the sample spectrum. Operating at 6 Hz (six on–off shutter cycles per second) with equal open and closed intervals, the system corrected for ambient light fluctuations up to the Nyquist frequency of 3 Hz and captured equal numbers of sample and ambient spectra, thereby optimising the SNR under strong ambient light. Each measurement sequence comprised roughly 70 shutter cycles with 28 spectra captured per cycle (14 samples and 14 ambient). The shutter position was synchronised with spectrometer acquisition to ensure accurate identification of illuminated and ambient spectra, while spectra captured during shutter transitions were identified and discarded. Consequently, minor variations occurred in the number of spectra per measurement sequence.

Acquisition schemes based on alternating illuminated sample and ambient sequences: the acquired sequences (a), a simulated lower ambient light sampling frequency (one third of the original) (b), fewer spectra acquired at the original sampling frequency (c), and the same reduced number of spectra distributed over a longer measurement duration (simulated lower sampling frequency) (d). For illustration purposes, the numbers of spectra per cycle are reduced relative to those used in the study.
Three methods were tested for obtaining ambient-corrected spectra from the sequences of alternating illuminated and ambient measurements. These methods were evaluated based on their ability to compensate for strong ambient light fluctuations and were compared in terms of their resulting prediction performance under varying illumination conditions. Several filtering strategies for excluding unreliable spectra prior to the final averaging step were also investigated, including the removal of spectra acquired during abrupt ambient changes, spectra with intensity spikes relative to their neighbouring spectra, and spectra with deviating shapes. These filters were not retained, as they reduced prediction performance.
Simple Averaging
The simplest correction approach involved subtracting the average ambient spectrum from the average illuminated sample spectrum. Under ideal conditions, with complete data, stable acquisition timing, equal numbers of spectra per shutter cycle, and no detector saturation, this method is effectively equivalent to the interpolation-based correction. However, irregular sampling or missing data can produce non-uniform weighting of different acquisition times throughout the measurement sequence. When ambient light levels fluctuate, such non-uniform weighting can create an imbalance between the ambient contributions in the sample and ambient averages, leading to residual ambient interference in the corrected spectrum.
Interpolation-Based Correction
To account for acquisition imperfections, missing values, or detector saturation, ambient correction was performed per shutter cycle. For each cycle, the batch of contiguous illuminated spectra was averaged, yielding
Regression-Based Correction
As an alternative to linear interpolation between ambient sequences, each spectrum
Calibration Procedure
The NIR interactance spectra were derived from the ambient-corrected spectra by dividing the sample spectrum by the spectrum of the reference sample (a curved white barium plate). The resulting spectra were smoothed using a second order Savitzky–Golay filter with a window size of 11 points, 15 corresponding to an average window width of 7 nm, and then cropped to the wavelength range 750–1020 nm. This range was selected to include the water absorption band around 970 nm while excluding the region with poor silicon detector quantum efficiency and strong temperature dependence above 1020 nm. The visible region below 750 nm was also excluded, as colour has previously been shown to be a poor predictor of berry sweetness. 11 In addition, the weaker water absorption in this wavelength region (compared to longer wavelengths) allows sufficient signal intensity despite the increased path lengths associated with subsurface interaction measurements. While longer wavelengths cover a richer set of spectral features, their stronger water absorption would significantly reduce the detected signal in the interaction measurement. The selected region also enables the use of low-cost silicon detectors, whereas longer wavelengths require InGaAs detectors.
The smoothed and cropped spectra were transformed to absorbance spectra by computing the logarithm of the inverse interactance intensity (
Prediction models of TSS were built using partial least squares regression (PLSR), 17 based on the absorbance spectra and corresponding reference TSS values from the calibration dataset.
To reduce temperature dependencies in the calibration model, 60 additional measurements of samples at lower temperatures were included alongside the room-temperature measurements, as described by Segtnan et al. 18 No further temperature corrections were applied.
The optimal number of latent variables was determined by segmented cross-validation, using the root mean square error of cross-validation (
The prediction performance of the calibration model was evaluated using the independent test set measured under ideal laboratory conditions. The model was applied following the same spectral pre-processing as the calibration dataset. To separately evaluate systematic deviations and random error, the root mean square error of prediction (
All data processing and analysis were performed using Python version 3.9 (Python Software Foundation, USA), within the Anaconda3 distribution (Anaconda, USA). Partial least squares regression was implemented using the scikit-learn package (version 1.6.1).
In-Field Measurements
Metrics Describing Data Quality, Prediction Accuracy, and Interfering Factors
Sources of errors that affect data quality and limit prediction accuracy can be divided into two main categories: random noise, which is uncorrelated between neighbouring pixels and spectra, and systematic disturbances, which cause distortions that are correlated across neighbouring pixels and spectra. Examples of random noise sources include electronic noise in the detector and shot noise (photon noise), a fundamental and unavoidable noise source arising from the quantised nature of photons. Shot noise scales with the square root of the number of photoelectrons in the signal, originating from both the sample signal and the ambient light. Systematic disturbances, on the other hand, include spectral distortions caused by sources such as environmental interference (e.g., stray light and uncorrected ambient interference), signal drift, and sampling errors (e.g., sensor misalignment). For a detailed discussion of these error sources, see Tschudi et al. 20
Data Quality
Spectral noise is often estimated as the standard deviation of repeated measurements. 21 While this approach accurately reflects random, uncorrelated noise when systematic disturbances are negligible, its estimate is inflated in the presence of spectral drift caused by systematic disturbances, leading to an overestimation of the true SNR. 20
A more robust estimate of the SNR can be obtained from the spectral roughness, which quantifies the amount of high-frequency variation across a single spectrum (i.e., rapid pixel-to-pixel fluctuations), reflecting the effects of random, uncorrelated noise. Unlike the standard deviation approach, it does not require repeated measurements and is therefore simpler and more practical. Roughness was defined as the standard deviation of the residual spectrum
Systematic distortions affecting the overall spectral shape can be quantified by comparing each spectrum to a reference spectrum
Prediction Accuracy
To evaluate the impact of data quality on prediction accuracy, the lab–field prediction deviations, which quantify prediction accuracy after removing contributions from model calibration error, were used.
To separately evaluate the contributions of random, uncorrelated noise and fluctuating systematic disturbances to prediction accuracy, time-independent prediction noise and time-dependent prediction variation were quantified. Time-independent prediction noise reflects the influence of random, uncorrelated noise on the predictions, whereas time-dependent prediction variation captures changes that are correlated between spectra and evolve throughout the measurement sequence. This way, the impact of variable disturbances, such as fluctuating ambient light or minor mechanical disturbances (e.g., tripod vibrations), can be assessed to determine whether acquisition under changing conditions affects predictions. Constant systematic disturbances, such as fixed sensor misalignment, are not captured. Although ambient drift change the magnitude of random shot noise, this noise remains uncorrelated between spectra and therefore affects only the time-independent prediction noise.
Time-independent prediction noise was quantified by dividing each measurement sequence into approximately 14 groups, each containing spectra from five evenly distributed shutter cycles. For each group, spectra were averaged and used to calculate predictions, and the standard deviation of these predictions provided an estimate of time-independent prediction noise. Similarly, time-dependent prediction variation was estimated by forming groups of sequential spectra. The standard deviation of predictions from these sequential groups reflected all sources of variation, including both time-dependent and time-independent components. By subtracting the time-independent prediction noise from this value, an estimate of the time-dependent prediction variation was obtained.
Interfering Factors
Logged temperature and humidity during the measurement runs were analysed to determine whether the prediction model was robust to variations in sample temperature and environmental conditions, and whether such fluctuations directly or indirectly affected data quality or prediction accuracy.
The ambient intensity and the illuminated sample intensity (not ambient-corrected) were quantified for each measurement sequence using the average signals in the weakly absorbing 750–900 nm region of the respective spectra. The ambient-corrected sample intensity was estimated as the difference between these values.
The stability of the ambient signal was assessed using the ambient drift, quantified as the standard deviation of the difference in signal strength between consecutive ambient spectra.
Although the instrument’s pose (orientation, positioning, and focus) and minor mechanical disturbances could affect measurements, it was not explicitly quantified. Spectral normalisation and outlier detection and removal mitigated the impact of shape deviations and intensity variations in these experiments, although residual SNR variations may persist.
Sample-Related Errors
Factors such as inhomogeneity, granulometric effects, and surface variations could potentially influence predictions; however, these effects were assumed to be of limited importance, as the interaction geometry samples a larger subsurface volume than reflection geometries and is therefore less sensitive to local surface variations. 11 A quantitative assessment of these effects would be challenging within the scope of the present study, and they were therefore not further investigated.
Correlations Between the Parameters
To investigate whether noise, systematic disturbances, or interfering factors were associated with reduced prediction accuracy, correlations were analysed between the data quality metrics (spectral roughness and shape similarity), the prediction accuracy metrics (lab–field prediction deviation, time-independent prediction noise, and time-dependent prediction variation), and the interfering factors (sample intensity, ambient intensity, ambient drift, temperature, and humidity). This analysis aimed to clarify the relationships between measurement quality, environmental conditions, and prediction robustness under varying field conditions.
Outlier Detection and Removal
Outlier detection was performed using the data quality metrics defined in the previous section. Spectra with insufficient SNR were identified by their roughness (
The proposed metrics can be computed in real time for each spectrum without requiring reference values, allowing operators to be immediately alerted when a measurement is unreliable. The approach can also be integrated into robotic systems to automatically repeat measurements until acceptable quality is achieved.
In-Field Prediction Performance
To evaluate the performance of the NIR instrument under varying environmental conditions, the prediction model was applied to the four in-field measurement runs. Outlier spectra were identified and excluded prior to evaluating prediction performance, ensuring that only reliable measurements contributed to the reported parameters. Prediction performance was assessed in terms of the squared correlation coefficient (
The in-field predictive performance was evaluated using spectra of the highest available data quality, obtained by averaging all 1000 illuminated sample spectra, corresponding to approximately 0.5 s of sample exposure. Ambient-correction was performed using the approach with interpolation-based correction described in Ambient-Correction Methods.
The effects of measurement methodology on prediction performance under varying environmental conditions was further investigated through a series of acquisition strategies encompassing both instrument configuration and data processing approaches. These experiments aimed to identify trade-offs and inform future optimisation of the instrument and acquisition protocol for in-field measurements.
First, the different ambient-correction methods presented in the Ambient-Correction Methods section above were compared based on their resulting prediction performance across measurement runs, in order to optimise ambient light removal.
Next, the effects of sampling frequency was investigated. The ambient light sampling frequency (shutter speed) determines the maximum rate at which ambient fluctuations can be corrected (up to half the sampling frequency, according to the Nyquist limit). The original frequency of 6 Hz was compared with a simulated slower frequency of 2 Hz, obtained by removing every third sequence of spectra (alternating between illuminated and ambient). This effectively merged two consecutive illuminated sequences and two consecutive ambient sequences, thereby tripling the duration of a single shutter cycle, as illustrated in Figure 2b.
To evaluate the impact of spectral sampling frequency (measurement time per spectrum), the same total number of spectra (240) was acquired over two durations, 5 s and 10 s. For the 6 Hz condition, this corresponded to eight spectra per cycle with 30 shutter cycles (5 s) and four spectra per cycle with 60 shutter cycles (10 s). For the simulated 2 Hz condition, it corresponded to 24 spectra per cycle with 10 shutter cycles (5 s) and 12 spectra per cycle with 20 shutter cycles (10 s). These acquisition strategies are illustrated in Figures 2c–d). This design enabled assessment of how shutter speed and measurement duration independently affect prediction accuracy by altering exposure to ambient drift.
Finally, the measurement speed, defined by the number of spectra acquired and averaged to form the final spectrum, was varied to examine how measurement time and SNR influence prediction accuracy under different environmental conditions.
Results and Discussion
Concentration of TSS
The concentrations and distributions of TSS in the two sample sets are summarised in Table 1. Although both datasets were selected using similar criteria, ranging from ripe to very ripe, the test set exhibited a higher standard deviation and greater maximum TSS value, exceeding the range of the calibration set. This appears to result from a wider spread of TSS concentrations in cv. Aurora, which was only included in the test set, compared to cv. Favori, which was included in both datasets. The inclusion of a second calibration sample batch in August, in addition to the first batch in June, helped extend the calibration range, as the mean TSS levels decreased from June to August, consistent with previous findings. 11 This broader span was intended to support a more robust calibration.
Total soluble solids (TSS; g/100 g) in strawberries from the calibration and test datasets. SD = standard deviation.
Spectroscopic Measurements
Pre-processed (intensity-normalized) absorbance spectra of the calibration strawberries are shown in Figure 3a. The spectra are dominated by the OH overtone absorption by water around 970 nm. Differences related to varying sugar content are largely masked by variations in the overall spectral slope and contrast. Contrast differences can be removed by using SNV, after which apparent shifts of the water peak around 970 nm toward longer wavelengths in strawberries with higher TSS have been observed. 11 These shifts are attributed to the influence of sugar concentration on the water OH absorption band around 970 nm, which consists of an absorption peak at 960 nm that decreases with increasing sugar concentration and a shoulder at 984 nm that becomes more pronounced with increasing sugar content. 22 Although no contrast corrections were applied in the present study, these variations contributed negligibly to the regression vector, being captured mainly by the second PLS component, which accounted for 71 % of spectral variance but only 1.5 % of TSS variance.

(a) Normalised absorbance spectra of the strawberries in the calibration set, coloured according to their concentration of total soluble solids (TSS). (b) PLS regression coefficients for prediction of TSS concentration.
The resulting PLS regression coefficients for TSS prediction are shown in Figure 3b. These coefficients display similar main features as previous models of TSS in strawberries, 11 tomatoes, 23 and mangoes, 24 obtained after applying SNV, indicating that the final result is similar to using SNV explicitly. The OH bands of sugar at 960 nm and 984 nm are not strongly weighted in the model, possibly due to masking by water absorption or, as suggested by Wold et al., 11 their temperature dependence. The regression relies primarily on the region between 850 nm and 920 nm, capturing the sugar absorption peak at 910 nm.22,24
Calibration Results
The calibration for TSS based on the calibration set of cv. Favori required 6 PLS factors and resulted in the regression vector shown in Figure 3b, with an
The prediction performance of the model is quantified by the results obtained for the laboratory run of the test set, presented in Figure 4a. The

Predicted versus measured TSS in the test set of strawberries measured in field under varying environmental conditions: laboratory (a), night (b), morning with diffuse sunlight (c), midday with direct sunlight (d), and afternoon with direct sunlight (e). Diagonal black lines indicate target values. Outlier measurements are marked with red triangles (high roughness
The test set consisted of strawberries from a different season and farm than the calibration set, which may explain the small bias and supports the robustness of the model across seasons. Furthermore, the test set covered a wider span of TSS values than the calibration set. Although the predicted values in the range 12–16% TSS were based on extrapolation, the prediction results were not significantly higher in this region. That indicates that the model successfully captured spectral variations that correlate with TSS even beyond the calibration range. Despite the calibration set containing only cv. Favori, the prediction errors for cv. Favori and cv. Aurora in the test set were not significantly different, suggesting that the model is transferable between cultivars. As discussed in Wold et al., 11 this demonstrates the advantage of interaction sampling geometries, which, compared to reflection, enable deeper light penetration and a more representative sampling of the berry interior. In that study, an interaction geometry similar to the one used here was employed, and effective penetration depths of approximately 6–8 mm were reported, enabling sampling of a substantial fraction of the fruit flesh. This reduces sensitivity to surface effects and internal heterogeneities, providing a model that is applicable across seasons, farms, and cultivars.
Ambient Conditions
The ambient conditions, characterised by the average ambient signal strength, the ambient drift (defined as the standard deviation of the difference in signal strength between consecutive ambient spectra), the temperature, and the humidity, are described in Table II for each measurement run. At night, when negligible ambient light was present, the ambient and sample temperature decreased and humidity increased compared to the daytime conditions. In contrast, daytime measurements were conducted under strong ambient illumination. The average ambient signal strength during daytime was approximately 20 000 counts, around 25 times higher than the sample signal of about 800 counts. Although this corresponds to roughly one third of the detector’s full-well capacity, the ambient signal exhibited substantial drift with frequent fluctuations, necessitating short exposure times to prevent saturation during peaks in intensity. Morning conditions were dominated by diffuse and relatively stable sunlight, whereas midday measurements experienced the strongest direct illumination with strong drift caused by passing clouds. Afternoon measurements were mostly influenced by lower-angle sunlight and intermittent shadows from surrounding objects, resulting in more pronounced ambient drift relative to the average intensity than at midday.
Ambient conditions during the measurement runs, expressed as averages over all measurements acquired in each run.
In-Field Prediction Performance
The prediction results obtained under the various environmental conditions are shown in Figure 4. Night-time measurements produced performance comparable to laboratory conditions, indicating that neither low ambient and sample temperatures nor higher humidity noticeably affected the predictions. This demonstrates that night-time operation is a viable approach for achieving high-quality, on-the-plant measurements.
In contrast, daytime sunlit conditions reduced prediction performance, leading to higher
Effectiveness of Outlier Detection
The two data quality criteria (spectral roughness
Effects of Interfering Sources on Prediction Performance
Comparing the prediction performances across the different in-field measurement runs (Figure 4) provides insight into how the various sources of systematic disturbances affected the predictions.
The bias differences observed between runs may be attributed to differences in sample temperature driven by ambient temperature variations. The bias during night-time measurements (0.65% TSS) was lower than for the laboratory measurements (0.78% TSS), and it increased progressively from morning (0.95% TSS) to midday (1.18% TSS) and afternoon (1.45% TSS). Although ambient temperature peaked at midday, the highest sample temperatures occurred during the afternoon, likely due to cumulative heating from direct solar exposure. The resulting temperature effects on the prediction model were relatively modest, indicating that the prediction model is only weakly affected by temperature. This robustness likely stems from the inclusion of berries at varying temperatures in the calibration set. In addition, the spectral region above 920 nm, which is more sensitive to variations in sample temperature, was weighted lower in the regression vector, as was the temperature-dependent water absorption around 760 nm. Models employing SNV pre-processing (not shown) assigned greater weight to the region above 920 nm, which resulted in consistently higher biases across all conditions (1.33% TSS for night, 1.54% TSS for morning, 1.45% TSS for midday, and 1.53% TSS for afternoon). Bias differences could be further reduced by applying temperature-correction methods, for example using difference spectra. 18
The decrease in performance from night-time to morning, under conditions of relatively stable ambient light with limited drift, primarily illustrates the effects of increased ambient shot noise. The
The further decline in performance from morning to afternoon, despite comparable average ambient light intensity, underscores the additional impact of ambient drift under direct sunlight.
Despite the stronger ambient-light shot noise at midday, the midday run showed slightly better apparent prediction performance than the afternoon. Compared to the afternoon, which had five outliers, the midday conditions produced more outliers (12) with insufficient SNR. Excluding these low-quality spectra improved the apparent accuracy of the remaining data, which explains why the midday run had slightly better prediction performance (
Relationships Between Data Quality and Prediction Accuracy
Correlations of the data quality metrics (roughness and shape similarity) with the prediction accuracy metrics (lab–field prediction deviation, time-independent prediction noise, and time-dependent prediction variation) are presented in Table III.
Correlations of data quality metrics with prediction accuracy metrics for the four field measurement runs. *Clear outliers removed.
During the night-time, when ambient light was absent, data quality was consistently high and lab–field prediction deviations were low. As expected, no meaningful correlations between lab–field prediction deviations and data quality were observed. With increasing ambient light from morning to afternoon and reaching its maximum at midday, lab–field prediction deviation became moderately correlated with spectral roughness, highlighting the influence of ambient-light shot noise on prediction accuracy. The correlations remained moderate, reflecting the fact that spectra with low SNR may occasionally yield accurate predictions.
Spectral roughness showed strong positive correlations with time-independent prediction noise, confirming that insufficient SNR leads to unstable predictions. Moderate correlations between time-independent prediction noise and shape similarity likely reflect the influence of roughness on shape similarity.
Correlations between shape similarity and lab–field prediction deviation were less clear. Only a small number of spectra showed substantial shape distortions during the midday and afternoon conditions, yet these dominated the correlations. No such distortions occurred at night or in the morning, and correspondingly, correlations were negligible under those conditions. After removing spectra classified as outliers (
Time-dependent prediction variation fluctuated around zero and showed no correlation with spectral roughness, as expected for a metric representing random noise. Weak correlations with shape similarity during midday and afternoon conditions were again driven by the few spectra with distortions, indicating that parts of these distortions were unstable within each measurement sequence and likely arose from sporadic insufficient ambient-light correction during strong ambient drift. Any remaining distortions not captured by these weak correlations were stable throughout the measurement and were likely caused by stray light, imperfect sensor positioning, or other interfering disturbances.
Overall, the results indicate that shot noise was the primary factor limiting prediction accuracy, whereas occasional shape distortions were handled through outlier detection and removal.
Effects of Interfering Sources on Data Quality and Prediction Accuracy
Correlations of data quality metrics (roughness and shape similarity) and lab–field prediction deviations with measurement conditions (sample intensity, ambient intensity, and ambient drift) are presented in Table IV. Ambient drift is here defined as the standard deviation of the difference in signal strength between consecutive ambient spectra. No significant correlations were observed with temperature or humidity.
Correlations of data quality metrics and lab–field prediction deviations with measurement conditions for the four field measurement runs. *Clear outliers removed.
Spectral roughness showed strong negative correlations with sample intensity, which, together with ambient-light shot noise, determines the SNR. These correlations were stronger during night and morning, when ambient-light shot noise levels were low and stable. Moderate correlations of roughness with ambient intensity further reflect the contribution of ambient-light shot noise. This confirms that spectral roughness is a suitable measure of SNR. Weak correlations of roughness with ambient drift during midday and afternoon were likely caused by temporary intensity peaks, which increased the average ambient intensity and, consequently, the shot noise during periods of strong drift.
Moderate correlations between spectral shape similarity and sample intensity likely arose because shape similarity is partly influenced by roughness (i.e., insufficient SNR). These correlations, particularly during midday and afternoon, may also reflect the greater impact of spectral distortions on weaker signals. No correlations were observed with ambient intensity or ambient drift, indicating that ambient-light interference was not systematically increased by higher intensity or drift, and that the correction was generally effective. Only a small number of spectra showed distortions indicative of occasional insufficient ambient-light correction, which were successfully handled through outlier detection.
Lab–field prediction deviations showed moderate negative correlations with sample intensity during daytime conditions. Most spectra identified as outliers, those with high roughness or low shape similarity, resulted from insufficient sample signal, likely due to imperfect sensor positioning. This confirms that sample intensity is the primary factor governing prediction accuracy, as it directly influences SNR and moderates the impact of spectral distortions. Although ambient intensity increases shot noise, no notable correlations were observed with lab–field prediction deviation, either from ambient intensity or ambient drift, likely because variations in sample intensity dominated.
Overall, the ambient-correction method was effective for the current setup, and SNR, determined primarily by sample signal intensity and ambient-light shot noise, remained the main factor affecting data quality and prediction accuracy. To maximise data quality and minimise outliers, maintaining a strong sample signal through optimal sensor positioning is essential, and additional measures to reduce the ambient light, such as active shading, could be beneficial.
Effects of Measurement Configuration on Prediction Performance
Ambient-Correction Method
The prediction performances obtained using the different ambient-correction methods are shown in Figure 5.

Prediction performances for the three ambient-correction methods across the five measurement runs: simple averaging (blue circles), interpolation-based correction (orange squares), and linear regression-based correction (green triangles).
The simple averaging method performed well under stable sunlight conditions (laboratory, night, and morning) but failed to adequately correct spectra acquired under strong ambient drift during the midday and afternoon runs. This was likely due to non-uniform weighting of acquisition times caused by irregular sampling, likely introduced by the discarding of spectra recorded during shutter transitions. As a result, the ambient contributions in the sample and ambient averages became imbalanced.
In contrast, both the interpolation-based correction and the linear regression-based correction, which compensate for such sampling irregularities, effectively handled the strong ambient drift. Their performances were nearly identical, reflecting their shared ability to model linear trends in the ambient signal. Because of its simplicity and computational efficiency, the interpolation-based method was chosen for all subsequent analyses.
Non-linear extensions of the regression-based approach, such as polynomial or locally weighted models, may offer improved correction of non-linear ambient variations.
The various strategies tested for excluding unreliable corrected spectra (e.g., those acquired under strong ambient drift) prior to the final averaging step reduced the prediction performance and were therefore not applied. This confirms that, for the current setup, the ambient correction was effective, leaving SNR as the primary factor limiting prediction accuracy. Consequently, including a larger number of spectra, even if some were slightly disturbed, improved the overall prediction accuracy.
Sampling Frequency
Figure 6 shows the prediction performances for 240 averaged spectra acquired over shorter (5 s) and longer (10 s) total acquisition times, and with two ambient sampling frequencies (6 Hz and 2 Hz). Note that in this approach, only approximately one-fourth of the acquired spectra were used, so the reported prediction performances do not represent the full potential of the setup for the specified durations.

Prediction performance for 240 averaged spectra under different measurement durations and ambient sampling frequencies: 6 Hz over 5 s (blue circles); 6 Hz over 10 s (orange squares); 2 Hz over 5 s (green triangles); and 2 Hz over 10 s (red diamonds).
The highest prediction performance was achieved with the fastest ambient sampling frequency (6 Hz) and the shortest measurement duration (5 s). Doubling the measurement duration to 10 s had a larger negative impact on prediction performance than lowering the ambient sampling frequency to 2 Hz, particularly under midday and afternoon conditions with strong ambient drift. Night-time measurements were unaffected, and only minor effects occurred under stable morning conditions. This indicates that slow ambient drift, which accumulates over longer measurement times, degrades prediction performance more than rapid fluctuations above 1 Hz, which are corrected at 6 Hz but not at 2 Hz.
The apparent improvement in performance at 10 s for 2 Hz compared to 6 Hz during morning and midday runs was an artifact resulting from the exclusion of more outliers, reflecting reduced overall spectral quality rather than improved prediction accuracy. In the afternoon, where similar numbers of outliers were removed, the slower sampling frequency produced the expected poorer performance. Overall, the slowest ambient sampling frequency combined with the longest measurement duration resulted in the lowest performance.
While increasing the shutter frequency from 2 Hz to 6 Hz modestly improved performance, minimising total measurement duration was more critical, as it limits the cumulative impact of ambient drift. Short exposures under intense ambient light therefore require fast-readout spectrometers: in this study, the 0.5 ms exposures accounted for only 1 s of the total 12 s measurement time due to 5.5 ms readout delays, making slower-readout spectrometers generally inadequate for these conditions.
Measurement Time
Figure 7 shows prediction performance as a function of the measurement time, using the ambient sampling frequency of 6 Hz, corresponding to 0.167 s per shutter cycle. For each run, the prediction performance obtained from acquiring spectra over 10, 20, 30, 40, 50, 60, and 70 shutter cycles was compared.

Prediction performance as a function of measurement time for the laboratory (blue circles), night (orange squares), morning (green upward triangles), midday (red diamonds), and afternoon (purple downward triangles) runs.
Averaging more spectra increases the SNR, which reduces spectral roughness and lowers the number of outliers. Across the measurement runs, outlier levels scaled with ambient light intensity, confirming that shot noise was the dominant noise source. The number of outliers decreased approximately as
In practical applications, overall measurement speed depends not only on the acquisition time per spectrum but also on the number of outliers, since rejected spectra must be reacquired, adding to the total measurement time. Therefore, the optimal measurement time is the minimum required to maintain an acceptably low outlier rate.
Under laboratory and night-time conditions with minimal ambient light, prediction performance remained stable as the measurement time increased, and the outlier rate stabilised at approximately 3 s. This suggests that, in these conditions, a measurement time of 3 s provides sufficient SNR and data quality.
Under ambient light, longer measurement durations were required to reduce the outlier rate. While 3 s produced stable predictions, roughly one-third of the spectra were rejected. Extending the measurement time to 10 s enabled most spectra to meet the SNR threshold imposed by the outlier detection method, improving robustness and reducing the need for repeated measurements. Even under the most challenging midday conditions, 10 s measurements still yielded 12 % outliers. However, further reductions would require substantially longer acquisitions, which would not shorten the overall measurement time compared with simply remeasuring the discarded spectra. Thus, a measurement duration of approximately 10 s appears to be a practical optimum under these conditions.
Design Guidelines for In-Field Spectral Measurements
Based on the findings of this study, the following guidelines summarise best practices for achieving robust in-field spectral measurements.
The most critical factor during in-field daytime measurements is ambient light, and several measures should be implemented in the instrument design and sampling strategy to reduce its influence.
Because ambient light adds to the sample signal, acquisition should be split over many spectra using very short exposure times to avoid detector saturation under strong ambient illumination. In this study, an exposure time of 0.5 ms was chosen. Consequently, total acquisition time is typically dominated by spectrometer readout time, which typically adds a few milliseconds or more to the acquisition cycle. To reduce total measurement time, it is advantageous to select a spectrometer with faster readout or larger full-well capacity (allowing longer exposure before saturation), even if this comes at the expense of sensitivity. This contrasts with applications under weak ambient light, where higher spectrometer sensitivity is generally more important than readout speed.
Disturbances from ambient drift under sunlit conditions can be mitigated by rapidly sampling the ambient using a shutter and applying ambient correction to each shutter cycle. The required shutter frequency depends on the speed of ambient drift and the characteristics of the system; in this study, 6 Hz was sufficient. Disturbances from slow ambient drift, which accumulate over time, are reduced by keeping measurement durations short. Achieving this with very short exposure times requires fast spectrometer readout; in this study, an 0.5 ms exposure combined with a 5.5 ms readout gave an acceptable 8% duty cycle. With appropriate configurations, no significant uncorrected drift remain, and the primary factor limiting prediction performance is shot noise.
Shot noise represents a fundamental, unavoidable noise source, and its influence depends on the SNR. When shot-noise limited, the SNR of the ambient-corrected spectrum is approximated by
The proposed outlier detection procedure is effective for maintaining consistent prediction performance, even for low-quality data. Operating on each final averaged spectrum, it is computationally light and straightforward to implement. In practical systems, it can provide real-time feedback to the operator, flagging poor-quality spectra for repetition and suggesting adjustments to optimise measurement duration.
Although this study did not specifically address external disturbances such as motion, stray light, or reflections from nearby objects, significant effects from these sources are expected to manifest as spectral shape deviations, which are identified and removed through the outlier detection procedure.
This work focused on the instrument design, sampling strategy, and raw data processing rather than advanced multivariate modelling. Nevertheless, further improvements in prediction performance may be achieved through more sophisticated regression and calibration techniques.
With these guidelines implemented, the results of this study suggest that robust, high-quality, in-field spectral measurements are feasible using compact, practical instrumentation suitable for non-contact robotic operation. Best performance was achieved under night-time or low-ambient conditions, although measurements acquired under bright daytime conditions without actively shading also yielded acceptable results. However, stronger sunlight in locations further south than Norway may reduce achievable performance at peak daylight, potentially affecting the practical implementation of such measurements.
Conclusion
This study demonstrated a non-contact NIR interaction instrument with a robust calibration model for rapid determination of TSS in strawberries. The model provided consistent performance across seasons and cultivars and maintained predictive ability even beyond the calibration range of TSS, confirming the advantage of interaction geometries that enable representative sub-surface sampling. By implementing rapid ambient light sampling and correction, the instrument achieved sufficient performance for daytime, in-field measurements of polytunnel-grown on-the-plant strawberries, with a measurement time of approximately 12 s per berry.
The ambient-correction method proved essential for robust predictions, particularly when combined with the proposed outlier detection procedure to reject low-quality spectra. Shot noise from ambient light remained the main limiting factor for daytime performance, indicating that adequate signal quality can be achieved by increasing the measurement time when necessary.
With the proposed methods and guidelines for achieving robust measurement performance under challenging environmental conditions, this work encourages further in-field implementations of NIR spectroscopy for determining internal quality attributes of fruit and vegetables, supporting the advancement of precision agriculture and agricultural robotics.
Footnotes
Acknowledgments
We thank Associate Professor Siv Fagertun Remberg and Head Engineer Kari Grønnerød (Norwegian University of Life Sciences) for access to the strawberry polytunnel and for providing strawberries for the test set. We also thank Simen Myhrene (Ekeberg Myhrene Sylling AS) for supplying strawberries for the calibration set. We acknowledge Bastian Krohg (SINTEF) for assistance with calibration measurements, and Karen Wahlstrøm Sanden (Nofima) for conducting the reference measurements for the test set. We acknowledge the use of ChatGPT-5 for language editing.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Data availability
The datasets generated during the current study are available from the corresponding author on reasonable request.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical considerations
This article does not contain any studies with human or animal participants.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Research Council of Norway, through the project SFI Digital Food Quality (RCN no. 309259).
