Near-Infrared Interaction Spectroscopy Under Daylight Conditions to Assess Total Soluble Solids in On-the-Plant Strawberries

Abstract

Robust in-field sensing technologies are essential for advancing precision agriculture and autonomous field robotics toward analysing internal quality attributes of fruits and vegetables. This study demonstrated in-the-field, non-contact near-infrared (NIR) spectroscopy for determining total soluble solids (TSS), a measure of sugar content, in on-the-plant strawberries under daytime conditions. A compact NIR interaction instrument (750–1020 nm), designed for robotic operation, was built and tested in a polytunnel environment under varying day- and night-time conditions. The instrument was calibrated using a partial least squares regression (PLSR) model built on laboratory data collected in 2025 from 200 strawberries of a single variety. It was tested on 100 strawberries of two varieties that were measured in 2024, while still attached to the plant. During night-time operation, TSS was predicted with a standard error of prediction ( $SEP$ ) of 0.73 % TSS and a bias of 0.65 % TSS. Under challenging daytime conditions with strong and fluctuating ambient light, measurements were more affected by additional shot noise from the ambient light, resulting in $SEP$ s up to 1.35 % TSS and biases up to 1.45 % TSS, both of which are acceptable for most applications. The measurement time was 12 s. Robust performance was achieved by implementing rapid and continuous ambient light sampling and correction, combined with outlier rejection of spectra of insufficient quality. These findings confirm the feasibility of in-field, on-the-plant NIR spectroscopy for assessing internal fruit quality and provide practical design guidelines to support further in-field implementations of NIR spectroscopy.

Graphical abstract

This is a visual representation of the abstract.

Keywords

Near-infrared spectroscopy NIR spectroscopy non-contact interaction measurements on-the-plant measurements in-field daytime measurements strawberry total soluble solids instrument optimisation

Introduction

Precision agriculture is becoming increasingly important as the food production sector strives to improve efficiency, sustainability, and product quality. This transformation is accompanied by the growing demand for healthy, fresh, and sustainably produced fruits and vegetables, and consumer willingness to pay for consistent flavour and freshness.^1,2

Autonomous field robots are central to this transformation, enabling more automated and precise farming operations. Commercial examples include UV-based treatment of powdery mildew, eliminating the need for pesticides,³ and robotic weeding systems.⁴ While harvesting robots hold great potential for optimising farming operations, commercial deployment remains limited due to significant technical challenges.⁵ Their success partly depends on rapid, non-destructive in-field sensing technologies capable of providing information on crop status, ripeness, and quality. Such sensors are essential for resource optimisation through improved crop management, better labour management, and more precise harvest timing.

Thus, there is a growing need for novel sensing technologies that can assess the internal quality of fruits and vegetables. Conventional methods for analysing sensory qualities are destructive, limiting measurements to a few representative samples. For instance, total soluble solids (TSS), a measure of sugar content, is typically determined via refractometric analysis of fruit juice. Non-destructive alternatives include RGB cameras, which offer a low-cost way to evaluate colour, shape, and texture for predicting ripeness and some quality parameters.^6,7 However, these cameras probe only the surface and are insufficient for robust estimation of internal attributes such as sugar content, acidity, or dry matter. Similarly, hyperspectral cameras operating in reflection mode have limited penetration depth: they can detect surface features such as microbial infection and surface defects, but not sub-surface properties like internal bruising.⁸ While such systems can provide reliable estimates when surface characteristics are representative, for example, for TSS in grapes,⁹ they fail for heterogeneous fruits where surface appearance does not reflect internal quality.^10,11

Near-infrared spectroscopy (NIRS) offers a feasible alternative for rapid, non-destructive assessment of internal fruit quality. Using an interaction sampling geometry, where illumination and detection fields are spatially separated, NIRS can capture sub-surface information that is more representative of heterogeneous samples.¹² Moreover, NIRS enables fully non-contact operation, which is important to prevent disease spread and mechanical damage to the fruit.

Strawberry (Fragaria $\times$ ananassa) is a delicate, high-value fruit valued for its sweetness, flavour, and bright red colour. As a non-climacteric fruit, it undergoes minimal ripening after harvest, making optimal harvest timing essential for achieving desired quality. Harvest decisions are typically based on manual colour assessment, which does not always reflect internal composition. Additional chemical information, such as sugar and acid content, would enable more accurate harvest timing and improved quality control.¹³

Wold et al.¹¹ demonstrated that non-contact NIR interaction spectroscopy can robustly determine TSS in strawberries. Since TSS is heterogeneously distributed within the fruit and the surface is prone to weather-dependent variations, interaction measurements outperform reflection geometries, producing prediction models with improved inter-seasonal robustness. However, the inherently weak signals of the interaction geometry, combined with the subtle spectral variations associated with TSS, makes strawberries a particularly challenging application requiring instrumentation with high signal-to-noise ratio (SNR).¹¹

Although NIRS has been widely explored for post-harvest assessment of fruit and vegetable quality, few instruments combine non-contact and sub-surface interaction sensing,¹⁴ and few studies have demonstrated successful in-field implementation with fruits still attached to the plant.⁶ Achieving this capability is critical to fully capitalising on agricultural robots’ potential to provide non-invasive measurements and monitor fruit quality over time without harvesting. In-field operation, however, imposes strict requirements on the sensor, which must be compact, low-power, portable, and robust to challenging outdoor conditions such as strong and fluctuating ambient illumination, ambient temperature variations that affect sample temperatures, and complex measurement scenes. On-the-plant, non-contact configurations are particularly susceptible to interference from ambient light compared with contact measurements, where the ambient light is shielded, and optimal measurement setups, where direct ambient light can be blocked or shaded. Wold et al.¹¹ recently demonstrated in-field, non-contact determination of TSS in strawberries using an NIR interaction geometry. While that work confirmed the feasibility of field-based NIRS, it used a benchtop instrument that was unsuitable for on-the-plant or robotic operation.

In this work, we demonstrate non-contact measurements of TSS in on-the-plant, polytunnel-grown strawberries under daytime conditions using a compact prototype NIR instrument (750–1020 nm) that we designed and built for robotic operation. We systematically investigate how environmental factors interfere with spectral measurements and assess their impact on data quality and calibration performance under varying field conditions. Furthermore, we evaluate strategies to mitigate these interferences and enhance measurement robustness. Based on these findings, we propose practical design guidelines for developing robust outdoor spectral analysers, contributing to the translation of laboratory-based methods into real-world agricultural applications.

Materials and Methods

Datasets

The ever-bearing strawberry cultivars (cvs.) Favori and Aurora were used in the experiment. These cultivars are commercially important to Norway and well suited to table-top cultivation systems that enable efficient plant care, harvesting, and integration with robotic operations. Plants were grown outdoors in open polytunnels (Haygrove Ltd., UK) on table-top systems with automatic watering and nutrient delivery, using coconut coir as the growth medium.

Two independent datasets, obtained from different farms and growing seasons, were used as calibration and test sets. The in-field test measurements were conducted first, following prior laboratory development and validation of the measurement concept under controlled conditions. A new calibration dataset was subsequently acquired the following year using the same instrument configuration; earlier calibration data collected prior to the field trials were not used, as the instrument configuration had been finalised before the field campaign and those datasets were therefore not directly comparable.

At the time of calibration measurements, only cv. Favori was available, resulting in a calibration set restricted to a single cultivar, while the test set contained both cv. Favori and cv. Aurora. This design provided a fully external evaluation across both seasonal and cultivar differences. Previous studies employing similar sampling geometries have reported robustness across seasons and cultivars,¹¹ providing a rationale for exploring model generalisation under heterogeneous conditions in the present study. In addition, the focus of this work was not to optimise a deployment-ready calibration spanning multiple cultivars, farms, and seasons, but to examine how varying environmental conditions, particularly illumination and temperature, affect spectral acquisition quality and prediction performance when applying a fixed calibration model.

Calibration Set

Fresh strawberries ( $n_{c} = 200$ ) of cv. Favori, grown in Sylling, Norway, were harvested in two batches during the summer of 2025 for calibration measurements: 100 berries in week 26 (June) and 100 in week 33 (August). Berries were visually selected to span the maturity range from medium ripe (mostly red with a lighter tip) to very ripe (uniform dark red). The second batch was collected to ensure sufficient variability and a broad range of sugar content.

The NIR spectroscopic measurements were performed in a laboratory under stable temperature conditions and weak ambient light (ceiling lights turned off and windows tinted). Measurements were taken on the same day as harvest, after the berries were brought to room temperature (21–22 $\circ$ C). Both the front (sunlit) and back (shaded) sides were measured near the equator of each berry, as illustrated in Figure 1b. To reduce temperature dependencies in the calibration model, an additional 60 measurements (front side of 24 berries from the first batch and both sides of 18 berries from the second) were performed after cooling the berries to approximately 15 $\circ$ C, thereby introducing controlled temperature variation into the calibration dataset, which is expected to reduce the sensitivity of the calibration model to temperature variation. After the measurements, berries were refrigerated at 4 $\circ$ C overnight, and refractometric reference measurements of TSS were conducted the following day. The changes in TSS during overnight storage were assumed to be negligible, as strawberries are non-climacteric, and storage at low temperature is expected to strongly reduce respiration and associated biochemical changes.

Figure 1.

(a) Illustration of on-the-plant non-contact measurement of a strawberry with the NIR interaction prototype. (b) Sampling geometry consisting of a projected illumination line and detected region (white dot).

Test Set and In-Field Measurement Runs

In week 35 (August) 2024, strawberries grown in Ås, Norway, were measured in situ in an open polytunnel while still attached to the plant. Berries ( $n_{t} = 100$ ) of cv. Aurora ( $n = 80$ ) and cv. Favori ( $n = 20$ ) were selected to span the maturity range from medium ripe to very ripe along a single row of plants oriented westward toward the polytunnel wall. Four rounds of NIR spectroscopic measurements were performed over a 24-hour period to capture diurnal variation of environmental conditions. Ripeness changes during this interval were assumed negligible. Measurements followed the sequential order of berries along the row, and only the front (sunlit) side was measured to avoid relocating the berries, with the measurement spot near the equator of each berry. Otherwise, the same instrument and sampling geometry were used as for the calibration and laboratory measurements.

Temperature and relative humidity inside the polytunnel were monitored using a data logger (EL-SIE-2+, Lascar Electronics, UK). Given the small size and high surface-area-to-volume ratio of strawberries, berry temperature was expected to broadly follow ambient air temperature during the measurement runs, with possible deviations due to sunlight exposure and delayed thermal response.

Weather on the measurement days was clear with variable cloud cover, no precipitation, and temperatures ranging from 14 $\circ$ C at night to 26 $\circ$ C during daytime. Measurements were conducted at midday (11:56–14:42), afternoon (15:45–18:13), night (21:19–22:51), and the following morning (09:17–10:55), with sunset at 20:36, sunrise at 06:02, and peak solar altitude of 41 $\circ$ at 13:18. During midday and afternoon, the berries – which were all west-facing – were exposed to relatively direct sunlight from above and in front, slightly diffused by the white polytunnel wall and varying with drifting clouds. Morning sunlight from the east was indirect and stable, while at night, temperatures were lower and negligible ambient light was present.

Following the in-field measurements, berries were harvested, transported to the laboratory, and brought to room temperature (21–22 $\circ$ C). A final set of NIR measurements was performed the same day under controlled laboratory conditions with stable temperature and weak ambient light (ceiling lights turned off and windows tinted), using the same instrument and acquisition protocol as for the calibration set, measuring both the front (sunlit) and back (shaded) side. Following the procedure of the calibration set, berries were then refrigerated at 4 $\circ$ C overnight, and refractometric reference measurements of TSS were conducted the following day, under the same assumptions regarding negligible post-harvest changes in TSS.

Chemical Reference Analysis of TSS

Reference measurements of TSS, expressed in $\circ$ Brix (%, g/100 g), were obtained using a $\circ$ Brix-acidity meter (AR200, Reichert GmbH, Germany). Whole berries were homogenised and centrifuged at 2000 RPM for 10 minutes (Heraeus Multifuge 4KR, Kendro Laboratory Products GmbH, Langenselbold, Germany), and the resulting supernatant was measured with the meter. For each sample, two replicate TSS measurements were performed, and the mean value was used for subsequent analysis.

NIR Spectroscopic Measurements

The prototype NIR spectroscopic instrument, shown in Figure 1a, was designed for on-the-plant non-contact interaction measurements. Compact and lightweight, with a working distance of approximately 20 cm, it can be mounted on an outdoor mobile platform, such as a robotic arm. For the experiments presented here, it was mounted on a tripod, and manually positioned in front of each berry.

The sampling geometry is illustrated in Figure 1b. A halogen light source (20 W) with projection optics generated a 20 mm $\times$ 2 mm illuminated line on the sample. The collection optics had a field of view with a diameter of 3.5 mm, with the detection region positioned 15 mm below the line centre, collecting light that propagated from the illuminated region through the sample. This geometry was chosen to accommodate the smallest typical samples of the relevant cultivars, while maintaining sufficient spatial separation between illumination and detection to reduce sensitivity to stray surface reflections. In contrast to configurations using two illumination lines,¹¹ the single-line setup allows larger illumination–detection separations.

Spectra were acquired using an OEM transmission grating spectrometer with a silicon CCD detector array, covering the wavelength range 532–1154 nm with an average sampling interval of 0.6 nm and spectral resolution (FWHM) of approximately 4.5 nm. A curved white barium plate was used as a reference sample.

For the calibration dataset acquired under minimal ambient light, each spectrum had an exposure time of 10 ms, with a total of 60 spectra per measurement sequence. For the test set, including in-field measurements, the exposure time was reduced to 0.5 ms to avoid saturation under strong ambient light. 2000 spectra were collected per measurement sequence, comprising approximately 1000 sample and 1000 ambient spectra, acquired using a light shutter in the source path. Even with a spectrometer with fast readout of approximately 5.5 ms, only about 1 s of the total 12 s measurement time corresponded to detector exposure, the remainder being spectrometer readout. Although longer exposure times would have been preferable under conditions with less ambient light, identical acquisition settings were applied across all test set measurements for comparison purposes.

Ambient-Correction Methods

A light shutter was used to sequentially block the illumination, enabling alternating acquisition of illuminated spectra containing both sample and ambient contributions, and ambient-only spectra when the lamp was blocked. This acquisition scheme, illustrated in Figure 2a, allowed continuous ambient-correction to isolate the sample spectrum. Operating at 6 Hz (six on–off shutter cycles per second) with equal open and closed intervals, the system corrected for ambient light fluctuations up to the Nyquist frequency of 3 Hz and captured equal numbers of sample and ambient spectra, thereby optimising the SNR under strong ambient light. Each measurement sequence comprised roughly 70 shutter cycles with 28 spectra captured per cycle (14 samples and 14 ambient). The shutter position was synchronised with spectrometer acquisition to ensure accurate identification of illuminated and ambient spectra, while spectra captured during shutter transitions were identified and discarded. Consequently, minor variations occurred in the number of spectra per measurement sequence.

Figure 2.

Acquisition schemes based on alternating illuminated sample and ambient sequences: the acquired sequences (a), a simulated lower ambient light sampling frequency (one third of the original) (b), fewer spectra acquired at the original sampling frequency (c), and the same reduced number of spectra distributed over a longer measurement duration (simulated lower sampling frequency) (d). For illustration purposes, the numbers of spectra per cycle are reduced relative to those used in the study.

Three methods were tested for obtaining ambient-corrected spectra from the sequences of alternating illuminated and ambient measurements. These methods were evaluated based on their ability to compensate for strong ambient light fluctuations and were compared in terms of their resulting prediction performance under varying illumination conditions. Several filtering strategies for excluding unreliable spectra prior to the final averaging step were also investigated, including the removal of spectra acquired during abrupt ambient changes, spectra with intensity spikes relative to their neighbouring spectra, and spectra with deviating shapes. These filters were not retained, as they reduced prediction performance.

Simple Averaging

The simplest correction approach involved subtracting the average ambient spectrum from the average illuminated sample spectrum. Under ideal conditions, with complete data, stable acquisition timing, equal numbers of spectra per shutter cycle, and no detector saturation, this method is effectively equivalent to the interpolation-based correction. However, irregular sampling or missing data can produce non-uniform weighting of different acquisition times throughout the measurement sequence. When ambient light levels fluctuate, such non-uniform weighting can create an imbalance between the ambient contributions in the sample and ambient averages, leading to residual ambient interference in the corrected spectrum.

Interpolation-Based Correction

To account for acquisition imperfections, missing values, or detector saturation, ambient correction was performed per shutter cycle. For each cycle, the batch of contiguous illuminated spectra was averaged, yielding $x_{j}$ , and compared with the adjacent ambient batches $x_{j - 1 / 2}$ and $x_{j + 1 / 2}$ . All saturated spectra were excluded. The ambient-corrected spectrum $x_{corr, j}$ was obtained by subtracting the ambient contribution $x_{amb, j}$ ,

x_{corr, j} = x_{j} - x_{amb, j}

(1)

The ambient contribution was estimated via linear interpolation between the surrounding ambient sequences,

x_{amb, j} = x_{j - 1 / 2} + (t_{j} - t_{j - 1 / 2}) \frac{x_{j + 1 / 2} - x_{j - 1 / 2}}{t_{j + 1 / 2} - t_{j - 1 / 2}}

(2)

where

t_{j - 1 / 2}

t_{j}

, and

t_{j + 1 / 2}

denote the mean acquisition timestamps of the respective sequences. This interpolation compensates for non-uniform spacing in acquisition times caused by missing spectra or irregular sampling, ensuring that the ambient contribution is weighted consistently across the measurement sequence. In the absence of such irregularities, the method reduces to simple averaging. Finally, the corrected spectra from all shutter cycles were averaged to obtain the final ambient-corrected spectrum.

Regression-Based Correction

As an alternative to linear interpolation between ambient sequences, each spectrum $x_{j k}$ within an illuminated sequence $j$ was individually corrected by subtracting an estimated ambient contribution $x_{amb, j k}$ , obtained from a pixel-wise regression model fitted to all spectra of the adjacent ambient sequences. The regression model captures the temporal trend in the ambient spectra, such as gradual increases or decreases in illumination. In this work, a linear regression function was used, which corrects linear variations in the ambient light and thus behaves similarly as the interpolation-based method. However, non-linear regression functions, such as polynomial or locally weighted models, could further improve correction under non-linear ambient variations.

Calibration Procedure

The NIR interactance spectra were derived from the ambient-corrected spectra by dividing the sample spectrum by the spectrum of the reference sample (a curved white barium plate). The resulting spectra were smoothed using a second order Savitzky–Golay filter with a window size of 11 points,¹⁵ corresponding to an average window width of 7 nm, and then cropped to the wavelength range 750–1020 nm. This range was selected to include the water absorption band around 970 nm while excluding the region with poor silicon detector quantum efficiency and strong temperature dependence above 1020 nm. The visible region below 750 nm was also excluded, as colour has previously been shown to be a poor predictor of berry sweetness.¹¹ In addition, the weaker water absorption in this wavelength region (compared to longer wavelengths) allows sufficient signal intensity despite the increased path lengths associated with subsurface interaction measurements. While longer wavelengths cover a richer set of spectral features, their stronger water absorption would significantly reduce the detected signal in the interaction measurement. The selected region also enables the use of low-cost silicon detectors, whereas longer wavelengths require InGaAs detectors.

The smoothed and cropped spectra were transformed to absorbance spectra by computing the logarithm of the inverse interactance intensity ( $\log_{10} (1 / I)$ ). To reduce intensity variations caused by differences in berry size, instrument positioning, or distance to the sample, the absorbance spectra were intensity-normalised by subtracting the mean absorbance value. Alternative pre-processing methods were evaluated, including standard normal variate (SNV) transformation, i.e., subtracting the mean and dividing by the standard deviation of each spectrum,¹⁶ or evaluating first and second derivatives. However, these approaches did not improve calibration or prediction performance and were therefore not used in the final model.

Prediction models of TSS were built using partial least squares regression (PLSR),¹⁷ based on the absorbance spectra and corresponding reference TSS values from the calibration dataset.

To reduce temperature dependencies in the calibration model, 60 additional measurements of samples at lower temperatures were included alongside the room-temperature measurements, as described by Segtnan et al.¹⁸ No further temperature corrections were applied.

The optimal number of latent variables was determined by segmented cross-validation, using the root mean square error of cross-validation ( $RMSECV$ ), defined as

RMSE = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} ({\hat{y}}_{n} - y_{n})^{2}}

(3)

where

N

is the total number of samples,

{\hat{y}}_{n}

is the predicted value,

y_{n}

is the measured reference value and

n

denotes the samples from

1

N

. For each cross-validation fold, measurements from twenty berries were randomly excluded, while ensuring that all measurements from the same berry (front side, back side, and at all temperatures) were kept within the same segment to prevent over-fitting. The final model was then rebuilt on the full calibration set using the selected number of latent variables.

The prediction performance of the calibration model was evaluated using the independent test set measured under ideal laboratory conditions. The model was applied following the same spectral pre-processing as the calibration dataset. To separately evaluate systematic deviations and random error, the root mean square error of prediction ( $RMSEP$ ), defined in Eq. 3, was decomposed into bias and standard error of prediction ( $SEP$ ) using the relation

{RMSEP}^{2} = {SEP}^{2} + {bias}^{2}

(4)

Here bias is defined as the mean difference between the predicted values

{\hat{y}}_{n}

and the reference values

y_{n}

bias = \frac{1}{N} \sum_{n = 1}^{N} ({\hat{y}}_{n} - y_{n})

(5)

and

SEP

represents

RMSEP

after bias correction.¹⁹ In addition, the squared correlation coefficient (

R^{2}

) between predicted and reference TSS values was computed after correcting for the bias.

All data processing and analysis were performed using Python version 3.9 (Python Software Foundation, USA), within the Anaconda3 distribution (Anaconda, USA). Partial least squares regression was implemented using the scikit-learn package (version 1.6.1).

In-Field Measurements

Metrics Describing Data Quality, Prediction Accuracy, and Interfering Factors

Sources of errors that affect data quality and limit prediction accuracy can be divided into two main categories: random noise, which is uncorrelated between neighbouring pixels and spectra, and systematic disturbances, which cause distortions that are correlated across neighbouring pixels and spectra. Examples of random noise sources include electronic noise in the detector and shot noise (photon noise), a fundamental and unavoidable noise source arising from the quantised nature of photons. Shot noise scales with the square root of the number of photoelectrons in the signal, originating from both the sample signal and the ambient light. Systematic disturbances, on the other hand, include spectral distortions caused by sources such as environmental interference (e.g., stray light and uncorrected ambient interference), signal drift, and sampling errors (e.g., sensor misalignment). For a detailed discussion of these error sources, see Tschudi et al.²⁰

Data Quality

Spectral noise is often estimated as the standard deviation of repeated measurements.²¹ While this approach accurately reflects random, uncorrelated noise when systematic disturbances are negligible, its estimate is inflated in the presence of spectral drift caused by systematic disturbances, leading to an overestimation of the true SNR.²⁰

A more robust estimate of the SNR can be obtained from the spectral roughness, which quantifies the amount of high-frequency variation across a single spectrum (i.e., rapid pixel-to-pixel fluctuations), reflecting the effects of random, uncorrelated noise. Unlike the standard deviation approach, it does not require repeated measurements and is therefore simpler and more practical. Roughness was defined as the standard deviation of the residual spectrum $x_{j, res}$ , given by

σ_{j, res} = std (x_{j, res}) = std (x_{j} - x_{j, smooth})

(6)

where

x_{j, smooth}

is a smoothed version of the pre-processed spectrum

x_{j}

, obtained using a second order Savitzky–Golay filter with a window size of 11.

σ_{j, res}

provides a practical estimate of the average SNR across the spectrum, reflecting only random, uncorrelated noise and remaining unaffected by drift or spectral distortions from systematic disturbances.

Systematic distortions affecting the overall spectral shape can be quantified by comparing each spectrum to a reference spectrum $x_{ref}$ , representing the expected spectral shape in absence of distortions. The average pre-processed spectrum from the calibration set was used as the reference, and the shape similarity to a pre-processed spectrum $x_{j}$ was quantified for the 750–1020 nm spectral region using the Pearson correlation coefficient, defined as

ρ_{j, ref} = \frac{cov (x_{j}, x_{ref})}{σ_{j} \cdot σ_{ref}}

(7)

where

cov (x_{j}, x_{ref})

is the covariance between the spectra, and

σ_{j}

and

σ_{ref}

are their standard deviations. Because spectral differences between strawberries of varying sweetness are small and do not substantially affect the spectral shape, this metric effectively highlights distortions caused by systematic disturbances, such as stray light or sensor misalignment. Additionally, as

ρ_{j, ref}

is reduced by high roughness, the parameter also captures some random, uncorrelated noise and is therefore not fully independent of

σ_{j, res}

Prediction Accuracy

To evaluate the impact of data quality on prediction accuracy, the lab–field prediction deviations, which quantify prediction accuracy after removing contributions from model calibration error, were used.

To separately evaluate the contributions of random, uncorrelated noise and fluctuating systematic disturbances to prediction accuracy, time-independent prediction noise and time-dependent prediction variation were quantified. Time-independent prediction noise reflects the influence of random, uncorrelated noise on the predictions, whereas time-dependent prediction variation captures changes that are correlated between spectra and evolve throughout the measurement sequence. This way, the impact of variable disturbances, such as fluctuating ambient light or minor mechanical disturbances (e.g., tripod vibrations), can be assessed to determine whether acquisition under changing conditions affects predictions. Constant systematic disturbances, such as fixed sensor misalignment, are not captured. Although ambient drift change the magnitude of random shot noise, this noise remains uncorrelated between spectra and therefore affects only the time-independent prediction noise.

Time-independent prediction noise was quantified by dividing each measurement sequence into approximately 14 groups, each containing spectra from five evenly distributed shutter cycles. For each group, spectra were averaged and used to calculate predictions, and the standard deviation of these predictions provided an estimate of time-independent prediction noise. Similarly, time-dependent prediction variation was estimated by forming groups of sequential spectra. The standard deviation of predictions from these sequential groups reflected all sources of variation, including both time-dependent and time-independent components. By subtracting the time-independent prediction noise from this value, an estimate of the time-dependent prediction variation was obtained.

Interfering Factors

Logged temperature and humidity during the measurement runs were analysed to determine whether the prediction model was robust to variations in sample temperature and environmental conditions, and whether such fluctuations directly or indirectly affected data quality or prediction accuracy.

The ambient intensity and the illuminated sample intensity (not ambient-corrected) were quantified for each measurement sequence using the average signals in the weakly absorbing 750–900 nm region of the respective spectra. The ambient-corrected sample intensity was estimated as the difference between these values.

The stability of the ambient signal was assessed using the ambient drift, quantified as the standard deviation of the difference in signal strength between consecutive ambient spectra.

Although the instrument’s pose (orientation, positioning, and focus) and minor mechanical disturbances could affect measurements, it was not explicitly quantified. Spectral normalisation and outlier detection and removal mitigated the impact of shape deviations and intensity variations in these experiments, although residual SNR variations may persist.

Sample-Related Errors

Factors such as inhomogeneity, granulometric effects, and surface variations could potentially influence predictions; however, these effects were assumed to be of limited importance, as the interaction geometry samples a larger subsurface volume than reflection geometries and is therefore less sensitive to local surface variations.¹¹ A quantitative assessment of these effects would be challenging within the scope of the present study, and they were therefore not further investigated.

Correlations Between the Parameters

To investigate whether noise, systematic disturbances, or interfering factors were associated with reduced prediction accuracy, correlations were analysed between the data quality metrics (spectral roughness and shape similarity), the prediction accuracy metrics (lab–field prediction deviation, time-independent prediction noise, and time-dependent prediction variation), and the interfering factors (sample intensity, ambient intensity, ambient drift, temperature, and humidity). This analysis aimed to clarify the relationships between measurement quality, environmental conditions, and prediction robustness under varying field conditions.

Outlier Detection and Removal

Outlier detection was performed using the data quality metrics defined in the previous section. Spectra with insufficient SNR were identified by their roughness ( $σ_{j, res}$ ) and excluded when $σ_{j, res} > 0.003$ . Spectra exhibiting shape distortions were identified based on their shape similarity to the reference strawberry spectrum ( $ρ_{j, ref}$ ) and classified as outliers when $ρ_{j, ref} < 0.999$ . These thresholds were suitable for the spectral sampling density and variation in this study, although other datasets or instruments may require adjustment. As the shape similarity may also reflect spectral roughness, some spectra were classified as outliers by both methods.

The proposed metrics can be computed in real time for each spectrum without requiring reference values, allowing operators to be immediately alerted when a measurement is unreliable. The approach can also be integrated into robotic systems to automatically repeat measurements until acceptable quality is achieved.

In-Field Prediction Performance

To evaluate the performance of the NIR instrument under varying environmental conditions, the prediction model was applied to the four in-field measurement runs. Outlier spectra were identified and excluded prior to evaluating prediction performance, ensuring that only reliable measurements contributed to the reported parameters. Prediction performance was assessed in terms of the squared correlation coefficient ( $R^{2}$ ) between predicted and reference TSS values after bias correction, the $SEP$ , the bias, and the number of excluded outliers.

The in-field predictive performance was evaluated using spectra of the highest available data quality, obtained by averaging all 1000 illuminated sample spectra, corresponding to approximately 0.5 s of sample exposure. Ambient-correction was performed using the approach with interpolation-based correction described in Ambient-Correction Methods.

The effects of measurement methodology on prediction performance under varying environmental conditions was further investigated through a series of acquisition strategies encompassing both instrument configuration and data processing approaches. These experiments aimed to identify trade-offs and inform future optimisation of the instrument and acquisition protocol for in-field measurements.

First, the different ambient-correction methods presented in the Ambient-Correction Methods section above were compared based on their resulting prediction performance across measurement runs, in order to optimise ambient light removal.

Next, the effects of sampling frequency was investigated. The ambient light sampling frequency (shutter speed) determines the maximum rate at which ambient fluctuations can be corrected (up to half the sampling frequency, according to the Nyquist limit). The original frequency of 6 Hz was compared with a simulated slower frequency of 2 Hz, obtained by removing every third sequence of spectra (alternating between illuminated and ambient). This effectively merged two consecutive illuminated sequences and two consecutive ambient sequences, thereby tripling the duration of a single shutter cycle, as illustrated in Figure 2b.

To evaluate the impact of spectral sampling frequency (measurement time per spectrum), the same total number of spectra (240) was acquired over two durations, 5 s and 10 s. For the 6 Hz condition, this corresponded to eight spectra per cycle with 30 shutter cycles (5 s) and four spectra per cycle with 60 shutter cycles (10 s). For the simulated 2 Hz condition, it corresponded to 24 spectra per cycle with 10 shutter cycles (5 s) and 12 spectra per cycle with 20 shutter cycles (10 s). These acquisition strategies are illustrated in Figures 2c–d). This design enabled assessment of how shutter speed and measurement duration independently affect prediction accuracy by altering exposure to ambient drift.

Finally, the measurement speed, defined by the number of spectra acquired and averaged to form the final spectrum, was varied to examine how measurement time and SNR influence prediction accuracy under different environmental conditions.

Results and Discussion

Concentration of TSS

The concentrations and distributions of TSS in the two sample sets are summarised in Table 1. Although both datasets were selected using similar criteria, ranging from ripe to very ripe, the test set exhibited a higher standard deviation and greater maximum TSS value, exceeding the range of the calibration set. This appears to result from a wider spread of TSS concentrations in cv. Aurora, which was only included in the test set, compared to cv. Favori, which was included in both datasets. The inclusion of a second calibration sample batch in August, in addition to the first batch in June, helped extend the calibration range, as the mean TSS levels decreased from June to August, consistent with previous findings.¹¹ This broader span was intended to support a more robust calibration.

Table 1.

Total soluble solids (TSS; g/100 g) in strawberries from the calibration and test datasets. SD = standard deviation.

Sample set	Batch	n	Mean	Min	Max	SD
Calib.	Combined	200	9.0	3.3	12.0	1.0
	June 2025, cv. Favori	100	9.4	5.0	12.0	0.8
	Aug. 2025, cv. Favori	100	8.6	3.2	11.2	1.1
Test	Combined	100	8.1	3.3	16.2	2.2
	Aug. 2024, cv. Aurora	80	8.0	3.3	16.2	2.4
	Aug. 2024, cv. Favori	20	8.5	6.2	10.4	1.0

Spectroscopic Measurements

Pre-processed (intensity-normalized) absorbance spectra of the calibration strawberries are shown in Figure 3a. The spectra are dominated by the OH overtone absorption by water around 970 nm. Differences related to varying sugar content are largely masked by variations in the overall spectral slope and contrast. Contrast differences can be removed by using SNV, after which apparent shifts of the water peak around 970 nm toward longer wavelengths in strawberries with higher TSS have been observed.¹¹ These shifts are attributed to the influence of sugar concentration on the water OH absorption band around 970 nm, which consists of an absorption peak at 960 nm that decreases with increasing sugar concentration and a shoulder at 984 nm that becomes more pronounced with increasing sugar content.²² Although no contrast corrections were applied in the present study, these variations contributed negligibly to the regression vector, being captured mainly by the second PLS component, which accounted for 71 % of spectral variance but only 1.5 % of TSS variance.

Figure 3.

(a) Normalised absorbance spectra of the strawberries in the calibration set, coloured according to their concentration of total soluble solids (TSS). (b) PLS regression coefficients for prediction of TSS concentration.

The resulting PLS regression coefficients for TSS prediction are shown in Figure 3b. These coefficients display similar main features as previous models of TSS in strawberries,¹¹ tomatoes,²³ and mangoes,²⁴ obtained after applying SNV, indicating that the final result is similar to using SNV explicitly. The OH bands of sugar at 960 nm and 984 nm are not strongly weighted in the model, possibly due to masking by water absorption or, as suggested by Wold et al.,¹¹ their temperature dependence. The regression relies primarily on the region between 850 nm and 920 nm, capturing the sugar absorption peak at 910 nm.^22,24

Calibration Results

The calibration for TSS based on the calibration set of cv. Favori required 6 PLS factors and resulted in the regression vector shown in Figure 3b, with an $RMSECV$ of 0.47 % TSS and an $R^{2}$ of 0.78. The results are comparable with previous calibration models of TSS in strawberries using a benchtop instrument with similar sampling geometry, 9 PLS factors and SNV pre-processing.¹¹

The prediction performance of the model is quantified by the results obtained for the laboratory run of the test set, presented in Figure 4a. The $SEP$ was only slightly higher than the $RMSECV$ obtained during calibration, and the bias was relatively small compared to the range of TSS values in the dataset. The $R^{2}$ was higher than for the calibration set, which is most likely due to a larger span of TSS in the test set.

Figure 4.

Predicted versus measured TSS in the test set of strawberries measured in field under varying environmental conditions: laboratory (a), night (b), morning with diffuse sunlight (c), midday with direct sunlight (d), and afternoon with direct sunlight (e). Diagonal black lines indicate target values. Outlier measurements are marked with red triangles (high roughness $σ_{j, res}$ ) and orange squares (low shape similarity to the reference spectrum $ρ_{j, ref}$ ) and were excluded when estimating performance parameters.

The test set consisted of strawberries from a different season and farm than the calibration set, which may explain the small bias and supports the robustness of the model across seasons. Furthermore, the test set covered a wider span of TSS values than the calibration set. Although the predicted values in the range 12–16% TSS were based on extrapolation, the prediction results were not significantly higher in this region. That indicates that the model successfully captured spectral variations that correlate with TSS even beyond the calibration range. Despite the calibration set containing only cv. Favori, the prediction errors for cv. Favori and cv. Aurora in the test set were not significantly different, suggesting that the model is transferable between cultivars. As discussed in Wold et al.,¹¹ this demonstrates the advantage of interaction sampling geometries, which, compared to reflection, enable deeper light penetration and a more representative sampling of the berry interior. In that study, an interaction geometry similar to the one used here was employed, and effective penetration depths of approximately 6–8 mm were reported, enabling sampling of a substantial fraction of the fruit flesh. This reduces sensitivity to surface effects and internal heterogeneities, providing a model that is applicable across seasons, farms, and cultivars.

Ambient Conditions

The ambient conditions, characterised by the average ambient signal strength, the ambient drift (defined as the standard deviation of the difference in signal strength between consecutive ambient spectra), the temperature, and the humidity, are described in Table II for each measurement run. At night, when negligible ambient light was present, the ambient and sample temperature decreased and humidity increased compared to the daytime conditions. In contrast, daytime measurements were conducted under strong ambient illumination. The average ambient signal strength during daytime was approximately 20 000 counts, around 25 times higher than the sample signal of about 800 counts. Although this corresponds to roughly one third of the detector’s full-well capacity, the ambient signal exhibited substantial drift with frequent fluctuations, necessitating short exposure times to prevent saturation during peaks in intensity. Morning conditions were dominated by diffuse and relatively stable sunlight, whereas midday measurements experienced the strongest direct illumination with strong drift caused by passing clouds. Afternoon measurements were mostly influenced by lower-angle sunlight and intermittent shadows from surrounding objects, resulting in more pronounced ambient drift relative to the average intensity than at midday.

Table II.

Ambient conditions during the measurement runs, expressed as averages over all measurements acquired in each run.

Parameter	Night	Morning	Midday	Afternoon
Ambient signal (counts)	7	16 832	22 205	15 531
Ambient drift (counts)	2.1	15.6	33.5	32.5
Temperature ( $\circ$ C)	16.8	22.6	24.5	23.8
Relative humidity (%)	71.3	51.2	53.3	53.7

In-Field Prediction Performance

The prediction results obtained under the various environmental conditions are shown in Figure 4. Night-time measurements produced performance comparable to laboratory conditions, indicating that neither low ambient and sample temperatures nor higher humidity noticeably affected the predictions. This demonstrates that night-time operation is a viable approach for achieving high-quality, on-the-plant measurements.

In contrast, daytime sunlit conditions reduced prediction performance, leading to higher $SEP$ and bias, lower $R^{2}$ values, and an increased number of rejected spectra. Nevertheless, even under rapidly changing midday conditions, prediction errors remained below 1.4 % TSS, corresponding to roughly one sixth of the typical TSS range in ripe strawberries. This accuracy is sufficient for practical applications such as sweetness-based grading. Overall, these results demonstrate that, with the present instrument configuration, in-field, on-the-plant TSS measurements are feasible across seasons and under typical sunlit summer conditions in Norway, without additional shading beyond the thin polytunnel wall.

Effectiveness of Outlier Detection

The two data quality criteria (spectral roughness $σ_{j, res}$ and shape similarity $ρ_{j, ref}$ ) proved effective for identifying unreliable spectra. As shown in Figure 4, the method removed measurements associated with the largest prediction errors while retaining nearly all spectra that produced reliable results. Only a small number of measurements with low errors were flagged; these predictions likely appeared accurate by chance despite originating from low-quality spectra. These results confirm that the outlier detection method reliably removes spectra that would otherwise degrade prediction performance, without discarding an excessive amount of useful data.

Effects of Interfering Sources on Prediction Performance

Comparing the prediction performances across the different in-field measurement runs (Figure 4) provides insight into how the various sources of systematic disturbances affected the predictions.

The bias differences observed between runs may be attributed to differences in sample temperature driven by ambient temperature variations. The bias during night-time measurements (0.65% TSS) was lower than for the laboratory measurements (0.78% TSS), and it increased progressively from morning (0.95% TSS) to midday (1.18% TSS) and afternoon (1.45% TSS). Although ambient temperature peaked at midday, the highest sample temperatures occurred during the afternoon, likely due to cumulative heating from direct solar exposure. The resulting temperature effects on the prediction model were relatively modest, indicating that the prediction model is only weakly affected by temperature. This robustness likely stems from the inclusion of berries at varying temperatures in the calibration set. In addition, the spectral region above 920 nm, which is more sensitive to variations in sample temperature, was weighted lower in the regression vector, as was the temperature-dependent water absorption around 760 nm. Models employing SNV pre-processing (not shown) assigned greater weight to the region above 920 nm, which resulted in consistently higher biases across all conditions (1.33% TSS for night, 1.54% TSS for morning, 1.45% TSS for midday, and 1.53% TSS for afternoon). Bias differences could be further reduced by applying temperature-correction methods, for example using difference spectra.¹⁸

The decrease in performance from night-time to morning, under conditions of relatively stable ambient light with limited drift, primarily illustrates the effects of increased ambient shot noise. The $R^{2}$ decreased from 0.87 to 0.74, while the $SEP$ increased from 0.73% TSS to 1.12% TSS. These results highlight that ambient-light shot noise fundamentally limits measurement precision.

The further decline in performance from morning to afternoon, despite comparable average ambient light intensity, underscores the additional impact of ambient drift under direct sunlight. $R^{2}$ decreased from 0.74 to 0.60, and $SEP$ increased from 1.12% TSS to 1.35% TSS. Most of this degradation likely results from transient peaks in ambient intensity, and correspondingly in the shot noise levels, as most ambient disturbances were removed by the ambient correction.

Despite the stronger ambient-light shot noise at midday, the midday run showed slightly better apparent prediction performance than the afternoon. Compared to the afternoon, which had five outliers, the midday conditions produced more outliers (12) with insufficient SNR. Excluding these low-quality spectra improved the apparent accuracy of the remaining data, which explains why the midday run had slightly better prediction performance ( $R^{2} = 0.69$ , $SEP = 1.23$ % TSS) than the afternoon ( $R^{2} = 0.60$ , $SEP = 1.35$ % TSS). This demonstrates that outlier detection and removal can be used to maintain robust prediction performance under challenging measurement conditions.

Relationships Between Data Quality and Prediction Accuracy

Correlations of the data quality metrics (roughness and shape similarity) with the prediction accuracy metrics (lab–field prediction deviation, time-independent prediction noise, and time-dependent prediction variation) are presented in Table III.

Table III.

Correlations of data quality metrics with prediction accuracy metrics for the four field measurement runs. *Clear outliers removed.

Parameter	Night	Morning	Midday	Afternoon
(i) Correlation with lab–field prediction deviation
Roughness $σ_{j, res}$	0.08	0.28*	0.47	0.31
Shape similarity $ρ_{j, ref}$	−0.08	−0.09*	−0.48	−0.45
(ii) Correlation with time-independent prediction noise
Roughness $σ_{j, res}$	0.71	0.82	0.78*	0.79*
Shape similarity $ρ_{j, ref}$	−0.37	−0.44	−0.38*	−0.37
(iii) Correlation with time-dependent prediction variation
Roughness $σ_{j, res}$	−0.20	−0.18	−0.15	−0.18
Shape similarity $ρ_{j, ref}$	0.11	0.04	0.26	0.34

During the night-time, when ambient light was absent, data quality was consistently high and lab–field prediction deviations were low. As expected, no meaningful correlations between lab–field prediction deviations and data quality were observed. With increasing ambient light from morning to afternoon and reaching its maximum at midday, lab–field prediction deviation became moderately correlated with spectral roughness, highlighting the influence of ambient-light shot noise on prediction accuracy. The correlations remained moderate, reflecting the fact that spectra with low SNR may occasionally yield accurate predictions.

Spectral roughness showed strong positive correlations with time-independent prediction noise, confirming that insufficient SNR leads to unstable predictions. Moderate correlations between time-independent prediction noise and shape similarity likely reflect the influence of roughness on shape similarity.

Correlations between shape similarity and lab–field prediction deviation were less clear. Only a small number of spectra showed substantial shape distortions during the midday and afternoon conditions, yet these dominated the correlations. No such distortions occurred at night or in the morning, and correspondingly, correlations were negligible under those conditions. After removing spectra classified as outliers ( $ρ_{j, ref} < 0.999$ ), correlations for midday and afternoon also became negligible. This indicates that most spectra were not distorted, but the few distorted spectra produced disproportionately large lab–field prediction deviations and were appropriately excluded by the outlier detection. Midday and afternoon conditions were more prone to such distortions, likely due to ambient-light interference. Their limited incidence rate suggests that the ambient-light correction was generally effective.

Time-dependent prediction variation fluctuated around zero and showed no correlation with spectral roughness, as expected for a metric representing random noise. Weak correlations with shape similarity during midday and afternoon conditions were again driven by the few spectra with distortions, indicating that parts of these distortions were unstable within each measurement sequence and likely arose from sporadic insufficient ambient-light correction during strong ambient drift. Any remaining distortions not captured by these weak correlations were stable throughout the measurement and were likely caused by stray light, imperfect sensor positioning, or other interfering disturbances.

Overall, the results indicate that shot noise was the primary factor limiting prediction accuracy, whereas occasional shape distortions were handled through outlier detection and removal.

Effects of Interfering Sources on Data Quality and Prediction Accuracy

Correlations of data quality metrics (roughness and shape similarity) and lab–field prediction deviations with measurement conditions (sample intensity, ambient intensity, and ambient drift) are presented in Table IV. Ambient drift is here defined as the standard deviation of the difference in signal strength between consecutive ambient spectra. No significant correlations were observed with temperature or humidity.

Table IV.

Correlations of data quality metrics and lab–field prediction deviations with measurement conditions for the four field measurement runs. *Clear outliers removed.

Parameter	Night	Morning	Midday	Afternoon
(i) Correlation with sample intensity
Roughness $σ_{j, res}$	$- 0.85$	$- 0.89$	$- 0.75$	$- 0.79$
Shape similarity $ρ_{j, ref}$	0.41	0.46	0.56	0.60
Lab–field prediction deviation	0.04	$- 0.32$	$- 0.41$	$- 0.33$
(ii) Correlation with ambient intensity
Roughness $σ_{j, res}$	$- 0.09$	0.27	0.43	0.60
Shape similarity $ρ_{j, ref}$	0.07	0.06	$- 0.07$	$- 0.16$
Lab–field prediction deviation	$- 0.04$	0.12*	0.15	0.01
(iii) Correlation with ambient drift
Roughness $σ_{j, res}$	$- 0.05$	$- 0.07$	0.26	0.36*
Shape similarity $ρ_{j, ref}$	$- 0.01$	0.02*	$- 0.03$	$- {0.06}^{*}$
Lab–field prediction deviation	$- 0.03$	0.13*	0.09	0.20

Spectral roughness showed strong negative correlations with sample intensity, which, together with ambient-light shot noise, determines the SNR. These correlations were stronger during night and morning, when ambient-light shot noise levels were low and stable. Moderate correlations of roughness with ambient intensity further reflect the contribution of ambient-light shot noise. This confirms that spectral roughness is a suitable measure of SNR. Weak correlations of roughness with ambient drift during midday and afternoon were likely caused by temporary intensity peaks, which increased the average ambient intensity and, consequently, the shot noise during periods of strong drift.

Moderate correlations between spectral shape similarity and sample intensity likely arose because shape similarity is partly influenced by roughness (i.e., insufficient SNR). These correlations, particularly during midday and afternoon, may also reflect the greater impact of spectral distortions on weaker signals. No correlations were observed with ambient intensity or ambient drift, indicating that ambient-light interference was not systematically increased by higher intensity or drift, and that the correction was generally effective. Only a small number of spectra showed distortions indicative of occasional insufficient ambient-light correction, which were successfully handled through outlier detection.

Lab–field prediction deviations showed moderate negative correlations with sample intensity during daytime conditions. Most spectra identified as outliers, those with high roughness or low shape similarity, resulted from insufficient sample signal, likely due to imperfect sensor positioning. This confirms that sample intensity is the primary factor governing prediction accuracy, as it directly influences SNR and moderates the impact of spectral distortions. Although ambient intensity increases shot noise, no notable correlations were observed with lab–field prediction deviation, either from ambient intensity or ambient drift, likely because variations in sample intensity dominated.

Overall, the ambient-correction method was effective for the current setup, and SNR, determined primarily by sample signal intensity and ambient-light shot noise, remained the main factor affecting data quality and prediction accuracy. To maximise data quality and minimise outliers, maintaining a strong sample signal through optimal sensor positioning is essential, and additional measures to reduce the ambient light, such as active shading, could be beneficial.

Effects of Measurement Configuration on Prediction Performance

Ambient-Correction Method

The prediction performances obtained using the different ambient-correction methods are shown in Figure 5.

Figure 5.

Prediction performances for the three ambient-correction methods across the five measurement runs: simple averaging (blue circles), interpolation-based correction (orange squares), and linear regression-based correction (green triangles).

The simple averaging method performed well under stable sunlight conditions (laboratory, night, and morning) but failed to adequately correct spectra acquired under strong ambient drift during the midday and afternoon runs. This was likely due to non-uniform weighting of acquisition times caused by irregular sampling, likely introduced by the discarding of spectra recorded during shutter transitions. As a result, the ambient contributions in the sample and ambient averages became imbalanced.

In contrast, both the interpolation-based correction and the linear regression-based correction, which compensate for such sampling irregularities, effectively handled the strong ambient drift. Their performances were nearly identical, reflecting their shared ability to model linear trends in the ambient signal. Because of its simplicity and computational efficiency, the interpolation-based method was chosen for all subsequent analyses.

Non-linear extensions of the regression-based approach, such as polynomial or locally weighted models, may offer improved correction of non-linear ambient variations.

The various strategies tested for excluding unreliable corrected spectra (e.g., those acquired under strong ambient drift) prior to the final averaging step reduced the prediction performance and were therefore not applied. This confirms that, for the current setup, the ambient correction was effective, leaving SNR as the primary factor limiting prediction accuracy. Consequently, including a larger number of spectra, even if some were slightly disturbed, improved the overall prediction accuracy.

Sampling Frequency

Figure 6 shows the prediction performances for 240 averaged spectra acquired over shorter (5 s) and longer (10 s) total acquisition times, and with two ambient sampling frequencies (6 Hz and 2 Hz). Note that in this approach, only approximately one-fourth of the acquired spectra were used, so the reported prediction performances do not represent the full potential of the setup for the specified durations.

Figure 6.

Prediction performance for 240 averaged spectra under different measurement durations and ambient sampling frequencies: 6 Hz over 5 s (blue circles); 6 Hz over 10 s (orange squares); 2 Hz over 5 s (green triangles); and 2 Hz over 10 s (red diamonds).

The highest prediction performance was achieved with the fastest ambient sampling frequency (6 Hz) and the shortest measurement duration (5 s). Doubling the measurement duration to 10 s had a larger negative impact on prediction performance than lowering the ambient sampling frequency to 2 Hz, particularly under midday and afternoon conditions with strong ambient drift. Night-time measurements were unaffected, and only minor effects occurred under stable morning conditions. This indicates that slow ambient drift, which accumulates over longer measurement times, degrades prediction performance more than rapid fluctuations above 1 Hz, which are corrected at 6 Hz but not at 2 Hz.

The apparent improvement in performance at 10 s for 2 Hz compared to 6 Hz during morning and midday runs was an artifact resulting from the exclusion of more outliers, reflecting reduced overall spectral quality rather than improved prediction accuracy. In the afternoon, where similar numbers of outliers were removed, the slower sampling frequency produced the expected poorer performance. Overall, the slowest ambient sampling frequency combined with the longest measurement duration resulted in the lowest performance.

While increasing the shutter frequency from 2 Hz to 6 Hz modestly improved performance, minimising total measurement duration was more critical, as it limits the cumulative impact of ambient drift. Short exposures under intense ambient light therefore require fast-readout spectrometers: in this study, the 0.5 ms exposures accounted for only 1 s of the total 12 s measurement time due to 5.5 ms readout delays, making slower-readout spectrometers generally inadequate for these conditions.

Measurement Time

Figure 7 shows prediction performance as a function of the measurement time, using the ambient sampling frequency of 6 Hz, corresponding to 0.167 s per shutter cycle. For each run, the prediction performance obtained from acquiring spectra over 10, 20, 30, 40, 50, 60, and 70 shutter cycles was compared.

Figure 7.

Prediction performance as a function of measurement time for the laboratory (blue circles), night (orange squares), morning (green upward triangles), midday (red diamonds), and afternoon (purple downward triangles) runs.

Averaging more spectra increases the SNR, which reduces spectral roughness and lowers the number of outliers. Across the measurement runs, outlier levels scaled with ambient light intensity, confirming that shot noise was the dominant noise source. The number of outliers decreased approximately as $1 / \sqrt{t}$ , where the measurement time $t$ is proportional to the number of spectra $n$ , consistent with the SNR increasing with the square root of the number of averaged spectra ( $\sqrt{n}$ ). Aside from reducing outliers, increasing the measurement time did not significantly improve prediction performance, except for instabilities at very short durations. This indicates that the outlier detection effectively excludes low-quality spectra and maintains robust predictions.

In practical applications, overall measurement speed depends not only on the acquisition time per spectrum but also on the number of outliers, since rejected spectra must be reacquired, adding to the total measurement time. Therefore, the optimal measurement time is the minimum required to maintain an acceptably low outlier rate.

Under laboratory and night-time conditions with minimal ambient light, prediction performance remained stable as the measurement time increased, and the outlier rate stabilised at approximately 3 s. This suggests that, in these conditions, a measurement time of 3 s provides sufficient SNR and data quality.

Under ambient light, longer measurement durations were required to reduce the outlier rate. While 3 s produced stable predictions, roughly one-third of the spectra were rejected. Extending the measurement time to 10 s enabled most spectra to meet the SNR threshold imposed by the outlier detection method, improving robustness and reducing the need for repeated measurements. Even under the most challenging midday conditions, 10 s measurements still yielded 12 % outliers. However, further reductions would require substantially longer acquisitions, which would not shorten the overall measurement time compared with simply remeasuring the discarded spectra. Thus, a measurement duration of approximately 10 s appears to be a practical optimum under these conditions.

Design Guidelines for In-Field Spectral Measurements

Based on the findings of this study, the following guidelines summarise best practices for achieving robust in-field spectral measurements.

The most critical factor during in-field daytime measurements is ambient light, and several measures should be implemented in the instrument design and sampling strategy to reduce its influence.

Because ambient light adds to the sample signal, acquisition should be split over many spectra using very short exposure times to avoid detector saturation under strong ambient illumination. In this study, an exposure time of 0.5 ms was chosen. Consequently, total acquisition time is typically dominated by spectrometer readout time, which typically adds a few milliseconds or more to the acquisition cycle. To reduce total measurement time, it is advantageous to select a spectrometer with faster readout or larger full-well capacity (allowing longer exposure before saturation), even if this comes at the expense of sensitivity. This contrasts with applications under weak ambient light, where higher spectrometer sensitivity is generally more important than readout speed.

Disturbances from ambient drift under sunlit conditions can be mitigated by rapidly sampling the ambient using a shutter and applying ambient correction to each shutter cycle. The required shutter frequency depends on the speed of ambient drift and the characteristics of the system; in this study, 6 Hz was sufficient. Disturbances from slow ambient drift, which accumulate over time, are reduced by keeping measurement durations short. Achieving this with very short exposure times requires fast spectrometer readout; in this study, an 0.5 ms exposure combined with a 5.5 ms readout gave an acceptable 8% duty cycle. With appropriate configurations, no significant uncorrected drift remain, and the primary factor limiting prediction performance is shot noise.

Shot noise represents a fundamental, unavoidable noise source, and its influence depends on the SNR. When shot-noise limited, the SNR of the ambient-corrected spectrum is approximated by

SNR \approx \sqrt{N} \cdot n_{s} / \sqrt{n_{s} + 2 \cdot n_{a}}

(8)

where

n_{s}

and

n_{a}

are the numbers of sample and ambient photoelectrons per spectrum (with the factor 2 arising from the ambient correction), and

N

is the number of averaged spectra, proportional to the measurement duration. SNR can be increased by raising the signal intensity relative to the ambient, for example, by using stronger sample illumination or shading the ambient. It can also be improved by acquiring and averaging more spectra. A spectrometer with fast readout or large full-well capacity helps minimise measurement time while maintaining sufficient SNR. Practical limitations such as lamp heating, power consumption, instrument size, or cost must be considered, and there is a trade-off between performance, speed, size, and cost.²⁵ Measurement time should be optimised according to the signal and ambient levels to achieve sufficient data quality above the outlier threshold. In this study, 10 s measurement duration was found sufficient.

The proposed outlier detection procedure is effective for maintaining consistent prediction performance, even for low-quality data. Operating on each final averaged spectrum, it is computationally light and straightforward to implement. In practical systems, it can provide real-time feedback to the operator, flagging poor-quality spectra for repetition and suggesting adjustments to optimise measurement duration.

Although this study did not specifically address external disturbances such as motion, stray light, or reflections from nearby objects, significant effects from these sources are expected to manifest as spectral shape deviations, which are identified and removed through the outlier detection procedure.

This work focused on the instrument design, sampling strategy, and raw data processing rather than advanced multivariate modelling. Nevertheless, further improvements in prediction performance may be achieved through more sophisticated regression and calibration techniques.

With these guidelines implemented, the results of this study suggest that robust, high-quality, in-field spectral measurements are feasible using compact, practical instrumentation suitable for non-contact robotic operation. Best performance was achieved under night-time or low-ambient conditions, although measurements acquired under bright daytime conditions without actively shading also yielded acceptable results. However, stronger sunlight in locations further south than Norway may reduce achievable performance at peak daylight, potentially affecting the practical implementation of such measurements.

Conclusion

This study demonstrated a non-contact NIR interaction instrument with a robust calibration model for rapid determination of TSS in strawberries. The model provided consistent performance across seasons and cultivars and maintained predictive ability even beyond the calibration range of TSS, confirming the advantage of interaction geometries that enable representative sub-surface sampling. By implementing rapid ambient light sampling and correction, the instrument achieved sufficient performance for daytime, in-field measurements of polytunnel-grown on-the-plant strawberries, with a measurement time of approximately 12 s per berry.

The ambient-correction method proved essential for robust predictions, particularly when combined with the proposed outlier detection procedure to reject low-quality spectra. Shot noise from ambient light remained the main limiting factor for daytime performance, indicating that adequate signal quality can be achieved by increasing the measurement time when necessary.

With the proposed methods and guidelines for achieving robust measurement performance under challenging environmental conditions, this work encourages further in-field implementations of NIR spectroscopy for determining internal quality attributes of fruit and vegetables, supporting the advancement of precision agriculture and agricultural robotics.

Footnotes

Acknowledgments

We thank Associate Professor Siv Fagertun Remberg and Head Engineer Kari Grønnerød (Norwegian University of Life Sciences) for access to the strawberry polytunnel and for providing strawberries for the test set. We also thank Simen Myhrene (Ekeberg Myhrene Sylling AS) for supplying strawberries for the calibration set. We acknowledge Bastian Krohg (SINTEF) for assistance with calibration measurements, and Karen Wahlstrøm Sanden (Nofima) for conducting the reference measurements for the test set. We acknowledge the use of ChatGPT-5 for language editing.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical considerations

This article does not contain any studies with human or animal participants.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Research Council of Norway, through the project SFI Digital Food Quality (RCN no. 309259).

ORCID iDs

Vilde Vraalstad

Marion O’Farrell

References

Török

Yeh

C.H.

Menozzi

, et al. “European Consumers’ Preferences for Fresh Fruit and Vegetables: A Cross-Country Analysis”. J. Agric. Food Res. 2023. 14: 100883. doi:10.1016/j.jafr.2023.100883.

Canales

Gallardo

R.K.

Iorizzo

, et al. “Willingness to Pay for Blueberries: Sensory Attributes, Fruit Quality Traits, and Consumers’ Characteristics”. Hort. Sci. 2024. 59(8): 1207-1218. doi:10.21273/HORTSCI17947-24.

Grimstad

From

“The Thorvald II Agricultural Robotic System”. Robotics. 2017. 6(4): 24. doi:10.3390/robotics6040024.

Gerhards

Risser

Spaeth

, et al. “A Comparison of Seven Innovative Robotic Weeding Systems and Reference Herbicide Strategies in Sugar Beet (Beta Vulgaris subsp. vulgaris L.) and Rapeseed (Brassica Napus L.)”. Weed Res. 2024. 64(1): 42-53. doi:10.1111/wre.12603.

Zhou

Wang

, et al. “Intelligent Robots for Fruit Harvesting: Recent Developments and Future Challenges”. Precis. Agric. 2022. 23: 1856-1907. doi:10.1007/s11119-022-09913-3.

Rizzo

Marcuzzo

Zangari

, et al. “Fruit Ripeness Classification: A Survey”. Artif. Intell. Agric. 2023. 7: 44-57. doi:10.1016/j.aiia.2023.02.004.

Pardede

Husada

Hermana

, et al. “Fruit Ripeness Based on RGB, HSV, HSL, L*a*b* Color Feature Using SVM.” In: 2019 International Conference of Computer Science and Information Technology (ICoSNIKOM). Mendan, Indonesia: 2019. Pp. 1-5. doi:10.1109/ICoSNIKOM48755.2019.9111486.

Saeys

Kim

, et al. “Hyperspectral Imaging Technology for Quality and Safety Evaluation of Horticultural Products: A Review and Celebration of the Past 20-Year Progress”. Postharvest. Biol. Technol. 2020. 170: 111318. doi:10.1016/j.postharvbio.2020.111318.

Bertoglio

Piliego

Guadagna

, et al. “On-the-Go Table Grape Ripeness Estimation via Proximal Snapshot Hyperspectral Imaging”. Comput. Electron. Agric. 2024. 226: 109354. doi:10.1016/j.compag.2024.109354.

10.

Rodríguez-Ortega

Aleixos

Blasco

, et al. “Study of Light Penetration Depth of a Vis-NIR Hyperspectral Imaging System for the Assessment of Fruit Quality. A Case Study in Persimmon Fruit”. J. Food Eng. 2023. 358: 111673. doi:10.1016/j.jfoodeng.2023.111673.

11.

Wold

J.P.

Andersen

P.V.

Aaby

, et al. “Inter-Seasonal Validation of Non-Contact NIR Spectroscopy for Measurement of Total Soluble Solids in High Tunnel Strawberries”. Spectrochim Acta, Part A. 2024. 309: 123853. doi:10.1016/j.saa.2024.123853.

12.

Wold

J.P.

O’Farrell

Andersen

P.V.

, et al. “Optimization of Instrument Design for In-Line Monitoring of Dry Matter Content in Single Potatoes by NIR Interaction Spectroscopy”. Foods. 2021. 10(4): 828. doi:10.3390/foods10040828.

13.

Kader

A.A.

Postharvest Technology of Horticultural Crops. Oakland, CA: Cooperative Extension, University of California, Division of Agriculture and Natural Resources, 2002.

14.

Walsh

K.B.

Blasco

Zude-Sasse

, et al. “Visible-NIR Point Spectroscopy in Postharvest Fruit and Vegetable Assessment: The Science Behind Three Decades of Commercial Use”. Postharvest Biol. Technol. 2020. 168: 111246. doi:10.1016/j.postharvbio.2020.111246.

15.

Savitzky

Golay

M.J.E.

“Smoothing and Differentiation of Data by Simplified Least Squares Procedures”. Anal. Chem. 1964. 36(8): 1627-1639. doi:10.1021/ac60214a047.

16.

Barnes

R.J.

Dhanoa

M.S.

Lister

S.J.

“Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra”. Appl. Spectrosc. 1989. 43(5): 772-777. doi:10.1366/0003702894202201.

17.

Martens

Næs

. Multivariate Calibration. New York: Wiley, 1989.

18.

Segtnan

V.H.

Mevik

B.H.

Isaksson

, et al. “Low-Cost Approaches to Robust Temperature Compensation in Near-Infrared Calibration and Prediction Situations”. Appl. Spectrosc. 2005. 59(6): 816-825. doi:10.1366/0003702054280586.

19.

Davies

Fearn

. Back to Basics: Calibration Statistics. Spectrosc. Eur. 2006. 18(2): 31-32.

20.

Tschudi

O’Farrell

Bakke

K.A.H.

“Inline Spectroscopy: From Concept to Function”. Appl. Spectrosc. 2018. 72(9): 1298-1309. doi:10.1177/0003702818788374.

21.

Jensen

P.S.

Bak

“Near-Infrared Transmission Spectroscopy of Aqueous Solutions: Influence of Optical Pathlength on Signal-to-Noise Ratio”. Appl. Spectrosc. 2002. 56(12): 1600-1606. doi:10.1366/000370202321115.

22.

Golic

Walsh

Lawson

“Short-Wavelength Near-Infrared Spectra of Sucrose, Glucose, and Fructose with Respect to Sugar Concentration and Temperature”. Appl .Spectrosc. 2003. 57(2): 139-145. doi:10.1366/000370203321535033.

23.

Wold

J.P.

Sanden

K.W.

Skaret

, et al. “Non-Contact Interactance NIR Spectroscopy for Estimating TSS and Sensory Sweetness in Conveyor-Belt Transported Cherry Tomatoes (Lycopersicon esculentum ‘Piccolo’)”. Spectrochim. Acta, Part A. 2025. 335: 125962. doi:10.1016/j.saa.2025.125962.

24.

Subedi

Walsh

“Assessment of Sugar and Starch in Intact Banana and Mango Fruit by SWNIR Spectroscopy”. Postharvest Biol. Technol. 2011. 62(3): 238-245. doi:10.1016/j.postharvbio.2011.06.014.

25.

Vraalstad

Tschudi

O’Farrell

, et al. “First-Principles Methodology for Developing Robust NIR Spectroscopic solutions for Real-World Applications.” In: Photonic Technologies in Plant and Agricultural Science II. Proc. SPIE. 13357, 2025. p. 1335707. doi:10.1117/12.3038835.