Quantitative analysis of the hexamethylenetetramine concentration in a hexamethylenetetramine–acetic acid solution using near infrared spectroscopy: A comprehensive study on preprocessing methods and variable selection techniques

Abstract

Hexamethylenetetramine (HA) is widely used as a raw material in the medical, chemical, industrial, and military industries, and the fast and quantitative analysis of HA is important for manufacturing processes in these fields. Owing to its efficiency, low cost, nondestructive testing, and convenience, near infrared (NIR) spectroscopy is a powerful technique for quantitatively analyzing the HA concentration in HA–acetic acid (HAc) solutions, demonstrating application potential in the production of hexogen and octogen. A series of preprocessing algorithms and variable selection methods were studied to improve the detection accuracy of the NIR spectroscopic calibration. Forty-six different combinations of standard normal variation (SNV), multiplicative signal correction, first derivative, second derivative, and discrete wavelet transform (DWT) were screened. The effects of four variable selection methods (successive projection algorithm (SPA), uninformed variable elimination, competitive adaptive reweighted sampling, and multiverse optimization (MVO)) were compared. Finally, a model (SPXY-SNV-1stDer-DWT-MVO-RF) was developed by combining sample set portioning based on the joint x–y distance (SPXY) algorithm with the random forest (RF) calibration model, and MVO was combined with the NIR technique for the first time. The model achieved a coefficient of determination for the calibration set (R²), root mean square error of the calibration set (RMSEC), coefficient of determination for the prediction set (r²), and root mean square error of the prediction set (RMSEP) of 1.000, 0.04%, 0.999, and 0.05%, respectively. This study demonstrated the novelty and feasibility of HA quantitative detection by NIR spectroscopy and provided valuable insights for optimizing quantitative analysis models by optimizing algorithms, indicating the great application potential of NIR spectroscopy in related fields.

Keywords

Preprocessing method near infrared spectroscopy multi-verse optimizer algorithm random forest

Introduction

The quality of raw materials is closely related to product quality and yield in chemical production processes. Conventionally, chemical titration is often used for quality testing.¹ However, this method has inherent limitations because it relies on organic reagents and specialized equipment. Furthermore, the accuracy of the test results may be influenced by variations in the operator skill level. Additionally, the prolonged testing period associated with this method can lead to delayed acquisition of titration results. Consequently, potential issues in the production process may not be identified promptly. Near infrared (NIR) spectroscopy has become a crucial tool in the field of chemical analysis owing to its efficiency, speed, low cost, nondestructive testing, and convenience. NIR spectroscopy has demonstrated significant potential for the quantitative analysis of samples and has found extensive applications in various fields, such as food inspection,² pharmaceutical manufacturing,³ chemical production,^4,5 environmental monitoring,⁶ and agricultural product processing.⁷

Establishing calibration models based on chemometrics and NIR spectroscopy enables the rapid quantitative detection of raw material quality, which provides robust support for improving production efficiency, product quality, and resource utilization efficiency. Nevertheless, the susceptibility of spectroscopic data to disturbances from factors such as stray light, noise, and baseline shifts presents a considerable challenge in the development of quantitative models for NIR analysis. To improve the prediction accuracy, implementing effective data preprocessing before establishing a quantitative model is essential. Common preprocessing methods, including but not limited to standard normal variation (SNV),⁸ multiplicative signal correction (MSC),⁸ discrete wavelet transform (DWT),⁹ first derivative (1stDer), and second derivative (2ndDer),¹⁰ can eliminate the influence of other components in spectral signals, making the identification of spectral information easier. Different pre-processing methods have distinct features. For instance, wavelet transform is highly effective in time-frequency domain analysis, whereas baseline correction methods are better suited for addressing baseline shifts. Combining multiple preprocessing methods may help handle baseline shifts and noise that affect certain spectra. Additionally, excess variable information may result in an oversized dataset for modeling, thereby increasing the computational time. Simultaneously, irrelevant information can be introduced, compromising the overall performance of the model and hindering its ability to achieve the intended objectives. To address this problem, variable selection methods are paired with preprocessing techniques to eliminate redundant information, thereby enhancing the predictive accuracy and stability of the model. Research indicates that intelligent optimization algorithms, such as genetic algorithms (GA) and particle swarm optimization (PSO), are often employed for wavenumber variable selection because of their superior performance in global search, nonlinear optimization, and handling high-dimensional datasets or complex feature selection problems when compared to traditional methods, such as the successive projections algorithm (SPA), uninformed variable elimination (UVE), and competitive adaptive reweighted sampling (CARS).^11–15 Robust global optimization capabilities, rapid convergence, and a minimal number of parameters are exhibited by the multiverse optimization (MVO) algorithm,¹⁶ making it well suited for addressing complex problems across various domains, such as trajectory optimization,¹⁷ parameter optimization,¹⁸ location of the critical failure surface of a slope,¹⁹ and optimal band selection for hyperspectral image data.^20,21 However, to date, MVO coupled with NIR spectroscopy for the quantitative determination of chemical compound concentrations has not been reported.

Hexamethylenetetramine (HA) is used widely in the medical, chemical, and military industries. For instance, an acetic acid solution (HAc) of HA is an important raw material for producing energetic hexogen and octogen compounds. Quantitative analysis of the HA concentration in HA–HAc solutions may have potential applications in improving the safety and quality consistency of the manufacturing process of these energetic compounds. In this study, the effects of different preprocessing methods and variable selection techniques on the accuracy of a quantitative model for NIR spectroscopy were investigated. Based on the analysis of the NIR spectroscopic signals obtained from the HA–HAc solution, various preprocessing algorithms and combinations (considering the number and order of the combinations) were employed to establish a random forest (RF) calibration model. The modeling effectiveness of different preprocessing algorithms was analyzed. Multiple variable selection methods, such as SPA, UVE, CARS, and MVO, were applied to optimize the variables, and the coefficient of determination of the calibration set (R²), root mean square error of the calibration set (RMSEC), coefficient of determination of the prediction set (r²), and root mean square error of the prediction set (RMSEP) were used to identify the optimal variable selection method. These results offer a reliable and efficient approach for the rapid quantitative analysis of HA concentration in HA–HAc solutions, which may also support the quantification of HA by the NIR technique in the medical and chemical industries.

Experimental and method

Sample preparation and spectral acquisition

Different quantities of HA and HAc were weighed using an EX12001ZH Electronic Balance (Shanghai Ohaus Instrument Co., Ltd, China) and different concentration HA solutions were prepared. The concentration of HA in the HA–HAc solution was analyzed by chemical titration. The HA concentrations of 102 HA–HAc solutions were determined; the HA concentrations ranged from 6.87% to 13.00% and were evenly distributed, as shown in Figure 1. The industrial-grade HA had a purity exceeding 99.30%, with a water content less than 0.20%. Glacial acetic acid with a purity exceeding 99.0% was purchased from the Beijing Tongguang Fine Chemical Co., Ltd.

Figure 1.

Distribution of the hexamethylenetetramine concentration.

An Antaris II Fourier Transform NIR Spectrometer (ThermoFisher, Waltham, MA, USA) was used, with an empty transmission liquid sample tube (borosilicate glass KIMAX-51 glass tube) as the background, and the spectra were recorded at a room temperature of 25 ± 1°C, with a spectral range of 10,000-4000 cm⁻¹ and a resolution of 8 cm⁻¹. A total of 1557 wavenumbers were acquired per spectrum.

Sample set division

The dataset of 102 HA–HAc solution samples was partitioned using sample set partitioning based on joint x–y distance (SPXY) and random splitting (RS), resulting in a calibration set of 72 samples and prediction set of 30 samples. SPXY software was used to partition the sample set. The spectral data x and HA concentration y affect the modeling results; therefore, when considering the distance between samples, the same importance is given to the distance between the spectral data x and HA concentration y in the space to ensure that the sample distribution is maximally represented and the multidimensional vector space is effectively covered.^22,23 The specific division formulas are as follows:

d_{x} (p, q) = \sqrt{\sum_{j = 1}^{m} {[x_{p} (j) - x_{q} (j)]}^{2}}, p 、 q \in [1, n]

(1)

d_{y} (p, q) = \sqrt{{(y_{p} - y_{q})}^{2}} = | y_{p} - y_{q} |, p 、 q \in [1, n]

(2)

d_{x y} (p, q) = \frac{d_{x} (p, q)}{\max d_{x} (p, q)} + \frac{d_{y} (p, q)}{\max d_{y} (p, q)}, p 、 q \in [1, n]

(3)

where m represents the number of wavenumber points in the sample spectral data, n is the number of samples, and p and q are different samples. The principle of the RS algorithm involves dividing a dataset by random sampling. Specifically, during dataset division, the algorithm randomly selects a certain proportion of samples as the prediction set, whereas the remaining samples serve as the calibration set. This random sampling process helps reflect the overall characteristics of the dataset to a certain extent, ensuring the randomness and representativeness of both the calibration and prediction sets.

Spectral preprocessing

Figure 2 shows the NIR spectrum of the HA–HAc solution. The spectral signals reveal overlaps in the regions between 4000 and 4800 cm⁻¹, accompanied by deformation and slight burrs. These overlaps may stem from baseline shifts, temperature variations, and background noise from other components.²⁴ Therefore, conducting baseline correction, noise filtering, and scattering corrections is necessary. Common techniques for eliminating continuous background shifts and enhancing the spectral resolution involve the use of 1stDer and 2ndDer.¹⁰ The DWT can be employed to break down a signal into a set of overlapping wavelet functions, thereby facilitating the correction of background noise and simultaneous noise reduction.⁹ Additionally, to remove any scattering effects on the spectra resulting from inconsistent particle distributions and diverse particle sizes, MSC or SNV should be employed.⁸ An investigation to compare the advantages and disadvantages of the five preprocessing methods (no preprocessing, single preprocessing, and combined approaches) was conducted. Additionally, this study analyzed the impact of various combination sequences on modeling and aimed to identify the most effective preprocessing technique.

Figure 2.

Original NIR spectra of HA–HAc solutions.

Variable selection

The MVO algorithm is an optimization algorithm that determines the universe with the best expansion rate through the interaction of white holes, black holes, and wormholes in a multi-universe. During the calculation, each universe in the multi-universe can be regarded as a candidate solution to the optimization problem, and the objects in the universe are the variables of the candidate solution. After several cycles of evolution, an optimal solution is obtained.^16,25,26 Notably, MVO has been successfully applied to solve global optimization problems related to continuity. However, this continuous approach is unsuitable for variable selection problems. In general, variable selection in spectroscopy can be regarded as an optimization problem of variable combinations. Therefore, a vector composed of 1 or 0, which indicates whether the variable was selected or not, was used as the MVO input.

Herein, MVO is applied to the selection of variable information for HA–HAc solution NIR spectroscopy. The modeling of variable information after four traditional optimization algorithms, namely, the SPA, UVE, CARS algorithm, and MVO is compared to determine the best variable information selection method to improve the predictive performance of the model.

RF calibration model construction

In machine learning, the construction of RF involves an ensemble of decision trees to create a method based on the theory of statistical estimation for developing predictive models. While demonstrating outstanding performance in classification tasks, the inherent characteristics of ensemble decision trees enable them to navigate complex data relationships and alleviate the risk of model overfitting. Moreover, they exhibit commendable proficiency in handling a plethora of input variables and intricate data structures, making them particularly well suited for applications in NIR spectroscopy processing.²⁷ This study combined the calibration set of the NIR spectral intensities of HA–HAc solutions with the HA concentration in the solutions. A RF calibration model was constructed by aggregating the outputs of all regression trees, which were subsequently employed to predict the HA concentration in the prediction set of HA–HAc solutions. During the development of the RF calibration model, the optimization of model performance parameters, such as the number of decision trees and the number of nodes per decision tree, was achieved by utilizing five-fold cross-validation and grid search algorithms to determine the optimal parameters.

Results and discussion

Sample set division

To validate the significance of sample set partitioning, after collecting NIR spectroscopy data of HA–HAc solutions, the RS and SPXY algorithms were employed to extract calibration and prediction sets. The constructed RF model was used to predict and assess the effectiveness of the results. The results of using different methods for sample modeling selection are compared shown in Table 1 and indicate that the model based on the SPXY algorithm exhibits the best predictive performance.

Table 1.

Comparison of different modeling methods for HA evaluation.

Method	Calibration set		Prediction set
Method	R²	RMSEC/%	r²	RMSEP/%
SPXY	0.984	0.25	0.980	0.28
RS^*	0.981	0.29	0.946	0.49

Note: R², determination coefficient of the calibration set; RMSEC, root mean square error of the calibration set; r², determination coefficient of the prediction set; RMSEP, root mean square error of the prediction set; RS, randomly selected; *, optimal result among the 50 runs.

This superiority stems primarily from the refined sample-partitioning strategy of the SPXY method, which assigns equal importance to both the spectral and concentration spaces, ensuring a more balanced distribution of samples between the training and testing sets. This prevents the excessive presence of any particular category of data in either the training or testing sets, thereby aiding better model generalization.¹³ Furthermore, in contrast to random partitioning, which causes an uneven feature distribution that affects model learning and predictive capacity, SPXY considers data correlations. This effectively enhances the predictive performance of the model, resulting in superior performance compared to a random distribution.

The concentration information statistics of the calibration and prediction sets obtained using the SPXY and RS algorithms were further evaluated, and the results are presented in Table 2. Notably, the concentration lower limit in the prediction set divided by the RS algorithm is 6.87%, which did not appear within the range of the calibration set’s lower limit of 6.88%. This indicates a deficiency in the dataset divided by the RS algorithm. In contrast, the data divided by the SPXY algorithm had concentration ranges in both the calibration and prediction sets, and similar standard deviations and relative standard deviations contributed to the establishment of a robust regression model.²⁸

Table 2.

Statistical analysis of the HA concentration in calibration and prediction sets under different partitioning methods.

Partitioning method	Sample	Mean/%	Max/%	Min/%	Standard deviation/%	Relative standard deviation/%
SPXY	Calibration set (n = 72)	9.74	13.00	6.87	1.99	20.45
SPXY	Prediction set (n = 30)	10.56	12.91	6.90	1.97	18.63
RS	Calibration set (n = 72)	9.77	13.00	6.88	1.95	19.92
RS	Prediction set (n = 30)	10.45	12.90	6.87	2.10	20.04

Comparative analysis of different preprocessing methods

Five separate preprocessing techniques, SNV, MSC, 1stDer, 2ndDer, and DWT, were individually applied to the NIR spectra of the HA–HAc solutions (Figure 3). Appropriately processed spectral data were used as input variables to construct the RF correction model. To assess the effectiveness of the model using various preprocessing methods, we compared the determination coefficients and root-mean-square errors of the model for the calibration and prediction sets. Table 3 provides a thorough analysis of the predictive performance of the RF correction model considering the optimal parameter configurations.

Figure 3.

NIR spectra of single preprocessing methods for HA–HAc solution (a): SNV; (b): MSC; (c): 1stDer; (d): 2ndDer; and (e): DWT.

Table 3.

HA concentration predictive performance of RF calibration models for different pretreatment methods.

Method	Calibration set		Prediction set
Method	R²	RMSEC/%	r²	RMSEP/%
None	0.98-	0.25	0.98	0.28
SNV	0.990	0.21	0.990	0.22
MSC	0.990	0.20	0.93	0.52
1stDer	0.997	0.12	0.990	0.20
2ndDer	0.96	0.41	0.93	0.52
DWT	0.990	0.22	0.990	0.24

Combining the analysis of Figure 3 and Table 3 indicates that implementing the SNV and MSC treatments (refer to Figure 3(a) and (b)) greatly improves the handling of scattering effects on the spectra, thereby resulting in better spectral quality. As the MSC treatment was performed, the RMSEC value decreased in the correction set; however, the RMSEP value increased in the prediction set, implying unsatisfactory generalization performance. In contrast, the application of SNV treatment enhanced the comparison and stability of spectral data, enabling the model to apprehend a broad data framework and obtain exceptional generalization performance on novel data. Consequently, reduced RMSEC and RMSEP values were observed.

To mitigate the influence of the instrumental background or shift on the signal presented in Figure 3(c) and (d), the use of 1stDer and 2ndDer effectively curbs intervention from the baseline and background. The amplification of variations within the 4000-4800 cm⁻¹ band becomes more prominent. This is likely because of the sensitivity of the derivative operation to local spectral variations, including noise. Increases in both the RMSEC and RMSEP values after processing with 2ndDer suggests a decline in the performance of the calibration model, possibly due to an emphasis on the spectral gradients that cause signal distortion, particularly in the region between peaks and valleys. In contrast, 1stDer offers a smoother approach that assists in extracting information from the spectra, thereby decreasing the interference of noise on the overall RMSEC and RMSEP values postprocessing.

In addition, when DWT was applied to the spectrum (Figure 3(e)), a substantial overlap with the original spectrum was observed. This indicates that the DWT technique effectively preserves spike information from the original spectrum while eliminating incidental background noise. The reduction in the RMSEC and RMSEP values achieved through the application of the DWT algorithm highlights the efficacy of wavelet transformation in enhancing the spectral clarity and effectively removing unwanted noise components while retaining vital information.^29,30 This ultimately leads to improved model performance.

These results highlight the variations in the effectiveness of the different preprocessing methods owing to their unique strengths. Different preprocessing algorithms may be used in specific scenarios; therefore, employing various combinations of these methods allows us to overcome the limitations of individual techniques, harness their advantages, and facilitate better handling of noise, interference, and other data challenges. Consequently, employing various combinations of these methods may enhance the robustness of the model, enable it to adapt to diverse data variations and anomalies, and thus improve the overall preprocessing effectiveness.

To explore the potential of different preprocessing combinations, 40 distinct combinations were created using overlapping scattering correction (SNV, MSC), baseline correction (1stDer, 2ndDer), and filtering denoising (DWT) in various orders (refer to Table 4). These combinations were applied to the NIR spectra of the HA–HAc solution. Figure 4 shows the resulting RMSEC values for the correction set and RMSEP values for the prediction set. The red line illustrates the prediction outcomes of the RF correction model, and the blue balls represent the results of the model after each preprocessing step. The purple balls indicate the combinations that yielded the minimum RMSEC and RMSEP values.

Table 4.

Labelling of various combinations of preprocessing algorithms.

Number	Method	Number	Method	Number	Method	Number	Method
1	1stDer + SNV	11	MSC+2ndDer	21	2ndDer + SNV + DWT	31	MSC + DWT+1stDer
2	1stDer + MSC	12	MSC + DWT	22	2ndDer + MSC + DWT	32	MSC + DWT+2ndDer
3	1stDer + DWT	13	DWT+1stDer	23	2ndDer + DWT + SNV	33	DWT+1stDer + SNV
4	2ndDer + SNV	14	DWT+2ndDer	24	2ndDer + DWT + MSC	34	DWT+1stDer + MSC
5	2ndDer + MSC	15	DWT + SNV	25	SNV+1stDer + DWT	35	DWT+2ndDer + SNV
6	2ndDer + DWT	16	DWT + MSC	26	SNV+2ndDer + DWT	36	DWT+2ndDer + MSC
7	SNV+1stDer	17	1stDer + SNV + DWT	27	SNV + DWT+1stDer	37	DWT + SNV+1stDer
8	SNV+2ndDer	18	1stDer + MSC + DWT	28	SNV + DWT+2ndDer	38	DWT + SNV+2ndDer
9	SNV + DWT	19	1stDer + DWT + SNV	29	MSC+1stDer + DWT	39	DWT + MSC+1stDer
10	MSC+1stDer	20	1stDer + DWT + MSC	30	MSC+2ndDer + DWT	40	DWT + MSC+2ndDer

Figure 4.

RMSEC values (a) and RMSEP values (b) of RF (red line) and the combination of 40 preprocessing methods with RF (blue balls).

Figure 4 shows that preprocessing combination 25 (SNV+1stDer + DWT) was the most effective, resulting in reductions of 78.2% and 78.5% in the RMSEC and RMSEP values, respectively. Algorithmic overlay processing improves the quality of spectral data and enhances the predictive performance of the model. Nonetheless, preprocessing combinations 4, 5, 8, and 11 showed inferior effects, as indicated by their elevated RMSEC and RMSEP values. The increase in RMSEP values for the preprocessing combinations 12, 15, and 16 indicates inadequate generalization performance of the model, whereas all other preprocessing combinations outperformed no preprocessing. There are three potential reasons for this phenomenon: (1) algorithmic errors in the preprocessing techniques could inadvertently remove useful signals in addition to the intended noise, resulting in their obscurity or elimination and ultimately harming the predictive performance of the model³¹; (2) the sequence of data transformation steps may have been altered, leading to a negative impact on the ability of the model to generalize; and (3) the generalization performance of the model is affected by data characteristics such as baseline shift, scattering, and noise effects present in the HA–HAc NIR spectral signals. Therefore, baseline correction, noise filtering, and scattering correction can optimize the results. Analyzing the experimental results described above and utilizing spectral signal characteristics to determine suitable preprocessing algorithms before testing various combinations can enhance the efficiency and scientific validity of the preprocessing prior to modeling. For the HA–HAc NIR spectra preprocessed with SNV+1stDer + DWT, the values for R² and RMSEC were 0.999 and 0.06%, respectively. Meanwhile, the values for r² and RMSEP were 0.999 and 0.06%, respectively.

Selection of input variables for the SNV-1stDer-DWT-RF correction model for NIR spectra of HA–HAc solutions

When using HA–HAc solution NIR spectroscopy combined with the SNV-1stDer-DWT-RF algorithm for quantitative analysis, the large number of selected variables easily leads to too many modeling quantities, thus increasing the modeling time, and the redundant information contained will reduce the accuracy of the model analysis. If too few variables are used, although the modeling time is reduced, important variable information can be missed, which makes the effective spectral information unable to be fully applied to modeling and also reduces the accuracy of quantitative model analysis. Therefore, the screening of input variables is an essential step in modeling. The input variables preprocessed by SNV-1stDer-DWT were screened using the SPA, UVE, CARS, and MVO algorithms to build a RF correction model. Considering the stochastic nature of the CARS and MVO algorithms, the process was repeated 50 times to select the optimal wavenumber bands and better evaluate the variable selection methods. The optimal variable selection method was determined by comparing the predictive performance of the models. The variables selected by the four optimization algorithms are shown in Figure 5.

Figure 5.

Band comparison preferred by different variable selection methods.

The SPA algorithm has the lowest number of selected variables (only 34), and these variables are mainly concentrated in the wavenumber range of 4000–4800 cm⁻¹. Unlike the SPA algorithm, the UVE algorithm did not select any variables in the wavenumber range of 4000–4800 cm⁻¹ but selected a large number of variables in the ranges of 5700–6000 cm⁻¹ and 8000–8600 cm⁻¹. In contrast, the CARS algorithm showed a balanced trend in variable selection, with variables selected in several band ranges. The MVO algorithm selected the most variables, which was only 25.50% of the initial variables. This difference may reflect the different strategies and tradeoff considerations used by the different algorithms in the variable selection process.

The predictive performances of the RF correction models constructed using variables selected by different variable selection methods are presented in Table 5. The increase in the RMSEC and RMSEP values under the SPA algorithm compared to the full-spectrum modeling indicates that the variables selected by the SPA algorithm have difficulty adequately describing the HA concentration in the HA–HAc solution, which is attributed to the fact that the SPA algorithm removes the important information while removing the redundant information, resulting in degradation of the overall model performance.³² The RMSEC and RMSEP values of the RF correction model constructed using the variables extracted by the UVE, CARS, and MVO algorithms decreased, indicating that the predictive performance of the model improved, and redundant variable information was removed. As a global variable optimization method, the model constructed using the variables selected by the MVO algorithm obtained the best prediction accuracy.

Table 5.

Comparison of different algorithms after variable optimization.

Method	Variables	Calibration set		Prediction set
Method	Variables	R²	RMSEC/%	r²	RMSEP/%
None	1556	0.999	0.06	0.999	0.06
SPA	34	0.999	0.07	0.997	0.10
UVE	189	0.999	0.05	0.999	0.06
CARS	114	1.000	0.04	0.999	0.06
MVO	397	1.000	0.04	0.999	0.05

Certainly, in terms of the variable count, the SPA algorithm selected the least number of variables, indicating that the quantitative model constructed using SPA-selected variables was less complex. However, the MVO algorithm selected more variables, suggesting that the calibration model constructed using the MVO-selected variables was more complex and encompassed more spectral information. However, evaluating the model solely based on the variable count is insufficient, and the variable count must be combined with the generalization performance of the model and ability of the selected variables to effectively explain the target variables. The characteristic peaks of the NIR spectra of HA were at 4081-4367 cm⁻¹, 7067-7168 cm⁻¹, 8264 cm⁻¹, and 9497 cm^-1,³³ and the variables selected by the SPA, UVE, and CARS algorithms did not fully cover these characteristic peaks, indicating the difficulty of these three algorithms in fully capturing the characteristics of HA. In contrast, the MVO algorithm selected variables that captured the characteristic peaks, and the better modeling performance was indicated by the smallest RMSEC and RMSEP values. Notably, both the UVE and CARS algorithms selected fewer variables but achieved approximately predictive results. This can be attributed to a simplified strategy during variable selection, sacrificing some information to enhance computational speed, thereby catering to practical applications that require a balance between real-time capability and accuracy. However, this study placed a stronger emphasis on determining an accurate HA concentration in the HA–HAc solution. In the pursuit of high precision, we prefer to retain more information, even if it involves sacrificing computational speed. Nevertheless, the computational speed sacrifice is not as significant as imagined; the prediction times for UVE, CARS, and MVO are 0.41 s, 0.36 s, and 0.58 s, respectively, with a modest sacrifice of only 0.22 s, which is well within an acceptable range. Therefore, the MVO algorithm can be considered the best variable selection method.

Conclusion

By screening a series of combinations of different preprocessing methods for NIR spectroscopy, the combined approach of SNV+1stDer + DWT effectively addressed the baseline shift, scattering, and noise in the spectral signals and notably enhanced the performance of the RF correction model. Additionally, the MVO algorithm was introduced to optimize the quantitative NIR spectral model for the first time and improve the predictive performance of the quantitative analysis of HA–HAc solutions. Although the above research still falls within the realm of offline detection and cannot satisfy the requirement of real-time monitoring or online process control, NIR spectroscopic analysis may replace manual titration and provide a convenient and fast quantitative detection method for HA–HAc solutions This method may not only provide robust technical support for improving the safety and quality consistency of hexane and octogen production but also be significant for extending the application of NIR technology for quantitatively detecting HA in the medical and chemical industries.

Footnotes

Author’s note

All authors made substantial contributions to the conception or design of the work or the acquisition, analysis, or interpretation of data for the work. We have drafted the work or revised it critically for important intellectual content, approved the final version to be published, and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All persons who made substantial contributions to the work reported in the manuscript, including those who provided editing and writing assistance but were not authors, are named in the Acknowledgments section of the manuscript.

Acknowledgements

The authors are grateful to the editor and two anonymous reviewers for their comments, which have greatly improved the quality of this manuscript. We would like to thank Editage () for English language editing.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Kun Chen

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.*

References

Liang

Jin

, et al. Study on temperature calibration model of hexamethylenetetramine (HA) content in HA-HAc solution by near infrared spectroscopy method. Chin J Energetic Mater 2018; 26: 441–447.

Altieri

Genovese

Admane

, et al. On-line measure of donkey’s milk properties by near infrared spectrometry. LWT - Food Sci Technol (Lebensmittel-Wissenschaft -Technol) 2016; 69: 348–357.

Kang

Liu

, et al. On-line monitoring the extract process of Fu-fang Shuanghua oral solution using near infrared spectroscopy and different PLS algorithms. Spectrochim Acta Mol Biomol Spectrosc 2016; 152: 431–437. DOI: 10.1016/j.saa.2015.07.098

Févotte

Calas

Puel

, et al. Applications of NIR spectroscopy to monitoring and analyzing the solid state during industrial crystallization processes. Int J Pharm 2004; 273: 159–169.

Salzmann

Blößl

Todorovic

, et al. Usage of near-infrared spectroscopy for inline monitoring the degree of vuring in RTM processes. Polymers 2021; 13: 3145.

Kachko

Van

dHLV

Bardow

, et al. Comparison of Raman, NIR, and ATR FTIR spectroscopy as analytical tools for in-line monitoring of CO₂ concentration in an amine gas treating process. Int J Greenh Gas Control 2016; 47: 17–24.

Cortés

Blasco

Aleixos

. Monitoring strategies for quality control of agricultural products using visible and near-infrared spectroscopy: a review. Trends Food Sci Technol 2019; 85: 138–148.

Yun

Chen

, et al. Discrimination of waxy wheats using near-infrared hyperspectral spectroscopy. Food Anal Methods 2021; 14: 1704–1713.

Zhang

Guo

, et al. An effective prediction approach for moisture content of tea leaves based on discrete wavelet transforms and bootstrap soft shrinkage algorithm. Appl Sci 2020; 10: 4839.

10.

Liang

Wei

Fang

, et al. Prediction of holocellulose and lignin content of pulp wood feedstock using near infrared spectroscopy and variable selection. Spectrochim Acta Mol Biomol Spectrosc 2020; 225: 117515.

11.

Wang

Feng

, et al. Rapid determination of cellulose and hemicellulose contents in corn stover using near-infrared spectroscopy combined with wavenumber selection. Molecules 2022; 27: 3373.

12.

Bao

Zeng

Liu

, et al. Rapid detection of talc content in flour based on near-infrared spectroscopy combined with feature wavenumber selection. Appl Opt 2022; 61: 5790–5798.

13.

Liu

Sun

, et al. Fast detection of volatile fatty acids in biogas slurry using NIR spectroscopy combined with feature wavenumber selection. Sci Total Environ 2023; 857: 159282.

14.

Wang

Liu

Zeng

, et al. Rapid detection of protein content in rice based on Raman and near-infrared spectroscopy fusion strategy combined with characteristic wavenumber selection. Infrared Phys Technol 2023; 129: 104563.

15.

Chen

, et al. Prediction model of wood absolute dry density by near-infrared spectroscopy based on IPSO-BP. Spectrosc Spectr Anal 2020; 40: 2937–2942.

16.

Mirjalili

Hatamlou

. Multi-Verse Optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl 2016; 27: 495–513.

17.

Liu

Wang

, et al. Robotic arm trajectory optimization based on multiverse algorithm. Math Biosci Eng 2023; 20: 2776–2792.

18.

Zheng

Lai

Chen

, et al. A study of cellular traffic data prediction by kernel ELM with parameter optimization. Appl Sci 2020; 10: 3517.

19.

Mishra

Ramana

Maity

. Multiverse optimisation algorithm for capturing the critical slip surface in slope stability analysis. Geotech Geol Eng 2020; 38: 459–474.

20.

Sawant

Prabukumar

Loganathan

, et al. Multi-objective multi-verse optimizer based unsupervised band selection for hyperspectral image classification. Int J Rem Sens 2022; 43: 3990–4024.

21.

Aravinth

Veni

Dheepika

, et al. Optimal hyperspectral band selection using robust multi-verse optimization algorithm. Multimed Tool Appl 2023; 82: 14663–14687.

22.

Chen

Feng

, et al. A hybrid optimization method for sample partitioning in near-infrared analysis. Spectrochim Acta Mol Biomol Spectrosc 2021; 248: 119182.

23.

Zhou

, et al. New strategy of sample set division in spectroscopy analysis-SWNW. Infrared Phys Technol 2021; 117: 103824.

24.

Yan

Xue

, et al. Rapid quantitative analysis of methanol content in methanol gasoline by near infrared spectroscopy coupled with wavelet transform-random forest. Chin J Anal Chem 2019; 47(12): 1995–2003.

25.

Meng

Tang

Zhang

, et al. GNSS height anomaly fitting method based on MVO optimized neural network. J Geodesy Geodyn 2022; 42: 1233–1238.

26.

Chen

Kuang

. A hybrid multiverse optimisation algorithm based on differential evolution and adaptive mutation. J Exp Theor Artif Intell 2021; 33: 239–261.

27.

Zhang

Tan

Liu

, et al. Determination of the food dye indigotine in cream by near-infrared spectroscopy technology combined with random forest model. Spectrochim Acta Mol Biomol Spectrosc 2020; 227: 117551.

28.

Zuo

Wang

. Optimization of Fourier transform near-infrared spectroscopy model in determining saponin compounds of Panax notoginseng roots. Vib Spectrosc 2024; 130: 103615.

29.

Leung

Chua

Gao

. Wavelet transform: a method for derivative calculation in analytical chemistry. Anal Chem 1998; 70: 5222–5229.

30.

Zhang

. Pretreatment method research of near-infrared spectra in blood component non-invasive measurement. Mod Phys Lett B 2009; 23: 925–937.

31.

Zhu

Gao

Zhang

, et al. Partitioning proportion and pretreatment method of infrared spectral dataset. Chin J Anal Chem 2022; 50: 1415–1424.

32.

Zhong

. Raman spectroscopy combined with support vector regression and variable selection method for accurately predicting salmon fillets storage time. Optik 2021; 247: 167879.

33.

. Modern near infrared spectroscopy analytical technology. 2nd ed. Beijing, China: China Petrochemical Press, 2006.