Abstract
Breast microcalcifications are crucial clinical features for breast cancer that remain poorly understood with many present contradicting findings and views. Prior research, which utilized Raman spectroscopy (RS), only focused on the mid-frequency spectral region. In this work, we applied low-frequency RS to three experimental setups, with synthetic calcium oxalate and hydroxyapatite in amorphous and crystalline forms immersed and under varying depths of paraffin wax. When comparing the full spectra at all offsets for the three experimental setups, we found that the defocused microspatially offset setup performed the best with an area under the curve (AUC) of 0.87. Furthermore, it was found that the low-frequency spectral region was optimal for this setup with an AUC of 0.94. Overall, the low-frequency region outperformed the mid-frequency spectral regions for two out of the three experimental setups with the AUC improving by up to 13% from 0.81 to 0.94. This is attributed to the low-frequency Raman region containing order information of the probed solids allowing the amorphous and crystalline forms to be more accurately classified.
This is a visual representation of the abstract.
Keywords
Introduction
Breast cancer is the most common cancer among women with approximately 287 850 new cases of invasive breast cancer and 51 300 cases of ductal carcinoma in situ in the United States in 2023. 1 Breast microcalcifications are calcium deposits within the breast that are considered to be robust indicators of breast cancer. 2 These are usually picked up during a routine mammogram; however, mammography lacks the chemical sensitivity (SE) to distinguish the two types of microcalcifications. 3 These two types of microcalcifications are calcium oxalate (CaOx), predominantly associated with benign lesions, and hydroxyapatite (HA), which is associated with benign or malignant lesions depending on its composition and structure.4–6 Mammography relies on morphological information to categorize microcalcifications based on an international grading system such as Breast Imaging Reporting and Data System, Madame Le Gal, or rhenium-carbon monoxide (CO)-releasing molecule.7–11 This is then usually followed up with a biopsy to extract and confirm the diagnosis of the microcalcification(s). 12 However, only ∼30% are subsequently found to be malignant, resulting in unnecessary patient trauma and wasting hospital resources.13,14 There is a clear need for a complementary noninvasive procedure to classify microcalcifications. Vibrational spectroscopic methods, such as Raman spectroscopy (RS), are candidate technologies for such applications due to their chemical specificity (SP) and nondestructive nature. 15
Raman spectroscopy (RS) is an inelastic scattering process wherein incident photons probe the vibrational modes of the target molecule and induce a net change in polarizability (∂α/∂q), where q is the normal coordinates. 16 This phenomenon results in a transfer of energy between the photon and molecule, exciting the molecule to a virtual energy state. 17 The resulting net loss or gain in photon energy correlates with Stokes or anti-Stokes Raman scattering, respectively. 18 This shift in energies corresponds with discrete vibrational modes of the scattered molecule and hence provides chemical and molecularly specific information about the molecule. 19 Raman scattering is a rare phenomenon that occurs once in every 106–108 Rayleigh scattering events. 20 A high-intensity monochromatic source is required to overcome the weak nature of this phenomenon, with lasers being the preferred choice. 21
Prior groundbreaking research on microcalcifications using RS has focused on the mid-frequency (MF) spectral range, enabling the two types of microcalcifications to be successfully classified with additional information.22–24 Firstly, that malignancy is inversely proportional to carbonate content and could be measured through properties of the 960 cm–1 peak, which is supported by Fourier transform infrared spectroscopy.22,25 Carbonate can substitute into HA at either the hydroxyl (OH−) or phosphate (PO43−) sites corresponding to A- and B-type substitution, respectively.26,27 For breast microcalcifications, B-type substitution is dominant as the environment is more aligned with its synthetic conditions. 28 Secondly, through applying sub-techniques of RS such as spatially offset and transmission RS, these calcifications have been successfully detected and classified at clinically relevant depths.23,24,29,30
For mid-frequency RS (MFRS; 700 to 1800 cm−1), (∂α/∂q) results from intramolecular vibrational modes of the sample such as bending and stretching of chemical bonds within the molecule.31,32 Comparatively, low-frequency RS (LFRS; 10 to 400 cm−1) probes the intermolecular vibrational modes of the sample through molecules breathing and rocking against each other, again resulting in a net polarization change. 33 LFRS provides information on the structural characteristics of the sample including crystallinity and solid-state form.34–36 Additionally, LFRS allows the use of both the Stokes (10 to 400 cm−1) and anti-Stokes (−10 to −300 cm−1) spectra potentially allowing for better classification of the sample. 37 In this work, we tested if these two features unique to LFRS outperformed MFRS for discriminating CaOx and partially substituted HA in crystalline and amorphous forms buried under varying depths of wax for three deep LFRS experimental setups. This was achieved in a step-by-step manner to draw attention to and flesh out each finding in a clear and concise manner. First by determining which setup performed the best, before finding the ideal spectral region of that setup and its corresponding optimal offset. We hypothesize that the SE of LFRS to the solid-state form will improve the accuracies of Raman microcalcification analysis.
Experimental
Materials and Methods
Calcium oxalate was prepared via mixing of aqueous calcium chloride and sodium oxalate at room temperature with the precipitate CaOx monohydrate collected by vacuum filtration. HA microcalcifications in amorphous and crystalline forms were produced by chemical synthesis, originally described in Thomas et al., 38 wherein the amorphous forms were obtained via vacuum filtration and drying immediately after precipitation. The crystalline HA forms had an additional step of being placed into a furnace at 800 °C for 2 h before slowly cooling down overnight prior to removal. 39 The samples of 0.6F HA in the crystalline and amorphous form were the apatites used in this test system as they have been previously characterized, and the sample 0.6F-HA-Cry was near the middle of the range of crystallinities obtained across the different substitution patterns studied. 39
Model calcifications of 100 mg were each separately placed into 2.5 g of molten wax and sank to the bottom before the wax hardened. Additional wax blocks of thickness 2.4, 3.6, 4.6, 6.7, 8, 10.2, and 11.8 mm were produced, with any uneven surfaces flattened to ensure minimal air gaps when stacked. This enabled the imitation of microcalcifications buried under varying depths in tissue, when these wax blocks were placed on top of the calcification samples.
Deep Low-Frequency Raman Spectroscopy
The data were gathered using three different deep spatially offset LFRS (SOLFRS) setups, the defocused microspatially offset LFRS (D-microSOLFRS), free space (FS)-SOLFRS, and fiber optic (FO)-SOLFRS, which will be discussed in more detail below.40,41 A total of 63 spectra were collected for each of the three calcifications and wax as a reference media, with nine spectra at each depth iteration and triplicate spectra at each offset. The spectra were collected using LightField (Princeton Instruments) with a 1 s exposure time and 300 frames that were averaged. The spectra were calibrated using acquired spectra from sulfur and 2-mercaptobenzothiazole by comparing them to reference spectra from McCreery Research Group (National Institute of Nanotechnology, University of Alberta). 42
Defocused MicroSOLFRS (D-microSOLFRS)
The D-microSOLFRS utilized a defocused spatial offset geometry, wherein offsets were applied by adjusting the stage along the optical axis. The 200 mW 785 nm excitation laser (Ondax Inc.) was propagated through BragGrate bandpass filters (OptiGrate Corp.), to clean the laser line through the removal of amplified spontaneous emission.43,44 The laser light was then focused using a 20× microscope objective (MPLN20X, Olympus) onto the sample with a power of 18–37 mW. The backscattered light passed back through the microscope objective, and then a pair of volume Bragg grating notch filters (Ondax Inc.) were utilized to remove the Rayleigh scattered light before being coupled to an LS 785 spectrograph spectrometer (Princeton Instruments) with a charge-coupled device (CCD) detector (PIXIS 100 BR CCD, Princeton Instruments).
Free Space Spatially Offset Low-Frequency Raman Spectroscopy (FS-SOLFRS)
The FS-SOLFRS utilized a lateral spatial offset geometry wherein offsets were applied laterally via a mirror, which refracted the beam incident upon the sample ∼30° from the optical axis. 40 The FS-SOLFRS used the same excitation laser, objective bandpass filter, and spectrometer as the D-microSOLFRS, with power at the sample ranging from 68 to 80 mW. The scattered light was propagated through a pair of volume Bragg gratings (Ondax Inc.) before being focused on the spectrometer.
Fiber Optic Spatially Offset Low-Frequency Raman Spectroscopy (FO-SOLFRS)
The FO-SOLFRS probe used the same excitation laser and bandpass filter as mentioned previously, wherein the wavefield was coupled into an FO probe. The FO probe had a three-prong design above the sample with power ranging from 114 to 121 mW and offsets set by changing the output fiber connection. The output optic fiber was connected to a mount in the FS-SOLFRS setup to utilize the notch filters and spectrometer.
Optical Geometry
The D-microSOLFRS and FS-SOLFRS used offsets of 0, 2, and 5 mm, while the FO-SOLFRS was fixed to offsets of 0, 2.5, and 5 mm. The D-microSOLFRS utilized a defocused geometry, as seen in Figure 1, wherein offsets were applied along the optical axis via the stage. This resulted in a change in the accepted volume of scattered photons. 45 In contrast, the other methods utilized lateral spatially offset geometries, wherein offsets were applied laterally via the mirror for the FS-SOLFRS and changing the connected output fiber for the FO-SOLFRS. 46 Both these methods enable the retrieval of deep propagating photons, which tend to migrate laterally in a random walk-like fashion as they propagate through a turbid media, with a key distinction.23,47

Schematics of the LFRS setups, showing the different geometries and offsets applied denoted by σ (mm). The D-microSOLFRS setup utilized a defocused geometry, wherein applying an offset along the optical axis resulted in an alteration of the volume of accepted scattered photons. The FS-SOLFRS and FO-SOLFRS utilized spatial offsets by adjusting the mirror laterally and switching output fibers, respectively. The key difference between the two geometries was that the defocused geometry does not filter out any of the surface-scattered photons while spatial offset does so at the expense of S/N.
The spatially offset geometry suppresses the superficial or subsurface scattered photons while defocused geometry does not. 48 This means that with an increasing offset, the signal-to-noise ratio (S/N) will decrease for the spatially offset geometry, but increase for the defocused geometry, at the expense of spatial resolution. Additionally, this means there is an optimal offset for the defocused geometry when the proportion of deeply scattered photons to surface scattered photons is maximized and the proportion of deeply scattered photons to S/N for the spatially offset geometry. One key advantage of microSORS is that it does not require additional reconfiguration with no immediate constraints on the applied offsets. 49
Data Analysis
Statistical Analysis
Receiver operating characteristic (ROC) curves are produced by varying the decision threshold of the classifier and marking the true positive rate (TPR) and false positive rate (FPR) at each threshold, yielding an empirical ROC.50,51 This allows ROC curves to assess the performance of the classifier independently of the decision threshold. 52 These points can be fitted using a binomial distribution yielding a fitted or smooth ROC. 53 Empirical ROC curves were used in this work to ensure there was no potential bias induced from fitting the ROC curves.
The area under the curve (AUC) measures the overall performance of the classifier and can be interpreted as the average value of SE over varying values of SP.54,55 In general, the AUC must be >0.8 to be considered acceptable in a clinical setting. 56 SE is defined as the proportion of people who are correctly diagnosed with a disease and equivalently is the TPR. 57 SP is the proportion of people who are correctly diagnosed with not having a disease or the true negative rate (TNR). 58 Additional statistical rates are the FPR and false negative rate (FNR), respectively, which can be calculated from the “true rates” via FNR = 1 − TPR and FPR = 1 − TNR. 59 Classification accuracy (CA) is a measure of the proportion of people that are correctly diagnosed as having or not having the disease with respect to the total population tested. 60
Support vector machine (SVM) classifiers map the data into a higher dimensional feature space set by the kernel to find a line (hyperplane), which successfully separates the data.61–63 This hyperplane is optimal when the distance between the line and the nearest data point for each class (margin) is maximized. 64 SVMs are based on statistical learning theory and use structural risk minimization to avoid overfitting the data. 65 We chose to use an SVM classifier with a radial basis function (RBF) kernel as it performed the best when compared with other classifiers and kernels trialed.64,66,67
Experimental Analysis
Cosmic spikes with a max pixel width of 5 were removed from the spectra using SpectraGryph. 68 The data were then imported into Matlab where the data were split into low frequency (LF) and MF spectral regions, ranging from −300 to 400 cm−1 and 700 to 1800 cm−1, respectively. 69 The spectral region from 400 to 700 cm−1 was observed to be silent and therefore was excluded from the data set. The Raman line was removed from the LF spectra from approximately −20 to 20 cm−1 before the LF spectral region underwent linear baseline subtraction to correct for baseline difference while maintaining the presence of the vibrational density of states. 70 In contrast, rubber band baseline correction 71 was performed from (700, 1350) and (1350, 1800) in the MF spectral region. All the spectra were normalized using standard normal variate (SNV) to correct for intensity variations within the data gathering procedure. 72 Finally, both the spectral regions were modeled in a “one-versus-all” approach using an error-correcting output code SVM classifier with an auto-scaled RBF. The classifier was created using the fixed training set and evaluated using the fixed test set samples (Table I). 73 The training and test data sets were created with a staggered approach to cover the range of thicknesses and model calcification types in both the training and test data sets. All model performance parameters reported are that of the test set sample set.
Sample set measured and the associated training and test set used to construct and evaluate the SVM model.
M, model (training) set; T, test set.
Scaled wax subtraction was performed separately before or after the aforementioned preprocessing procedure on the combined LF and MF D-microSOLFRS data set in Matlab. The spectra incorporated all offsets and thicknesses with the same staggered model and test set used as in Table I, to construct and evaluate the SVM model with the same aforementioned template. This was achieved by first taking the mean of the wax for each offset, then scaling the mean wax spectra based on the 1294 cm−1 peak and multiplying it by 0.9 to ensure no negative features were present in the spectra. This was then subtracted from each spectrum individually. This process resulted in a negligible increase in performance when scaled wax subtraction was performed after preprocessing and a decrease in performance when applied prior to preprocessing.
The first five principal components (PCs) of the same data set under the same conditions were then fed into an SVM with the same template and a one-versus-all linear discriminant analysis model. However, both yielded worse results (see Figure S2, Supplemental Material) and were not further investigated.
Results
Visual Inspection
The mean spectra of the surface calcifications and wax at 2 mm offset are given in Figure 2. All spectra are dominated by the paraffin wax features at 891, 1063, 1133, 1294, 1416, 1441, and 1463 cm−1. 74 The only discerning feature is the v1PO43− peak at ∼960 cm−1 for the HAs. The other peaks in the MF spectral region are attributed to paraffin wax. The low wavenumber spectral region is dominated by the underlying wax signature with dominant peaks around −90 and 93 cm−1. We see that the calcifications are challenging to distinguish from each other from spectral observation alone.

Split and preprocessed reference (top) and mean (bottom) Raman spectra of calcifications and wax from the D-microSOLFRS data set. The spectra were split into LF and mid-frequency (MF) spectral regions spanning −300 to 400 cm−1 and 700 to 1800 cm−1, respectively. The Rayleigh line was removed from the LF spectra before undergoing linear baseline subtraction while rubber band baseline correction was performed for the MF spectral region. All the spectra were normalized using SNV.
Principal Component Analysis (PCA)
Principal component analysis (PCA) was performed on the preprocessed spectra to gain insights into the nature of the data set obtained. The scores and loading of the first three PCs are shown in Figure 3, encompassing 63.6, 6.3, and 3.3% of the respective explained variance. We observe that principal component 1 (PC1) solely encompasses information pertaining to the offset applied to the optical setup. This is highlighted in its respective loading where we see wax features in the positive loadings space, with peaks at −88, 90, 889, 1133, 1418, and 1439 cm−1 (Figure 3c, supported by Figure 2). While in the negative loading space, there are broad features that result from the change in offsets. This enables the offsets to be clearly separated along PC1 (Figure 3a). Focusing on calcification type, we see that no single or combination of PCs appears to accurately separate all four categories (Figure 3b). Calcium oxalate is separated along principal component 2 (PC2), while the others remain clustered together. The PC2 loadings appear to contain HA and blue shifted wax signals in the negative loadings space with its distinguishing v1PO43− peak at 962 cm−1. 75 Calcium oxalate is apparent in the positive feature space with its unique symmetric and antisymmetric CO stretching with peaks at 1472 and 1635 cm−1 alongside redshifted paraffin wax signal.76–79 Finally, the loadings for principal component 3 (PC3) contain wax in the positive feature space with peaks at 886 and 1294 cm−1 as well as a shift from 1129 to 1136 cm−1. Crystalline HA is apparent in the negative feature space with defining peaks at −139, −25, 30, 143, and 961 cm−1 (Figures 2 and 3c). 39

Principal component analysis (PCA) of the preprocessed D-microSOLFRS spectra showing the score plots of the first three PCs (a, b) and their respective loadings (c). (a) Score plots for PC1 versus PC2 and PC2 versus PC3 with labeling based on measurement offset. (b) Score plots for PC1 versus PC2 and PC2 versus PC3 with labeling based on buried calcification composition. (c) The associated loading plots for PC1–PC3 describe the nature of variance observed in the score plot.
Optimal Setup
First, to determine the optimal experimental setup, an ROC curve was produced by the fixed test data for the LF spectral regions of the experimental setups incorporating all offsets, as shown in Figure 4. We observe from the ROC that the D-microSOLFRS objective performed the best with the highest AUC of 0.87, as well as the highest CA, SE, and SP, as shown in Tables S1–S3 (Supplemental Material). This was followed by the FO-SOLFRS and FS-SOLFRS setups with respective AUCs of 0.69 and 0.57.

Receiver operating characteristic (ROC) curve of the experimental setups modeled using SVMs with an RBF kernel. SVMs were modeled using LF spectra from all the offsets obtained for each of the setup sample spectra with a staggered thickness model and test set. We see from the obtained AUCs that the D-microSOLFRS performed the best.
A potential explanation for this is defocused geometry utilized by the D-microSOLFRS, wherein increasing offsets increases the volume of accepted photons, which incorporates deep propagating photons at the cost of spatial resolution. This means that there is no loss in S/N as in spatial offset geometries employed by the other experimental methods and thus allows the D-microSOLFRS to perform the best.
Optimal Spectral Region
We then focused on the D-microSOLFRS as it performed the best and first evaluated performance of LF, MF, and combined frequency spectral regions at all offsets by an ROC, as shown in Figure 5. From this, we see that the LF spectra performed the best with the highest AUC of 0.94 compared to 0.87 and 0.81 for the full and MF spectral regions, respectively. This is due to the LF spectral region containing information about the structural characteristics of the sample including the solid-state form. This information allowed the amorphous and crystalline forms of HA to be correctly classified.

Receiver operating characteristic (ROC) curve to evaluate the performance of different spectral regions from the D-microSOLFRS experimental setup. The LF spectral region performed the best with the highest AUC of 0.94, as it was able to successfully discriminate between the amorphous and crystalline forms of HA. This is because LFRS probes the intramolecular modes and vibrations obtaining structural information about the sample such as crystallinity and hence excels at separating materials of different (physical/chemical) states.
Optimal Offset
We then evaluated the performance of the 0, 2, and 5 mm offsets for the D-microSOLFRS in the LF spectral region. From the ROC shown in Figure 6, the 0 mm offset performed slightly better than the 2 mm offset with respective AUCs of 0.93 and 0.92.

Receiver operating characteristic (ROC) curve evaluating the performance of different offsets from LF spectra obtained in the D-microSOLFRS experimental setup. Even though the 0 mm performed the best, we cannot confidently consider it to be the optimal offset due to the large variation of wax thickness employed in the spectral data set.
However, noting the narrow difference between the 0 and 2 mm offsets, it is hard to conclude that the 0 mm offset is generally optimal. In fact, looking through Tables S1–S3 (Supplemental Material), we see that the best-performing offset varies sizably between experimental setups and their spectral regions. This is due to the variation of wax thickness employed in the spectral data set that ranged from 0 to 10.2 mm. Intuitively, the 0 mm offset will perform best for small wax thicknesses and degrade as it increases, and vice versa for the larger offsets. Therefore, in this case, no optimal offset can be rigorously determined.
Finally, we were interested in determining the best-performing spectral regions for all experimental setups. This was achieved by calculating the percentage difference between the LF and MF spectral regions for all the experimental setups with respect to the model evaluation parameters AUC, CA, SE, and SP (Figure 7), wherein a positive or negative percentage difference corresponds to the LF performing better or worse than the MF spectral region, respectively.

Bar graph evaluating the percentage differences between the LF and MF spectral regions of the three experimental setups in terms of the model evaluation parameters. We see that the LF spectral region generally outperforms the MF spectral regions except for the FO-SOLFRS data set.
We see that generally, the LF outperforms the MF spectral regions with D-microSOLFRS experiencing the largest improvements in AUC, CA, SE, and SP of 13, 11, 22, and 7%, respectively. FO-SOLFRS is the exception to this trend and could be due to the additional step of removing the signal from the silica glass in the spectra. This was only required and performed on the FO-SOLFRS data set as it was the sole probe utilized in this work. This procedure may have impacted the LF spectral region more than the MF or the silica glass spectra masked important information in the LF spectral region.
Discussion
It is important to note that when applying these models to clinical diagnostic applications, other aspects need to be considered, such as the confirmation test, in this case being biopsy of the microcalcifications. Biopsying the patient results in additional resources spent by the hospital with the patient undergoing mental, physical, and financial trauma.13,14,80
Recent research on breast microcalcifications has utilized energy-dispersive X-ray, X-ray diffraction, and multimodal imaging approaches as well as RS.5,12,81–86 As mentioned previously, the malignancy of the cancer is correlated with breast microcalcification composition; however, many of these findings are contested so a brief summary will be given. Firstly, carbonate was shown to correlate with benign lesions; however, the simplicity of this is being challenged as carbonate content can be affected by the surrounding tissue environment.5,12,22,25,85 Magnesium, before being confirmed as whitlockite, was initially found to correlate with malignancy; however, more recent studies have found the opposite.5,28,81–86 Finally, sodium has been found to routinely correlate with malignancy.5,28,82,84 Future work will focus on incorporating additional samples of HA substituted with varying concentrations of Mg2+ or CO23− in a more accurate tissue phantom to see if crystalline structural changes from these substitutions can be classified.28,87,88
Conclusion
Breast microcalcifications are crucial clinical features for detecting breast cancer that remain poorly understood with many present contradicting findings and views. Prior research, which utilized RS, only focused on the MF spectral region. We applied LFRS from three experimental setups to synthetic CaOx and HA in amorphous and crystalline forms immersed and under varying depths of paraffin wax. We found that the LF outperformed the MF spectral regions for two out of the three experimental setups as LFRS contains structural information of the probed molecule. Of the experimental setups, the D-microSOLFRS performed the best with an AUC of 0.87, as it utilized a defocused geometry as opposed to a spatially offset geometry. The optimal performance of the D-microSOLFRS objective was achieved in the LF spectral region with an AUC of 0.94. We have shown the potential of LFRS to accurately classify breast microcalcifications with future research focusing on carbonate and magnesium-substituted HA in hydrogel-based tissue phantoms.
Supplemental Material
sj-docx-1-app-10.1177_27551857241302079 - Supplemental material for Discriminating Model Microcalcifications Immersed and Under Varying Depths of Wax Using Deep Low-Frequency Raman Spectroscopy
Supplemental material, sj-docx-1-app-10.1177_27551857241302079 for Discriminating Model Microcalcifications Immersed and Under Varying Depths of Wax Using Deep Low-Frequency Raman Spectroscopy by Mitchell C. Chalmers, Teemu Tomberg, Keith C. Gordon and Sara J. Fraser-Miller in Applied Spectroscopy Practica
Footnotes
Acknowledgments
Mitchell C. Chalmers thanks the University of Otago for his PhD scholarship.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the Royal Society Te Apa̅rangi, Marsden Fast-Start (Grant No. 19-UOO-210) and Te Whai Ao–The Dodd-Walls Center for Photonic and Quantum Technologies.
Supplemental Material
All supplemental material mentioned in the text is available in the online version of the journal.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
