Abstract
Near infrared spectroscopy was used in discrimination of intact bovine teeth in terms of animal sex, diet, tooth type and place of origin. Discriminant Analysis (DA) models were developed and tested using a stratified random 70:30 data split to calibration and test sets. Of the discriminant techniques of Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares – DA, Artificial Neural Networks (ANN) and Support Vector Machine (SVM), SVM and PLS-DA models performed best in most instances, with pretreatment choice impacting technique success. For test set prediction of animal sex, an accuracy of 95% was achieved using a PLS-DA model, and similar performance achieved using a SVM model. SVM, ANN and SIMCA models predicted three categories of tooth type (deciduous, permanent unerupted and erupted) with similar accuracies (89, 79 and 82%, respectively), however, permanent unerupted teeth were conflated with permanent erupted teeth. Prediction accuracy for ANN, SVM and PLS-DA models were also similar for discrimination of teeth from animals on a grain versus grass diet (PLS-DA 82%). There was considerable variation between models in prediction of geographic origin. Of the seven locations, two were discriminated with true positive rate around 70% using SVM and ANN models. Potential applications of this technology are discussed.
Keywords
Introduction
Mammalian teeth are comprised of three bio-inorganic mineralised tissues: enamel, dentine and cementum, and a small amount of organic material, principally protein.1,2 The most abundant mineral in the inorganic structure of tooth enamel and dentine is hydroxyapatite (HAp), 2 a crystalline calcium phosphate with the chemical formula Ca10(PO4)6(OH)2.1,3–6 Water, lipids and proteins make up the balance of the tooth structure. 6 Sex, age, and diet all impact tooth chemistry,7–10 and the relationship between diet and tooth decay is well established. 11 For example, Laser Ablation-Inductively Coupled Plasma-Mass Spectrometry (LA-ICP-MS) of dental enamel was used to differentiate possum teeth according to geographical origin. 12
The use of near infrared spectroscopy (NIR spectroscopy) in the study of HAp has been recently reviewed. 13 In brief, HAp spectra are characterised by an OH stretching feature at ∼8817 cm−1, 14 a broad feature from 7135 to 6773 cm−1 representing the first overtone OH stretch of hydrogen bonded water molecules15,16 with a sharp feature at 6977 cm−1 typifying structural OH groups.3,17 Small features at ∼7240, 5319 and 4545 cm−1 represent the first overtone band from water species,18–20 structural water bands 3 and highly structured water bonded molecules15,16 respectively.
Changes in tooth chemistry may be evident in NIR spectra. NIR spectroscopy has been used for characterisation of teeth and bone in anthropological and palaeontological studies21,22 and NIR reflectance and transillumination imaging is used to detect caries in teeth. 23 The possibility of sex recognition using Raman spectra of teeth has been investigated. 24 However, the use of NIR spectroscopy for the characterisation of teeth in terms of sex, diet and place of origin has not been explored. Such a method would have value in forensic, agricultural and environmental applications with potential for classification on sex and/or environmental history.
Large numbers of undecayed human teeth are rarely available for experimental work. Primate, 25 bovine, 26 swine,27,28 equine 29 and shark 30 teeth have been used as substitutes for human dental experimentation, given their comparable composition and attributes. For example, the Ca/P ratios associated with mineral movement in and out of enamel during the processes of demineralisation and remineralisation is consistent for both human and bovine enamel. 31 Due the relative ease with which large numbers of bovine teeth can be obtained, they have become the most commonly used substitute for human teeth in dental experiments. 32 The study of bovine teeth also has practical applications in its own right. For example, food safety concerns have led to the use of radio-frequency identification (RFID) enabled ear tags on all cattle in the Australian national herd. NIR spectroscopy could augment such a traceability system, as a validation technique.
Bovine bone density and chemistry is influenced by Ca and P nutrition,33,34 through factors such as sex, 35 calf rearing, feed phosphorus (P) content,33,34 and by the fluoride content of drinking water.36,37 In Australia, there are extensive areas of phosphorus-deficient soils, addressed by use of phosphorus supplemented licks for improved cattle nutrition, 38 and in some areas, the bore water used for cattle watering has high fluoride levels. 37
Several discriminant analysis chemometric techniques have been employed in NIR spectroscopy. Zeng et al. 39 notes the common use of Principal Component Analysis (PCA), Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares-based Discriminant Analysis (PLS-DA) and Support Vector Machines (SVM) in use of NIR spectroscopy in the food industry, and the potential for use of Back Propagation Artificial Neural networks (BP-ANN) and deeper learning machines such as 1 Dimensional Convolutional Neural Networks (1D-CNN) when large training data sets are available.
In brief, PCA transforms high-dimensional data to a lower dimensionality data, viz principal components. 40 SIMCA is a supervised qualitative analysis method based on the PCA algorithm, involving the use of a critical value θ, describing similarities among the samples of different classes, in determining match of a test sample to the categories of the calibration set. PLS-DA is a supervised qualitative analysis method based on PLS regression analysis. 41 PLS-DA is described as a “supervised” version of PCA because it achieves dimensionality reduction with full awareness of the class labels. 41
SVM works relatively well when the number of dimensions is greater than the number of samples42,43 but it is not suitable for large data sets (>10,000 samples) and underperforms when the data set has more noise, that is, when target classes are overlapping.42,43 SVM is commonly used with the radial bias function (RBF) kernel function in working with non-linear spectroscopy data.42,43 The RBF kernel uses fewer hyperparameters and supports a faster approximation speed than other common kernels. The penalty parameter cost (
ANN is modelled on biological neural networks. 46 It is a collection of interconnected algorithms organised in layers made up of interconnected nodes. The input layer communicates with one or more hidden layers where the nodes take weighted connections and use an activation function to pass their signal to the output layer. It essentially learns by example as it modifies the weights of connections. 46 BP-ANN is a supervised learning model where the neural net attempts to predict a pattern it is presented with after which it calculates the proximity of the prediction from the actual answer and adjusts the connection weights as required. 46
The current study was undertaken to assess the suitability of NIR spectroscopy for the discrimination of bovine teeth in terms of tooth type, sex, diet, and place of origin, using the chemometric techniques of SVM, PLS-DA, ANN and SIMCA, with models trained and tested on the same data sets.
Materials and methods
Tooth samples
Sample breakdown in terms of diet, sex, tooth type and place of origin (within Queensland, Australia).
Approximately four anterior teeth (front tooth located in the dental arch, including incisors and canines) were extracted per mandible with the use of molar forceps. The teeth were cleaned using a soft toothbrush and distilled water, avoiding introduction of exogenous chemicals and potential contamination of samples. Labial surfaces (the outer surface of the tooth that faces the lips) of anterior bovine teeth crowns (the visible part of the tooth above the gum line) were chosen in preference to posterior teeth because they have a large, relatively flat surface area which facilitates maximal surface area presentation to the integrating sphere chamber used in the NIR spectroscopy instrument used in this study. Teeth were oven dried for 48 h at 45°C and left to cool at room temperature before dry storage in labelled specimen jars. Of 497 extracted teeth, 101 were deciduous (“baby teeth”) while 396 were permanent. Of the permanent teeth, 58 were unerupted (still fully encased in the alveolar bone of the jaw) at the time of their retrieval (Table 1). All erupted permanent teeth were female (
Spectra acquisition
NIR absorbance spectra (4003 – 9886 cm−1; resolution 16 cm−1; 32 scans per spectrum) were acquired of labial surfaces of 497 intact bovine teeth using a diffuse reflectance accessory (10 mm diameter integrating sphere) of a Nicolet Antaris FT-NIR spectrophotometer (Thermo Scientific, Waltman, MA, USA), following the procedure used in Pretorius et al. 47 for human teeth. Spectra were acquired under an air atmosphere, employing the in-built reference.
Data was exported to Microsoft Excel (2303), The Unscrambler ® (X 10.5.1 Camo Analytics, Oslo, Norway), Sigmaplot ® (15.0), and R version 4.2.3 (2023-03-15 ucrt) software.
Other measurements
Hardness of teeth (
A Universal Specific Gravity Kit (SGK-B, Mineralab, Arizona, United States of America, https://www.mineralab.com/SGK-B/) was used to assess specific gravity using the formula SG = Weight in Air/(Weight in Air – Weight in Water). Teeth were weighed with an electronic scale (FX-300i, A&D Company, Tokyo, Japan, https://www.andonline.com/).
A carcass fat score utilising a scoring system with a scale from 1 (lean) to 6 (fattest) was provided by the source abattoir. This score is based on fat depth at the ‘PB’ site, that is, aligned with the crest of the third sacral vertebrate, measured manually with a cut and measure knife or electronically using a Hennessey Grading probe. 48
Chemometrics and data analysis
The dataset was partitioned on a 70:30 split to calibration and test sets using a stratified random approach using the combined strata of tooth type and diet using a R script. PCA was used in inspection of the data prior to development of PLS-DA, ANN, SIMCA and SVM models using calibration set data based on the variables of tooth type (deciduous, permanent unerupted or permanent erupted), diet (grass- or grain-fed), sex (male or female) and place of origin, using the chemometric packages of The UnScrambler (Camo, Sweden) and R version 4.2.3 (2023-03-15 ucrt). Models were then used for discrimination of test set samples. In an additional exercise, the dataset was randomly partitioned on a 70:30 split to calibration and test sets on a mandible level, that is, all teeth from a given mandible were retained in either calibration or test sets. This was used in development and testing of a model for sex discrimination.
Pretreatments of SNV, Savitzky-Golay (SG) smoothing (20 or 31 points) and SG second derivative in context of different wavelength ranges were considered for discrimination of the attributes under consideration. Assessment was based on qualitative inspection of score biplots of PCs/PLS factors for separation of samples based on the attribute of interest, and quantitative evaluation of the promising pretreatment combinations using the various discriminant techniques (PLS-DA, SIMCA, SVM, ANN). The optimal pre-processing determined by calibration results of SVM, ANN and SIMCA models utilised absorbance spectra, SNV and a 2nd derivative SG (2nd degree polynomial and 31 smoothing points), except when otherwise stated. A PLS-DA model yielded best results for second derivative spectra with SG smoothing (2nd degree polynomial and 20 smoothing points unless otherwise stated).
Parameter settings used in the development of SVM, ANN and PLS models.
Mean comparisons were based on a
To assess the significance of the difference in Accuracy estimates from two models, a proportion test using a z score calculator was undertaken utilising a two tailed
Results and discussion
Tooth weight, tooth and jaw hardness and tooth specific gravity
Tooth weight, tooth hardness, jaw hardness and tooth specific gravity (SG) of teeth. Mean and SE (in brackets) presented. Abbreviations are M, male; F, female; GS, grass; GN, grain; D, deciduous; U, permanent unerupted; P, permanent erupted.
Sex
Tooth absorbance spectra were characterised by features expected for HAp and water (Figure 1(A)), with a characteristic 6977 cm−1 feature associated with structural OH groups.
3
The mean absorbance spectra of male and female teeth differed from about 4000 to 7200 cm−1 for the groupings of: (i) all female and all male teeth, and (ii) female and male permanent unerupted teeth (Figure 1, panel A). There was no apparent difference between the mean spectra of deciduous teeth of the two sexes (Figure 1, panel A). (a) Mean absorbance spectra for (i) combined male and female deciduous (male 
Class separation for male and female deciduous teeth was evident in a factor 3 and 4 biplot from a PCA based on SNV treated and smoothed spectra in the range 7300 – 10000 cm−1 (Figure 2). Score bi-plot for PCA based on SNV and smoothed absorbance data (7300 – 10000 cm−1) of male (blue triangles) and female (red circles) deciduous teeth (
PLS-DA model b coefficients were weighted heavily in the 7000-7400 cm−1 and 5200-5400 cm−1 regions (Figure 1(B)), consistent with influence of O-H features. The size of the model coefficients (also seen in the PLS-DA coefficients for a tooth type model) is notable as it may indicate over-fitting of the model, although it may also be due to (a) a scaling issue, with derivative of absorbance values being small; or (b) correlated X variables. While these issues were not further explored, the practical test of the model in terms of its prediction of a test set was documented.
Class separation for male and female unerupted permanent teeth was clearest in a scores biplot from a PLS-DA based on mean centred, second derivative spectra, although separation was not as clear as for the deciduous teeth (data not shown). A possible explanation is that dental eruption occurs earlier in females than in males. 55 Females thus achieve a full set of primary teeth earlier in life which are in turn lost sooner and replaced by permanent teeth earlier than males, 56 meaning that female teeth are typically exposed to environmental influences for longer than their male counterparts. This would not be the case for unerupted permanent teeth, as they are still encased in alveolar bone and have not yet been exposed to the oral environment.
Of the four model types tested, highest accuracy was obtained using a PLS-DA model using second derivative pretreated data (Figure 3), with male and female teeth predicted with an accuracy of 95%, although this result was not significantly different to that of the SVM or ANN models (Table 4). SIMCA produced the poorest results. PLS-DA scatter plot for a test set ( Confusion matrix and calculated True Positive Rate (TPR), True Negative Rate (TNR) and Accuracy (Acc) for male-female discrimination of the test set of 153 teeth using ANN, SVM and SIMCA models based on input of SNV and second derivative pretreatment; and PLS-DA using second derivative pretreated data. Abbreviations are a, actual; p, predicted; M, male and F, female. Accuracy values labelled with a * are not significantly different to that of the PLS-DA model.
A seven factor PLS-DA model was also developed using a 70:30 split of teeth samples to calibration and test sets at a mandible level, with a cut-off score of 1.75 adopted to optimise sex discrimination in the calibration set. This model achieved an accuracy of 95% in discrimination of the test samples, the same as that achieved for the model developed using calibration and test sets based on allocation of teeth samples at an individual tooth level. This result is consistent with the use of teeth as independent samples, irrespective of associated mandible. The 70:30 allocation of teeth samples calibration and test split at an individual tooth level has been used in all following analyses.
Tooth type – Deciduous, permanent (unerupted) and permanent (erupted)
There was separation between tooth types (Figure 4(A)), with higher absorbance values for deciduous than permanent (erupted or unerupted). Thus, tooth type may confound consideration of other variables (sex, finishing diet and place of origin). The higher apparent absorbance of deciduous teeth is ascribed to their lower levels of Ca and P mineralisation, and thus higher porosity.
57
The decrease in the 6977 cm−1 feature associated with structural OH groups is also consistent with higher porosity of deciduous teeth. (a) mean absorbance spectra for deciduous (red dotted trace, 
The separation between deciduous and permanent (erupted or unerupted) teeth in a PCA scores biplot using raw absorbance spectra (Figure 5) was not as clear as that obtained using SNV and smoothed derivative absorbance spectra (Figure 6). Score bi-plot for a PCA based on absorbance spectra for deciduous (red triangles, Score bi-plot for a PCA based on absorbance spectra with pre-treatments of SNV and smoothing for deciduous (red triangles, 

The strong influence exerted by tooth type was also evident in the failure of a factor 1 and 2 biplot from a PLS-DA model for sex to discriminate samples by sex (although separation for sex was achieved with factors 3 and 4, data not shown) but did demonstrate separation of deciduous teeth from permanent (erupted- and unerupted) teeth (Figure 7). (a) Score bi-plot for a PLS-DA based on sex (second derivative of absorbance data), for deciduous (red triangles, 
Confusion matrix and calculated True Positive Rate (TPR), true Negative Rate (TNR) and Accuracy (Acc) for discrimination based on tooth type for models developed on a test set (
Diet
There was no visible separation in the absorbance spectra of deciduous teeth from grass- and grain-fed animals ( Mean absorbance spectra for teeth from animals on a grass- or grain-fed diet, displayed for (i) deciduous teeth (grass-fed 
Some separation between grass- and grain-fed samples was noted for unerupted permanent teeth in a biplot using scores of PCs four and five of a PCA, but separation of deciduous teeth based on diet was not noted (data not shown). Separation based on diet was noted for erupted permanent teeth using PCs two and three of the PCA (5400 – 10000 cm−1). Interestingly, when various pre-treatments were applied to spectra of erupted permanent teeth, separation in the PC space became less pronounced, which is contrary to findings for deciduous and unerupted permanent teeth. This might be because all erupted permanent teeth were female, whereas deciduous- and unerupted permanent teeth comprised a mix of male and female teeth.
Confusion matrix and calculated True Positive Rate (TPR), true Negative Rate (TNR) and Accuracy (Acc) for diet discrimination of 153 teeth using ANN and SVM models based on SNV and second derivative, and a PLS-DA model based on second derivative data. Abbreviations are a, actual; p, predicted; Gs, grass, and Gn, grain. Accuracy values labelled with a * are not significantly different to that of the PLS-DA model.
Geographic origin
There was no clear separation of samples evident in PCA or PLS-DA score biplots on the basis of geographic origin (data not shown). Consideration of individual tooth types and use of reduced spectral ranges did not improve the outcome.
Confusion matrix and calculated True Positive Rate (TPR) for geographic discrimination of 153 teeth using ANN and SVM models based on SNV and second derivative absorbance spectra, and a PLS-DA model based on second derivative data. Abbreviations are a, actual; p, predicted; Mar, Marlborough; Cle, Clermont; MtG, Mount Gravatt; Wow, Wowan; Com, Comet; Dix, Dixalea; Rag, Raglan.
Conclusion
NIR spectroscopy is a non-destructive method that has been applied for the classification of teeth 58 in the context of demineralisation and decay,59,60 but to our knowledge it has not been used in characterisation of teeth in terms of tooth type, sex, diet, and place of origin. Pretreatment choice impacted discriminant success, and pretreatment was optimised for each model, based on visual inspection of PCA and PLS-DA biplots. SIMCA underperformed relative to other model types across the attributes considered, while no one of the other three model types (SVM, ANN and PLS-DA) stood out as superior.
Deciduous teeth were discriminated from permanent teeth (unerupted and erupted) on the basis of their spectra in qualitative (PCA and PLS-DA score biplots) and quantitative chemometric methods (SVM, ANN and SIMCA). Thus, tooth type is a potential confounding variable when attempting to discriminate sex, diet or geographic location. For discrimination based on tooth type, SVM and ANN models outperformed SIMCA. Discrimination based on sex was achieved using a PLS-DA second derivative model to an accuracy of 95% overall, with better results for female than male classification. Discrimination on diet (grass- vs grain-fed diet) was achieved to an accuracy of 81 and 82% using SVM and PLS-DA models respectively. Discrimination based on location was achieved using SVM and ANN models to a TPR of >70% for four of the seven locations considered, depending on the model used.
The use of NIR spectroscopy to discriminate samples based on geographic origin result has high practical potential to the cattle industry, and it is recommended further work be conducted, involving samples from a wider variety of regions and of a more balanced age and sex distribution. The results of the current study should also encourage similar studies with human teeth, with the potential to use NIR spectroscopy in estimation of, for example, sex, holding value in forensic and archaeological investigations.
Footnotes
Acknowledgments
The authors thank Ryan Batley of Central Queensland University, Rockhampton, Australia for arranging procurement of bovine mandibles for use in this study, and JBS Rockhampton for their donation of 100 mandibles. The authors thank Prof. Mark Tennant for academic advice, Dr Mark Griffin of Insight Research Services Associated for statistical advice, Rob Dixon for his advice on bovine nutrition, and Professor Simon Quigley for the use of densitometry equipment. This research project is supported under the Commonwealth Government’s Research Training Program. The authors gratefully acknowledge the financial support provided by the Australian Government.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
