Abstract
Sex estimation from skeletal remains is an important component of personal identification in forensic anthropology. The different rates of skeletal growth and development pertaining to age, ethnicity and sex form the basis of such identification. The present study has been conducted on the contemporary North Indian Haryanvi population to ascertain the cephalometric measurements and the best machine learning algorithm to study sexual dimorphism. A cross-sectional study was conducted on 200 individuals (M: 100: F:100) aged between 18 and 40 years and 12 cephalometric measurements were obtained using spreading and sliding callipers. Statistical analysis was done using Statistical Package for Social Science (SPSS) version 21.00. All 12 variables showed sexual dimorphism and the sexing accuracy ranged between 62% and 93.5% in univariate analysis. Bizygomatic breadth and bi-gonial width (BiGoW) showed an accuracy of 99% in multivariate analysis. The receiver operating curve (ROC) analysis also depicted BiGoW to have the highest area under the curve (AUC) (1.00) and sexing accuracy of 95.5%. Principal component analysis (PCA) also revealed a similar result with BiGoW, nasal height (NH) and ZyBr having the highest communalities. However, it was concluded that discriminant function analysis (DFA) and ROC analysis showed more promising results in studying sexing accuracy as compared to PCA.
Keywords
Introduction
Sex estimation is one of the major parameters which help in personal identification from skeletal remains. Furthermore, in cases of mass disasters, charred, mutilated and highly decomposed skeletal remains, identification pose a difficult task for the forensic anthropologists.1, 2 Hence, a systematic approach to determine sex from the morphometric differences of the skeleton is important. Modern humans are found to exhibit a lower level of sexual dimorphism as compared to other primates. 3 Sexual dimorphism in the skull is mostly attributed to a larger size in males as compared to females due to an extended growth period in males during puberty, effects of androgens leading to higher bone deposition in the craniofacial region and increased masticatory forces exerted on the masseter muscles. Most of the sexual differences in the skull are related to physiological and functional features such as muscle mass, volume of oxygen intake, metabolic system and bone development. 4 Cephalometric measurements have also shown statistically significant results in establishing sexual dimorphism among various populations.5–9 The discriminant functions developed are individual to each population. However, a rise in immigration and population inter-mixing is one of the most important genetic factors, along with other epigenetic factors, that have prompted the development of new and updated cephalometric sexual discriminant formulas.10, 11 With the inclusion of advanced imaging tools such as computed tomography (CT), cone-based CT, magnetic resonance imaging (MRI) and digital radiography, the accuracy of analysis has increased, but the expense and trained personnel can be challenging. 12 Further, machine learning algorithms have also shown promising results while analysing huge volumes of data. 13 Therefore, the present study aims to provide the updated discriminant functions of cephalometric variables for the contemporary North Indian Haryanvi population and compare different machine learning techniques to ascertain the best tool for observing sexual dimorphism among the same.
Material and Methodology
A total of 200 (100 M/100 F) Haryanvi adults aged between 18 and 40 years were included in the study. Informed consent with the approval of the Institutional Ethics Committee in both English and Hindi languages was obtained from all the participants. The inclusion criteria were: (a) Participants belonging to the Haryanvi origin and (b) participants who gave their consent for the study. The exclusion criteria were: (a) Any participant with traumatic or congenital deformities of the head or face and (b) any transgender individual belonging to Haryana.
The participants were asked to sit with their backs straight and feet touching the ground in the Frankfurt Horizontal Plane while taking the measurements. Digital vernier callipers and spreading callipers were used to obtain the measurements. After the measurements were made, all the data were analysed statistically using Statistical Package for Social Science (SPSS) version 21.00. The following frequently used measurements were obtained by the first author at two different time intervals to avoid intra-observer error: Maximum head length (MxHL), maximum head breadth (MxHBr), bi-gonial width (BiGoW), minimum frontal width (MnFW), zygomatic breadth (ZyBr), physiognomic facial height (PhyFH), morphological facial height (MorFH), nasal length (NL), nasal breadth (NBr), nasal height (NH), maximum orbital width (MxOW) and minimum orbital width (MnOW).
Statistical Analysis
Descriptive statistics with the independent t-test were performed using IBM SPSS version 21.0. A t-test measures the variation between two groups. The sexual dimorphism index (SDI) has been calculated for all the variables. A paired t-test has also been done to evaluate the intra-observer error. Stepwise discriminant analysis, univariate and multivariate direct discriminant function analysis (DFA) were also performed to obtain the highest separation between the sexes. The receiver operating curve (ROC) analysis has been used to identify the cut-off points for each variable. These points were determined by analysing the optimum sensitivity (true positive rate) and 1-specificity (true negative rate) values, which are closest to the area under the curve (AUC) for each variable. Principal component analysis (PCA), discovered by Karl Pearson in 1901, is a statistical tool that is used to explore the interrelationships among variables. The loading of the variables on PC1, PC2 and PC3 has been used to distinguish between both sexes.
Results
Table 1 shows the descriptive statistics, including the mean, standard deviation, t value, p value, SDI and sexing accuracy ranging from 62% to 93.5% after cross-validating the results. All the variables have shown significantly larger dimensions in males (p < .05).
Descriptive Statistics of the North Indian Population.
Table 2 depicts the results of the paired t-test, which has been used to determine the extent of intra-observer error. The correlation values depict a strong correlation between the first and second measurements. The p value also exhibits a non-significant difference between the measurements taken at two different times.
Paired t-test for Intra-observer Error.
Table 3 shows the results of stepwise DFA, where seven variables were selected, which showed maximum classification accuracy. In stepwise DFA, the SPSS system automatically chooses those variables which provide maximum classification.
Stepwise DFA.
Table 4 shows the canonical discriminant coefficients and sexing accuracy for stepwise and direct discriminant functions. In stepwise analysis, seven variables that are MxHL, BiGoW, MxHBr, ZyBr, MorFH, NL and NH were selected and provided 100% sexing accuracy after cross-validation. Direct DFA showed the highest sexing accuracy of 99% (M = 100%, F = 98%) using the variables BiGoW and ZyBr. A combination of ZyBr and BiGoW shows a minimum of 98.5% accuracy with the addition of any one of the remaining 10 variables. Keeping in view the fragmentary condition of the skull, we created functions using MnOW and MorFH in combination with NH.
Canonical Discriminant Coefficients.
Table 5 provides the results of ROC analysis, depicting the cut-off points along with the sensitivity and specificity of each variable. The highest AUC was shown by BiGoW with a sexing accuracy of 95.5% (M = 100%, F = 91%). Figure 1 shows the ROC plot along with the reference line, where most of the variables occupy the top left corner of the plot, thereby depicting a model with high sensitivity and specificity.
Shows the ROC Plot for all Cephalometric Variables.
ROC Analysis Depicting AUC, Cut-off Values and Sexing Accuracy of Each Variable.
Table 6 depicts the Kaiser–Meyer–Olkin measure of sampling adequacy, which ranges from 0 to 1 and is a measure to provide the minimum standard before conducting a PCA. A minimum value of 0.6 or more is suggested to be optimum for the analysis. It also shows Bartlett’s Test of Sphericity that the correlation between the variables does not form an identity matrix. PCA cannot be conducted if the correlation matrix forms an identity matrix.
PCA–KMO and Bartlett’s Test.
Table 7 shows the communalities, eigenvalues, percentage of variance and loading of each variable on the principal components, which is the correlation between the variable and principal components for both sexes. The communalities depict the proportion of each variable’s variance that can be explained by the principal components. BiGoW has depicted the highest communality for both sexes (M = 0.765, F = 0.738). The first principal component for both sexes has accounted for the most variance (M = 33.24%, F = 32.26%), followed by the second component and so on. Three principal components have been extracted and the correlation of BiGoW, followed by NH and ZyBr, has depicted the highest correlation with the principal component 1(PC 1) for both males and females (Figures 2 and 3).
The Scree Plot Depicts the Distribution of Variances (Eigenvalues) of the Principal Components.
Shows the Plot for Factors in Rotated Space for Only PC1 and PC2, as all the Variables Showed a Positive Loading on these Two Components.
Communalities, Eigenvalues and Loading of Factors.
The scree plot (Figure 2) depicts the distribution of variances (eigenvalues) of the principal components. PC 1 shows the highest variance. The plot peaks at the first principal component and eventually falls flat, depicting that each successive principal component accounts for gradually smaller variances.
In Figure 3, the plot for factors in rotated space has been obtained for only PC1 and PC2, as all the variables showed a positive loading on these two components, while some variables which show no correlation with sex loaded negatively on the third component.
Discussions
Population Variability in Sexual Dimorphism
The study of sexual dimorphism from percutaneous measurements can play an imperative role in forensic facial reconstruction and personal identification. The growth and development of the craniofacial region is affected by a plethora of factors, such as nutrition, environmental stress and masticatory forces influenced by dietary habits. 14 The anatomical and morphological variations are a result of the different ontogenetic trajectories in males and females.15, 16 However, sexual dimorphism is typically found in the adult craniofacial measurements. This explains the age group of 18–40 years for the present study. The viscerocranium is responsible for the development of facial features and undergoes a slower growth rate as compared to the neurocranium. The later-developing areas of the craniofacial region, such as the mandible, maxilla, upper face, cranial base and head height, have a higher likelihood of showing greater classification accuracy. 17
Univariate Discriminant Analysis
This study depicts that all cephalometric variables are significantly larger among males and have a sexing accuracy ranging between 62% and 93.5%. NH showed the highest classification accuracy (93.5%), as the nasal bone and piriform aperture are sexually dimorphic. 18 Bhargava and Sharma (1959) found that variation is higher within the nasal region compared to the cranium. 19 The findings of the present study are in accordance with the previous literature, where NH provided a sexing accuracy of 70.8% in males and 68% in females in the Ladakhi population, 64.96% in the European population and 77.2% in males and 68.4% in females within the Dehradun population.20–22 The nasal organ plays a key role in regulating air pressure and temperature before it reaches the lungs. A study by Noback et al. (2011) revealed a strong correlation between the shape of the nasal cavity and two climatic variables, namely temperature and humidity. The nasal cavities of populations living in cold-dry climates tend to be relatively narrower and longer compared to those living in hot-humid regions. This adaptation helps to optimise the warming and humidification of air in colder climates, whereas wider nasal passages are advantageous in hot-humid climates for efficient cooling and moisture management. 23 However, this explanation may not be sufficient in explaining the sexual dimorphism of the NH among the North Indian Haryanvi population. The nasal morphology is a complex trait influenced by multiple factors, including genetics, evolutionary history and adaptation to environmental conditions, which should additionally be considered.
The human mandible is the most durable cranial-facial bone and depicts functional morphological variation due to higher masticatory forces in males. A sexing accuracy of 91% has been obtained in the present study using univariate DFA of BiGoW. This is also, in accordance with other studies depicting a sex accuracy of 92% in the Croatian archaeological samples, 86% in the Brazilian population and 70.7% in the Central Indian population.24–26
The sample size, unequal number of male and female participants and secular and temporal variations may lead to different sexing accuracy percentages across different population groups. 27
Multivariate Sex Discriminant Analysis
We achieved very exceptionally high sexing accuracy using multivariate direct discriminant analysis by a combination of four variables, that is MxHL, BiGoW, ZyBr and MxHBr provided an accuracy of 100% and a little less (99%) by combining two variables, that is BiGoW and ZyBr. The extra-wide curvature of the zygomatic arch can be associated with the robustness of the male skull and increased development of the masseter muscle, which also affects the development of the mandible.17, 28 Multivariate analysis, including bizygomatic width, NH and nasal width, has provided high sexing accuracy also in other population groups, that is, 96.7% in the Chinese crania, 91.7% in North West Indians, 90.3% in Australian skulls, 89.7% in Japanese skulls, 88.2% and 81.5% in the Tibetan and South Indian crania, respectively.9, 29–32
Best Machine Learning Algorithm for Sex Estimation
In the present study, we have used a combination of machine learning techniques, including ROC analysis, PCA and DFA, to determine the best tool for obtaining accurate sex determination from craniofacial skeleton data. ROC analysis has depicted a sexing accuracy of 82%–95.5% with an AUC of 1.00 for BiGoW. It is in accordance with the other studies that is AUC was 0.809 with 71.7% sex classification in the Central Chinese population, 0.764 with 79% in the Brazilian population 25 and 0.684 with 79.6% in the Portuguese population.33, 34
This study has also drawn a comparison between a supervised learning algorithm, namely DFA and an unsupervised technique, that is, PCA, to obtain sexing accuracy of the metric parameters and morphological features. The PCA results focused on identifying differences in the shape and size of cephalometric variables, without considering sex, whereas DFA identified the variables that can best differentiate between males and females. PC1 accounted for 33.24% in males and 32.26% in females. These results concur with the findings of a PCA-based study on sex estimation from skeletal collections containing American white crania, where the principal component explained 38% of the variation in males and 34% in females. 35 This study has thus provided an updated discriminant function for sex estimation and deduced a novel combination of the above-mentioned algorithms to enhance the accuracy of estimation from these metric and morphological features of the North Indian population. The results of this research will also be beneficial in facial reconstruction for forensic anthropology and reconstructive orthognathic surgery. However, the results may be slightly varied due to larger population sizes and an unequal number of males and females. The data collection technique (CT scan, geometric morphometric or direct measurement) also affects the results while assessing sexual dimorphism within the same population. 36
Footnotes
Acknowledgements
We would like to extend our gratitude to the Department of Forensic Science, SGT University, Gurugram, for their laboratory infrastructure and research support.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Ethical Approval
Informed consent, along with approval from the Institutional Ethics Committee, was obtained (Ref. No. SGTU/FOSC/2023/1481).
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
Informed Consent
As mentioned in the manuscript, an informed consent in both English and Hindi languages were obtained from the participants prior to obtaining their measurements. This was in accordance with the approval of the Institutional Ethical Committee.
