Abstract
Several factors have been implicated in schizophrenia (SZ), including human herpes viruses (HHV) and the adaptive immunity Human Leukocyte Antigen (HLA) genes. Here we investigated these issues in 2 complementary ways. In one analysis, we evaluated SZ-HLA and HHV-HLA associations at the level of a single allele by computing (a) a SZ-HLA protection/susceptibility (P/S) score based on the covariance between SZ and 127 HLA allele prevalences in 14 European countries, (b) estimating in silico HHV-HLA best binding affinities for the 9 HHV strains, and (c) evaluating the dependence of P/S score on HHV-HLA binding affinities. These analyses yielded (a) a set of 127 SZ-HLA P/S scores, varying by >200× (maximum/minimum), which could not be accounted for by chance, (b) a set of 127 alleles × 9 HHV best-estimated affinities, varying by >600×, and (c) a set of correlations between SZ-HLA P/S scores and HHV-HLA binding which indicated a prominent role of HHV1. In a subsequent analysis, we extended these findings to the individual person by taking into account the fact that every individual carries 12 HLA alleles and computed (a) the average SZ-HLA P/S scores of 12 randomly chosen alleles (2 per gene), an indicator of HLA-based SZ P/S for an individual, and (b) the average of the corresponding HHV estimated affinities for those alleles, an indicator of overall effectiveness of HHV-HLA binding. We found (a) that HLA protection for SZ was significantly more prominent than susceptibility, and (b) that protective SZ-HLA scores were associated with higher HHV-HLA binding affinities, indicating that HLA binding and subsequent elimination of several HHV strains may confer protection against schizophrenia.
Introduction
Schizophrenia is a complex and highly disabling mental disorder characterized by heterogeneous symptoms including hallucinations, delusions, disorganized speech, blunted affect, avolition, and anhedonia with a lifetime prevalence of ~1%. 1 Although the prevalence of schizophrenia is relatively low, its early onset and low remission together with high disability and psychosocial impairment result in a substantial global burden. 2 Accordingly, decades of research have been aimed at identifying risk and vulnerability factors to reduce the burden associated with this debilitating condition. Those efforts have documented a large genetic component to schizophrenia, with heritability estimated at around 80%. 3 In addition, other factors have also been implicated in schizophrenia, including various infections as well as other environmental factors (eg, obstetric complications, birth season, cannabis use, and adverse childhood experiences).4 -7 Indeed, considerable evidence supports a 2-hit theory that suggests that schizophrenia likely results from genetic vulnerability coupled with environmental hits during key neurodevelopmental periods.6,8 In light of extensive research implicating exposure to infections and genetic liability in schizophrenia risk, here we focus on the influence of genes involved in the human immune response to infections, human leukocyte antigen (HLA) genes, and human herpes viruses on schizophrenia risk.
HLA and schizophrenia
HLA genes, located on chromosome 6, code for cell-surface proteins that are instrumental in immune system surveillance and elimination of foreign (eg, viral, bacterial) antigens via 2 primary pathways that work in concert to provide host protection. Class I HLA molecules (HLA-A, B, C) are expressed on nucleated cells and bind small peptides from proteolytically degraded cytosolic viruses and bacteria; those bound peptides are exported to the cell surface for presentation to CD8+ cytotoxic T cells, signaling cell destruction. Class II HLA molecules (HLA-DPB1, DQB1, DRB1) are expressed on lymphocytes and professional antigen-presenting cells (eg, macrophages, dendritic cells, and monocytes). Class II HLA presents larger peptides derived from endocytosed exogenous antigens to CD4+ T cells to facilitate B cell-mediated antibody production and adaptive immunity. More than 30 000 Class I and Class IIHLA alleles have been identified 9 ; that variation maximizes population protection against infection by increasing the scope of antigens that can be bound and presented to the cell surface for destruction. Variation in HLA, particularly within the binding groove, has also been linked to variability in susceptibility to numerous human diseases. 10
HLA is the most highly polymorphic region of the human genome, 11 differing not only across but also within populations.12,13 The large number of polymorphisms coupled with the relatively low prevalence of schizophrenia and methodological variability across studies has resulted in inconsistent and modest associations in HLA candidate gene studies of schizophrenia 14 ; however, early genome-wide association studies (GWAS) of schizophrenia have also documented associations with the HLA region,15 -17 and subsequent evidence has continued to support that association.18 -21 Furthermore, common HLA variants have been shown to influence brain abnormalities associated with schizophrenia including cerebral ventricle size, 22 thalamic volume and asymmetry, 23 and hippocampal volume, 24 and variations in the HLA binding groove have been shown to influence treatment response with antipsychotic medications in patients with schizophrenia. 25 Indeed, HLA-schizophrenia associations have been one of the most commonly reported genetic associations in the literature, 26 though effect sizes are relatively small and findings have varied across studies due to variation between samples and geographical variation in HLA frequency. 27
An epidemiological approach to HLA-schizophrenia associations
HLA confers protection via pathogen elimination; consequently, HLA has evolved in concert with pathogen evolution28,29 and has been shown to vary by population,30,31 presumably reflecting differences in pathogen exposure. HLA-disease associations have been typically investigated using case-control studies. 32 Although this design is solid, the overall results have been rather limited for at least 2 reasons. First, such studies are frequently underpowered due to difficulties in obtaining large numbers of participants, with a notable exception of the study of HLA and malaria in West Africa. 33 And second, the drawback of a small sample size is compounded by the insistence on finding HLA alleles with a very low P-value. Given that diseases are common outcomes of various factors, including pathogen insults, host responses, molecular mimicry, autoimmunity, etc., in various combinations, the objective should rather be to identify and determine the degree of involvement of different alleles in a given disease rather than the quest to find “the one” allele with a very low P-value in disease association. Such an allele, when found, would be ranked #1 and would be most probably due to a specific aspect of the disease (eg, autoimmunity) but other alleles could play smaller but still important roles (protective or susceptibility) in disease manifestation, thus effectively modulating the effect of the most prominent allele. Hence, a method that combines signed, weighted contributions of individual alleles for an overall protection/susceptibility assessment would seem more appropriate. Finally, GWAS has been used in recent studies to investigate HLA-disease associations. A limitation of this approach, as noted elsewhere, 34 is that HLA alleles are been detected through imputation rather than direct HLA sequencing. While more time- and cost-effective than direct sequencing, the accuracy of HLA imputation for certain alleles (eg, DRB1) has been reported to be as low as 30%. 35 Conversely, due to the highly polymorphic nature of HLA, direct sequencing of large enough samples to identify HLA-disease associations is often prohibitive.
Here we take an alternative immunogenetic epidemiology approach that takes advantage of the population heterogeneity of HLA and permits the identification of a high-resolution population-level HLA profile with regard to disease prevalence. Using this approach, we36 -45 and others30,46 have identified HLA alleles that are negatively associated with disease prevalence (ie, protective alleles) as well as HLA alleles that are positively associated with disease prevalence (ie, susceptibility alleles) in various neurological diseases,36 -41 type 1 diabetes, 42 and 30 types of cancer.43 -45 In light of HLA’s involvement in pathogen elimination and autoimmunity, we presume that protective alleles facilitate pathogen elimination and that susceptibility alleles promote autoimmunity. Here we used this immunogenetic epidemiological approach above to evaluate associations between HLA frequency and schizophrenia prevalence and to derive a weighted overall measure along a protection-susceptibility continuum. Our working hypothesis is based on the following 3 assumptions: (1) a number of HLA alleles are involved in SZ, (2) the contribution of an allele to SZ can be negative (protective) or positive (susceptibility), and (3) the ultimate contribution of the HLA to SZ in a particular individual lies in the combined contributions of 12 HLA alleles carried by the individual, namely 2 of each 6 HLA classical genes (genes A, B, C of Class I and DPB1, DQB1, DRB1 of Class II); other nonclassical HLA genes (eg, DRB3, DRB4, etc.), not carried by all individuals, may contribute as well. The covariance of SZ prevalence and HLA allele frequency is a suitable quantitative estimate of the protective/susceptibility (P/S) contribution of an allele. Covariance possesses the following useful properties: (a) it can be easily calculated from the epidemiological/immunogenetic data (SZ prevalence and allele frequency), (b) it takes negative and positive values, corresponding to a protective or susceptibility contribution, (c) covariance estimates from different alleles can be added to derive and overall SZ P/S contribution of an ensemble of alleles as in the case of adding contributions of 12 alleles carried by an individual, (d) it can be normalized after division by the product of the data analyzed to yield a correlation coefficient, and, finally, (e) can be calculated from ranked data to provide a unit-free estimate of SZ P/S score. Here we use both the raw covariance and its normalized version (ie, correlation coefficient) to investigate (a) the SZ P/S contributions of various alleles and their 12-allele sets (the number of alleles carried by all individuals), and (b) the possible dependence of SZ prevalence on in silico estimated binding affinities of HLA alleles to 9 strains of human herpes virus.
Human herpes viruses (HHV) and schizophrenia
Infections, especially involving prenatal or childhood exposure, have been linked to adult schizophrenia risk purportedly via direct effects of pathogens and/or the effects of an inflammatory response to pathogens or autoimmunity on the developing brain.47 -49 A number of pathogens have been implicated in schizophrenia including several human herpes viruses (HHV). 50 HHV are neurotropic viruses that are nearly ubiquitous and can cause lifelong infection characterized by alternating periods of latency and reactivation.51 -53 They include herpes simplex virus 1 (HSV1/HHV1) and HHV2 (HHV2), varicella zoster virus (HHV3), Epstein-Barr virus (EBV/HHV4), cytomegalovirus (CMV/HHV5), HHV6a, HHV6b, HHV7, and HHV8. Though several studies have implicated HHV in schizophrenia, findings across studies have been highly inconsistent.50,54 Nonetheless, intriguing links between HHV and schizophrenia exist. For instance, increased HHV antibodies have been documented in patients experiencing an initial episode of schizophrenia,55,56 linked to higher suicide risk in patients with schizophrenia, 57 and have been associated with structural brain changes 56 and cognitive impairment58,59 among patients with schizophrenia. In addition, studies have documented that anti-herpes treatment and anti-inflammatory medications improve schizophrenia symptoms.55,60 -62 Furthermore, studies suggest that interactions between single nucleotide polymorphisms in the HLA region and exposure to HHV influence schizophrenia risk.19,63,64
Consistent with studies documenting an interaction between the HLA region and HHV on schizophrenia,19,63,64 we have hypothesized that exposure to foreign antigens, such as HHV, in the absence of HLA alleles that are capable of binding with those antigens can lead to antigen persistence, inflammation and/or autoimmunity, and downstream deleterious effects on the brain.65 -68 Here we extend that line of research to schizophrenia, evaluating the relation between the frequency of HLA alleles and the prevalence of schizophrenia in 14 Continental Western European countries and evaluating the correlation between the HLA-schizophrenia profile and HLA-HHV binding affinity.
Methods
Prevalence of schizophrenia
Table 1 shows the population prevalence of schizophrenia in 2016 for each of the following 14 countries in Continental Western Europe (CWE): Austria, Belgium, Denmark, Finland, France, Germany, Greece, Italy, Netherlands, Portugal, Norway, Spain, Sweden, and Switzerland. Specifically, the total number of people with schizophrenia in each of the 14 CWE countries was identified from the Global Health Data Exchange, 69 a publicly available catalog of data from the Global Burden of Disease study, the most comprehensive worldwide epidemiological study of more than 350 diseases. The number of people with schizophrenia in each country was divided by the total population of each country in 2016 70 and expressed as a percentage. We have previously shown that life expectancy for these countries is virtually identical 36 ; therefore, life expectancy was not included in the current analyses.
Schizophrenia prevalence in the 14 CWE countries.
HLA alleles
We obtained the population frequency of 127 HLA alleles from 14 Continental Western European Countries (Austria, Belgium, Denmark, Finland, France, Germany, Greece, Italy, Netherlands, Portugal, Norway, Spain, Sweden, and Switzerland), as described elsewhere. 39 Those 127 alleles comprised 69 Class I and 58 Class II alleles, all of which were used to analyze the schizophrenia-HLA immunogenetic scores (see below). The alleles and their mean frequencies of the individual alleles are given in Tables 2 and 3 for HLA Classes I and II, respectively.
The 69 HLA Class I alleles used and their mean frequencies.
The 58 HLA Class II alleles used and their mean frequencies.
SZ-HLA protection/susceptibility scores
The PSCorr scores are the Fisher z-transformed correlations between HLA schizophrenia prevalence and HLA allele frequency (both log-transformed). 36
The PSCov scores are simply the covariance between the log-transformed SZ prevalence and HLA allele frequency. The sign of both scores indicates protection (negative) or susceptibility (positive) to schizophrenia (P/S scores), whereas their absolute value indicates the strength of their P/S effect.
HHV proteins
The amino acid sequences of surface glycoproteins from HHV 1 to 8 (1, 2, 3, 4, 5, 6A, 6B, 7, 8) were retrieved from the UniprotKB database 71 and are given in the Appendix. Table 4 gives the details of these proteins and associated information regarding the number of n-mers used in the analyses of Class I and Class II alleles.
Attributes of the 9 HHV viruses used.
A sliding window approach67,68 (Figure 1) was used to partition the whole sequence of each glycoprotein into n-mers for HHV-HLA analyses. Since the exact n-mer length for binding to HLA Class I and Class II molecules is not known (only its range), we used 4 n-mer lengths (2 per HLA Class) to cover their approximate optimal range, namely 9- and 10-mer for Class I and 15-and 22-mer for Class II. 72 For each n-mer, a set of subsequences was generated (number of subsequences = length of glycoprotein—n). For all n-mers (n = 9 and 10 for Class I, and n = 15 and 22 for Class II), subsequences were collected and queried in the IEDB database (www.iedb.org) in order to identify potential epitope peptides that are recognized by and bind to HLA Class I and II surface receptor proteins. IEDB queries were performed for each of the sliding-window aggregated sequences against HLA alleles available (see below). Binding affinity predictions were obtained using the NetMHCIIpan method. 73 For each n-mer, a binding affinity score was predicted and reported as a percentile rank by comparing the peptide’s score against the scores of 5 million random n-mers selected from the SwissProt database 71 ; smaller percentile ranks indicate higher binding affinity (“good binders”). For each HHV strain and HLA allele, the lowest percentile rank (LPR) was used as the highest estimated binding affinity measure for quantitative analyses. The distribution of LPR was skewed to the left (Figure 2A) in an exponential fashion and was log-transformed (equation (2)) to normalize it (Figure 2B), as illustrated in Figures 2 and 3, respectively.

Schematic diagram to illustrate the sliding window approach for estimating exhaustively in silico the binding affinity of all consecutive 9- and 15-mer peptides. The figure refers to HHV1 for illustration but the glycoproteins of all HHV strains (Table 4) were tested, as described in the text.

Frequency distributions of

Probability-probability plots of
Of a total of 127 alleles available (Tables 2 and 3), 113 were used for the analyses involving HHVs because 14 HLA Class I alleles could not be modeled by the NetMHCIIpan method (A*29:01, A*33:03, A*36:01, B*14:01, B*15:17, B*15:18, B*35:02, B*35:08, B*39:06, B*41:01, B*41:02, B*44:05, B*45:01, B*47:01).
Data analysis
Schizophrenia-HLA PSCorr scores
Standard parametric (mean, standard deviation, etc.) and nonparametric statistics (median, interquartile range, etc.) were used to evaluate the distribution of scores. We used Tukey’s 74 fences to identify outlier and extreme values of the PSCorr distribution, as follows.
where Q1, Q3 are the 25th and 75th percentiles, respectively. These fences demarcated outlier values (between inner and outer fences) and extreme values (outside the outer fences). A major objective of this study was to determine whether there was a statistically significant preponderance of protective or susceptibility schizophrenia PSCorr scores overall and within (and among) HLA classes and genes. For that purpose, we used a one-sample t-test with test value = 0 and estimated the mean preponderance, its statistical significance, and the effect size.75,76
Random permutations test for assessing the statistical significance of the schizophrenia-HLA profile
The SZ-HLA profile is a vector of length N = 127 containing the estimated PSCorr scores. We tested the null hypothesis that the assignment of PSCorr to specific alleles may be due to chance by performing the permutation test below. Let
Application to individuals
Since every individual carries 12 classical HLA alleles (2 of each 3 HLA Class I genes and 2 of each 3 Class II genes), an average PSCorr score can be calculated:
Finally, we transformed
We obtained expected estimates of ξ using a bootstrap procedure,
77
as follows. For each gene, 2
Application to populations
In this analysis, we extended computations above to populations by using the average allele frequencies (Table 2) as weighting factors in equation (8):
We also carried out a bootstrap analysis of Ξ (N = 1 000 000) to obtain 1 million bootstrap Ξ* values for evaluating the representation in the population of allele frequency-weighted combinations of randomly selected 12 SZ PSCorr scores.
Association of schizophrenia PSCorr scores with HHV affinities
The second objective of this study was to assess the relation between schizophrenia PSCorr scores and the in silico estimated binding affinity of the corresponding alleles with the 9 HHVs (Table 4). For this purpose, we performed 2 analyses. At the single allele level, we carried out a stepwise linear regression, where the schizophrenia PSCorr score was the dependent variable and the HLA-HHV binding affinities, ln(LPR) (equation (2)), of the 9 HHV strains were the independent variables. At the level of the individual, we took into account the fact that an individual carries 12 HLA alleles, each of which has an estimated binding affinity to the 9 HHV strains. We first computed the average of those affinities for each allele and then computed the grand average binding affinity for the whole set of the randomly chosen alleles in the bootstrap procedure above. This provided a set of 1 million pairs consisting of (a) the average PSCorr (ξ*) of the 12 randomly chosen alleles, and (b) the grand average (ζ*) of the estimated HHV affinities for that set of alleles. The correlation between ξ* and ζ* is an estimate of the aggregate association of SZ PSCorr and HHV at the level of the individual.
Schizophrenia-HLA PSCov scores
The same analyses as above were performed on PSCov scores.
Implementation of analysis procedures
The IBM-SPSS statistical package (version 27) was used for implementing standard statistical analyses, including nonparametric exploratory data analysis, 74 regression analysis, and testing of proportions. All P values reported are 2-sided. The permutation test and bootstrap procedure were implemented using FORTRAN (Geany, version 1.38, built on or after 2021-10-09) and 64-bit Mersenne Twister random number generator with a large random double-precision odd seed.
Results
The SZ-HLA immunogenetic scores PSCorr and PSCov are given in Tables 5 and 6. The 2 scores were highly significantly and positively correlated (Figure 4; r = .937, P = 3.94 × 10−59, N = 127).
Schizophrenia Immunogenetic scores for HLA Class I alleles.
Schizophrenia immunogenetic scores for HLA Class II alleles.

SZ-HLA PSCorr scores are plotted against SZ-PSCov scores. See text for details.
Permutation tests
In the permutation test, no cases were found where a random SZ prevalence – HLA allele frequency pairing (out of 1000 000 runs) matched the observed SZ-HLA PSCorr or PSCov profiles (Tables 5 and 6), thus rejecting the null hypothesis that the observed profile could be accounted for by chance (P < 1 × 10−6) for both PSCorr and PSCov scores. In addition, in the ranks version of the random permutation test, which relaxed the requirement of an exact PSCorr match, we found that in only 3/1000 runs the ranks of the observed PSCorr scores matched the ranked observed SZ-HLA PSCorr profile, thus rejecting again the null hypothesis that the observed ranks of the PSCorr scores were due to chance (P = .003); no case matching the PSCov scores profile was found, the rejecting the null hypothesis that the observed ranks of the PSCov score were due to chance (P < .001).
Frequency distributions
The ranked SZ-HLA PSCorr scores are plotted in Figure 5, and their frequency distribution and box plot are shown in Figure 6A and B, respectively; detailed statistics are given in Table 7. It can be seen that there were 3 outliers (in red), with scores lower that the lower inner fence (equation (3), −0.081). These were HLA alleles B*27:02, B*35:08, and DRB1 * 13:05, illustrated in Figure 7, with SZ-HLA PSCorr scores of −0.956, −0.870, and −0.820, respectively. The correlation coefficients were all >.5 (r = .743 for B*27:02, 0.703 for B*35:08, and 0.675 for DRB1:13:05), indicating large effect sizes. 76

Ranked SZ-HLA PSCorr are plotted against their ranks. Notice that protective scores (blue) are more numerous and stronger than susceptibility scores (red). See text for details.

(A) frequency distribution of the 127 SZ-HLA PSCorr scores. (B) boxplot of the data in Figure 5. Q1, 25th percentile; Q3, 75th percentile; IQR, interquartile range.
Descriptive statistics of the distribution of SZ-HLA PSCorr scores. N = 127 HLA alleles.

Prevalence of schizophrenia is plotted against frequency of B*27:02 (r = .743, PSCorr = −.956), B*35:01 (r = −.703, PSCorr = −.872), and DRB1 * 13:05 (r = −.675, PSCorr = −.820), as indicated. The curves are a power fits. The fits are linear between log-log transformed data, from which the correlations above are derived.
Very similar results were obtained for PSCov, with the following 2 differences. (a) There was an additional protective outlier allele (B*15:18), and (b) there were 4 outlier susceptibility alleles (C*16:01, DRB1 * 11:04, DRB1 * 11:01, DQB1 * 02:02—from highest to lower score). These results indicate a higher sensitivity of the PSCov measure.
Tests of score proportions
Since the sign of the PSCorr score is determined by the sign of PSCov score, the following results apply to both scores. (a) There was a statistically significant higher proportion of protective scores (77/127 = 60.6%; P = .017, Wilson test of single proportion). (b) The proportions of protective scores in either Class I or Class II, and in any of the 6 genes did not differ from those of the susceptibility scores.
Tests of score magnitude
With respect to PSCorr scores, a statistically significant higher magnitude of protective (vs susceptibility) scores was observed (a) overall, (b) for HLA Class I, and (c) for HLA B gene (Figure 8); no significant effects were found for Class II or any other gene. Detailed statistics for the significant effects are given in Tables 8 and 9. Similarly, the magnitude of PSCov scores was significantly higher overall, HLA Class I, and HLA B gene.

Mean (±95% CI) protective preponderance of SZ-HLA scores (negative/protective – positive/susceptibility SZ-HLA score) for the groups indicated. See text for details.
Statistics for statistically significant preponderance of protective (negative) Schizophrenia-HLA PSCorr scores (one sample t-test, test value = 0).
Effect sizes for the results in Table 7.
Application to individuals
The following results were obtained from the bootstrap analysis of the PSCorr scores (descriptive statistics in Table 10); very similar results were obtained for the PSCov scores (not illustrated). Since the SZ-HLA PSCorr scores are population-level correlations between schizophrenia prevalence and HLA allele frequencies, it is reasonable to interpret them as reflecting the degree of protection/susceptibility conferred in general on an individual of the population by their specific HLA genetic makeup. For a particular individual, the average of the PSCorr scores of the 12 HLA alleles that the individual carries (equation (8)) can serve as an overall estimate, ξ, of SZ protection/susceptibility along a continuum ranging from −1 to +1 (equation (9)). Analysis of the distribution of the bootstrapped 1 million ξ* values revealed the following. (a) The distribution was unimodal, approximating a normal distribution but skewed to the left, that is, toward negative (protection) values (Figure 9). (b) Indeed, negative (protection) values (71.2%) outnumbered significantly positive (susceptibility) values (28.8%) (P < .001, Wilson score test for a single proportion). In addition, the mean of the distribution was negative, indicating a bias toward SZ protection (P < .001, one-sample t-test against the null hypothesis that the mean = 0). (c) There were 4404 (0.4%) negative (protective) ξ* outliers and 3 extreme ones (>Tukey’s outer fence, equation (4); Figures 10 and 11A). (d) There were 3043 (0.304%) positive (susceptibility outliers (Figures 10 and 11B). These proportions differed highly significantly (Z = 15.8, P < .00001). (e) Remarkably, the percent (0.3%) of positive (susceptibility) values of ξ* above was very close to, and statistically not significantly different from, the prevalences of schizophrenia used in this study (Table 1; P = .318, independent samples t-test).
Descriptive statistics of the distribution of ξ*. N = 1 000 000.

A, frequency distribution of 1 million bootstrap ξ* values.


Frequency distribution of protective (A) and susceptibility (B) ξ* values.
Application to populations
In this analysis, we computed the average Ξ of SZ-HLA PSCorr scores weighted by the corresponding allele frequencies (equation 10) to obtain the population outcome of a particular combination of HLA alleles. The results obtained are shown in Table 11; very similar results were obtained for the SZ PSCov score (data not illustrated). More specifically: (a) The distribution of Ξ*was unimodal, approximating a normal distribution, with a skew to the left (ie, toward protection values). (b) Overall, negative (protection) values (69.3%) outnumbered significantly positive (susceptibility) values (30.7%) (P < .001, Wilson score test for a single proportion). In addition, the mean of the distribution was negative, indicating a bias toward SZ protection (P < .001, one-sample t-test against the null hypothesis that the mean = 0). (c) There were 5562 (0.556%) negative (protective) Ξ* outliers and 11 extreme ones (0.0011%). (d) There were 3134 (0.313%) positive (susceptibility outliers. These proportions differed highly significantly (Z = 26.09, P < .00001). (e) Similarly to the results of ξ* above, the percent of susceptibility outliers was very close to, and statistically not significantly different from, the prevalences of schizophrenia used in this study (Table 1; P = .434, independent samples t-test).
Descriptive statistics of the distribution of Ξ*. N = 1 000 000.
Schizophrenia PSCorr scores and HLA-HHV binding
In this analysis, we first estimated the highest binding affinity of each HLA allele to the 9 HHV strains (Table 4). We found that HLA alleles of Class I had significantly better binding affinities than those of Class II (Figure 12, P < .001, F-test in ANOVA). The average binding affinities did not differ significantly among genes of the same HLA Class (Figure 13; P = .884 for Class I genes, and P = .492 for Class II genes, ANOVA). Next, we evaluated the possible dependence of SZ-HLA P/S scores (PSCorr and PSCov) on HHV-specific HLA-binding affinities by performing a stepwise multiple linear regression, where the SZ-HLA immunogenetic score of individual alleles was the dependent variable and the 9 corresponding HHV-specific

Average (±95% CI) HHV1-HLA affinities [ln(LPR)] for HLA Class I and II.

Average (±95% CI) HHV1-HLA affinities [ln(LPR)] for genes of HLA Class I and II.

SZ-HLA PSCorr scores are plotted against
Combined SZ-HLA PSCorr scores versus combined HLA-HHV affinities
In the previous section, we analyzed the affinities of single HLA alleles to the various HHV strains. Since an individual carries 12 HLA alleles, it is reasonable to suppose that the ultimate influence of HHV on SZ susceptibility of an individual would be reflected in the correlation between the average PSCorr ξ (equation (8)) and the average estimated 9 HHV binding affinities ζ (equation (11)). For that purpose, we ran a bootstrap procedure (see Methods) and obtained 1 000 000 bootstrap values of ξ* and ζ*. We found that these measures were positively and highly significantly correlated (r = .644, Z = 764.98, P < .001). The data from a portion of that set (N = 1000) are plotted in Figure 15. Very similar results were obtained for PSCov (data not illustrated). These results indicate a strong dependence of overall individual SZ risk on combined HHV affinities, such that a HLA makeup with overall higher binding HHV affinity would confer protection from (ie, lower the risk of) schizophrenia.

One-thousand ξ* values are plotted against the corresponding averages of 9 HHV ln(LPR) values. r = .638, P < .001. See text for details.
Discussion
Methodological considerations
The SZ-HLA PSCorr and PSCov scores were used here as continuously-varying measures of schizophrenia-HLA allele covariation at the population level and not as genetic causative factors testing a specific null hypothesis; hence no statistical tests of significance or P-values were computed for individual PSCorr scores since there was no specific null hypothesis to be tested. Instead, we tested the null hypothesis that the set (vector) of the 127 SZ-HLA PSCorr scores was due to chance by performing an extensive random permutation test. The results of this test rejected the null hypothesis above (P < 1 × 10−6) as well as its more liberal version that the ranked scores were due to chance (P < .003). In accordance with these results, the set of the observed SZ-HLA PSCorr scores was employed to evaluate its possible associations with the in silico estimated binding affinities to HHV proteins, which were used in the same way, namely as continuously-varying quantitative assessments HLA-HHV associations.
Remarkably, but unsurprisingly, practically identical results were obtained in all analyses when SZ PSCov scores were used, instead of PSCorr scores, since these measures were strongly and highly significantly correlated (Figure 4). In a way, the PSCov score would be somewhat more appropriate because it measures pure covariation, without standardization, which, although useful for statistical significant testing, it alters the covariance estimate itself. It should be mentioned that covariance has been usefully78,79 (and profitably 80 ) employed in various fields, including evolution and natural selection78,79 and finance. 80
General
A wealth of research has documented genetic and environmental influences on schizophrenia, including exposure to infections. Here we employed an epidemiological-immunogenetic approach30,36 -46 to explore the relations between schizophrenia and HLA at the population level by computing the covariance (and its standardized version of correlation) between schizophrenia prevalence and HLA allele frequencies in 14 CWE countries. The choice of these countries was based mainly on the availability of HLA allele frequencies and on their practically identical life expectancies. The computed SZ-HLA PSCorr (and PSCov) scores were used to (a) estimate the degree of protection (negative score) or susceptibility (positive score) conferred by specific HLA alleles on the individuals of the population, (b) evaluate in silico possible associations between those scores and the binding affinities of the same HLA alleles to proteins of 9 HHV strains, and (c) investigate the possible dependence of the SZ P/S scores on HHV-HLA binding affinities. Our findings documented a preponderance of protective effects of HLA on schizophrenia prevalence and indicated that the SZ-HLA profile was highly associated with the estimated HLA binding affinity for HHV1, such that higher HLA-HHV1 binding affinity was associated with protective SZ-HLA scores. We then extended those analyses to aggregates of SZ-HLA scores and corresponding HHV-HLA binding affinities, since every individual carries 12 HLA alleles. Indeed, we found that at this aggregate level, SZ protection conferred by HLA was highly associated with better (more effective) binding affinities to HHV. Taken together, these findings suggest that HLA binding and subsequent elimination of HHV1 confer protection against schizophrenia in individuals and populations.
HLA and schizophrenia prevalence
With regard to the association of HLA and schizophrenia, we found there were more negative associations between HLA allele frequency and schizophrenia prevalence than there were positive associations, reflecting an overall protective effect. The relative protection conferred by HLA may partially account for the low prevalence of schizophrenia. Significant protective effects were documented here for Class I HLA, and the B genes, in particular. Notably, the HLA-B gene has been previously linked to the intergenerational transmission of schizophrenia risk. 81 Here, 63.9% of the HLA-B genes were protective against schizophrenia and, of the 127 alleles investigated, the most highly significant protective effect was found for HLA-B*27:02 (r′ = −0.956), an allele that has been strongly linked to risk for ankylosing spondylitis in one of the strongest HLA-disease associations known. 82 Interestingly, this negative SZ-HLA association corresponds to the similarly negative associations between ankylosing spondylitis and schizophrenia documented in epidemiological studies. 83 Although there was a preponderance of protective (ie, negative) HLA-SZ associations, 39% of HLA-SZ associations were positive, suggesting susceptibility toward SZ. HLA is most commonly associated with immune-mediated/autoimmune conditions. 33 In light of substantial evidence indicating robust associations between the risk of schizophrenia and several autoimmune disorders84 -89the present findings suggest HLA may mediate that association. Furthermore, it has been suggested that some schizophrenia endophenotypes are driven by an infectious/inflammatory and/or autoimmune component 27,62,90 -93; the present findings suggest that HLA may influence immune-mediated schizophrenia endophenotypes.
Implications for individuals
A major outcome of this study was the derivation of a unique, continuously-varying measure, ξ, assessing the protection/susceptibility of an individual to schizophrenia, based on their specific genetic HLA makeup. The allele-based SZ-HLA PSCorr scores provided the building blocks of ξ, which is a standardized score of the average 12 PSCorr scores of the HLA alleles carried by an individual (equations (8) and (9)). A reasonable interpretation of ξ would be as a measure of schizotypy,94 -97 such that high values of ξ would correspond with increased liability for schizophrenia, as supported by the finding that the percent of very high (outlier) positive (susceptibility) values of ξ* in our simulation was 0.3%, a percent very close to, and statistically not significantly different from, the prevalences of schizophrenia used in this study (Table 1). The same considerations apply to the negative (protective) ξ values which could be interpreted as schizophrenia-resilience 96 scores. Since ξ is a normalized score with a range [−1, +1], it can be used as a threshold for detecting individuals resilient or susceptible to schizophrenia. Specifically, a value of ξ < − 0.25 (the value of negative inner fence, equation (3)) would indicate schizophrenia resilience, whereas a value of ξ > 0.16 (the value of positive inner fence, equation (5)) would indicate schizophrenia susceptibility, with continuous gradation in both domains.
SZ-HLA PSCorr scores and in silico HHV binding affinity
Several prior studies have documented the influence of HHV1 on schizophrenia. For instance, HHV1 antibodies are elevated in patients with schizophrenia compared to healthy controls56,98 (cf, ref 99 ), and HHV1 exposure is associated with brain morphological anomalies56,61,100 and cognitive impairments58,61,64,100 -102 in patients with schizophrenia. Here, we did not evaluate the association between HHV and schizophrenia but rather the influence of HLA on both by examining the dependence of SZ-HLA immunogenetic scores on HHV-specific HLA-binding affinities. Those analyses revealed a significant association between the HLA-schizophrenia profile and binding affinity only for HHV1, bolstering evidence regarding the influence of HHV1 on schizophrenia. As reviewed elsewhere, 50 associations between other HHV including CMV, HHV2, and EBV with schizophrenia have been mixed. Here, at the allele level, SZ P/S scores were found to be associated with HHV1-HLA affinity. However, at the allele aggregate level, where groups of 12 alleles were considered (as carried by every individual), the association between SZ-HLA P/S scores and HHV-HLA binding affinities was much stronger, indicating the involvement of several HHV strains acting in concert. In fact, this finding is in accord with the central hypothesis of this study, namely that SZ risk/protection is the outcome of a combined effect of groups of HLA alleles and not the exclusive effect of a specific allele.
Considering the above findings as a whole, the present study suggests that the association between HHV and schizophrenia may be partially attributable to HLA composition. The formation of an antigen-HLA molecule complex is a vital step in antigen elimination and host protection against foreign antigens. Given the extremely polymorphic nature of HLA, there is large variability in the binding affinity to a given antigen across HLA alleles. In the case of poor binding affinity between certain HLA molecules and antigen epitopes, the elimination of those antigens is blocked resulting in antigen persistence which may result in inflammation, autoimmunity, and disease. 65 Indeed, we have previously discussed antigen persistence due to HLA incongruence in relation to several conditions.65 -68,103 -106 We documented here that the HLA-schizophrenia profile that is primarily characterized by protective effects is significantly associated with HHV binding, suggesting that HLA binding and elimination of HHV protect against schizophrenia. The converse implication, which follows from the perspective of the persistent antigen theory, is that in the absence of HLA alleles capable of binding HHV antigens with sufficient affinity and immunogenicity, such antigens may persist, 107 ultimately contributing to damaging effects including those associated with schizophrenia. That is, the effect of HHV1 exposure may be partially mitigated by HLA with high binding affinity. Furthermore, the combined influence of HLA composition and HHV1 exposure may partly account for the low prevalence of schizophrenia despite high HHV1 seroprevalence in Europe. 108
The current findings and approach may also have implications that extend beyond schizophrenia given the high degree of genetic overlap that exists between risk for schizophrenia and other psychiatric disorders.109 -111 For instance, previous studies have documented substantial genetic overlap between schizophrenia and bipolar disorder.109 -111 Furthermore, like schizophrenia, both infections (eg, HHV 112 ) and HLA 113 have been implicated in bipolar disorder risk, and, like schizophrenia, anti-inflammatory agents have shown promising treatment effects for bipolar disorder. 114 The extent to which the immunogenetic profile for bipolar disorder and other psychiatric conditions overlaps with that of schizophrenia remains to be investigated and is an interesting avenue for future studies.
The “crowdfunding” nature of HLA contribution to schizophrenia (and other diseases)
In evaluating HLA contributions to schizophrenia, it is important to consider the following: (1) the HLA region is the most polymorphic of the entire human genome, (2) every individual possesses 12 HLA alleles, and (3) each HLA allele may differ in terms of the direction of association with schizophrenia (and other diseases) and the magnitude of those associations. Consequently, rather than focusing on the individual “unique” contribution of specific alleles on schizophrenia susceptibility and protection, the primary emphasis here is on relatively small contributions of individual alleles that when combined as a set contribute to overall protection/susceptibility of schizophrenia in a population and for individuals. This “crowdfunding” nature of HLA contributions to schizophrenia moves away from single gene-disease associations toward considering polygenetic contributions to disease.
Conclusions
Overall the findings of the present study imply that schizophrenia prevalence is partially attributable to the combined effects of HLA composition and HHV1 exposure in Continental Western Europe. A significant strength of this study relative to GWAS studies that rely on imputation is that the approach used here permits evaluation of the frequency of high−resolution Class I and Class II HLA alleles with schizophrenia prevalence and estimated HHV binding affinity in silico. High-resolution HLA genotyping allows for the evaluation of HLA-disease associations at the protein-level where small variations may result in discordant disease associations such as that seen here for HLA-C*07:01 which was negatively correlated with schizophrenia prevalence and C*07:02 which was positively correlated with schizophrenia prevalence. In addition, the immunogenetic epidemiological approach used here takes advantage of HLA heterogeneity to evaluate the association of a large number of HLA alleles with schizophrenia at the population level, providing a broad picture of the influence of HLA on the population prevalence of schizophrenia. Given that schizophrenia is polygenic, our approach of combining signed, weighted contributions of HLA alleles to derive a unique, continuously-varying, normalized measure could easily be extended to include contributions of others genes for a more comprehensive estimate. At the minimum, determination of the complete, high-resolution HLA makeup of individuals in a population would be needed, in addition to population frequencies of other schizophrenia-related genes, and also mutations identified in GWAS studies.
Limitations of the study
Despite these strengths, it is also prudent to consider the present findings within the context of the study’s limitations. First of all, these findings hold for the European countries studied, and their extension to other countries remains to be determined, given the geographic heterogeneity of schizophrenia prevalence, 2 HLA allele frequency,30,31 and pathogen exposure. 115 Also, our study was limited to 127 HLA alleles, the only alleles that occurred in 9 or more of the countries investigated at the time the data was obtained; still, other alleles that were not included here may be relevant to schizophrenia and HHV, especially in countries of other geographical locations. An additional consideration is that this study was limited to HHV although several bacterial and other viral infectious agents have been implicated in schizophrenia.4 -7 Future studies evaluating the HLA-schizophrenia profile in relation to the binding affinity of HLA to other infectious agents may reveal additional HLA-mediated infectious contributors to schizophrenia.
Footnotes
Appendix
Amino acid sequences of the 9 HHV strains (Table 4). Strain labels are from Uniprot (https://www.uniprot.org/uniprotkb/)
Author Contributions
LMJ and SAC contributed to data retrieval. APG contributed to data analysis. LMJ and APG wrote the paper. LMJ, SAC, and APG edited and approved the paper.
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Partial funding for this study was provided by the University of Minnesota (the Kunin Chair in Women’s Healthy Brain Aging, the Brain and Genomics Fund, the McKnight Presidential Chair of Cognitive Neuroscience, and the American Legion Brain Sciences Chair) and the U.S. Department of Veterans Affairs. The sponsors had no role in the current study design, analysis or interpretation, or in the writing of this paper. The contents do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability
The datasets generated and analyzed in the current study are publicly available.
Ethical Approval
This article does not contain any studies with human participants performed by any of the authors.
Significance Statement
Schizophrenia prevalence covaries with the population frequencies of human leukocyte antigen (HLA) alleles, which, in turn, bind with various affinities to human herpes viruses. HLA alleles protective against schizophrenia had higher binding affinities to HHV, suggesting that HLA binding and subsequent elimination of HHV may confer protection against schizophrenia.
