Abstract
The current paper reports systematic variations of people’s attitudes toward gender and gendered roles between countries and regions in Europe, making regional and national comparisons simultaneously visible on the same scale over time. We operationalized the concept of “gender attitudes” by using a fresh combination of items among those administered by the European Values Survey (in 2008 and in 2017) whose sampling strategy is statistically representative at both national and regional level. Then, we validated our proposed measure by using the Rasch model to test its measurement invariance across European countries and regions, and over time. We included regions under the hypothesis that the variability of gender attitudes is primarily attributable to the local sociocultural milieu people live in, and thus that the variability
Plain language summary
Previous studies have shown that people’s attitudes toward gender and gendered roles are socially, culturally, historically and thus geographically situated. Previous research has thus investigated such a concept longitudinally (to understand if/how it has evolved over time) and across cultures and geographies (e.g., between countries). Nonetheless, very little research (if any) has investigated gender attitudes at sub-national level, e.g. region to region within the same country. The current paper aims to fill this gap. We developed and validated a new scale to measure gender attitudes at regional level, in each European country. Results confirmed our hypothesis that gender attitudes are primarily attributable to local social and cultural factors, and suggested that policy and/or research based on nationally aggregated data should pause and focus on regional (more than on national) data.
Introduction
People’s attitudes toward gender are complex, multidimensional, and socio-culturally situated (Constantin & Voicu, 2015; Larsen & Long, 1988; Permanyer, 2010; Pfau-Effinger, 2004) in regards to (a) roles (in family and wider society) that are deemed as more appropriate for men or women, and (b) certain institutional, social contexts (Constantin & Voicu, 2015).
Gender attitudes change over time (Bolzendahl & Myers, 2004; Brewster & Padavic, 2000; Choe et al., 2014; Cotter et al., 2011a; Lomazzi, 2017a; Pampel, 2011), at different speeds and in different directions, depending on the sociocultural characteristics associated with the
In order to account for the role played by sociocultural and historical factors affecting gender attitudes, previous studies have investigated gender attitudes in a cross-country perspective, using “country” as a proxy of such sociocultural factors. Nonetheless, relatively less work has been done to establish the invariance of the proposed measures across geographies, even though measurement invariance is to be considered as a prerequisite to comparisons as “measurement invariance (MI; consistent score quality and meaning) should be confirmed across relevant subgroups (…) to establish a common interpretation framework in diverse settings” (Bulut et al., 2015, p. 1). According, to Weziak-Bialowolska (2015), for example, the lack of research about measurement invariance (MI) has led to inconsistencies in results. In the present research, we started from this research gap but went a step further by testing the measurement invariance of our scale (a) across geographies, at both national and regional level in Europe; and, (b) over time, by analyzing data collected by the European Values Survey (EVS) in 2008 and in 2017, and, finally, (c) by comparing gender attitudes across countries and regions, in 2008 and in 2017, comparatively, to show how gender attitudes have evolved over time at different levels of locality in the EU.
In addition, the scale presented in this paper builds on an innovative combination of EVS items, that we claim will better measure people’s gender attitudes. To support such a claim, we compared the psychometric functionality of our proposed scale with the functionality of the original EVS gender attitudes scale.
We thus think that our paper contributes to knowledge by developing an validating a new measure of gender attitudes, at different levels of regionality, that can (a) shed new light onto the relationship between
Region-Based Investigation of Gender Attitudes
Building on evidence that gender attitudes are socio-culturally and historically situated, we hypothesize that sociocultural and historical factors affecting people’s attitudes toward gender (and thus attitudes toward gender equality) can vary also geographically, that is, by country but also by region, within the same country.
Subnational comparisons have gained increasing attention recently, as their study helps to “untangle the operational realities of national systems, and the role of units and processes at the subnational scale in wider patterns” (Sellers, 2019, p. 86). Nationally aggregated data, often used to channel policy and practice, cannot capture the real tendency of social phenomena when they are characterized by high degrees of internal heterogeneity. As claimed by Snyder (2001), the “tendency to unreflectively gravitate toward national-level data and national units of analysis has contributed to a miscoding of cases that can distort causal inferences and skew efforts at theory building. A greater sensitivity to within-nation variation and complexity can help comparativists avoid these pitfalls” (94).
The variability of gender attitudes by region may contribute to explaining, for example, the regional variations of gender gaps in economics (Campa et al., 2011; Uunk, 2015), or female underachievement in mathematics (Gonzalez de San Roman & De la Rica, 2016), among other things. More precisely, we claim that ignoring the variability of gender attitudes at different levels of locality can be misleading, with interpretations of results misguiding both policy and practice. On the other hand, we hypothesize that locating the exploration of gender attitudes at different (national and sub-national) levels can be helpful in identifying connections between gender attitudes and social, cultural and historical factors, and therefore more “properly” inform policy and practice. Cascella and Pampaka (2020), for instance, have recently shown high regional variability of gender attitudes in Italy where, on average, the traditional perception of women was shown to be stronger in the south than in the center and in the north. Many factors can converge to cause such differences. For example, compared to south, northern Italy has been characterized by a faster economic and industrial development thus enhancing female involvement in the job market. Such differences exist even today and are sharpened by differences in the availability of services to support the family (Da Roit et al., 2015). This is just an example of the potential implications: gender attitudes have been used to explain a number of phenomena, including family characteristics/structure and processes, marriage or division of household labor and homework (Braun, 2008; Budig et al., 2012; Carlson & Lynch, 2013; Cunningham, 2008; Farré & Vella, 2013; Voicu, et al., 2009), female disadvantage in the job market (Campa et al., 2011; Kmec et al., 2010), female under representation in politics (Dilli et al., 2015; Dilli et al., 2019), differences in informal care (Da Roit et al., 2015), or female underachievement in STEM education (Fryer & Levitt, 2010; Guiso et al., 2008; Nollenberger et al., 2016). Most of these studies are based on nationally aggregated data, with just a few exceptions (such as Campa et al., 2011).
Our study intends to show to what extent individuals’ attitudes toward gender and gendered roles vary by region, in each European country, by validating and using a measure that guarantees measurement invariance across geographies that can be used to expand previous research and used in further studies.
Measuring gender attitudes at different levels of locality calls for setting up an appropriate methodological strategy. The current paper aims to propose an analytical approach within the framework of the Rasch analysis that has been rarely used in previous research, as shown in the next paragraph.
The Measurement of Gender Attitudes in Previous Research
Several different methods have been used to develop and validate gender attitudes measures (for a review, see e.g., Weziak-Bialowolska, 2015). Gender attitudes have been frequently measured by developing composite indices, based on the combination of four- or five-point Likert statements that were summed (Inglehart & Norris, 2003; Motiejunaite, 2008; Puur et al., 2008), averaged (Chesters, 2012; Lucier-Greer & Adler-Baeder, 2011; Philipov, 2008) and/or used to calculate principal component scores (Philipov, 2008) or factor scores (Luck & Hofacker, 2003). Among them, only a few studies verified the dimensionality of their indicators through principal component analysis (i.e., Inglehart & Norris, 2003; Luck & Hofacker, 2003; Motiejunaite 2008; Philipov, 2008). “Motiejunaite (2008) used classical (exploratory) factor analysis and although conducted it for two analyzed countries separately, entirely ignored the issue of obtaining solutions with different numbers of factors, which strongly implies a lack of concept comparability” (Weziak-Bialowolska 2015, p. 54). Chesters (2012), Inglehart and Norris (2003), Luck and Hofacker (2003), Motiejunaite (2008), and Westoff and Higgins (2009) verified the consistency of the indicators through the application of Cronbach’s alpha, without testing for measurement invariance.
Measurement invariance can be tested via several alternative approaches (Zumbo, 1999). The most frequently employed according to Davidov (2008) resonate with (a) item response theory (IRT) models (Andrich & Marais, 2019), (b) differential item functioning approach (applied within Rasch and IRT, Osterlind and Everson, 2009; and also within SEM factor models, Bauer et al., 2020), and (c) the factor analysis framework (Davidov 2008). The latter is the most frequently used, especially in recent research (Lomazzi, 2018; Weziak-Bialowolska, 2015).
In the current paper, we proposed an application of the Rasch analysis (1960/1980) to develop our broadened measure (see the next paragraph) of gender attitudes, and validate it at different levels of locality in the European countries.
A revised and upgraded operationalization of the concept of “gender attitude”
Many gender attitudes’ measures have been proposed since the 1950s, with most referring to the “man-breadwinner” (vs. “woman home-maker domestic”) model. These are based on very similar items aimed at capturing respondents’ level of agreement/disagreement with proposed (supposedly) female roles in and outside family. For example, the scale developed by the European Values Survey (EVS) in 2008 to measure people’s gender attitudes in Europe included the following eight items asking about the perceived effects of a mother’s job on family life on a 4-point agreement scale (i.e., 1 = agree strongly to 4 = disagree strongly):
A working mother can establish just as warm and secure a relationship with her children as a mother who does not work (v159).
A pre-school child is likely to suffer if his or her mother works (v160).
A job is alright but what most women really want is a home and children (v161).
Being a housewife is just as fulfilling as working for pay (v162).
Having a job is the best way for a woman to be an independent person (v163).
Both the husband and wife should contribute to household income (v164).
In general, fathers are as well suited to looking after their children as mothers (v165).
Men should take as much responsibility as women for the home and children (v166).
Such a scale was used in a number of studies to study gender attitudes in Europe (among others, see, e.g., Lomazzi, 2017b; Voicu & Tufiş, 2012) as well as to explore the relationship between gender attitudes and other concepts or dimensions of gender equality that were supposed to be related to gender attitudes, such as female disadvantage in mathematics attainment (e.g., Gonzalez de San Roman & De la Rica, 2016), female disadvantage in the job market (Campa et al., 2011), and so on.
The EVS items listed above refer to the so-called “man breadwinner model” (Pfau-Effinger, 2004) as they include perceptions about the effect of having a job on looking after children and home (conceived as a particularly female duty), the importance of women’s economic independence, and the suitability of men (relative to women) in undertaking family care activities (Baber & Tucker, 2006; Bergh, 2006; Brooks & Bolzendahl, 2004; Cheng et al., 2012; Constantin & Voicu, 2015; Cotter et al., 2011b; Inglehart & Norris, 2009; Kroska & Elman, 2009b; Pampel, 2011; Shu & Zhu, 2012; Treas & Tai, 2012; Walter, 2018b, 2018a; Yu & Lee, 2013).
Nevertheless, stereotypical perceptions about gender have changed over time, making most of the existing measures outdated as measures of variation within some more progressive cultures, and thus calling for a revision or extension of the items and item sets used to construct gender attitude measures in order to include more culturally relevant dimensions (Walter, 2018b). So far, existing measures have explored three main dimensions: (a) a general category, including a clear division of tasks between women and men, both in and outside the family home (Bergh, 2006; Pfau-Effinger, 1993), and, two more specific categories: (b) the division of labor within family practices (Alwin, 2005), and (c) the perceptions about gender roles outside the family (including the importance of education, access to the labor market, or engagement in politics (Baxter & Kane, 1995; Jakobsson & Kotsadam, 2010).
In addition to these items, in our operationalization of gender attitudes (and the resulting scale/measure, which is presented later) we also considered items about single motherhood (conceived as an elective woman’s choice rather than as a consequence of events like widowhood), cohabitation out of wedlock, and same sex parenting, under the hypothesis that, in more traditional environments (i.e., where attitudes toward gender and gender roles are more traditional), people might be more likely to condemn motherhood outside marriage, cohabitation out of wedlock, or same-sex parenting.
Accounting for people’s perceptions about same sex parenting allows including a wider definition of gender attitudes, and thus extends the range of opportunities for relationships within and between genders, making for more choice and equality in principle. In our opinion, exploring people’s perceptions toward an enlarged definition of gender allows one to include a relatively new, but increasingly frequent facet of “modern” (i.e., contemporary) attitudes to gender and family, and to effectively contribute to the debate about masculinity/femininity and thus about gender inequality. In fact, previous studies have shown that “traditional societies” are typically characterized by a clear definition of “masculinity” with a whole set of cultural structures (Bourdieu, 2001) as well as by a clear condemnation of homosexuality (Kligerman, 2007; Shoko, 2010). This is more prominent, for example, in societies and groups strongly influenced by traditional, faith-based, religious injunctions and prejudices, where same-sex relationships are explicitly condemned by referring to the prophet Lut, the same encountered as Lot in the Christian Bible, who preached against homosexuality in the cities of Sodom and Gomorra. It is observed that geographical areas where homosexuality is explicitly condemned (in traditional/orthodox Jewish, Christian, and Islamic, communities/ societies), have typically more traditional attitudes toward female roles (Al-Ghanim & Badahdah, 2017).
Methodology
The current paper is 3-fold as it aims to (1) validate a scale intended to measure a broader concept of attitudes toward gender and gendered roles (and show why/how it performs compared with previous similar measures), (2) establish its measurement invariance across European countries and regions, across time; and (3) use the resulting measure in comparative analysis across countries and regions.
Data
We used data collected by the European Values Survey (EVS), a large-scale and cross-national survey program. The EVS sampling design and frame are based on the Nomenclature of Territorial Units for Statistics (NUTS,
The number of macro-regions (NUT-1) and regions (NUT-2) vary significantly across countries depending on many factors. Further information about the sampling criteria adopted by the EVS is available at https://ec.europa.eu/eurostat/web/nuts/principles-and-characteristics and at https://ec.europa.eu/eurostat/web/nuts/background. The EVS sample is statistically representative at each of these levels, but we analyzed and compared results at country level, macro-region (NUT-1), and region (NUT-2), and not at NUT-3, as this level is not available for all countries (Table 1).
Sample Description, by Country.
The most recent EVS wave took place in 2017. Nonetheless, data collected in 2017 (wave 5) have not yet been released for all European countries. We thus based our analysis on wave 4 (2008) to provide an overview of gender attitudes comparatively measured in all countries and regions. Then, we explored measurement invariance of such a measure over time with both wave 4 and 5, and discussed how gender attitudes have changed over time, from 2008 to 2017, in the countries where more recent data have been made available.
The instrument (i.e., the collection of items) used in this paper to measure gender attitudes consists of twelve (both 4- and 5-point Likert) items administered in 2008 and of eight items administered in 2017 (Figure 1). Some of the items administered in 2008 were not administered in 2017. Information about deleted items has been provided in the Supplemental Appendix 1.

EVS items used to construct our gender attitude scale in 2008 and in 2017.
Consistently with Cascella and Pampaka (2020), we hypothesized that the selected EVS items measure the same (unidimensional) trait (i.e., attitude toward gender equality) thus improving measurement discrimination because they extend the measurement range (especially at the most challenging ends). Such a hypothesis was verified by using the Rasch model that assumes construct unidimensionality.
Analytical Strategy
The present study is based on a four-step analytical approach. In the first step, we validated our proposed construct with the selected items within the framework of the Rasch analysis to develop an updated measure of gender attitudes, at different levels of locality. Then, we compared the psychometric functionality of our new broadened scale with that of the old EVS gender attitudes scale. Subsequently, we concurrently calibrated EVS items administered in 2008 and in 2017 to explore measurement invariance over time and then to explore the evolution of gender attitudes from 2008 to 2017, at least in the countries where data was available for both years (in 2008 and in 2017). Finally, we estimated a multilevel model to investigate differences within and between countries and assess their significance.
Scale Validation
For the purposes of the present study, we tested measurement invariance at different levels of regionality in Europe by performing Differential Item Functioning analysis (Osterlind & Everson, 2009) within the framework of the Rasch analysis (Andrich & Marais, 2019; Rasch, 1960/1980). The Rasch model is particularly adequate for the purposes of the present study as it is considered as a powerful construct validation tool (Baghaei, 2008; Pampaka, 2021) because its “fit statistics are indications of construct irrelevant variance and gaps on Rasch item-person map are indications of construct under-representation” (p. 1146).
Moreover, since we developed our scale by combining items from more than just one EVS scale/instrument, we used 3-, 4-, and 5- Likert items (as shown in Figure 1). We thus estimated a Partial Credit Model (PCM) (Masters, 1982), an extension of the Rasch model (Rasch, 1960/1980) for polytomous items with different response formats. Rasch proved that the model that bears his name is the only one in which a continuous measurement obeys the key measurement axioms of unidimensionality, conjoint additivity, and subgroup and subtest independence. The Rasch model assumes both person and item parameters are invariant measures, that is, they are item- and sample- free respectively, thus allowing for the comparability of groups of respondents matched on the same variables (such as their place of residence), between groups of items or between items and subjects (Engelhard, 2009, 2013). The Rasch model provides evidence (via the person and item separation reliability) for the sensitivity of the scale in differentiating person parameters depending on their attitude toward gender, and for the distinctiveness of the items in their locations, respectively.
In the current study, we used the same analytical strategy detailed in Cascella and Pampaka (2020), in line with extensive methodological literature (e.g., Bond & Fox, 2007) on the investigation of goodness of fit via the infit and outfit mean squares (Linacre & Wright, 1994), dimensionality diagnostics (Linacre, 2002) to confirm the unidimensionality assumption (Hambleton & Swimmintan), and via Differential Item Functioning (DIF) by sex, age, and education level (Osterlind & Everson, 2009), and then by country, regionality and year (i.e., wave) to explore measurement invariance across geographies, over time.
We expected that some differences in items or test/instrument’s functionality will be observed across countries. To verify whether such differences reflect substantial differences in gender attitudes in different countries or whether they are due to a statistical deficiency of the scale, we also validated our instrument separately within each country and compared person parameters estimated by country with those estimated on the pooled dataset (i.e., the dataset including all countries). This can verify that measures of individual persons were consistent across Europe (Wolfe & Smith, 2007a, 2007b) independently of which (sub-) sample is used for calibration. For each country we explored data-model fit, data dimensionality and DIF analysis by age, sex, education, and region.
The same analysis was performed with the original EVS gender attitudes scale (items v159–v166). This scale’s functionality was compared with the psychometric functionality of our proposed new scale to evaluate how our operationalization (i.e., our selection of EVS items) improves the measurement of gender attitudes.
Finally, we concurrently calibrated item responses collected in 2008 and in 2017 to put them onto the same scale and make them directly comparable (Kolen & Brennan, 2014) which could then enable us to explore the evolution of gender attitudes over time. When conducting such equating with non-equivalent groups (i.e., groups from populations that are not or may not be equivalent, as in our cross-sectional study), the parameters from different instrument versions need to be on the same IRT scale (Kolen & Brennan 2014) which can be achieved with the common (in both versions) set of items. The items in both versions (that administered in 2008 and that administered in 2017) were concurrently calibrated (i.e., estimated all together with the pooled dataset) so their resulting parameters were on the same metric/scale, which consequently made them directly comparable (Lord & Wingersky, 1984). The quality of equating (and thus the reliability of the estimates based on it) was assessed by looking at the functionality of the common, “anchor” items (infit and outfit close to 1 and low ZSTD) and by performing a DIF analysis by survey-date to account for the possible effect of instrument version (over time).
Statistical Modelling
After concurrent calibration, the Rasch person scores on this measure were used as the dependent variable in a multilevel regression model to compare people’s gender attitudes between and within countries and regions.
Results
The results section was split in two parts. In the first part, we presented the psychometric validation of our proposed scale and its functionality in the different EU countries and compared it with the original EVS scale’s functionality. Results reported in this section are based on items administered in 2008 to show the procedure employed to validate our scale. Data analysis based on data collected in 2017 has been reported in the Supplemental Appendix 4.
In the second part of this section, we illustrated the evolution of gender attitudes in some European countries (i.e., those where more recent EVS data have been already made available—see Table 1).
Part 1: Scale Validation
All the items in the EVS original scale aimed to measure people’s perceptions about the (negative) implications of women’s paid job on family life and children’s happiness. Such a consistency was mirrored by infit and outfit statistics very close to 1 (Table 2). In contrast to item separation and reliability of our proposed scale (Item Separation = 102.09; Reliability = 1.00), person separation and reliability (Person Separation = 1.43; Reliability = 0.67) were a bit low, but better than item and person separation and reliability of the original EVS scale (Item Separation: 77.85 and Item reliability: 1.00; Person Separation: .81 and Person Reliability: .40). Low person separation (<2, person reliability <.8) with a relevant person sample indicates that the instrument may not be sensitive enough to distinguish between high and low performers. Low item separation (<3, item reliability <0.9) indicates that the person sample is not large enough to confirm the item difficulty hierarchy (=construct validity) of the instrument.
Item Measures and Fit Statistics.
This could suggest that some more items could be added in future versions at the top of the scale to improve the scale’s functionality and thus better discriminate between respondents depending on their attitudes toward gender and gendered roles.
Nonetheless, both infit and outfit values are close to 1, thus providing good evidence of the unidimensionality assumption for both scales (and the constructed measure) across countries. Further analysis about data dimensionality is presented in the Supplemental Appendix 2.
Further evidence of the scale’s construct validity was provided by the (item-person) Wright map (Figure 2), that reports on the items’ hierarchy along the latent trait. Results are consistent with that reported previously for Italy (Cascella & Pampaka, 2020). Items higher up the scale were more difficult to endorse, and high scoring persons are more likely to agree with them than persons with low scores. The Wright maps provided empirical evidence of the better functionality of our scale compared to the original EVS scale along with some suggestions about the contribution of each item to the measurement of gender attitudes. In both cases, the items’ difficulty parameters range from −1.00 to +1.00 but the persons’ and items’ location/distributions in the second graph (our proposed scale) are better aligned (i.e., closer together) thus suggesting that our selection of items can better discriminate respondents along the latent trait, so allowing for a more precise measure of people’s attitudes toward gender and gendered roles. Moreover, the Wright maps also show that the addition of our proposed items rescales all the other items’ location and reveals that both item v164 (Both the husband and wife should contribute to household income) and item v166 (Men should take as much responsibility as women for the home and children) are too easy to endorse and thus do not actually contribute to the measurement of gender attitudes. These items could be deleted and replaced by some other—more difficult—items.

Person-item maps.
In our proposed scale, the most difficult item to endorse is v154, asking about perceptions on same-sex parenting, whilst the easiest (after items v164 and v166) was item v159 (“A working mother can establish just as warm and secure a relationship with her children as a mother who does not work”). Such a result may be explained considering that the number of working women has increased over time across Europe, thus making outdated the idea of the mother as housewife without any kind of paid job; in contrast to this, same-sex parenting is still highly debated and thus it is not surprising that the percentage of agreement tends to be low.
The location of v159 as the easiest item did not change depending on the respondents’ sociodemographic characteristics. In fact, our proposed measure is invariant by sex, age, and education as assessed with the DIF analysis: albeit some items showed differential difficulty between sub-groups, the DIF size was always small, and lower than 0.43, that is, the value below which some claim it is negligible (Zieky, 1993), with just one exception, that is, item v164 (Equal contribution to the household), that is much easier to endorse among more educated people than among those with lower educational qualifications.
For each item, we reported differences in person parameter by gender (Figure 3), age (Figure 4), and education (Figure 5), noting that higher values in the overall measure indicate more modern perceptions whereas lower values indicate more traditional attitudes. The differences in the estimates for the item difficulties (denoted by the letter δ) between the male and female subgroups were largest for the easiest items (i.e., v166, v164, v159, and v165) because they refer to the man-breadwinner model. In contrast, there were no differences between men and women in relation to the most challenging items, that is, those at the top of the latent trait. Both age and education affect people’s attitudes toward gender equality: older and less educated people show more traditional attitudes toward gender equity, in relation to all items.

DIF size by gender (δ denotes overall item difficulty).

DIF size between people aged [15–29], [30–49], and more than 50 (δ denotes overall item difficulty).

DIF size between low-, medium-, and high- educated people (δ denotes overall item difficulty).
Measurement Invariance by Country
To further investigate measurement invariance across countries a DIF analysis by country was performed. DIF size by country (Supplemental Appendix 3) was in most cases smaller than 0.43 which is considered negligible (Zwick, 2012, p. 4). National and local socioeconomic contexts can interplay with such results because both the percentage of more educated people and the characteristics of the job market vary across geographies, therefore this DIF may be due to real differences in perceptions (rather than the result of item bias).
To quantify how much impact cross-country differences in items’ difficulties have on the individual/person estimates, we validated our gender attitude scale within each country separately. The two person scores (that estimated for each country separately and that estimated for the pooled matrix) were highly correlated (

Correlation analysis between person parameters estimated by analyzing countries all together and separately.
Measurement Invariance of Gender Attitude Across Regions in the European Countries
DIF by region was performed within each country and then on the pooled data matrix. Results were reported in the Supplemental Appendix 3). To establish measurement invariance, we explored the association between the person scores estimated for each country separately and those estimated from the pooled matrix. The correlation analysis showed that they are highly correlated (
Measurement Invariance of Gender Attitudes Over Time
To explore the evolution of people’s attitudes toward gender and gendered roles, we concurrently calibrated items administered in 2008 and those administered in 2017 on the same scale to make items’ difficulty (and the resulting person scores) comparable over time (Kolen & Brennan, 2014).
Three out of the twelve items used to construct our gender attitudes scale were administered in both waves (i.e., in 2008 and in 2017). These items are suitable to serve as anchor items because (a) all of them showed infit close to 1, thus indicating that they fitted the Rasch model’s assumptions well (Table 3); and, (b) they included relevant aspects of the construct we want to measure (Kolen & Brennan, 2014) such as the perceived effect of mother’s work on children happiness (v72, in 2018; v160 in 2018), the importance of family and children in women’s life (v73, 2018; v161, in 2018), and the importance of a job in women’s and men’s lives (v81, in 2018; v103, in 2008).
Item Measures and Fit Statistics After Calibration.
PERSON: Separation:2.14, Reliability:.82; ITEM: Separation:109.60, Reliability:1.00.
In addition to fit statistics, we performed DIF analysis by “wave” to assess the possible effect of time and ensure measurement invariance over time. Results showed statistically significant but negligible (i.e., lower than 0.43, according to Zwick, 2012) DIF by survey, thus providing further evidence of the robustness of linking and the measurement invariance over time (Table 4).
Differential Item Functioning by Survey.
Part 2: The Variation of Gender Attitudes Across European Countries and Regions, and Over Time
Figure 7 shows European countries ordered by their average Rasch person scores on the constructed measure. After concurrent calibration, the Rasch scores show the distribution of people’s attitudes toward gender and gendered roles across countries (the higher the Rasch score, the less traditional the attitudes toward gender, and gendered roles), and its evolution over time. Results (in Figure 7) showed no variation in the countries’ ranking, but in 2017 Rasch test scores are higher than in 2008 thus suggesting, in line with previous studies, that people’s gender attitudes were more traditional in 2008 than in 2017. Northern European countries were ranked at the top of the latent trait, showing relatively less traditional attitudes toward gender compared to eastern and southern countries.

Distribution of persons’ parameter by country, 2008 and 2017, after concurrent calibration.
To explore further the subnational variability of gender attitudes, we performed a multilevel regression analysis considering incrementally the hierarchical structure of the data (i.e., respondents in regions [NUTs-2], in macro-regions [NUTs-1], in countries). The null models in Table 5 reported on the proportion of the variance of gender attitudes explained at each hierarchical level. In order to show the contribution of our proposed measure to the measurement of gender attitudes, we comparatively estimated two null-models, one using gender attitudes as estimated via the original EVS scale and one using gender attitudes estimated via our proposed measure. In both cases, results indicated that regions (and NUT-2, in particular) accounted for a big proportion of the total variance (Models 2, 3, 5, and 7; otherwise wrongly attributed to the individual level—Models 4) or, stated otherwise, that the national level cannot capture all the factors affecting gender attitudes. In addition, results in Table 5 showed that our proposed measure can capture (better than the original one) the role of regions in explaining the variability of gender attitudes (Models 2, 3, 5). Moreover, our proposed measure seemed to be able to avoid attributing to the individual level part of the variance explained by contextual factors (Model 4).
Null-models.
Results based on our measure showed that individual characteristics (level 1) explained most of the variation in gender attitudes (more than 50%, in model 2 and model 4, and more than 40% in model 3). With the 4-level model, which accounted for all levels of regionality including country, the biggest percentage of “regional” variation was explained by country (32%).
Nonetheless, 14% of the total variance was explained by region (NUT-2). The variation in the log-likelihood indicates that the best model is that with four levels (i.e., respondents nested into NUTs-2, NUTs-1, and country) which indicates that (a) more than 50% of the total variance in individuals’ attitudes toward gender and gendered roles is primarily due to personal factors (such as gender, age, education, and so on), (b) around 30% is associated with national factors; and, consistently with our hypothesis, (c) 14% is due to local factors not captured by the national level. Such a result supports our hypothesis that the variability of gender attitudes is largely affected by local (sociocultural) factors that are not captured by the national level.
Results from our study showed that the variability of gender attitudes by region was not a prerogative of big countries (i.e., those with many regions) such as Germany where the variability of gender attitudes between regions was larger than the variability of gender attitudes between countries in Europe (Figure 8).

The distribution of gender attitudes (on the
Countries with only a few regions showed different sub-national patterns: Poland, for example, showed little differences between regions. Greece, in contrast, showed a much higher sub-national variability of gender attitudes, even though both countries have a similar number of regions (Figure 9).

The distribution of gender attitudes (on the
Discussion
Research about people’s attitudes toward and about gender and gendered roles has received an increasing attention over time (e.g., Constantin & Voicu, 2015; Walter, 2018a). So far, such an interest has been 2-fold as it has been mainly aimed at (a) understanding and measuring gender attitudes, often in an international perspective, and their evolution over time; and, (b) exploring the possible relationship between gender attitudes (used as a predictive factor) and gender inequality in society at large but, in particular, in the key sectors of the (social and human) life, such as health, education, economics, and politics (e.g., Bericat, 2012). Studies of gender attitudes thus have been performed within different disciplines in Social Sciences, including for example investigations of the relationship between stereotyped perceptions of gendered (socially ascribed) skills, attitudes, interests, preferences, such as in the use of technology (e.g., Leach & Turner, 2015). They have also focused on attempting to explain educational inequality (e.g., Cascella et al., 2022), and/or economic and financial gaps between men and women (e.g., Casarico et al., 2015), and to understand the social construction of gender (e.g., Chen, 2019) and the attached gendered roles in social interactions (e.g., Lever et al., 2015).
Several studies have measured attitudes toward and about gender and gendered roles using the EVS gender attitudes scale, across countries. The current study went beyond this current research landscape in multiple ways and directions. First, we extended the operationalization of gender attitudes by including appropriate items of contemporary equality debates and showed its robustness by comparing its psychometric functionality with that showed by the original EVS scale. As with Baghaei (2008) who claimed that “the items which do not fit the Rasch model are instances of multidimensionality and candidates for modification, discard or indications that our construct theory needs amending,” and that “the items that fit are likely to be measuring the single dimension intended by the construct theory” (p. 1146), we showed that our new, extended measurement provides a one-dimensional measure of attitudes toward gender equality that is more useful and contemporary relevant than previous EVS operationalization.
The psychometric analysis we presented showed that, even though our measure works (psychometrically) better than the EVS scale, most of the available EVS items are still too easy to be endorsed. Such a result suggests that EVS items may be asking for people’s agreement/disagreement about outdated topics, mainly referred to the male-breadwinner model, and thus potentially not completely able to capture neither a definition of “gender” in step with the times nor female empowerment: there is noticeable fast progression on the latter, especially in Europe, even though it has not yet reached gender equality. Our research thus suggests that adding more items challenging the most progressive attitudes toward and about gender and gendered roles (including, e.g., those about single motherhood or fatherhood as a necessary thing to be fulfilled, administered in 2008 but not in 2017 by EVS) would be necessary to better scale people on the continuum we purported to measure.
Nonetheless, the results presented in the current paper showed that our proposed selection of items provide a good enough measure of peoples’ gender attitudes, and that the addition of items not included in previous gender attitudes measures (based on EVS items) strengthens scale’s invariance across European countries and regions.
The Rasch model is particularly appropriate to pursue the comparative purposes of the present studies as it can ensure estimates’ measurement invariance (Engelhard, 2009, 2013). Compared with previous studies, we went a step further by testing measurement invariance of our measure over time (from 2008 to 2017), and not only by country (as usually done in previous research) but also by region, within each European country, under the hypothesis that gender attitudes are primarily attributable to local (more than to nationally aggregated) sociocultural factors and thus that gender attitudes may vary at sub-national levels (across regions) even more than across countries. Our proposed measure thus fits for comparative purposes, and it is ready to be used in future research, also interested in analyzing the evolution of gender attitudes over time.
The results presented in the current paper showed that the variations at local (NUT-2) and macro (NUT-1) regional levels were found to be important in understanding gender differences in attitudes toward gender and gendered roles. Moreover, results showed that people’s attitudes toward and about gender and gendered roles vary at sub-national level, region to region within the same country, even more than between countries in Europe, thus suggesting that research intended to both measuring gender attitudes and exploring their role in predicting gender inequality should not ignore subnational variability to avoid misleading interpretations of the studied phenomena.
Our results thus provide an empirical basis upon which future research may construct more robust hypotheses about the possible relationship between gender attitudes and contextual (social, cultural, historical, and
We took some European countries as examples. We contrasted Germany and the UK (two of the biggest countries in Europe, that is, with the greater number of regions) with Poland and Greece (two smaller countries) to show that (a) the sub-national variability can be large both in big countries (i.e., countries with many regions, such as Germany) and in small countries, such as Greece; and that (b) different patterns can be observed in different countries, regardless of the number of regions. For example, our results showed that Germany is located at the top of the constructed measure along with the more modern countries regarding attitudes toward gender roles.
Nonetheless, saying that Germany is a modern country is incomplete, if not false, without specifying that, in Germany, there are some of the most and some of the least traditional regions in Europe. In contrast, other big countries, such as the United Kingdom, are characterized by a lower variability between regions thus supporting the argument that the number of regions as such is not enough to explain the sub-national variability of gender attitudes. Similarly, the sub-national variability of gender attitudes in Greece is much bigger than that observed, for example, in Poland (results about the other European countries have been reported in the Supplemental Appendix).
Interpreting these results in terms of specific cultural mediations goes beyond the scope of the current paper for several reasons. First, the number of regions included in the EVS sampling strategy has changed over time, thus making difficult the comparison over time, and calling for an in-depth study (and possibly revision) of the criteria used to identify NUT-2, in each European country. Moreover, our results suggest that regionality matters but to different extents in different countries, thus supporting the hypothesis that people’s attitudes toward gender and gendered roles are rooted in the sociocultural and historical identity of place (whose characteristics are only partially captured by the national level, that roughly explains around 30% of the variability of gender attitudes, whereas the region explained around the 15% of it, as measured via a multilevel analysis that accounts for data hierarchy). Even though the proportion of variance explained by “region” is smaller than that explained by “country,” such a 15% would be wrongly attributed to the country level if one ignored the role of regions.
Understanding gender attitudes thus calls for a more local investigation, for example at NUT-3, a level not available in most of the European countries and thus not included in the current study that focused on the cross-national and cross-regional comparison. Nonetheless, the criteria used by Eurostat to define NUT-3 (i.e., the “small regions for specific diagnosis”) may be very useful to serve (at least) as a starting point to identify (a) the sociocultural, historical and economic roots of gender attitudes, and/or (b) some relevant covariates that may be used to understand the origin of gender attitudes and, therefore, to better channel policy interventions. Nonetheless, in our opinion, these criteria may be revised and enlarged by including further dimensions that may be of help to identify and understand the roots of gender attitudes and their evolution over time, at different levels of locality.
Moreover, even though the data analysis has shown how our broadened scale of gender attitudes performs better than the original EVS scale, it also revealed possible room for improvement for the future EVS waves. Our data analysis has identified (a) the items that contribute less than others to the measurement of gender attitudes, and (b) the areas of content missing in the EVS but potentially useful to better understand people’s attitudes toward gender and gendered roles. As regards the latter, our cross-sectional analysis showed that most of the items administered in the most recent wave (2017) are based on the outdated (e.g., Walter, 2018b) “man bread-winner” model. Therefore, we recommend caution in interpreting results from cross-sectional analyses and, in particular, the jump toward less traditional attitudes in 2017 (compared with those observed in 2008) as such a jump may be due to the selection of the administered EVS items rather than (just) to an actual modernization of people’s gender attitudes.
Despite such a shortcoming, our results confirmed our hypothesis that differences across regions can sometimes be even bigger than those between countries, suggesting that comparing gender attitudes across cultures and geographies should be more appropriately located at different levels of regionality within the same country and not just at national level, with clear implications in terms of policy. The findings support the argument that policies promoting equity should account for regional variability to appropriately understand attitudes and other outcomes in the key sectors of human life such as health, economics, education, politics and so on, at least in cases that relate to gender attitudes.
Finally, the analysis carried out in Europe clearly showed that the variability of gender attitudes at sub-national level does not depend on just the number of regions per country, thus disconfirming the hypothesis that the larger the number of regions, the larger the variability of gender attitudes at sub-national level. Such a result may suggest to the international reader that the regional-national issue may be relevant more widely (e.g., for America, South Asia, Australasia) than the European “case” presented in the current paper.
Conclusions
The research presented in the current paper contributes to knowledge in different ways. First, it has shown that a new combination of items works psychometrically better (as it better scales subjects along the latent trait) than that the original scale proposed by the European Values Survey and used in a number of studies aimed at measuring gender attitudes at both national and international level in the EU. We also claimed that our measure captures a more modern conceptualization of gender by including, for example, items aimed at investigating people’s attitudes toward same-sex parenting and cohabitation out of the wedlock. Our proposed measure is thus ready to be used in other studies.
Second, our proposed measure has been validated within the framework of the Rasch analysis at different levels of locality (countries and regions), in each European country. Such a regional analysis is new in the literature and has shown that gender attitudes vary locale to locale even more than country to country. Such a result clearly calls for pausing national (or even European) policy interventions based on nationally aggregated data, but also provides information that can channel local interventions.
Of course, we do acknowledge that the unavailability of further information about the socio-cultural and economic characteristics of the “places” where our analysis has been located hinders the possibility of understanding the causes underlying the variability of gender attitudes at local level. Nonetheless, for practical reasons, EVS sampling strategy employs the NUTS classification that mirrors the territorial administrative division of the Member States: if, on one hand, such a decision supports the availability of data and the implementation capacity of policy, on the other hand, it makes “transparent” all the information we may have used to interpret the variability of gender attitudes at local level. We acknowledge that clustering territories according to further socio-cultural characteristics (not explicitly mentioned in the EVS sampling strategy) would represent an important advance in data collection and could open-up the way to knowledge advance. Yet, such information is not available. In fact, in our study, we assumed that territories going under the same “geographical umbrella” (NUT-1, NUT-2, NUT-3) share something more than other territories. Therefore, even though we acknowledge that our conceptualization of “place” is not fully mirrored by EVS territorial levels, we claim that EVS sample is to be taken as an appropriate “proxy” of possible (but not available) more refined/smaller scale regional sample, explicitly including a wider range of social, cultural, and historical variables. Irrespective of sampling limitations, we claim that the results presented in the current paper can be used as a starting point to open-up the way to further (socio-cultural and historical) research.
In addition, an important implication of our research for policy is that regional and national governments need to consider research that pays appropriate attention to regional (and local) versus international effects. In terms of interventions, for example, it seems more likely that the important research evidence will arise from similar regions inside the nation and in other nations, and not from national comparators.
Supplemental Material
sj-docx-1-sgo-10.1177_21582440241259912 – Supplemental material for Gender Attitudes Within and Between European Countries: Regional Variations Matter
Supplemental material, sj-docx-1-sgo-10.1177_21582440241259912 for Gender Attitudes Within and Between European Countries: Regional Variations Matter by Clelia Cascella, Maria Pampaka and Julian Williams in SAGE Open
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
