Abstract
Rapid social development and recent changes in views concerning childhood have urged a more holistic approach to measuring children’s well-being, particularly in the domain of rights. In light of articulated provisions within the United Nations Convention on the Rights of the Child, there is obvious interest in understanding more about how children’s perceptions of their rights have evolved. Using both traditional measurement (exploratory factor analysis and confirmatory factor analysis) and Rasch analysis, this study focused on measures gauging Chinese high school students’ perceptions of freedom of expression. The survey was designed to capture students’ perceptions over various forms of freedom of expression (e.g. student publications, dress code), and their experiences with regard to how these rights were respected in their schools. The aim of the study was to examine and evaluate validity and reliability of the survey used with a sample of 838 Chinese students from two high schools, one urban, and one rural. Overall, the survey exhibits certain degrees of validity and reliability and is appropriate for measuring children’s perceptions on freedom of expression. The study pointed to a number of areas where the survey could be improved. Implications for future research were discussed.
Measuring children’s perceptions of their rights
In the past few decades, interest in children’s rights has intensified. Concepts on childhood have advanced considerably since the 1980s (Qvortrup, 2004). The United Nations Convention on the Rights of the Child (UNCRC), adopted in 1989 and ratified by 193 countries including China, expressed that children are entitled to a full range of rights including access to education to enable moral and social responsibility. The Convention significantly enhanced children’s status both through enumerated human protections and principles conveying political empowerment (Sherrod, 2008). In this vein, children are viewed more as independent persons and social agents rather than property of adults or a class of the “weak, poor and needy” (Moss and Petrie, 2005: 55). Moreover, children are accorded value in their own right and have emerged as competent actors capable of interacting with and shaping the world surrounding them (Mayall, 2000; Qvortrup, 2004).
These entitlements have stimulated interest in discovering new methods to assessing children’s experiences and conditions (Lansdown, 2006). Theories germane to the sociology of childhood, for instance, have led to a number of innovative and critical approaches to investigating children (e.g. Qvortrup et al., 2009). Among them, the inclusive and participatory research that places the voices of children at the center of studies has drawn increasing attention from the field (Cahill, 2007). Recent research efforts have moved to more fully understanding the shift from treating children as passive objects to active and capable members of society as well as from the provision of basic needs to quality of life (Lippman et al., 2011). For example, in studying children’s participation in decision-making process, scholars have recognized the competence of children in providing unique experiences and viewpoint toward manners concerning them (Helwig et al., 2003).
Thus, there is general agreement that the perception of the child provides an avenue to more fully understanding well-being (Thomas, 2007). In the past, studies have avoided the subjective perception of children because of a supposed inferior status (Ben-Arieh, 2005). Nowadays, rather than relying mostly on indirect information (e.g. household economy, parents’ or teachers’ reports), a growing number of researchers are discovering that children’s perceptions and experiences offer meaningful insight (Lohan and Murphy, 2001). Using children’s subjective views have been seen as “both a prerequisite and a consequence of the changing field of measuring and monitoring child well-being” (Ben-Arieh, 2005: 578). Listening to children’s voice is important also because reports from parents and other adults do not comprehensively and/or truly represent the subjective feeling of children, especially older children (Bastos and Machado, 2009). Children need to be the focus and unit of children studies (Land et al., 2007).
Furthermore, research suggests that proper democracies should encourage their citizens to express their thoughts and experiences (John, 1996). Such information might provide a missing nuance, particularly in situations as Ben-Arieh (2005) argues, “when it comes to the question of how well different groups in society fare and what changes there are in their situation over time” (p. 578). The approach of self-reporting can also be applied to the discussion of children because their perspectives are essential for (1) treating children as active members of society who are able to understand and even shape the structures and processes around them (James and James, 2004), (2) social and educational policy implications, and (3) the socialization of children (Ben-Arieh, 2008; Melton and Limber, 1992).
In measuring children’s well-being, growing attention has been given to the manner in which children conceive their own rights, especially in the aftermath of the United Nations’ declaration which was formally recognized and encoded (Helwig and Turiel, 2002; Peterson-Badali and Ruck, 2008). As such, children’s well-being refers to the realization of their rights and the provision of the opportunity for every child to reach her or his own potential (Kosher et al., 2014). Children’s perceptions of their rights are considered a crucial component to their development (Cherney and Perry, 1996; Helwig, 1995). In addition, assessing the degree to which children are capable of understanding their rights in a meaningful way is a response to the call for children’s voices to be heard and given consideration (Day et al., 2006). If children’s rights are to be properly realized, it is necessary to understand their perspectives of rights when they are making “rights-related decisions” (Peterson-Badali et al., 2004: 163) and to know if they “understand their rights well enough to take advantage of them in a mature way” (Sherrod, 2008: 772). Enhancing methods to examine children’s rights enables researchers, policymakers, and practitioners to better promote young people’s well-being and development.
Instrument measuring Chinese children’s perceptions of freedom of expression
Children’s civil and political rights, such as free speech, have drawn considerable attention over the past two decades (Invernizzi, 2016). Despite China’s long history of discouraging free thought, the dramatic social transformation in China as of late has resulted in significant change in Chinese children. There are signs youth in China are becoming more autonomous and tend to appeal to personal choice when engaging in conflict with adults (Smetana, 2002). China youth seem more willing to voice what they truly think (Helwig et al., 2003). Whether Chinese children are valuing greater freedom of expression and receiving more tolerance for differing opinions still remains unclear. In addition, as the majority of studies regarding students’ rights have been conducted in Western contexts, more research in needed in other parts of the world and across varying political states. The process for defining freedom of expression and selecting representative constructs that are culturally and politically valid is very complex (Krotoszynski Jr, 2015; Zick, 2010). The contours of free speech from a global standpoint may range from “absolute” or “exceptional” entitlements (i.e. very permissible) to more “paternalistic” arrangements (i.e. less permissible) (Krotoszynski Jr, 2015). To be sure, studies have found differences in children’s understanding of their rights across Western and Eastern societies (e.g. Cherney and Perry, 1996; Melton and Limber, 1992). For instance, researchers in one study found that children in some Western countries developed their concepts and judgments about civil liberties, autonomy, and democracy far sooner than their non-Western peers (Helwig and Turiel, 2002).
To date, limited work has examined Chinese children’s perceptions related to freedom of expression (e.g. Such and Walker, 2005; Torres and Qin, 2017). Researchers in these studies have used different research designs and tools to investigate children’s rights including their free speech rights. Some researchers reported little support from schools and government for the protection of students’ free speech rights (Such and Walker, 2005). Others revealed increasing tolerance toward expression of Chinese youth. The participants seem to endorse autonomy, freedom, and rights as much as their Western peers (Helwig et al., 2003; Naftali, 2009). Using a written instrument that contained nine scenarios, Lahat et al. (2009) investigated the understanding of nurturance (e.g. well-being and education) and self-determination (e.g. freedom of religion, freedom of speech) rights in a sample of Chinese adolescents from both urban and rural areas. The respondents appeared to support both types of rights and the authors also found that urban and order adolescents were more likely to support self-determination rights. Naftali (2014) examined how urban Chinese teacher, parents, and children understood and interpreted the notions of children’s rights using interviews and observation in two public schools of Shanghai. This study has revealed the complexities and conflicts that related to discourses and practices of children rights, especially their civil and political rights.
In light of the paucity of research examining students’ rights in mainland China (Lahat et al., 2009), a survey was designed to capture students’ perceptions over various forms of freedom of expression (e.g. student publications, dress code), and their experiences about whether selected rights were respected in their schools. The data were collected from a sample of 838 high school students, which represented both city and rural areas within a northwest province of China. This study gives close attention to the validation properties of the instrument to assess how well the survey is able to measure students’ perceptions concerning freedom of expression in China.
The first construct of the survey was developed to evaluate the manner in which Chinese high school students understood their free speech rights. The attitudes measured by the survey reflect how participants perceive various forms of freedom of expression. Waldman (2010) suggests students’ speech rights under school settings may include (1) school-sponsored speech (e.g. school newspapers, student bulletin boards) and (2) individual student speech (e.g. student speech occurring on campus during school time). These two aspects embody this construct. Despite significant differences between Western and Chinese societies, notions of freedom of expression in China are similar to those in Western society.
The second construct of the survey was created to gauge the extent to which Chinese students felt that they could freely express their opinions and feelings within their specific school. Items under this construct provide a basis to assess perspectives within a unique organizational context (Limber et al., 1999). Inspired by Bronfenbrenner’s (1979) ecological perspective, Khoury-Kassabri and Ben-Arieh (2009) suggest children’s views of their rights are influenced by both individual traits (e.g. gender and age) and contextual and environmental characteristics (e.g. family and school). Researchers noted that the level of tolerance and acceptance to dissenting voices in a school influenced students’ enforcement of free speech rights (Helwig and McNeil, 2011). Also, the conception of children’s rights can vary across schools within a country (Nucci et al., 1996). Thus, children’s perspectives of their rights should be analyzed in context (Ng, 2010). While it is assumed that schools are instrumental to students achieving academic success and preparing responsible members of society (Cohen, 2006), variations occur which may result in less support for expression and articulation of diverse viewpoints for some children (Jin, 2002; Ng, 2010). Studies have documented a lack of cultural tolerance within the Chinese school system and oppression of student voice.
What makes this study more significant is that there are few survey instruments measuring children’s perceptions and experiences of their rights. Absent advanced tools, evaluating treatment and perceptions of children’s rights will continue to be a challenge. Furthermore, although some studies have examined children’s perceptions on their rights, none so far have reported the psychometric properties of their instruments with a modern item analysis approach (e.g. Rasch models) (Harris and Livingstone, 2001). Some recent efforts have been observed in the related fields such as human rights studies (e.g. Lopez et al., 1998; Mullany et al., 2007). Most of these studies used traditional measurement approaches to validate their instruments, such as factor analysis (e.g. exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) and Cronbach’s alpha). In a study measuring teachers’ attitudes toward human rights in Hong Kong, the authors adopted both CFA and EFA and Rasch analysis to validate a six-dimensional instrument (Lo et al., 2015). Giving the values of understanding children’s rights, it is important to develop a psychometrically sound instrument measuring their rights.
Rasch measurement framework
Developing a psychometrically sound instrument is a complex process. Measurement issues such as vagueness in how constructs are defined may distort survey inquiries (Green and Frantom, 2002). A desirable measure is one characterized by high reliability and validity. In recent years, Rasch measurement models have been utilized to develop and validate instruments (e.g. Beglar, 2010; Nam et al., 2011). One advantage to a Rasch model is that it can more accurately validate survey instruments by producing interval measures (Liu and Boone, 2006). Because this study’s instrument was designed to capture children’s perceptions of freedom of expression, the Rasch analysis provides a multifaceted assessment of the survey instrument.
As a modern test theory approach, Rasch analysis offers an alternative framework for understanding measurement and judging the quality of an instrument in social science studies (Rasch, 1980). Rasch models can address the psychometric criteria and diagnose the quality and structure of data. Unlike most of the traditional measurements, Rasch models can examine person and item interactions separately (Andrich, 2011). In Rasch modeling, the parameter of difficulty varies and the probability of a person endorsing an item depends on only the difficulty of the item and the person’s ability. The total scores of both person and item are mapped side by side on a logits scale ruler. This probabilistic relationship can be tested by a set of fit statistics which compare the differences between the theoretical item performance and observed data (Andrich, 2011). The closer the results are to the expected results, the better fit the data are to the Rasch model.
In addition to measure knowledge tests, Rasch analysis has been largely used to measure latent variables such as attitudes, traits, and skills. This study aimed to validate a survey examining students’ perceptions of freedom of expression. Therefore, the key terms such as “person ability” and “item difficulty” in the current analysis have been interpreted accordingly. Person ability (student ability) in our study refers to the level of conviction; that is, the extent to which students agree about the statement of each item regarding freedom of expression. Item difficulty means the level of endorsing a statement. Full endorsement refers to “correct answer” (Sjaastad, 2014). According to the Rasch measurement, scores may become higher when participants have more of a specific trait (“a lot like me”) and they become lower when they have less of the trait (“a little like me”) (Green and Frantom, 2002: 9). In this study, for example, individuals who showed a greater level of endorsement of items were considered more likely to demonstrate support for those rights. On the other hand, students who had less support of the rights could find it is hard to endorse some of the items.
This analysis examines the reliability and validity of an instrument measuring freedom of expression from a Rasch perspective. The results are expected to provide information concerning the dimensionality of the instrument, the quality of each item, and the evidence regarding the generalizability (e.g. how items function across groups and settings). This information will help researchers and practitioners determine how to optimally use and develop similar social surveys.
Method
Research purpose and procedure
The freedom of expression survey that comprised 4-point Likert-scale items was validated using a sample of Chinese high school students. Researchers examined and evaluated validity and reliability of the survey using the following research questions:
What validity and reliability evidence can Rasch measurement provide to support the use of freedom expression instrument according to a sample of Chinese high school children?
What changes or improvements are needed to improve the instrument?
The completed questionnaires were coded and data were entered into the Statistical Package for Social Sciences (SPSS 23). Negatively worded items were reversed for analysis. Then all questions were polarized to be positive (high scores indicated higher level of agreement). The Rasch analysis was conducted using Winsteps. The analyses proceeded in four stages:
The first stage was the descriptive statistics by adopting SPSS to provide a general picture of the data. By testing the linearity and normality of the data, the analysis focused on whether the data met requirements and assumptions for further analysis. For each individual item, frequencies, means, and standard deviations were calculated. The study also reported means of items by groups.
The second stage explored the dimensions underpinning the survey, which involved an EFA to determine the appropriate number of factors to extract and rotate. And then a CFA was used to compare two different proposed models to decide which one fit the data better.
The third stage involved a separate Rasch analysis of the items, which included the dimensions identified in the first stage. This analysis involved a (1) rating scale analysis, (2) person and item response validity, and (3) hierarchical ordering and test targeting.
The fourth stage applied the Rasch model to investigate the existence of differential item functioning (DIF) items in the data across different groups – city/rural and male/female students. Parameters for this model were estimated using joint maximum likelihood estimation procedures as implemented in Winsteps (Linacre, 2002).
Sample
The sample included 838 participants who were sophomores (Gao er in Chinese public school system) from a city and a rural high school of China (age range was 16–17 years). The definition of “children” used in this study came from the Article 1 of the Convention, “every human being below the age of eighteen years unless under the law applicable to the child, majority is attained earlier” (UNCRC, 1989: 1). Children are not a homogeneous block and the concept of children is dynamic in nature (Shek et al., 2007). Researchers suggest that rather than solely focusing on the deficits of young people, more attention should be paid to children’s the competencies and potentials. For example, when children reach late adolescence, they become “more proficient in perspective and role taking, emotional expression, and social awareness”.
The sample was drawn from both city and rural settings in order to partly capture the contextual and cultural variation in perspectives concerning individual rights. Some prior findings suggest that both city and rural adolescents express considerable support for self-determination rights (Helwig et al., 2003). Other studies, however, show that rural children are more likely to be characterized by traditional cultures and authoritarian family values than their city counterparts and showed less support for autonomy and participant rights (Kipnis, 2006).
Two high schools in northwest of China agreed to participate in this study. The city subsample was drawn from one of the top high schools in a major city. Most of the students in this school came from middle class and above families. The rural subsample was drawn from a school located in a small town that was 50 miles away from the city. Most of the parents of students in this school were farmers by occupation. In addition to city–rural differences, gender was included as a variable. The total sample from the two schools comprised 438 males and 400 females. All 838 participants in the study provided verbal and written information about the study. The questionnaire was administered in a secluded room on-site and returned to the project representative. Table 1 displays gender breakdowns for the sample.
Demographic characteristics.
Instrument
The survey was part of a broader research project aiming to more accurately assess students’ perceptions of rights and school climate toward democratic practices. It included two major sections: students’ demographic information and perceptions and experience relating to freedom of expression in their school. The survey contained questions regarding student involvement in various free expression decisions in school settings. To be sure, some of these practices occur more frequently within the Chinese cultural context. Items in the survey generally inquire about tolerance for different perspectives and dissent, dress codes, and school press activities. The survey provides a clear definition for freedom of expression at the beginning of the survey in order to establish a common understanding among the participants. Questions 1 and 2 (question 2 contained 4 subquestions) were not included in the current analysis due to the different category and different sample size. Therefore, this study included 11 questions in total. All items were presented with four response options: 1 = Disagree, 2 = Somewhat Disagree, 3 = Somewhat Agree, and 4 = Agree.
Results
Data description
Descriptive statistics provide a basic picture of data for student response. Table 2 contains the means and standard deviations of the measures on the scales. The skewness and kurtosis of the data distribution have been presented as well. The skewness at a range of −2 to +2 fell within an acceptable span (Bachman, 2004). The negative values of kurtosis display a fairly flat distribution. From the standard deviation, all responses concentrated around the mean. The small standard error for each item indicates that the sample mean accurately reflected the actual population mean.
Data description (N = 838).
Table 2 illustrates the overall support for freedom of expression. The results show the variations across groups. The city participants scored higher on the most of the items than their rural peers, especially on the questions associated with the perceptions of the school environment.
Missing data
In order to identify if the missing data occurred randomly, an in-depth missing data analysis using SPSS Missing Value Analysis was administered. It was confirmed that all of the EM (Expectation-Maximization) estimations were greater than 0.05, which means that the cases with missing values were not significantly different from the cases without missing values. The pattern frequencies chart (Chart 1) illustrates that the first pattern, one without missing values, is the most prevalent pattern. The other patterns that contain missing values are far less prevalent.

Missing value patterns.
The item with the largest portion of missing data was question 13 (5.6%). Besides some common reasons for missing data (e.g. students not paying attention or entry mistakes), missing data could occur if the question was considered to be sensitive to students. Some students might have intentionally not answered the questions due to social desirability concerns (Booth-Kewley et al., 2007).
Dimensionality
One of the important Rasch assumptions is unidimensionality, whereas single common factor accounts for the variance among all item responses (Embretson and Reise, 2000; Rasch, 1980). We first utilized the traditional measurement approaches to explore the dimensionality of the survey.
EFA
EFA with varimax rotation was performed for clarifying the dimensionality of the survey. Tabachnick and Fidell (2007) suggest that if factor correlations are not significant, varimax (orthogonal) techniques can be adopted for rotation. In this study, there was no significant correlation between the factors (0.22). In addition, several tests were executed in order to test the appropriateness for factor analysis. The Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy for the sample (0.851) was acceptable. Bartlett’s test of sphericity was significant (chi-square (χ2) = 2338.02, p < 0.01), and goodness-of-fit test was 0.189 (nonsignificance) (Kaiser, 1974). Together, these tests results supported the assumption that the data were suitable for further analysis.
The results of the two-factor analyses are presented in Table 3. The factors together account for 60.59% of total variance. Among the factors, Factor 1 explained 35.42% of the total variance. All of the seven items had high loadings on the first factor and the other four items had high loadings (>0.60) on the second factor. Both varimax and promax solutions showed similar patterns of item factor loadings. The only difference was in the absolute values of these loadings. Cronbach’s alpha value of 0.879 for the first factor and 0.793 for second factor demonstrated adequate internal reliability (Kline, 2005).
Exploratory factor analysis (EFA) of the survey (N = 838).
Source: Torres and Qin (2017).
CFA
In order to strengthen the conclusion of a two-factor solution received from the EFA, CFA tests using the Mplus program 7.1 were performed to test the proposed model (as displayed in Table 4). The indices of root mean square error of approximation (RMSEA), comparative fit index (CFI), and χ2 were chosen to test the model fit (Ullman, 2001).
Confirmatory factor analysis (CFA) to test proposed models.
DF: degree of freedom; RMSEA: mean square error of approximation; CFI: comparative fit index.
Source: Torres and Qin (2017).
Table 4 indicates that a single-factor model in which all 11 items are loaded on a single-factor results is a poor model fit to the data, as reflected by a large and significant χ2 value of 2127.067 (df = 44, p = 0.000). This was also confirmed by range-of-fit indices that yielded a poor CFI = 0.626 and RMSEA = 0.238 (CFI > 0.97 and RMSEA < 0.08 are considered a good fit) (Kline, 2011). The CFA further suggests that the second model with two factors (subscales) fits the data better than the first model (χ2 = 82.816; CFI = 0.993; RMSEA = 0.033). All of the fit indices in the second model met Ullman’s criteria, which proves that the items have been grouped into perceived rights (students reported their perceptions regarding freedom of expression) and perceptions of school environment (students evaluated school environment in terms of freedom of expression).
Subscales Rasch analysis
Based on the results of EFA and CFA, the survey could be described as a two-subscale model, with four-item and seven-item subscales. Thus, as a next step, researchers applied a subscales Rasch analysis to assess for model fit. The four-item subscale represents perception of school environment, and the seven-item subscale represents perceived rights.
Item fit statistics and difficulty measures
One of core purposes of Rasch modeling is to examine how persons and items fit the model. The fit statistics are calculated for both persons and items to test how well items have discriminated persons and how well they have met model expectations (Wright and Stone, 1979). In this study, the fit statistics produced through the Rasch analysis included two indicators. The infit (weighted) statistics are sensitive to ratings of the items that are close to student abilities; the outfit (unweighted) statistics are mostly influenced by the ratings on the off-target items (i.e. questions too easy or too hard for the respondents) (Bond and Fox, 2007). The mean square (MNSQ) is an important item fit index for the lack of consistent responses and validity of scores (Adams and Khoo, 1993). Infit and outfit MNSQ indices have an acceptable range of 0.77–1.30. Any MNSQ larger than 1.30 indicates that the items have low discrimination and below 0.77 indicates that the items have redundant information (Wright and Stone, 1979).
Item fit statistics and difficulty measures are summarized in Table 5. The residual values represent the differences between an item’s theoretically expected score and the actual scores in the data matrix. The larger residuals display the bigger gap between how the item should have performed versus its actual performance. In Table 5, except for item 6, the range of 0.71 to 1.00 for infit MNSQ and the range of 0.77 to 0.99 for outfit MNSQ suggest that these items are within acceptable thresholds. The fit index of item 6 (in school, students’ leaflets and publications should be scrutinized and approved by school first) is relatively high (infit 1.91; outfit 1.90). The value of the point-biserial (0.29) also indicates that the item should be flagged. The result shows that this item has high level of randomness (e.g. noise). One possible reason of this result is that the item was poorly written and ambiguous, or students did not have relevant knowledge about the statement.
Item measure and fit indices for the models.
The bubble chart in Figure 1 has given us a more intuitive display of the items. MNSQ statistics display a plot of item difficulty related to measures linked to person ability. In the subscale 2, the outfit MNSQ of all the items except for item 6 was closely distributed around the item fit mean (1.00), indicating acceptable measurement properties. Item 6 is outside the range. Furthermore, Figure 1 also contains information concerning relative difficulty of the items in logits. The higher the items, the more difficult it was for students to endorse them.

Bubble chart on item fit: subscale 2.
Item–person map
The Wright map can visually illustrate the relationship between person and item (Wang and Wilson, 2005). In Figure 2, the vertical dashed line represented the low-to-high continuum of “endorsement” or “agreement” toward the statement of freedom of expression. The items and students share the same linear measurement units. Starting from the bottom, the items are aligned from easy to difficult on the right of the line. On the left, the students’ agreements are aligned in an increasing order. Each # equals 11 students. This item–person map (Figure 2) demonstrated how well items targeted the range of student ability (level of endorsement in this study).

Item–person map for the subscales.
In Figure 2, the distribution of students is consistent, making a curve with a U-like shape around the mean. Compared with the student distribution, the item distribution is quite narrow, mainly centering on the mean, which suggests that item hierarchy for both subscales is poor. Ideally, items should form a ladder with low agreement on the bottom to high agreement on the top. In this study, all of items in the two maps are located at −1 and +1 logit zone, which only allows part of student ability to fall within the range. According to Bond and Fox (2015), the instrument is well targeted if the distribution of the respondents is opposite to the distribution of the items. Thus, the result signals a poor targeting of items on the responses and there are no sufficient items to distinguish the levels of student ability. In addition, items with similar difficulty are located at the same point on the map. For instance, three items are located at the same point on the map. Even though the items were similarly difficult, a decision was made to retain the items as they represent fundamental aspects of student speech rights.
Rating scale analysis
To assess the accuracy of the category usage and improve the reliability of the survey, the rating scale analysis was employed to examine the average measure and threshold of each category (Wang and Wilson, 2005). Figure 3 shows the modeled category probability curve for both subscales. The categories are presented in an ordinal manner using stepwise calibrations. The peak for each category is distinctive, which means that the rating categories are used as intended. Hence, the 4-point rating scale in the instrument yields good construct validity and is appropriate for the measurement.

Category probability curves for the subscales.
DIF
The DIF is designed to test differential item validity across groups (Camilli and Shepard, 1994). DIF is identified when some items favor one group over other group(s). When DIF occurs, the individuals with same abilities from different groups might response item(s) differently (Thissen et al., 1993).
In this study, the DIF analysis was conducted among two groups, city and rural, male and female students, to examine item invariance across groups. DIF can be detected by testing differences (paired t tests) in item difficulty calibrations between groups (Camilli and Shepard, 1994). Most of the items across schools and gender did not show DIF except for item 10 in the gender group. DIF measurements in Table 6 display the possible gender DIF occurring in the subscale 1 model. Class 1 represented male students and Class 2 represented female students. The DIF contrast is the difference in difficulty of the item across the groups. “Prob.” shows the probability of observing significant contrast between groups. In Table 6, the difficulty of item 10 for the female students is −0.72 logits and −0.41 logits for the male group. The DIF contrast was 0.31 logits for this item. In addition, the Prob. of item 10 is 0.045 (Prob. ≤ 0.05). Thus, the item 10 may have functioned differently between male and female groups. It appeared significantly more difficult for male students to endorse the statement the item 10 (there should be some limits on what students are free to wear in the school). The observed DIF in item 10 needs to be further investigated to decide whether the actual measurement invariance occurred between gender groups.
DIF on subscale 1 gender group.
DIF: differential item functioning.
Figure 4 offers further evidence. The DIF in gender group in subscale 1 is detected on the overlapping lines where the t value revealed the significance of the unit deviate. Graphically, male students appeared to differ from female students on responses to item 10.

Plot of the items comparing male and female students in subscale 1.
Discussion
Inspired by the emerging views of childhood and children’s rights, there is growing interest in children’s perceptions, which is consistent with the UN’s position that children’s thoughts and ideas do matter in with respect to informing and monitoring legal rights (Ben-Arieh and Frones, 2009). Recent studies have used children’s perceptions and experiences as a construct to measure well-being (Cummins and Lau, 2005; Fattore et al., 2007). However, while self-reports provide a promising avenue to study well-being, few studies has focused on the validation of how well indicators reflect children’s subjective feelings. It is important to underscore the importance of validation to “[supporting] the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1989: 13). It will be problematic to design and implement social and educational policies on the basis of unreliable or invalid instrument results.
By employing a Rasch analysis in tandem with traditional measurement approaches, the researchers attempt to establish validity and reliability of a freedom of expression survey. The study sets out to validate a self-reporting survey measuring the two subscales associated with students’ attitudes and perceptions of freedom of expression in high schools within China. The components and results of this study partly support various aspects of validity and reliability of the instrument.
First, the findings suggest that combining a classical test analysis (EFA and CFA) and Rasch models have contributed to, as researchers believe, a more comprehensive process to evaluating the soundness of an instrument (Edelen and Reeve, 2007). The results from the EFA and CFA analyses provide a good starting point for assessing constructs prior to using Rasch modeling. The factors have been successfully separated as we expected, and the expected patterns have been generated.
The finding of two subscales, perceptions of free speech rights and perceptions of school environment, optimizes the measurement of freedom of expression within the context of Chinese high schools. The results suggest indicators of children’s perceptions and attitudes need to be distinguished from indicators of contextual factors, such as the effects of school and family environments (Lippman et al., 2011). Without distinguishing measures of individual well-being from contextual factors, distortions are more likely to occur hampering the meaning of the results. In other words, the indicators will be less valid and reliable when informing policy and programmatic interventions (Lippman et al., 2011).
Second, the Rasch analysis provided useful insight into item fit within the two subscales and appropriate ordering of response options. Overall, most of the item infit and outfit MNSQ statistics mirror the average fit of the items according to the Rasch model’s expectations. That is, the appropriateness of these items to measure Chinese children’s perceptions of freedom of expression was confirmed. The analysis points out one potentially misfit item (item 6), suggesting that it is a candidate for revision or deletion. Furthermore, the results from item–person maps display the item insufficiency in measuring issues pertaining to students’ freedom of expression. That is, the components of freedom of expression are not adequately measured by the items. Previous research evidence has shown that the measurement ability of an instrument may be improved when more dimensions or items are introduced (Pallant and Tennant, 2007). Hence, adding more items and even constructs may help to more comprehensively capture children’s concepts and experiences regarding freedom of expression.
Third, by employing a Rasch model, the study was able to assess the DIF of the survey across the city/rural and gender groups. The results reveal that most survey items do not significantly vary across gender and school groups in terms of item functioning, which means that the comparability among participants on those items is assured. There is only one item showing possible DIF in the gender group. Compared to the female students, the male students seem to have more difficulty endorsing the statement that students should follow the school dress code. Further investigation of the reason for the resulting DIF between male and female students on this item should be considered. If the item does not have an equivalent meaning across all groups, it may not be directly comparable between gender groups (Cheung and Rensvold, 1999).
Despite these issues, the above findings together confirm an acceptable degree of validity and reliability for the survey. This finding too suggests that results of descriptive statistics are also acceptable. The findings demonstrated students’ favorable perceptions regarding free expression, which is consistent with prior studies showing that Chinese children have become more confident in expressing their own opinions and are more willing to challenge authority (Helwig, 2006; Shen, 2000). In addition, city participants reported more positive attitudes toward the school environments’ treatment of their freedom of expression than their rural peers. Detailed results and discussions can be found in published study (Torres and Qin, 2017).
Although the results of the survey are promising, researchers acknowledge the limitations. First, the study only included two demographic groups at the average age of 17–18 years (gender and geographic location). The manner by which children perceive their rights derives from the interplay between individual preferences and a variety of resources (e.g. society, family, school, and peers) (Ben-Arieh and Frønes, 2011). The various contextual factors such as culture, society, and political and environmental characteristics all contribute to the variations among children (Zeb, 2008). The differences also arise due to their demographic characteristics, such as gender, age, and geographical location (e.g. rural and urban areas) (Zeb, 2008). This may create extra methodological challenges to the studies involving children. In addition, researchers may face difficulty generating reliable findings based on different groups, especially age-related groups. Nevertheless, this process is constantly changing along with children’s evolving capacities, which will certainly call for a set of indicators that are responsive to the nuances of different groups and various developmental states of children (Kosher et al., 2014).
Second, this study relies on a convenience sample. Future studies using this instrument might be applied in other places within China. Furthermore, although the DIF test has indicated the item invariance across groups (e.g. gender and geographic location), the results of this study may be still limited to the context of China. As researchers noted, the measures developed and used with limited local samples usually lack social and economic diversity. Cross-country research will be valuable to assessing the generality of the instrument (Lippman et al., 2011).
Third, this study has only identified two constructs of children’s freedom of expression and one of them is related to context/environment evaluation. This finding reflects the challenge to identifying dimensions (constructs) of children’s rights. The development of a unifying framework would be desirable to better defining dimensions and indicators of children’s rights, particularly with respect to those under free speech. Also, it may help to develop comparable indices that make cross-nation research more possible and accurate (Fernandes et al., 2012).
Conclusion
According to Mauthner (1997), “When space is made for them, children’s voices express themselves clearly” (p. 2). The rapid social development and recent changes in the views of childhood call for more comprehensive and reliable measures to capture children’s well-being, particularly in the area of human rights. What children think about their rights is important to understanding their level of motivation to engage in more democratic action. In addition, how children perceive their rights carry important policy implications for improving quality of life, such as school and family (Shek et al., 2007).
In the present analysis, researchers emphasized the value of children’s views and experiences (Ben-Arieh, 2007). Yet, to more fully and accurately understanding these views and perceptions, further investigations into instrument validity and reliability will need to occur. The survey in this study was designed to gather data on the concept of freedom of expression within particular contexts. While freedom of expression cannot be fully captured in an opinion survey alone, the survey does provide a glimpse into their interpretation of a cherished right.
This study has shown that the Rasch measurement model can be a powerful tool in evaluating the quality of instrument. It can help to establish a fundamental measurement baseline before reporting the psychometric attributes (Tennant et al., 2004). This investigation provides evidence to the validity and reliability of a survey on Chinese students’ perceptions on freedom of expression. Overall, the survey exhibited acceptable quality and was appropriate for examining the topic. Meanwhile, the study points to a number of areas where the survey could be improved. Future survey iterations should acknowledge the distribution of item difficulty (fill in the gaps) and possibly include new constructs (items) to increase the survey’s utility as an effective tool to measure the free speech rights of the child.
Despite these limitations, this survey may serve as a base to build a potential tool for the investigation of children’s rights in the future studies – one that can be applied not only in the contexts of Chinese society but also in other counties. Should the instrument be applied in future studies, especially across cultures, Rasch analysis is a desirable method for further validation.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
