Abstract
This study investigated the effects of an instructor's sex, field of study, gendered teaching style, and expected grade on students' evaluations of teaching using hypothetical scenarios. Participants responded to the Instructor Knowledge Scale, Instructor Likeability Scale, Instructor Teaching Ability Scale, Expected Grade, Workload, and Attendance scales based on the assigned hypothetical scenario. The results of multivariate analysis of variance indicated that students rated an instructor with a feminine teaching style positively on likeability and teaching ability in comparison to an instructor with a masculine teaching style. Mediation analyses showed that the difference between feminine and masculine teaching styles in perceived likeability, teaching ability, and expected workload was mediated by the difference in expected grade between feminine and masculine teaching styles. This study showed the complex dynamics of sex and gender roles on students' evaluations. Overall, however, students' evaluations may not be an accurate tool in evaluating teaching abilities.
Student evaluations of teaching have been routinely employed in higher education since the 1980s (Greenwald, 1997) and are considered to be the indices of a teacher's skills in teaching. Highly relevant in decision-making regarding continuing status or tenure, students' evaluations are one of the most essential tools used by college instructors to advance their careers (Radmacher & Martin, 2001; Ellis, Burke, Lomire, & McCormack, 2003). Consequently, student evaluations have been found to be significantly correlated with faculty burnout, especially for younger faculty who have high aspirations (Lackritz, 2004).
Another major reason for using students' evaluations is based on the assumption that they accurately evaluate students' learning. Because of the powerful influence they have on both faculty careers and quality of student learning, an entire “Current Issues” section of American Psychologist (American Psychological Association, 1997) was devoted to them. Some early studies suggested there is a moderate positive correlation between student evaluations, teaching effectiveness, and students' achievement (Greenwald, 1997). However, more recent studies, such as one conducted by Greimel-Fuhrmann and Geyer (2003), have shown that students' evaluations have weak correlations with students' learning and achievement. In accordance with these later findings, Tomasco (1980) asserted that student evaluations are more like personality contests than valid measures of teaching effectiveness and student learning. Zabaleta (2007) concluded they may be relied upon too heavily without any analysis of their validity. Despite such criticisms, student evaluations of instructors are frequently used and are exceedingly consequential for instructors. Given the concerns raised about their use and consequences, it is crucial to identify any factors that may introduce student bias, and also to determine if the evaluations accurately assess what they are designed to evaluate.
While research on the validity of these evaluations has produced many contradictory results, it seems clear that they have biases (e.g., Greenwald, 1997) and a number of factors appear to compromise their validity. If student evaluations are significantly affected by such biases, then evaluations are not useful and also leave the career prospects of many university faculty measured by a tool that inaccurately evaluated their teaching. This study examined the effects of such biases by investigating the relationships among the following factors as they relate to student evaluation outcomes: sex of instructor, gendered teaching style, field of study, sex of participants, and expected grade.
Effects of instructor's sex and gendered teaching style on students' evaluations
The effects of instructor's sex on student evaluations have been investigated thoroughly but have yielded mixed results. Some researchers have found that female professors are rated unfavorably in comparison with male professors (e.g., Basow & Silberg, 1987), while others have found the opposite (e.g., Feldman, 1993) or no sex effect at all (e.g., Blackhart, Peruche, DeWall, & Joiner, 2006). However, the majority of researchers now agree that student evaluations are influenced by the sex of instructors and factors that interact with instructor's sex (Freeman, 1992, 1994; Beyer, 1999; Arbuckle & Williams, 2003). For instance, previous researchers have found that male and female students tend to have different expectations of male and female professors. Basow, Phelan, and Capotosto (2006) asked participants to think of their best and worst professors and to describe what factors made them the best and worst. They found that male students described interpersonal skills as the most important characteristics of the best female professors. Male students also described the best female professors to be flexible, able to relate with students, able to provide a comfortable classroom environment, and willing to take time to help students. From the best male professors, male students expected commendable courses. On the other hand, female students did not seem to use the sex of the professor as a guiding factor in their evaluation of best and worst professors (Basow, et al., 2006). Given that respondents described best male and female professors differently, Basow, et al. concluded that students, particularly male students, may be affected by gender stereotypes when evaluating professors.
Basow and Silberg (1987) asked college students to evaluate their instructors in terms of teaching effectiveness and sex-typed characteristics. They found that both male and female students rated female professors significantly poorer than male professors on teacher effectiveness. In particular, male students evaluated female professors who violated gender stereotypes more poorly, indicating that gender-role stereotypes may have a greater influence on students' evaluations than the sex of the instructor alone. In accordance with this assertion, Basow and Distenfeld (1985) found that male teachers with stereotypical masculine traits tended to receive higher evaluations than did male teachers with stereotypical feminine traits, while female teachers with stereotypical feminine traits received higher evaluations than female teachers with stereotypical masculine traits. The authors concluded that these ratings were more indicative of the instructor's adherence to traditional gender roles than of teaching ability.
According to Nieva and Guteck (1981), the negative effect of gender-role violation on student evaluations can be explained by the gender-role congruency hypothesis, which states that behaviors congruent with gender roles would be more favorably evaluated than gender-role incongruent behavior. Several studies have tested this hypothesis, by evaluating the effects of the presentation of stereotypical feminine traits (e.g., warm, nurturing) and stereotypical masculine traits (e.g., aggressiveness, expressiveness) on students' evaluations (Kierstead, D'Agostino, & Dill, 1988; Best & Addison, 2000). The majority of those studies employed masculine/feminine teaching styles by using masculine/feminine characteristics to describe teachers. According to Basow and Silberg (1987), such studies may be limited when only incorporating one gender-role trait for each sex—which was the case in all four of the aforementioned studies.
Therefore, in the present study, several of the feminine and masculine qualities identified in a study by Cejka and Eagly (1999) were used. Cejka and Eagly identified eight masculine and 16 feminine personality traits that are significantly associated with respondents' beliefs in occupational success. For the purpose of this study, six feminine characteristics, such as “affectionate,” “sympathetic,” and “cooperative,” versus six masculine traits, such as “aggressive,” “dominant,” and “resistant to pressure,” were used. These personality traits were selected among all traits because they fit in the teaching scenarios. Furthermore, they were chosen because, to date, such traits have not been tested in teacher evaluation studies.
Effect of instructor's field of study on student evaluations A concept similar to the gender-role congruency hypothesis, Heilman's lack of fit model (1983) addresses the effects of the congruency between perceived job requirements and an employee's sex on evaluations of job performance. According to this assumption, perceptions of how successful or unsuccessful a person will be at a given job are determined by the fit between the job requirements and the sex of the employee. Specifically, this model predicts that an individual will be seen as incapable, and thus evaluated unfavorably, when working a job that requires traits commonly associated with the individual's opposite sex. Lyness and Heilman (2006), sampling upper-level managers, found that female managers who were in masculine professions tended to be unfavorably evaluated in their job performance in comparison to female managers in feminine professions. They claim that women in masculine jobs are expected to perform poorly, and are, therefore, held to stricter standards, leading such women to be evaluated less favorably than their male counterparts.
On the basis of the lack of fit model and previous findings, it was predicted that student evaluations would show a significant interaction between field of study and the instructor's sex. Furthermore, female professors in any field may encounter greater lack of fit in comparison to male professors because, as Bellas and Toutkoushian (1999) have proposed, academia, including college teaching, remains a male-dominated profession. As a consequence, a female professor who is in a masculine field may be unfavorably evaluated by students in comparison to a female professor who is in a feminine field. In the present study, psychology and computer science were chosen, respectively, as feminine and masculine fields of study, in accordance with Beyer's finding (1999) that psychology is generally perceived as the most feminine college major, while computer science is viewed as the most masculine.
Effects of expected grade on student evaluations
Another factor that may significantly influence student evaluations is a student's grade. Given that student evaluations are generally conducted before students receive a final course grade (Burns, 2007), researchers investigated the effect of both actual and expected grade on students' evaluations. For instance, Blackhart, et al. (2006) conducted a study to determine whether class size, course level, instructor's sex, number of publications, actual average grade, or instructor's rank were significantly related to teaching evaluations. They found the average actual grade given by an instructor was highly positively correlated with the student ratings received by that instructor. On the other hand, Greenwald and Gillmore (1997) investigated the effect of expected grades on teacher evaluation using actual student ratings obtained at University of Washington. They found that students who expected they were getting As rated their professor more highly than students who believed they were getting Cs. They also found that the effect of grading style was substantial enough that an instructor who scored in the bottom third on his or her evaluations (among colleagues at the same university) moved up to the top third following a transition from a strict to a lenient grading style. Other researchers have suggested that, along with lenient grading policies, high actual grades and high expected grades are both significantly related to higher students' evaluations, while course rigor is not significantly correlated with such evaluations (Aleamoni, 1999; Millea & Grimes, 2002; Ellis, Burke, Lomire, & McCormack, 2003; Griffin, 2004; Germain & Scandura, 2005). Redding (1998) attested that grading leniency is associated with the grading inflation problems faced by many universities in the United States.
Clayson, Frost, and Sheffet (2006) and Wachtel (1998) explained this phenomenon by using the reciprocity effect. That is, students reward instructors who give them good grades and punish instructors who give them poor grades, regardless of other student characteristics. However, it is unclear how students evaluate their instructors' leniency before they receive any indication of their actual grade. The present authors suggest that an instructor's teaching style may be associated with how students evaluate his or her leniency. When an instructor uses a feminine teaching style, students may expect the instructor to be sympathetic and cooperative, and therefore more lenient in grading. On the other hand, when an instructor uses a masculine teaching style, students may anticipate that the instructor will be authoritative and resistant to their desires for lenient grading. Therefore, it was expected that the effect of teaching style on students' evaluations would be mediated by grade. Because this investigation was based on a hypothetical scenario, expected grade was examined as a mediator.
In summary, the purpose of this study was to investigate the possibility that gender biases have a significant effect on college students' evaluations, to test previously proposed explanations for such biases, and to evaluate whether an instructor's sex, gendered teaching style, field of study, and student's sex play a vital role in explaining the biases found in students' evaluations. The specific hypotheses are: Hypothesis 1: there would be main effects of gendered teaching style, instructor's sex, field of study, and participants' sex on all dependent measures, such as the Instructor Knowledge Scale, Instructor Likeability Scale, Instructor Teaching Ability Scale, Expected Workload, Expected Attendance, and Expected Grade. In particular, participants would rate a male instructor who engaged in a stereotypical masculine teaching style significantly higher on all dependent variables. Participants, especially male participants, would rate a female professor who teaches in the stereotypically masculine field significantly lower than a female professor who teaches in the stereotypically feminine field. Hypothesis 2: expected grade would mediate the relationship between teaching styles and all dependent measures.
Method
Participants
A total of 465 undergraduate students (207 men, 258 women), who were recruited from introductory psychology courses at Brigham Young University, participated in this study. Some participants received course credit for their participation (in accordance with their instructors' discretion); other participants received two movie tickets (worth $1 each) for a local movie theater. The age range of the participants was 18 to 46 years (M age = 21.8), and their marital status was reported as single (89%), married (8%), and divorced (3%). Ethnicity of the participants was reported as being Caucasian (89%), Hispanic/Latino (6%), Asian/Pacific Islander (3%), Black (1%), and “Other” (1%).
Materials
Eight fictitious college course scenarios were created. Each scenario included a description of the course requirements, sex of instructor, teaching style of instructor, and instructor's field of study. The scenarios differed in three aspects: sex of the instructor (male or female), instructor's teaching style (masculine or feminine), and instructor's field of study (introductory psychology or computer science). Descriptions of the instructors' feminine or masculine teaching style were based on masculine and feminine personality characteristics identified in a study by Cejka and Eagly (1999). The scenario template is provided below, with items in brackets representing the variables that differed among scenarios:
Mrs. [Mr.] Smith is your professor who teaches a three-credit-hour introductory psychology [computer science] class. According to her [his] syllabus, your final grade will be based on four exams and one research paper. You also find that attendance will not be taken in her [his] class. You have heard from many students who took her [his] class before that she [he] is affectionate, sympathetic, gentle, and understanding [assertive, dominant, analytical, and competitive]. She [He] uses a feminine [masculine] teaching style, and thus, she [he] is very sensitive to all sorts of students' demands [defends his own beliefs, and thus he is willing to take his stand to resist pressure from all sorts of students' demands]. On the first day of this class, you immediately find that Mrs. [Mr.] Smith is exactly what others described.
Dependent measurements
For the purpose of this study, the dependent measures were created by the authors, and no validity and reliability data were available. Because these scales had not been utilized in the past, it was necessary to examine the dimensionality of these scales. A series of principal component analyses was employed.
Instructor Knowledge Scale. The scale first consisted of 10 items and was created by the authors to examine students' perceptions of the instructor's knowledge. Participants were asked to respond using a scale with anchors of 1: Strongly disagree and 7: Strongly agree. A principal component analysis with a varimax rotation was performed on all 10 items. The analysis showed that only one factor had an eigenvalue greater than 1.00. This factor accounted for 42.8% of the variance and consisted of six items that loaded .65 or above on this factor. The six items were combined to form a measure of students' perceptions of the instructor's knowledge in the scenarios. The six items were (a) “I believe that Mrs. [Mr.] Smith will demonstrate a proficient knowledge of the materials,” (b) “I believe that Mrs. [Mr.] Smith understands difficult concepts, ideas, and theories related to this course material,” (c) “I believe that Mrs. [Mr.] Smith received the proper education to be as knowledgeable as she [he] is,” (d) “I believe that Mrs. [Mr.] Smith is competent in all matters pertaining to the subject,” (e) “I believe that Mrs. [Mr.] Smith has a wide breadth of knowledge of this course,” and (f) “I believe that Mrs. [Mr.] Smith knows the materials inside out.” Higher scores on this index reflect higher student beliefs of the instructor's knowledge in the assigned scenario. Cronbach's alpha of this measure in the current study was .82.
Instructor Likeability Scale. The scale consisted of 10 items and assessed test respondents' beliefs of an instructor's likability. Participants rated all 10 items on a scale with anchors 1: Strongly disagree and 7: Strongly agree. A principal component analysis with a varimax rotation was conducted. The result showed there was only one eigenvalue of more than 1.00, which accounted for 87.6% of the variance. The typical items of this scale are (a) “I believe that Mrs. [Mr.] Smith will be liked by the students,” (b) “I believe that Mrs. [Mr.] Smith appears to have a genuine interest in helping students,” (c) “I believe that Mrs. [Mr.] Smith will be kind toward students,” (d) “I believe that Mrs. [Mr.] Smith will be caring and considerate toward everyone in the class,” (e) “I believe that Mrs. [Mr.] Smith will be pleasant to work with,” and (f) “I believe that Mrs. [Mr.] Smith will value student opinions.” Ratings on the 10 items were summed, and higher scores reflected the extent to which students believed the instructor to be more likeable. The scale was also used as a repeated measure. Cronbach's alpha for this measure was .95.
Instructor Teaching Ability Scale. The scale first consisted of 10 items to examine respondents' beliefs of an instructor's teaching ability. Participants were asked to rate the extent to which they agreed with each statement on a scale with anchors of 1: Strongly disagree and 7: Strongly agree. A principal component analysis with a varimax rotation was conducted to identify the participants' beliefs of the instructor's teaching ability. The analysis indicated that only one factor had an eigenvalue greater than 1.00, which accounted for 54.6% of the variance, and consisted of six items that loaded at .75 or above on this factor. Those items were (a) “I believe that Mrs. [Mr.] Smith will be enthusiastic about the subject,” (b) “I believe that Mrs. [Mr.] Smith will be well prepared for each class,” (c) “I believe that Mrs. [Mr.] Smith will use a variety of methods to get students interested in the course materials,” (d) “I believe that Mrs. [Mr.] Smith will understand the needs of the students,” (e) “I believe that Mrs. [Mr.] Smith will show excellent teaching ability,” and (f) “I believe that Mrs. [Mr.] Smith will be aware of and accommodate various learning styles.” All six items were summed, and higher scores indicated a higher perception of the instructor's teaching abilities. Cronbach's alpha for this scale was .90.
Expected Workload Scale. The scale was created by the authors to assess respondents' expectations of how hard they would have to work in order to do well in the courses described in the scenarios. The scale consists of three items, and participants were asked to rate the extent to which they agreed with each statement on a scale with anchors of 1: Strongly disagree and 7: Strongly agree. These three items were (a) “This class will require long hours of work,” (b) “I will have to work really hard to get a good grade,” and (c) “If I do not spend a lot of time on this class, I will not be able to get a good grade.” The three items were summed, and higher scores represent the respondents' expectation of greater workload. Cronbach's alpha of this measure was .94.
Expected Attendance Scale. The scale was developed by the authors to measure respondents' expectations of their attendance rate for the hypothetical course and consists of three items. Participants rated all three items on a scale with anchors of 1: Strongly disagree and 7: Strongly agree. The three items were (a) “I will attend class regularly,” (b) “I feel that I will be able to skip some lectures,” and (c) “I will not need to attend this class.” The first item was reverse-scored. All three items were summed. Higher scores on the scale indicate greater expected attendance of the class in the assigned scenario. Cronbach's alpha of the scale was .71.
Expected grade. Participants were asked to rate their expected grade from the class in the assigned scenario on a scale of 1 to 12 (1 = A, 2 = A–, 3 = B+, 4 = B, 5 = B–, 6 = C+, 7 = C, 8 = C–, 9 = D+, 10 = D, 11 = D–, and 12 = F). These values were reversed prior to analysis so that higher scores represented participants' expectations of a higher grade.
Procedure
All participants were informed the study would investigate their evaluation of a college instructor in a hypothetical scenario. All participants completed the questionnaires via an online survey system. After giving informed consent, participants were randomly assigned to one of the eight scenarios and were asked to read the assigned scenario. After reading their assigned scenario, participants were asked to complete all five measures and expected grade, followed by a demographic questionnaire. The demographic questionnaire included items which asked for age, sex, marital status, and ethnicity.
Results
Instructor's sex, teaching style, field of study, and participant's sex on student evaluations
A 2 (Instructor's sex) × 2 (Teaching style) × 2 (Field of study) × 2 (Participant's sex) multivariate analysis of variance (MANOVA) was performed on six variables: perceived instructor knowledge and teaching ability, likeability of instructor, expectation of class attendance and workload, and expected grade. In accordance with our hypothesis, significant main effects were identified for teaching style (F6,431 = 431.00, p < .0001; η2 = 0.77), instructor's sex (F6,431 = 2.27, p < .05; η2 = 0.03), and participant's sex (F6,431 = 3.37, p < .01; η2 = 0.05). There was also a significant interaction of field of study and instructor's sex (F6,431 = 2.14, p < .05; η2 = 0.03).
Univariate analyses of variance (ANOVAs) were conducted as a follow-up to the MANOVA. There were significant main effects of teaching style on all dependent variables (teaching ability, likeability, workload, attendance, and expected grade) except for perceived instructor knowledge. That is, participants tended to rate an instructor with a feminine teaching style to be likeable and to have better teaching skills than an instructor with a masculine teaching style. However, participants also tended to expect less workload and class attendance and a higher grade from an instructor with a feminine style compared to one with a masculine style.
As for the main effect of instructor's sex, a follow-up ANOVA identified a significant effect only for perceived knowledge. The main effect of instructor's sex was further explained by a significant interaction effect of instructor's sex and field of study on perceived knowledge. That is, participants rated female instructors as more knowledgeable than male instructors, particularly in the stereotypical masculine field (computer science), while in the stereotypical feminine field (psychology), male instructors were rated more knowledgeable than female instructors. A follow-up ANOVA indicated that there was a significant main effect of participants' sex on perceived knowledge and expected grade. Specifically, male participants tended to see instructors of both sexes as more knowledgeable than did female participants. Furthermore, male participants generally expected lower grades than did female participants. Table 1 represents the means and standard deviations on all dependent and mediating variables as a function of teaching style and instructor's sex, and Table 2 is a summary of the follow-up ANOVAs.
Dependent and mediator variable means and standard deviations as functions of gendered teaching style and instructor's sex
Summary of the 2 × 2 × 2 × 2 multivariate analysis of variance
Mediating role of expected grade
To investigate whether the effect of teaching style on student evaluations would be mediated by expected grade, a series of mediation analyses was performed. According to Baron and Kenny (1986), in order to identify a mediating effect, three requirements must be met. Firstly, the predictor variable (teaching style) must show a zero-order relationship with both the hypothesized mediator (expected grade) and the criterion variable (in this case, five elements: perceived knowledge of instructor, likeability, teaching ability, participant's expectation of workload, and class attendance). Secondly, the mediator must show a first-order relationship with the criterion variable. Finally, when the relationship between the mediator and the criterion is held constant, the previously significant zero-order relationship between the predictor variable and the criterion variable should be significantly reduced. This latter criterion would be manifested in a significant indirect effect of the predictor on the criterion via its influence on the mediator (Cohen & Cohen, 1983). Sobel's (1982) approximate significance test was used to analyze the indirect effect of the predictor variable on the criterion via the mediator.
Mediation: perceived likeability, teaching ability, and expected workload. As mentioned earlier, teaching style was regressed on expected grade, with beta statistically significant (first requirement). The second requirement of statistically significant regression results for expected grade on criterion variables (perceived likeability, teaching ability, and expected workload) and the third requirement for teaching style on criterion variables were all met. Therefore, analyses were performed to assess whether the relationship between teaching style and each of the criterion variables was significantly reduced when expected grade was controlled. Because they were all reduced, the final step was to test Sobel's indirect effect of significance.
These analyses demonstrated that there were indeed significant mediated effects for teaching style on some criterion variables via the effects of expected grade (p < .0001). That is, the difference in perceived likeability, teaching ability, and expected workload associated with feminine and masculine teaching styles was mediated by the difference in the expected grade between feminine and masculine teaching styles. However, no mediating effect of expected grade on perceived instructor knowledge and expected attendance was found through these analyses. All statistical data for the mediation analyses, including regression coefficients and standard errors, are in Figs. 1 and 2.

Path model of the relationships among teaching style, expected grade, knowledge, and attendance. Knowledge: higher score indicates greater perceived knowledge; Attendance: higher score indicates greater expectation to attend class; and expected grade: higher score indicates higher grade. Teaching style is coded as masculine style = −1 and feminine style = 1. The number inside the parentheses indicates the standard regression coefficient between Teaching Style and Knowledge (Attendance) after Grade was entered into the regression equation. *Regression coefficient is significant at p < .01.

Path model of the relationships among teaching style, expected grade, likeability, teaching ability, and workload. Likeability: higher score indicates greater likeability; Teaching Ability: higher score indicates greater teaching ability; Grade: higher score indicates higher grade; and Workload: higher score represents greater expected workload. Teaching style is coded as masculine style = −1 and feminine style = 1. The number inside the parentheses indicates the standard regression coefficient between Teaching Style and Likeability (Teaching Ability and Workload) after Grade was entered into the regression equation. *Regression coefficient is significant at p < .01.
Discussion
The purpose of this study was to investigate the effects of instructor's sex, gendered teaching style, field of study, and participant's sex on students' evaluations of hypothetical college instructors. The main focus of the study was to understand how teachers' and participants' sex and perceptions of stereotypical gender roles influence student evaluations. The investigation was structured within the frameworks of Heilman's lack of fit model (Heilman, 1983) and the gender-role congruency hypothesis (Nieva & Guteck, 1981). The present study provides a plausible explanation of how students develop initial expectations regarding their course grades and how teaching style influences such expectations. Student evaluations may not fulfill their purposes of accurately evaluating instructors' teaching abilities and the quality of their students' learning experience.
The results indicate that the sex of the hypothetical instructor has a significant effect on student evaluations, explained by interaction of field of study and instructor's sex. That is, the hypothetical female instructor in the field of computer science was perceived as more knowledgeable than the male instructor in the same field, while the male instructor in the field of psychology was perceived as more knowledgeable than the female instructor in the same field. This finding is, in fact, exactly the opposite of what was expected. It had been hypothesized that a male instructor would be perceived as more knowledgeable than a female instructor, particularly in a stereotypical masculine field. This finding was partially supported by Basow's (1998) argument that there may be significant interaction effects between sex and other context variables that may lead to beliefs of less ability and knowledge for female instructors. The finding suggests that is the case for the female instructor in the stereotypical feminine field (psychology). The interaction effect between instructor's sex and field of study indicates that being a female instructor in a stereotypical masculine field may lead to perceptions of greater ability and knowledge.
Thus far, there have been only two studies that have illustrated a general tendency for female instructors to receive higher evaluations than male instructors on student evaluations (Freeman, 1992; Feldman, 1993); however, neither study assessed factors that may have significant interaction effects with instructor's sex. While we unable to explain the discrepancy between these results and those obtained in previous research, we do offer a tentative explanation for our observed pattern of results, particularly for the overall higher evaluation of female instructors in a stereotypical masculine field. Students' evaluations of their instructors are influenced by the perceived qualifications of the instructor, such as rank of professorship (Nation, LeUnes, & Gray, 1976) and awards received (Kaschak, 1981). Therefore, participants may have rated the hypothetical female instructor in the stereotypical masculine field favorably because she earned a doctoral degree, has survived in an overwhelmingly masculine field, and must therefore be particularly skilled, or because this study was based on a hypothetical scenario. Thus, varied methodological differences from other studies may lead to such differences in the results. However, further investigation is necessary to test this explanation.
Regardless of the hypothetical instructors' field of study or sex, students rated instructors described as having feminine teaching styles as more likeable than instructors with masculine styles; they also attributed greater teaching skills to instructors with feminine teaching styles. Recently, Spencer and Schmelkin (2002) found that students perceived effective teaching in terms of an instructor's personal characteristics, such as demonstrating concern for students, valuing students' opinions, clarity in communication, and openness toward varied opinions—all of which are associated with interpersonal skills (Basow, et al., 2006). This would seem to explain the findings regarding the effect of feminine teaching style on likeability and perceived teaching skills.
An engaging stereotypical feminine teaching style, however, also appears to be associated with undesirable characteristics. Participants tended to expect a smaller workload and the ability to attend class less often, but they also expected to receive a higher grade from an instructor described as having a feminine teaching style than they did from one with a masculine teaching style. In a recent study, Greenberger, Lessard, Chen, and Farruggia (2008) labeled this phenomenon “academic entitlement,” referring to students' expectation that “good grades should not be too hard to come by and that teachers should ‘give them a break’” (p. 1201). Greenberger, et al. suggest that this attitude has led to an increasing tendency for students to “beleaguer” their professors for higher grades and special accommodations (p. 1193).
In accordance with the hypothesis, it was found that expected grade was a significant predictor of positive evaluations of hypothetical instructors. The authors analyzed the mediating role of expected grade on teacher evaluations and found that the differences in ratings of instructor's likeability, teaching ability, and expected workload between masculine and feminine teaching styles were mediated by the expected grade. That is, when instructors were described by such personal characteristics as being affectionate, sympathetic, gentle, understanding, and sensitive to their students' concerns, students tended to rate the hypothetical instructor as likeable and highly capable, and they expected to work less, presumably because of their expectation of receiving a higher grade. It appears that students are influenced by their perceptions of their instructors' gendered teaching styles when developing initial expectations for final grades.
In summary, significant effects of instructor's sex and several interacting variables on student evaluations were found. However, this study had several major limitations. One of these limitations is that the findings are based on participants' perceptions of hypothetical instructors and scenarios. Therefore, generalizability of the results is questionable. As Basow (1995) noted, investigating the effects of sex and gender roles on student evaluations proves to be an extremely complicated endeavor due to the many variables (e.g., rank, size of class, upper- or lower-level class, interaction style, time of class, etc.) that may significantly influence teacher evaluations. Basow's attempt to conduct a large-scale field study that takes such variables into account is commendable, and analyses of real-world student evaluations are preferable to laboratory studies, which may lack external validity. However, many large field investigations frequently ignore such variables.
In the face of this dilemma, and keeping in mind the highly consequential nature of student evaluations for both college instructors and students, further investigations of student evaluations should consider both field and laboratory components. Another limitation is that Mr. or Mrs. was used instead of professor or Dr. in the scenarios. This was done to make sure that participants recognized the sex of the instructor in the scenarios. Although all participants were told that this study was related to student evaluation of college professor, usage of “professor” or “Dr.” would have been preferable, along with a manipulation check. Further, the data were obtained from students who were members of The Church of Jesus Christ of Latter-day Saints (LDS, or Mormons), an organization that emphasizes traditional gender roles. This unusual composite of highly religious and conservative participants in this sample may affect the generalizability. In addition, for the purpose of the study, all dependent measures were created by the authors, and the validities of the measures were not examined. Therefore, it is critical to replicate this study by using measures that have established validity.
