Abstract
The Rorschach Comprehensive System (CS) may possess distinct advantages as a tool for identifying leaders but current research is inadequate to support its validity in organizational contexts. To strengthen this empirical foundation, the present study used confidence interval analyses to compare data for 30 college student leaders to samples of nonpatients and college students on the following CS scales: emotionally based resources (SumC), introspective tendencies (FD), capacity for collaboration (COP), stimulus integration (DQ+), and perceptual accuracy (X+%). Results indicated that student leaders produced higher scores on SumC, COP, and FD but not DQ+ or X+%. The implications of these findings are discussed and guidelines for advancing research in this area are proposed.
As modern organizations become increasingly complex, effective procedures to select leaders are a potential source of competitive advantage. Organizations that are able to select capable leadership consistently not only improve their ability to meet performance objectives, but also avoid the costs associated with executive turnover. However, the task of selecting leaders is difficult, as organizations must determine whether initial impressions reflect enduring styles of functioning and desirable performance (Sperry, 1999).
One way to improve selection of leaders is through the expanded use of personality assessment. Empirical evidence suggests that cognitive intelligence is a threshold requirement for a leader's competence, whereas social and emotional attributes are determinants of success after this threshold is met (e.g., Rosete & Ciarrochi, 2005; Kerr, Gavin, Heaton, & Boyle, 2006). For example, adaptive constellations of personality traits are predictors of positive outcomes such as the emergence and effectiveness of leaders (see Judge, Bono, Ilies, & Gerhardt, 2002, for a review). In addition, personality shortcomings not detected during the selection process are frequently reported as the primary cause of leaders' failures across industries and cultures (e.g., Van Velsor & Leslie, 1995; Hogan & Hogan, 2001).
The Rorschach may be useful as a supplement to assessment batteries for leaders. First, the Rorschach generates individual personality profiles that are performance-based; respondents react to ambiguous test stimuli by producing their own responses rather than selecting from a fixed set of response options. This type of data is preferred by organizations over self-reports, which may reveal less about candidates' competencies and motivations (Howard, 2007). Second, the meaning of Rorschach data is not obvious to the non-expert because test responses often appear unrelated to the constructs they are intended to measure (Grossman, Wasyliw, Benn, & Gyoerkoe, 2002). The Rorschach therefore may be helpful in controlling the impact of candidates' impression management during selection, which is a legitimate concern for organizations (Rosse, Stecher, Miller, & Levin, 1998). Last, the Rorschach has provided incremental validity in clinical settings when paired with self-reports in assessment batteries (Meyer, 1997). Current research has not adequately addressed whether these advantages of the Rorschach apply in organizational settings. The need to evaluate the empirical foundation of the Rorschach in these contexts is heightened by reports of its contemporary use for selecting leaders (de Villemor Amaral, 2007) and developing leaders (Wasylyshyn, 2003).
The validity of the Rorschach has been a source of controversy for decades (see Wood, Nezworski, Lilienfeld, & Garb, 2003). This controversy is in part driven by a tendency to view the instrument in monolithic terms when in fact its validity is a function of predictor-criterion combinations (Meyer & Archer, 2001). In other words, some Rorschach scales possess solid empirical support in predicting certain phenomena whereas other scales lack sufficient evidence. Moreover, many different scoring and interpretive schemes have been created for using the instrument; some possess better reliability and validity than others. Acknowledging these properties when conducting and evaluating research on the Rorschach is important.
Older research evaluating the usefulness of the Rorschach in selecting leaders is limited but intriguing. Multiple studies from the mid-twentieth century indicated that numerous Rorschach scales were a valuable component of leader selection batteries (e.g., Gibb, 1949; Hampton, 1960; Phelan, 1962). These studies reported that data from the Rorschach differentiated successful leaders from other groups and also provided incremental validity over other cognitive tests. However, most of these authors did not sufficiently describe their methodologies and statistical procedures, making it difficult to appraise the soundness of their results and conclusions.
In contrast, the Perceptanalytic Executive Scale (PES; Piotrowski & Rock, 1963) is a Rorschach method that more clearly outlined its development, methodology, and predictive capabilities. The PES was used to assess a sample of 110 American executives whose managerial performance was determined successful, intermediate, or failing via objective business criteria (e.g., promotions, financial indicators). The authors identified PES scales with the most discriminatory power across these groups; chi-squared analyses demonstrated that positive Rorschach scores (e.g., human movement responses indicative of assertiveness) were associated with successful managerial performance and negative PES scores (e.g., reflections of animals in water) were associated with unsuccessful managerial performance. Despite this potential promise, these results were not replicated and the PES was seldom used in organizations.
The bulk of recent research on the Rorschach has been conducted using the Comprehensive System (CS; Exner, 1974), which possesses better psychometric properties and clearer administrative, scoring, and interpretive guidelines than previous methods. Only two empirical studies have examined the use of the CS with organizational populations. An unpublished dissertation study reviewed archival Rorschach protocols of 50 executive candidates from North American and Europe in order to produce baseline CS data for this population (Bach, 2006). Analyses indicated that over 50% of executives had some difficulty with stress tolerance and a significant proportion also had ideational problems and relational difficulties. In light of these findings, the author emphasized that CS trends commonly considered pathological may in fact represent adaptive qualities in specialized populations.
Another investigation in this area tested the hypothesis that a sample of 20 relatively successful executives from various countries would exhibit CS personality patterns consistent with effective leadership, such as good emotional adjustment, proficient information processing, and interpersonal adeptness (de Villemor Amaral, 2007). The author selected 17 CS scales that best approximated the critical predictors of executive success reported by the PES. Executives' responses on these scales were compared to a Brazilian normative sample. Results did not support the authors' hypotheses: executives had responses that indicated social difficulties, mood problems, and passive dependency. The author conjectured that these surprising findings might indicate that more psychological distress actually exists in corporate executives due to their constant functioning under high pressure.
Two limitations of this study suggest that its results might represent methodological artifacts rather than real personality differences. First, group differences could have been the result of group heterogeneity across the samples. Whereas experimental groups are usually created according to specified homogeneous characteristics, normative samples include a broad range of variations. As such, comparisons between normative and experimental groups are expected to produce differences. For these reasons, Exner (1995) affirmed that normative or general nonpatient samples do not constitute adequate control groups and should not be used to make statistical extrapolations. Second, variables in this study were selected based on PES results that were over four decades old. Perhaps the personality trends of effective leaders manifest differently on the Rorschach now than in previous decades.
In light of the inconsistent findings outlined above, the present study sought to investigate CS variables likely related to leadership as suggested by (1) the current literature concerning the performance of leaders and (2) the current literature concerning the CS. These CS variables are depicted in Table 1 and discussed below in greater detail.
CS variables assessed
In recent years the link between the performance of leaders and emotional intelligence has been argued conceptually and demonstrated empirically. Emotional intelligence (EI) has been defined as the ability to use emotional information productively and includes traits such as self-awareness, affective self-regulation, social insight, and relational adeptness (Mayer, Salovey, & Caruso, 2008). Studies have demonstrated that leaders draw upon emotions when carrying out their roles (see Gooty, Connelly, Griffith, & Gupta, 2010, for a review) and that effective leaders show greater degrees of emotional intelligence than ineffective leaders (e.g., Rosete & Ciarrochi, 2005; Kerr, et al., 2006).
Several CS scales provide information that corresponds with an EI framework. The SumC scale is an aggregate index of the number of color responses provided. Higher SumC scores suggest a greater range of affective experiences and openness to emotions (Exner, 2003). No studies have linked SumC to leadership phenomena but evidence has indicated that low rates of SumC are associated with impoverished emotional experiences (Porcelli & Meyer, 2002). In the realm of self-awareness, the FD scale is scored for responses including dimensionality and is associated with introspective tendencies and a capacity for self-reflection (Exner, 2003). This process has implications for leadership, which involves the capacity to move back and forth between engaging the field of action and reflecting upon organizational dynamics from a position of detached objectivity (Heiftez & Laurie, 2001). The FD variable may also reflect self-monitoring, which has predicted the emergence of leaders in various settings (Foti & Reub, 1990; Zacarro, Foti, & Kenny, 1991).
In the interpersonal sphere, the COP scale is assigned for responses involving cooperative movement and reflects expectations for collaboration in relationships (Exner, 2003). One study found that higher COP scores were produced by college students rated as more leader-like than their peers (Exner, 1987). As leadership is increasingly understood as a social process, (see Mumford, Zacarro, Harding, Jacobs, & Fleishman, 2000), leaders might be expected to show higher mean scores on COP. Further, a major meta-analytic review of the Big Five personality factors (McCrae & Costa, 1987) indicated that Agreeableness was significantly correlated with self- and subordinate-reports of effective leadership (Judge, et al., 2002). As a superordinate construct, Agreeableness encapsulates preferences for cooperation and collaboration.
Regarding the information processing component of EI, the X+% scale assesses the tendency to interpret information in a manner consistent with social norms. Higher X+% scores have been associated with pass-fail rates in a military Special Forces training (Hartmann, Sunde, Kristensen, & Martinussen, 2003), suggesting that this scale may also be useful in evaluating leadership phenomena. The DQ+ score is assigned for responses that meaningfully integrate aspects of the inkblot. The DQ+ scale has shown its strongest validity assessing intellectual processes such as global intelligence (Acklin & Bates, 1989) and creative thinking (Ferracuti, Cannoni, Burla, & Lazzari, 1999). Given the overlap that exists between cognitive intelligence and emotional intelligence (see Schulte, Ree, & Carretta, 2004), the DQ+ scale may be useful in evaluating cognitive tendencies that facilitate emotional intelligence.
Based on these findings, this exploratory study hypothesized that leaders should produce higher rates of Rorschach responses than non-leaders on the following CS scales: SumC, FD, COP, X+%, and DQ+. Given the limitations of current research, the purpose of this study was to generate new hypotheses for future testing. College student leaders represent an apt sample for assessment because these individuals are more likely than others to occupy leadership positions following graduation (Smart, Ethington, Riggs, & Thompson, 2002).
Method
Twelve men and 18 women attending a small university in the northeast United States volunteered as participants. All participants were between the ages 18 and 22 years. The mean, mode, and median ages of participants were 20.6, 22.0, and 21.0 years, respectively. Ethnicity of the participants was 93% White, 3% Black, and 3% Hispanic.
At the time of the study, all participants held a formal (i.e., titular) leadership position at the university and were actively engaged in leadership behavior. Participants held leadership positions (e.g., president, vice president, captain) in various types of student organizations, including the following: athletic teams, student government, R.O.T.C., Intrafraternity Council, and Pan-Hellenic Association. Although these leaders differed in their leadership responsibilities, all were actively engaged in leadership behavior, defined as influencing their organization or team toward common goals (Northouse, 2007). Participants were asked how they obtained their leadership positions, and responses indicated a wide variety of means ranging from self-selection to advisor nomination to formal election victory. As compensation for their involvement, all volunteers were given a gift certificate to a local restaurant and entered into a lottery for a larger cash prize.
Participants were administered the Rorschach CS using the R-Optimized method of administration, which represents a slight variation of testing procedures outlined by Exner (2003). R-Optimized is designed to control for error variance associated with different Rorschach response frequencies (Dean, Viglione, Perry, & Meyer, 2007). This method of administration is especially useful for Rorschach research because it reduces the likelihood that participants will provide invalid test protocols (i.e., insufficient number of responses). An examination of CS reference data indicated that R-Optimized did not alter normative expectations (Dean, et al., 2007).
During Rorschach administration, respondents are presented with a series of 10 inkblots and asked, “What might this be?” In accordance with administration guidelines, prompts were used to ensure that between two and four responses were provided for each inkblot (e.g., “What else might this be?” or “Okay, that's enough responses for this card.”) Prompts were also used to help determine how respondents arrived at their perceptions of the inkblots (e.g., “Help me see it like you do,” “What makes it look pretty?”)
A scoring reliability analysis was performed in which 20% of the CS protocols were randomly chosen, roughly the proportion used in previous Rorschach studies (e.g., Hamel, Shaffer, & Erdberg, 2000; Hartman, 2001). These protocols were scored by the author and two independent coders with expertise in the CS to ensure adequate inter-scorer agreement of Rorschach variables. Procedures developed and validated by Meyer (1999) were used to estimate chance agreement and kappa coefficients for the various response segments of the CS. These procedures are commonly used in Rorschach research studies (e.g., Ritsher, Slivko-Kolchik, & Oleichik, 2001; Weizmann-Henelius, Illonen, Viemero, & Eronen, 2006). Rates of chance-corrected agreement for the variables assessed were between .67 and .82, which is similar to those reported in major Rorschach reliability studies (see Meyer, Hilsenroth, Baxter, Exner, Folwer, Piers, et al., 2002) and generally falls in the “good” to “excellent” ranges according to general research standards (Cichetti, 1994).
Primary data analyses in this study compared the Rorschach data of college leaders to previously collected Rorschach data of the following non-leader samples: a college sample from the Pacific Northwest obtained in 2000 (n = 65; Meyer, 2000); for the purpose of controlling for potential age-based differences, those individuals between the ages 18 and 22 from this sample (n = 35; Meyer, 2000); a college sample from the West coast obtained between 1996 and 1999 (n = 34; Shaffer, Erdberg, & Haroian, 2007); and an adult nonpatient sample from the West coast obtained between 1996 and 1999 (n = 283; Shaffer, et al., 2007). Table 2 presents some relevant demographic characteristics of each group used in this study.
Demographic data of leader sample and comparison groups
Results
The statistical procedure used to compare the groups was a confidence interval analysis. A confidence interval captures a range of data centered on the mean of a population (Cumming & Finch, 2005). The confidence interval analysis is not an inferential statistical comparison; it is a way to evaluate whether this study's hypotheses warrant closer empirical scrutiny. Non-overlapping confidence intervals represent signs of possible group differences to be explored in subsequent research using inferential statistical analyses. In other words, the confidence interval analysis identifies which Rorschach variables of the college leader sample may represent statistically significant deviations from the other samples.
Confidence intervals (CI) were computed for the leader sample and each of the four comparison groups on the five CS variables outlined in Table 1. Because the purpose of this study was to generate hypotheses, CIs were conducted at both 95% and 90% ranges to maximize signs of potential group differences. Non-overlapping CIs at 90% should be viewed much more speculatively than those at 95% (Cumming & Finch, 2005).
Rather than analyzing the leader sample against each of the comparison groups separately, CI analyses evaluated the CIs of the leader sample across the CIs of all comparison groups. This level of analysis was chosen based on the assumption that identifying non-overlapping CIs across four comparison groups would be more suggestive of leader-specific trends than non-overlapping CIs across only one group. Specifically, a general nonpatient sample, two college student samples, and a college student sample aged 18–22 were chosen as comparison groups to increase the likelihood that any differences found in the leader sample were related to leadership behavior.
Results indicate that student leaders produced higher SumC and more COP, at the 95% comparison; greater FD responses were detected at the 90% comparison. In other words, leaders appeared to possess more emotionally-based psychological resources that guide decision-making (SumC), expect greater collaboration during interpersonal experiences (COP), and engage in more introspection (FD) than nonpatient adults and other college students. These findings were consistent with the hypotheses of this study. In contrast, student leaders did not produce different levels of stimulus integration (DQ+) or perceptual accuracy (X+%). That is to say, student leaders did not appear to engage in more complex information processing or interpret information in a more socially conventional manner. These findings were inconsistent with the hypotheses of this study.
Figs. 1–5 display these findings pictorially using confidence interval bars. CS values are displayed on the vertical axis and comparison groups are listed on the horizontal axis. In Figs. 1–3, note that the range of values for the Leader sample did not overlap with the range of values for the four comparison groups. In Figs. 4 and 5, the range of values for the Leader sample did overlap with range of values for some of the comparison groups.

Emotionally-based Resources (SumC). 95% confidence intervals by group related to emotionally-based resources. “Leader” from n = 30 college student leaders ages 18–22; “NP” from n = 283 Nonpatient adults; “CC” from n = 34 California college students; “AC1” from n = 65 Alaska college students; “AC 2” from n = 35 Alaska college students ages 18–22. Mean (•); Upper (—); Lower (—).

Capacity for Collaboration. 95% confidence intervals by group, related to capacity for collaboration. “Leader” from n = 30 college student leaders ages 18–22; “NP” from n = 283 Nonpatient adults; “CC” from n = 34 California college students; “AC1” from n = 65 Alaska college students; “AC 2” from n = 35 Alaska college students ages 18–22. Mean (•); Upper (—); Lower (—)

Introspective Tendencies. 90% confidence intervals by group related to introspective tendencies. “Leader” from n = 30 college student leaders ages 18–22; “NP” from n = 283 Nonpatient adults; “CC” from n = 34 California college students; “AC1” from n = 65 Alaska college students; “AC 2” from n = 35 Alaska college students ages 18–22. Mean (•); Upper (—); Lower (—)

Perceptual Accuracy. 95% confidence intervals by group related to perceptual accuracy. “Leader” from n = 30 college student leaders ages 18–22; “NP” from n = 283 Nonpatient adults; “CC” from n = 34 California college students; “AC1” from n = 65 Alaska college students; “AC 2” from n = 35 Alaska college students ages 18–22. Mean (•); Upper (—); Lower (—)

95% confidence intervals by group related to stimulus integration. “Leader” from n = 30 college student leaders ages 18–22; “NP” from N =283 Nonpatient adults; “CC” from n = 34 California college students; “AC1” from n = 65 Alaska college students; “AC 2” from n = 35 Alaska college students ages 18–22.
Discussion
In order to interpret the meaning of these results, understanding the psychological processes involved in generating responses is necessary. Through requiring individuals to react to a relatively ambiguous inkblot, the Rorschach is a complex decision-making task that includes encoding the visual stimulus, discarding potential responses through censorship, and articulating responses on the basis of personality traits or styles (Exner, 2003). Responses are thought to reflect a match between the respondent's inner experience with the properties of the inkblot.
The construct assessed in this study was leadership behavior. Results should therefore be interpreted as personality trends that might facilitate leadership behavior, defined as influencing groups toward common goals. Most generally, the results provided additional support that the CS is capable of measuring adaptive personality traits, not just those associated with maladjustment. Given that the Rorschach is generally associated with the assessment of psychopathology and has been criticized as an overpathologizing test (see Wood, et al., 2003), these results are notable. Organizations are more likely to accept an instrument that identifies positive leadership traits rather than one designed to assess the presence or absence of mental illness (Cook & Cripps, 2005).
The CS scales that emerged as potentially significant—SumC, FD, and COP—pertain to emotional resources, introspection, and collaboration, respectively. Those that did not emerge as significant (i.e., X+%, DQ+) relate to information processing tendencies. This pattern suggests that the Rorschach might be more valuable in assessing social and emotional aspects of leadership rather than cognitive or ideational aspects. While IQ is the single best predictor of leadership effectiveness (see Schmidt & Hunter, 1998, for a review), ample predictor space remains for tools that can generate novel information in non-cognitive areas. One shortcoming of some emotional intelligence (EI) tests is their limited ability to predict outcomes after controlling for IQ (Fiori & Antonakis, 2011). No evidence has correlated SumC, FD, and COP with cognitive intelligence. Thus, these scales may be especially helpful to our understanding of personality-based factors associated with leadership.
The pattern of results also appears logical considering the circumstances in which student leaders are likely to function. In terms of SumC, student leaders may be inclined to use emotional information because their goals entail mobilizing student groups rather than devising complex organizational strategies. Accessing emotionally based resources is probably effective in efforts to motivate and influence their followers. Because student leaders operate in group settings, perhaps the high rate of FD responses represents an introspective capacity especially likely to manifest in social contexts. The use of introspection in social contexts might help catalyze a process by which leaders are able to gauge group dynamics and realign their approach accordingly.
Finally, the limited authority of student leaders relative to corporate leaders may render expectations for collaboration particularly salient. Leaders who sense collaborative possibilities (as seen in higher rates of COP) may be more inclined to treat followers in ways that actually elicit cooperation. Without an internal sense that cooperation with others is possible, engaging in leadership behavior might be discouraged.
Limitations of the present study
Several important limitations of this study must be acknowledged. The first limitation concerns group composition. Namely, the leader sample was obtained from a northeastern university in 2008 whereas control samples were gathered from different geographical regions as early as 1996. These differing group characteristics may have produced a cohort effect by which samples were not gathered from the same population. A cohort effect would mean that the observed personality differences had less to do with leadership and were more related to demographic or sociocultural differences. This study also did not control for leaders that may have existed in the control samples, thus preventing analysis of truly distinct groups. However, the presence of leaders in those comparison samples could have reduced the magnitude of observed group differences, in which case the findings might actually constitute underestimations of real group differences.
The external validity of the results may also be rather limited. Although the findings suggest that the CS may be capable of differentiating college student leaders from non-leaders, these trends cannot necessarily be generalized to leaders in business contexts. It may be the case that leadership in organizations is associated with an entirely different constellation of CS traits than those observed in this study. In addition, this study does not inform our understanding of effective or ineffective leadership because it assessed leadership via group membership, not performance.
From a practice standpoint, applying the Rorschach in organizations would likely encounter a number of challenges. Organizations may reject the Rorschach due to its ambiguous nature, historical affiliation with psychopathology, and time-intensive administration and scoring requirements. Establishing empirical support for Rorschach scales in organizational settings would improve the likelihood that the instrument is accepted and used properly.
Directions for future research
A primary goal of Rorschach-leadership research should be to develop a constellation of variables capable of predicting effective leadership and ineffective leadership in particular contexts. Establishing a leadership constellation is not only psychometrically indicated, but also practically important because it will allow the Rorschach to be used in streamlined formats. Namely, focused scoring practices in search of relevant variables would help conserve organizations' time and resources.
The results of this study suggest that CS variables related to social and emotional aspects of personality may be more useful than other CS variables when evaluating leadership phenomena. In particular, SumC, FD, and COP may be especially likely to fluctuate in relation to these performance outcomes. Based on the results of this study, these variables might be good candidates to include in inferential statistical comparisons of CS variables in relation to leadership outcomes. The predictive power of a CS scale constellation might be further strengthened if researchers considered using well-validated Rorschach scales that are not formally part of the CS, such as the Rorschach Oral Dependency Scale (see Bornstein, 1996) or the Mutuality of Autonomy Scale (Urist, 1977).
We also recommend using comparison groups that are more closely matched to help ensure that findings are related to the construct of interest. To enhance generaliziblity, future studies should assess corporate leaders and use more meaningful outcome constructs related to organizational performance. Should subsequent studies produce similar results to those found in this investigation, researchers will have a solid empirical foundation from which to select variables for more targeted predictive purposes.
