Abstract
Many approaches to work-related stress risk assessment suggest the integration between a phase where objective data are collected and analyzed, and a phase where results of data collection and analysis are discussed and compared with information coming from the workers. However, stress researchers have criticized the use of self-report job stress measures, because of their potential distortions, and have called for an approach based on the use of objective measures. The Italian law for work-related stress risk assessment, closer to the latter approach, prescribes a two-stage procedure: first, a set of objective measures and then, conditionally to the outcome of the first stage, a set of subjective measures. In this article, we analyze, on the basis of psychometric principles, the tool used for the objective stage in the most adopted method in Italy. Such a tool is a checklist for which we discuss a number of issues, suggesting it is not methodologically well founded. Given the fact that assessment outcomes have a sensible impact on workers’ safety measures, we conclude these weaknesses affect the practice of work-related stress risk assessment.
Keywords
Introduction
Many papers have been reporting, at least since 1999, that the costs of occupational illness in general, and in particular of work-related stress, are not negligible. We now briefly describe what we feel are the most relevant studies referring to Europe, from the oldest to the more recent.
The European Agency for Safety and Health at Work (EU-OSHA, 1999) surveyed the costs (both in absolute value and in percentage of the Gross National Product) of occupational illness in 15 member states of the European Union (EU-15). The total amounts to a value between 185 and 289 billion euros a year. The European Commission (2000) conservatively estimated the costs of work-related stress at 20 billion euros a year for the EU-15. A report on work-related stress of the European Foundation for the Improvement of Living and Working Conditions (2007) analyzed costs of work-related stress for the Netherlands and Germany. The EU Executive Agency for Health and Consumer (2013) estimated that 14% of employed individuals who suffer from stress will go on to develop depression, and the total costs of depression in all 27 member states of the EU is 617 billion euros.
Also, international studies have been highlighting that work-related stress is one of the most widespread occupational illnesses. Hoel, Sparks, and Cooper (2001) reported that stress accounts for up to 30% of all work-related illness annually, on the basis of a number of reliable studies based on large population samples from the United States, Europe, and Australia. EU-OSHA (2014) provided estimates for socioeconomic costs in a number of European and non-European countries. Its conclusions stated “there is evidence suggesting that appropriately planned and implemented workplace interventions focusing on preventing stress, improving psychosocial work environment and promoting mental health are cost effective” (p. 23).
The need of an approach able to evaluate, prevent, and mitigate stress and psychosocial risks at work is, therefore, well founded on objective data. Stavroula and Aditya (2010), in a report prepared for the World Health Organization (WHO) and based on about 500 research papers, have suggested that the most accurate assessment of work-related stress consists in the integration and correlation among objective measures of working conditions (observational measures) and information coming from workers (e.g., self-report questionnaires).
Background
The normative background in Italy for work health and safety is the Legislative Decree 81/2008 (Ministry of Labour, 2008), which has transposed into the Italian law the Framework Directive of EU-OSHA (1989). This Legislative Decree states (art. 28, para. 1) the obligation for all public and private employers to assess, inter alia, work-related stress of their workers according to the content of the European Framework Agreement on Work-Related Stress (2004). This agreement does not contain an exhaustive list of potential stress indicators but describes the necessity of analyzing both objective and subjective factors. The need for such an integrated multi-source approach rests on a number of studies, such as, for example, Frese and Zapf (1988), Hurrell, Nelson, and Simmons (1998), and Giorgi, Leon-Perez, Cupelli, Mucci, and Arcangeli (2014).
The above cited Legislative Decree states also (para. 1bis) that such an assessment has to comply with the guidelines in Permanent Consultative Committee of the Ministry of Labour for Workplace Health and Safety (2010). These guidelines have defined a methodological approach aiming at providing the minimum level of implementation for employers to comply with the legislative duties for work-related stress risk assessment. Such an approach prescribes an evaluation method organized in two sequential phases. The first one (defined preliminary assessment) is mandatory, while the second one (defined in-depth assessment) is required only if both the preliminary assessment has revealed risk elements requiring mitigation and the adopted mitigating actions have been ineffective. The preliminary assessment phase requires measuring only objective factors, as long as they belong to at least three categories: sentinel events, work content, and work context. Such a measurement may be carried out by means of checklists compiled by the health and safety representatives. The in-depth assessment phase allows for the use—on homogeneous groups of workers—of tools, such as self-report questionnaires, focus groups, semi-structured interviews, again investigating at least the same three above cited categories.
The staged assessment defined in the above cited guidelines implicitly prescribes a hierarchical subordination of subjective measures to objective ones. The consequence is that if the first stage reveals a low level of risk, the assessment process terminates without carrying out the second stage (Inter-Regional Technical Coordination Committee of Prevention in Workplaces, 2012, see question G.1). This situation is a clear departure from what the European Framework Agreement on Work-Related Stress (2004) has prescribed and from the literature (Frese & Zapf, 1988; Giorgi et al., 2014; Hurrell et al., 1998; Stavroula & Aditya, 2010). Regarding this Italian approach, Zoni and Lucchini (2012) have already recognized that “A limitation of this approach is represented by the predominant relevance given to the assessment of objective factors in the first steps of the evaluation” (pp. 47-48).
The previous discussion clarifies that a correct and reliable execution of the preliminary assessment phase is essential in the Italian context. A failure at detecting a risk situation in such a phase would make it impossible to reveal risky situations, as no further assessment phase would be performed. The survey conducted in 2013 on the stress level in Italian workers observed that stress affects often or always more than 30% of workers (Italian Institute for Political, Social and Economic Studies, 2014). This outcome is not far from the value found in a 2005 survey by EU-OSHA (2009, Foreword), declaring work-related stress affects 22% of workers of the EU-27.
Note that the guidelines of the Permanent Consultative Committee have described the need for surveying the results of their application 2 years after their entering into force. As of September 2015, we are not aware of any publication reporting these results. Instead, some authors have criticized these guidelines. For example, Curzi, Fabbri, and Nardella (2013) stated the guidelines have “both diagnosis capability . . . faulty due to their incoherence with respect to the European Framework Agreement on Work-Related Stress and a preventive potential . . . inadequate in terms of identification of corrective measures of organizational nature” (p. 1). Also, Galli, Mencarelli, and Calzolari (2013) discussed these guidelines and stated that the Italian Union of Labour (UIL) is
strongly critical, particularly with respect to the under-evaluation of workers’ role and to the optional role of the assessment of workers’ perception, evaluating the proposed methodology incoherent both with respect to the European Framework Agreement on Work-Related Stress and to the most elementary principles of relevant national and international literature. (pp. 2-3)
Method
The most used method in Italy for the work-related stress risk assessment (Guglielmi, Depolo, & Violante, 2013) is the one designed by the National Institute for Insurance Against Accidents at Work (INAIL), a public non-profit entity safeguarding workers against physical injuries and occupational diseases. The user manual for Italian companies to comply with the obligations deriving from the above cited Legislative Decree 81/2008 (INAIL, 2011) describes the INAIL method. An English translation is available on their website (INAIL, 2013) and describes the method as “based on the Management Standards model of the Health and Safety Executive [HSE]” (Health and Safety Executive, 2004, Preface and Introduction). The INAIL method has proposed a checklist for the preliminary assessment phase and an indicator tool consisting of a 35-item questionnaire for the in-depth assessment, in accordance with the Italian methodological approach described above. The questionnaire is a translation to Italian of the HSE Management Standards Indicator Tool (Health and Safety Executive, 2004), but the INAIL method provides no indication of how the checklist is related to the HSE Management Standards model.
The HSE approach to stress has developed the idea of standards for managing work-related stress in terms of organizational states to be achieved and has discussed how their achievement can be assessed by workers (Cousins et al., 2004; Mackay, Cousins, Kelly, Lee, & McCaig, 2004). Moreover, the HSE approach has emphasized the involvement and the perception of workers, highlighting the need for discussing the outcome of data collection with them by means of focus groups. Issues revealed from the data collection phase may not turn out to be the most important ones for workers and, conversely, new (and possibly more important) issues can emerge in focus groups that have not been revealed by the data collection phase. Instead, in the INAIL method, the data collected during the preliminary assessment phase by means of the checklist are not subject to a focus group discussion but merely to a consultation with health and safety representatives.
The INAIL checklist is structured into three areas:
Sentinel events, collecting the trends of 10 indicators in the sub-areas of injury percentage, sick leaves, staff absence percentage, untaken leaves percentage, internal turnover percentage, external turnover percentage, disciplinary sanctions, unplanned health examinations, formal complaints, and judicial claims filed for downgrade/dismissal/harassment;
Work content, analyzing 36 items in the four sub-areas of work environment, task planning, workload, and working hours; and
Work context, analyzing 30 items in the six sub-areas of organizational culture, role in the organization, career development, autonomy and control, inter-personal relationships, and home-work interface.
The goal of this study is to investigate whether the various methodological steps of the INAIL checklist have been designed according to the state of the art principles for the definition of psychometric tools. Hence, in the remainder of this section, we describe how these methodological steps have been defined in the INAIL method’s user manual (INAIL, 2011; INAIL, 2013). Then, in the section “Discussion,” we critically review each of the steps in the light of psychometric principles.
Therefore, we now first analyze how data are collected and a risk level is computed for each area of the checklist, and then how the results of these data collection and risk evaluation are combined to produce the final outcome of the preliminary assessment phase. A methodological choice, common to all areas and to the construction of the final outcome, is that data collected result in a score, and the risk level is computed depending on the ratio of the obtained score to the maximum attainable score. This partition rule prescribes that if the obtained ratio is less than or equal to 25%, there is a low (non-relevant in the Italian version, this term is used in the following text) risk level; if it is higher than 25% and less than or equal to 50%, there is a medium risk level; and otherwise there is a high risk level (see headers of colored columns in Figures 1, 2, and 3, extracted from INAIL, 2013).

Scores and risk levels for Area 1 (sentinel events).

Scores and risk levels for Area 2 (work content).

Scores and risk levels for Area 3 (work context).
Sentinel Events
For each sentinel event but the last two, it is observed whether the value of indicator decreased, remained stable, or increased, and a corresponding score of 0, 1, 4 is noted down. The absence or presence of events of the last two categories directly produce a score of 0 or 4, respectively. Decrease, stability, or increase have to be computed with respect to the average of the previous 3 years. For the indicators expressed in percentage, the method prescribes to compare the last year value, while for indicators expressed in absolute value, no specification is given.
The score of the area is eventually obtained by means of the following conversion process (see Figure 1):
0 ≤ sum of scores ≤ 10 results in a non-relevant risk level and an area score of 0;
11 ≤ sum of scores ≤ 20 results in a medium risk level and an area score of 2;
21 ≤ sum of scores ≤ 40 results in a high risk level and an area score of 5.
Work Content
The method investigates each of the 36 items through a yes/no question producing a score of 0 or 1.
The score of the area is exactly equal to the sum of scores and results in the following risk levels (see Figure 2):
0 ≤ area score ≤ 13: non-relevant risk level;
14 ≤ area score ≤ 25: medium risk level;
26 ≤ area score ≤ 36: high risk level.
Note that for this area, the boundaries between risk levels, set at 13 and 25 instead of at 9 and 18, do not respect the partition rule for mapping scores to risk levels.
Risk levels may be computed also for the four sub-areas. Also for these sub-areas, boundaries do not respect the partition rule.
Work Context
The method investigates each of the 30 items through a yes/no question producing a score of 0 or 1. The area score is computed by first summing only the first 26 items. Next, the sum of the four last ones is computed. If the latter sum is greater than 0, it is discarded, and the former one is the area score. Otherwise, 1 is subtracted from the former sum, and the result is the area score. No motivation is given for treating these last four items (making the whole of the sub-area “home-work interface”) in a different way.
The area score results in risk levels according to the following (see Figure 3):
0 ≤ area score ≤ 8: non-relevant risk level;
9 ≤ area score ≤ 17: medium risk level;
18 ≤ area score ≤ 26: high risk level.
Note that for this area, boundaries between risk levels, set at 8 and 17 instead of at 6 and 13, do not respect the partition rule.
Risk levels may be computed also for the first five of the six sub-areas. Also for these sub-areas, boundaries do not respect the partition rule.
Construction of the Overall Outcome
The method constructs the overall outcome by summing the scores of the three areas. The result is interpreted as follows (see Figure 4, extracted from table at p. 51 in INAIL, 2013):
0 ≤ sum of area scores ≤ 17: non-relevant risk level;
18 ≤ sum of area scores ≤ 34: medium risk level;
35 ≤ sum of area scores ≤ 67: high risk level.

Scores and risk levels for the overall outcome.
Discussion
In this section, we discuss some of what we feel are critical methodological issues of the INAIL checklist with respect to its psychometric properties and to the quality of data organization and accessibility (Aiken, 1996; Kaplan & Saccuzzo, 2012; Nunnally, 1978). Critical points common to the three investigated areas are:
No justification is provided for converting the sum of scores to an area score for the first area (sentinel event) and not for the other two (work content and work context).
No objective evidence of a correlation with the measured phenomena is provided for the choice of the thresholds adopted for dividing risk levels.
All items have equal significance. For example, the “Diffusion of the organizational chart” (INAIL, 2013, p. 45, question n. 37) has the same weight for the final risk assessment as the “Misconduct of top managers and colleagues are properly managed” (p. 48, n. 61).
One of the publicly available Risk Assessment Documents required to comply with Italian legislative requirements (City of Imola, 2013) explicitly stated “The empirical basis of the method used to compute and weigh scores assigned to items is unknown” (p. 24).
In the following sub-sections, we further discuss critical methodological issues, first those that are specific to each of the three investigated areas and then those related to how the overall outcome is constructed.
Sentinel Events
The most relevant weakness is that increase or decrease is measured using absolute values and not as a percentage. Even for those indicators expressed in percent, depending on the magnitude of the base level, a big difference may exist between an increase of 1 point and 5 points. Moreover, if there is an increase, a 100% increase should weigh more than a 1% increase. Next, we note the method gives little emphasis to the need for providing supporting documentation demonstrating the revealed trends. Finally, we observe that it would have been more appropriate to compare values of sentinel events with reference values for the industrial sector.
Work Content and Work Context
A critical element for both areas is that it is difficult to provide an objective answer to some questions, because they often investigate issues with a qualitative nature and their meaning depends on the subjective interpretation of the compilers. For example, expressions such as “Adequacy of equipment resources to accomplish the task” (INAIL, 2013, p. 42, question n.15), “Particularly monotonous works” (p. 42, n.16), or “Roles are clearly defined” (p. 46, n. 49) are not objectively interpretable (Barattucci & Sarchielli, 2013).
Next, the yes/no questions are not able to properly detect elements of risk because the simple positive or negative answer does not give indications on the quality of investigated aspects. For example, a yes/no answer to a question such as “Meetings between management and employees” (INAIL, 2013, p. 45, question n.43) does not allow evaluating conditions, frequency, and quality of the meetings, which are instead highly relevant aspects to be evaluated for the sub-area of organizational culture.
Construction of the Overall Outcome
In this sub-section, we discuss the construction of the overall outcome of the preliminary assessment, focusing on three steps where we have found methodological weaknesses that undermine the significance of the outcome itself.
Area scores are summed
The most critical element of the INAIL method is that the summing of area scores tends to hide risk levels for some of the areas. Consider, for example, a situation where the area scores for the three areas are, respectively, 5 (high), 3 (non-relevant), and 9 (medium). Then the overall risk score is 17, with an overall risk level of non-relevant. Note that the methodological guidelines in case of an overall risk level of non-relevant conclude that, in such a case, the checklist “does not reveal specific conditions that can determine the presence of work-related stress” (INAIL, 2013, p. 51). Hence, on the basis of the staged assessment prescribed by the Italian methodological approach (section “Background”), there is no need of an in-depth assessment.
For such a delicate process, it would be more appropriate to observe the concurrence and concordance of results in the various areas than to simply compute their sum, which, as is well known, has a smoothing effect on the overall result. The construction of a synthesis indicator should be done using the appropriate logical-mathematical combinations with the ability to correct possible distortions of single area indicators (Lazarsfeld, 1966). The INAIL method leads to a contrary result: the synthesis indicator hides the outcome of single area indicators.
Prescriptions for medium/high overall risk levels are ambiguous
When discussing the overall assessment outcomes of medium and high risk levels, the INAIL method advises that corrective actions need to be taken for those sub-areas of work content and work context “with the highest risk level” and, if ineffective, an in-depth assessment must be performed (INAIL, 2013, pp. 24-25).
These prescriptions present some ambiguity. First of all, they do not motivate as to why no intervention is prescribed in the area of sentinel events: It is true that these describe objective facts that cannot be altered, but the prescriptions could have suggested to investigate at least possible correlation/dependencies among them and the work content/context sub-areas with the highest risk levels. Next, these prescriptions do not justify why corrective actions need to be taken only for the sub-areas with the highest risk level. Nor is this indication formulated in an operational way: Does it refer to the two highest or three highest or how many? Finally, these prescriptions suggest the same corrective approach in the two cases of overall risk level of medium and high: Which is, then, the difference between the two situations?
As these prescriptions appear under the paragraphs describing the overall risk level of medium and high (INAIL, 2013, pp. 24-25), it is clear that nobody would apply them in the case of an overall risk level of non-relevant. The need for addressing the critical situations in those sub-areas will, thus, be neglected.
Adaptation of the checklist is not discussed
A third important critical element is that is not clear how the same checklist may be applied to every kind of company, because not all items in the checklist are applicable to all companies. Contrast this procedure with the need for tailoring the tool to the type of company, which is explicitly required, for example, by Satzer and Gerey (2009) and by Alis, Dumas, and Poilpot-Rocaboy (2010). The INAIL method (INAIL, 2011, 2013) has provided neither any guidance for the adaptation of the checklist to the specific sector of an organization nor any discussion on the reliability of such a modification. For example, do threshold values separating risk levels in various areas keep their validity independently of the organization’s sector? Moreover, methodological indications should be provided on how to manage the items that are not applicable to the examined company: How do these affect each area score and the overall result of the preliminary assessment? The INAIL method (INAIL, 2011, 2013) has not addressed this highly relevant problem. Finally, it has not provided any normative database, for purpose of comparison and interpretation (Aiken, 1996; Kaplan & Saccuzzo, 2012; Nunnally, 1978).
Validity and implications
INAIL declares (INAIL, 2013, Preface) that its methodological path “has been merged with the experiences of the” Inter-Regional Technical Coordination Committee of Prevention in Workplaces (2012), containing guidelines for the correct risk management within companies and for oversight activities of public health agencies. These prescribe that checklists used for the preliminary assessment need to be:
scientifically valid with respect to: Evaluated stressors Objective and verifiable elements examined to estimate stressors Criteria to assign scores and compute risk level. (p. 18)
But in the INAIL method (INAIL, 2011, 2013), there is no evidence of a scientific validation able to prove the INAIL checklist is a useful, valid, and reliable method (Aiken, 1996; Kaplan and Saccuzzo, 2012; Nunnally, 1978). Ronchetti et al. (2014) reports of a convergent validity between the checklist and the INAIL indicator tool, but the published details of the study are not enough to understand the reliability of these reported findings.
Psychometric properties
The previous discussion shows that the INAIL checklist has severe methodological weaknesses, and there is no evidence in the literature that it satisfies the following main psychometric principles:
Internal coherence, as an aspect of reliability;
Stability, as an aspect of reliability;
Content validity;
Criterion validity.
Only in regard to the area of construct validity is there a very preliminary result (discussed above) regarding convergent validity.
The psychometric property of reliability has two characterizations: internal coherence (referring to the fact that the various parts of the tool provide highly correlated indications) and stability (referring to the fact that reusing the tool after some time over an unchanged sample provides the same results). A tool whose reliability has not been established makes it impossible to derive meaningful consequences from its measurements.
Content validity refers to which degree the various components of a tool are appropriate to measure the construct and cover the entire construct domain. Its absence means the tool is not measuring what the tool should measure or is measuring only part of it.
Criterion validity refers to various ways of evaluating the tool as a good operationalization of the psychological construct to be measured (Trochim & Donnelly, 2006). Its absence means the tool is not a proper operationalization of the construct.
Implications
The various methodological weaknesses of the INAIL checklist previously discussed in this section are highly relevant in the light of the fact that their combined effect might result in an under-valuation of the actual risk levels. Considering that the INAIL method is freely and widely available because of the institutional role of such an organization, it is clear the potential consequences of these weaknesses affect a large part of the Italian workforce. Moreover, since workers’ safety measures are defined on the basis of assessment outcomes, we conclude these weaknesses have a clear bearing on the practice of work-related stress risk assessment.
Conclusion
The most widely used European models for work-related stress evaluation emphasize an approach centered on workers’ perception. But, given the widely discussed potential distortions of self-report perceptions (see, for example, Ostry, Kelly, Demers, Mustard, & Hertzman, 2003), several attempts afforded the development of observational methods. We analyzed, as an example of these attempts, an objective tool developed in Italy that declares itself to be inspired by the HSE approach. Zoni and Lucchini (2012) have already observed that the INAIL checklist has departed from HSE’s spirit, because of the “predominant relevance given to the assessment of objective factors in the first steps of the evaluation” (pp. 47-48). Indeed, the use of a checklist as a closed system of measurement and not as a process of evaluation shifts the balance of the assessment method toward objective measures.
We have found that the INAIL checklist has a number of methodological weaknesses in terms of psychometric principles, analyzed and discussed in sections “Method” and “Discussion.” We therefore conclude it is not methodologically well founded. A preliminary version of our work is in Corradini, Marano, and Nardelli (2014). Given the fact that assessment outcomes have a sensible impact on workers’ safety measures, we conclude these weaknesses have a clear bearing on the practice of work-related stress risk assessment. Given the relevant literature and the complexity of involved phenomena, we think it should be mandatory to evaluate work-related stress risk by means of the integrated use of both well founded objective measures and adequate workers’ involvement, through focus groups, questionnaires, and similar tools, as suggested, among others, by Albini, Zoni, Parrinello, Benedetti, and Lucchini (2011); Corradini, Marano, and Nardelli (2015a, 2015b); Panari, Guglielmi, Ricci, Tabanelli, and Violante (2012).
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research and/or authorship of this article.
