Abstract
Background and objective:
The aim of this study was to first produce and cross-culturally validate a Finnish version of the FACE-Q Eye module, a patient-reported outcome measure designed for patients undergoing eyelid surgery for esthetic reasons, and second assess the suitability of this instrument for use in a university hospital setting.
Methods:
The FACE-Q Eye module and the general FACE-Q components Satisfaction with Facial Appearance, Appearance-Related Psychosocial Distress, and Satisfaction with Outcome were translated according to established guidelines. A postal survey study was conducted with the translated instrument and the generic health-related quality of life instrument 15D on 245 patients operated in the Helsinki University Hospital between 2009 and 2019. Cronbach’s alpha, floor and ceiling effects, measurement reliability with repeat administration, and convergence with 15D dimensions were analyzed.
Results:
The FACE-Q Eye module and general components translated readily into Finnish. Eighty-one patients (33%) responded to the survey, most of whom (78%) had undergone blepharoplasty. Most subscales demonstrated acceptable internal consistency with Cronbach’s alphas 0.79–0.96. A ceiling effect was observed for four of the seven subscales evaluated. Intra-class correlation coefficients were high (0.82–0.91) indicating good reliability. Results of the FACE-Q subscales correlated at best moderately with the 15D dimensions.
Conclusions:
The Finnish versions of the FACE-Q Eye module and the FACE-Q components Satisfaction with Facial Appearance, Appearance-Related Psychosocial Distress, and Satisfaction with Outcome perform well when assessing outcomes relevant to patients after eyelid surgery. However, when used in patients operated on for mainly functional reasons, subtle variations may be missed.
Introduction
Periorbital region includes highly visible facial subunits exacting precise surgery that preserves both esthetics and function. Blepharoplasty is a common esthetic procedure and one that is, due to its functional benefits, covered by the public healthcare system in Finland. Together with ectropion repair, it is also a frequent adjunct to larger facial esthetic or functional procedures in patients with facial nerve paralysis, the amyloidosis characteristic of Meretoja facial paralysis syndrome, or complex facial trauma. An increasing focus on evidence-based medicine and the trend toward performance monitoring in plastic surgery with patient-reported outcome measures (PROMs) call for a tool for assessment of surgical outcomes in these patients. 1 However, no validated instrument to assess the patient-reported outcomes of eyelid surgery is yet available in Finnish.
FACE-Q is a PROM developed using psychometrically robust methods to assess outcomes of facial esthetic surgery. It contains 40 independently functioning scales, including four designed to assess the appearance of the eyes, and one addressing adverse effects of esthetic eye surgery.2 –5 The FACE-Q eye module is, to our knowledge, currently the only available psychometrically validated PROM for patients undergoing treatment in the periorbital region. The FACE-Q scales are free for physicians to use in clinical and research purposes, and normative values are available for the Satisfaction with Facial Appearance scale. 6 Here, we aimed to produce a Finnish version of the FACE-Q scales relevant to patients undergoing eyelid surgery, and to assess the validity and the reliability of the scales.
Material and Methods
The study protocol was approved by the ethics committee of the Helsinki University Hospital. Patients who had undergone blepharoplasty, ptosis repair, ectropion repair, or removal of an eyelid skin lesion in the Helsinki University Hospital Department of Plastic Surgery between 2009 and 2018 were recalled from the hospital operating theater records. A total of 432 patients were identified. Of these, 300 patients were randomly selected for screening. Among the screened patients, 245 Finnish speaking adults aged 85 years or younger were identified.
A postal survey was conducted. The questionnaire package included the selected FACE-Q modules, the 15D, a question “How normal do you think your eyes are (0%–100%)?,” and demographic as well as health background questions. The questionnaires were returned, together with informed written consent, in a prepaid envelope. A repeat questionnaire of the FACE-Q modules was sent to the participant upon receipt of the initial questionnaire. The patient records were reviewed for the diagnosis, type of operation, date of operation, and postoperative complications.
Instruments
FACE-Q
Four scales of the original FACE-Q Eye module were selected as relevant for our patient population: eyes overall (seven questions), upper eyelids (seven questions), lower eyelids (seven questions), and eye-specific adverse effects (six questions). The eyelashes scale was not included. In addition, three general FACE-Q scales were used as follows: Satisfaction with Facial Appearance (10 questions), Appearance-Related Psychosocial Distress (eight questions), and Satisfaction with Outcome (six questions).
Translation of the FACE-Q scales was done following the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) guidelines. 7 Two separate forward translations were produced by native Finnish speakers fluent in English, based on which a consensus version was agreed on. A back translation was done by a professional translator. The original English version and the back translation were compared by a panel of three surgeons including a native English speaker fluent in Finnish. The final Finnish version was proofread by a linguistic expert and reviewed by a panel of four surgeons. Pilot testing with eight patients invited to comment on the questionnaire revealed no need for further changes.
15D
The 15D is a self-administered generic health-related quality of life (HRQoL) instrument containing 15 dimensions, each scored on a scale from 1 (best) to 5 (worst): moving, seeing, hearing, breathing, sleeping, eating, speech, excretion, usual activities, mental function, discomfort and symptoms, depression, distress, vitality and sexual activity. The 15D was chosen for use in this study because the Finnish version of it has undergone stringent psychometric validation and is widely used in Finland. 8
Statistical methods
Each FACE-Q subscale was analyzed independently. The FACE-Q scale data were screened for missing values. If the proportion of missing values in each scale was more than 50%, the patient was excluded from further analysis on the given scale. Otherwise, the missing values were replaced with the mean of the responses of other items in the scale. Total scores were scaled from 0 indicating the worst outcome to 100 indicating the best. The scaling was done according to the original FACE-Q scale scoring.2 –5 Score distributions with medians and interquartile ranges (IQRs) were examined. Floor and ceiling effects were assessed: if the proportion of patients scoring minimum or maximum score was over 15%, a floor or ceiling effect, respectively, was considered as confirmed.
Exploratory factor analysis (EFA) was conducted for each FACE-Q scale to evaluate internal structure and unidimensionality of the scales through identifying the number of latent variables influencing the score. Parallel analysis using the maximum likelihood method with 50 iterations of simulated analysis was conducted to obtain the 95th percentile eigenvalues. The obtained simulated eigenvalues were compared against the eigenvalues obtained from the observed data to determine the final number of factors to be included into the EFA. EFA was conducted using maximum likelihood method and promax rotation.
Underlying factor structures of the FACE-Q scales were evaluated individually by examining the eigenvalues of identified factors and loading and communality values of the items against the given factors. Loading value over 0.4 was interpreted as the item representing the given factor sufficiently. Communality value over 0.5 was interpreted as the given factor sufficiently accounting for variance of the item. Acceptable loading and communality values were taken to indicate unidimensionality of the scale, meaning that the scale assesses a single construct. Cronbach’s alphas were calculated to evaluate internal consistency of the scales. Values over 0.7 were interpreted to represent acceptable internal consistency, whereas values over 0.95 were taken to indicate redundancy.
Measurement reliability of the FACE-Q scales was assessed utilizing repeated measurements. The difference in the two measurement scores was examined using the Mann–Whitney U test. In addition, the intra-class correlation coefficient (ICC), the standard error of the measurement (SEM), and the repeatability coefficient (R) were calculated between the two measurements. An ICC value over 0.7 was interpreted as sufficient reliability of the measurement. To estimate the SEM value, the square root of the analysis of variance (ANOVA) error variance of the measurements was calculated. The SEM value of each scale was compared to the IQR of the baseline measurement score with values close to 0 representing low variation and values close to half of the IQR length representing high variance and, thus, low reliability of the measurement. In R-value and 95% confidence interval (CI) calculation, generalized mixed-effects model fitted by restricted maximum likelihood with bootstrapping of 1000 repetitions was used. R-values close to 0 represent high precision of the measurement.
Spearman correlation coefficients between the FACE-Q scale scores and 15D dimensions and question of “How normal do you think your eyes are?” were calculated to examine convergence of the scales with HRQoL dimensions and with the overall perception of eye appearance. Correlation coefficients were interpreted as follows: less than 0.3 is negligible; 0.3–0.5 is low; 0.5–0.7 is moderate; and over 0.7 is high.
R (3.6.1) statistical software with packages “tidyverse,” “psych,” “rptR,” “rel,” “GPArotation,” and “ltm” was used in data processing and statistical analysis.9 –15 The results were presented as counts with percentages, medians with IQRs, mean values and standard deviations (SDs), or coefficients with 95% CIs.
Results
Translation
The FACE-Q Eye module, Satisfaction with Facial Appearance, Appearance-Related Psychosocial Distress, and Satisfaction with the Outcome translated well to Finnish. The concepts covered in the questions were straightforward to phrase with common expressions. The review of the back translation and the pilot testing with eight patients led to no changes in the language of the instrument.
Validation
Eighty-one patients (33% response rate) returned the questionnaire and were included in the analysis. The median age at the time of questionnaire completion was 69 years (46–86 years). Most patients (78%) had undergone blepharoplasty. The median time from surgery was 6 years (0–10 years). Further participant characteristics are shown in Table 1. The repeat questionnaire was returned by 58 patients (72% of the participants). The median delay between the day the first questionnaire was filled and the repeat questionnaire sent was 17 days (range 3–74 days).
Characteristics of the patients.
BMI: body mass index.
Blepharoplasty and ectropion repair; resection of lower eyelid skin; resection of upper orbital rim.
Delayed wound healing.
Wound infection requiring antibiotic treatment.
Results of the EFA are detailed in Table 2. Scores of the FACE-Q subscales, and assessment of the internal consistency of the subscales and reproducibility of the score are shown in Table 3. Repeatability assessment for the subscales is shown in Fig. 1.
Results of exploratory factor analysis of the FACE-Q Eye subscales.
Scores of the FACE-Q subscales, and assessment of the internal consistency of the subscales and reproducibility of the scores.
IQR: interquartile range; 95% CI: 95% confidence interval.

Reliability of the FACE-Q subscales on repeat administration.
Satisfaction with Facial Appearance
Scores of the Satisfaction with Facial Appearance scale were normally distributed. No floor or ceiling effects were observed. Parallel analysis of the items suggested one factor to be included into factor analysis. In factor analysis, eigenvalue of the included factor was 6.85. All items of the scale loaded strongly on the factor. Communality of item h was low (0.42) suggesting that the item may reflect another underlying factor than the one identified. However, the loading value of the item h was still acceptable (0.65). Communalities of other items were high. High Cronbach’s alpha (0.95) indicated high internal consistency. In repeated measures, scores of the Satisfaction with Facial Appearance scale did not significantly differ. In addition, observed high ICC value and low SEM and R-values indicate high reliability of the scale.
Appearance-Related Psychosocial Distress
The Appearance-Related Psychosocial Distress scale scores were skewed toward high scores and a ceiling effect was confirmed (maximum score 17.3%). Parallel analysis provided one factor for factor analysis. Eigenvalue for the given factor was 5.75. Items of the scale presented acceptable loading values (0.46–0.91). Items a and h yielded low communality values (0.49 and 0.21, respectively) indicating involvement of other underlying constructs than the factor under investigation. Communalities of the other items were acceptable. Internal consistency of the scale was high (Cronbach’s alpha 0.95). Reliability of the Appearance-Related Psychosocial Distress scale was found to be excellent based on repeated measure scores with high ICC and low SEM and R-values.
Satisfaction with Outcome
A total of 16.0% of patients obtained a maximum score from the Satisfaction with Outcome which confirms the ceiling effect. A floor effect was not observed. According to parallel analysis, one factor (eigenvalue 4.84) was selected for further examination. Loading and communality values of the items were high indicating unidimensionality. In addition, Cronbach’s alpha was high presenting high internal consistency. The ICC of the scores was high, and SEM and R-values were low indicating excellent reliability.
Satisfaction with Eyes
No floor or ceiling effects were observed in the Satisfaction with Eyes scale. One factor identified in parallel analysis was found to be sufficient in factor analysis as loading and communality values of the items were high. Eigenvalue of the factor was 5.60. In addition, internal consistency was high (Cronbach’s alpha 0.96), which, in turn, indicates a potential redundancy of the items in the scale. Reliability was high as ICC was high, and SEM and R-values were low.
Appraisal of Upper Eyelids
A ceiling effect was confirmed in the Appraisal of Upper Eyelids scale (maximum score 23.5%). A floor effect was not observed. One factor revealed by parallel analysis was found to be sufficient with an eigenvalue of 5.68 and high loading and communality values for all items. Cronbach’s alpha (0.96) indicated high internal consistency with potential redundancy of the items. ICC as well as SEM and R-values showed excellent reliability of the measurement.
Appraisal of Lower Eyelids
In the Appraisal of Lower Eyelids scale, 24.7% of the patients obtained a maximum score which confirms a ceiling effect. A floor effect, in turn, was not presented. One factor was selected for further analysis after parallel analysis. In the factor analysis, loading values against the given factor (eigenvalue 5.57) were high, as were communality values. Internal consistency was high (Cronbach’s alpha 0.96). Reliability of the scale assessed using ICC, SEM, and R was excellent.
Adverse Effects: Eyes
No floor or ceiling effects were confirmed in the Adverse Effects: Eyes scale as the proportion of minimum or maximum scores was less than 15%. Parallel analysis provided one factor for further investigation. In factor analysis, four items of six in the Adverse Effects scale (items 1, 4, 5, and 6) displayed low communalities (0.06, 0.28, 0.19, and 0.45, respectively). In addition, item 1 loaded weakly (0.25) on the identified factor. Observed communality and loading values suggest non-unidimensionality of the Adverse Effects scale. Internal consistency, however, was sufficient (Cronbach’s alpha 0.79). According to ICC, SEM, and R-values, reliability of the measurement was high.
15D
Examination of convergence of the FACE-Q scales with the 15D dimensions showed the most consistent moderate to low correlations against the discomfort and symptoms, the vitality, the mental function, and the depression dimensions (Table 4). There were only minor correlations between the Appraisal of Upper Eyelids or the Appraisal of Lower Eyelids scales and the 15D dimensions suggesting that satisfaction with the eyelids is not associated with generic HRQoL. The Appearance-Related Psychosocial Distress and the Appraisal of Upper Eyelids scales correlated moderately with the item “How normal do you think your eyes are?” responses. All other scales correlated weakly, except the Appraisal of Lower Eyelids, of which correlation coefficient was of negligible strength.
Convergence between FACE-Q Eye module subscale scores and the 15D dimensions or the self-perceived normality of the eyes. The numbers are the correlation coefficient (p-value).
correlation coefficient less than 0.30.
Discussion
Incorporation of patient-reported outcome data into clinical decision-making is increasingly valued. 1 Here, we produced a Finnish version of the FACE-Q Eye module, a PROM designed for assessment of patients seeking eyelid treatment primarily for esthetic reasons. 4 We also included three general FACE-Q scales salient to patients undergoing eyelid surgery: Satisfaction with Facial Appearance 16 , Appearance-Related Psychosocial Distress 3 , and Satisfaction with Outcome. 2 Evaluating the validity of the Finnish version in a university hospital setting, we observed that the instrument is applicable also to a wider range of patients, including those with conditions affecting the appearance and the function of the face also outside of the eye region. We also found the results of the scale highly reproducible on repeat admission.
The eye-specific scales Satisfaction with Eyes, Appraisal of Upper Eyelids, and Appraisal of Lower Eyelids all displayed unidimensionality and high reliability. This suggests that the Finnish version of the scale performs well at evaluating the patients’ satisfaction with their eye region. 17 Both appraisal scales demonstrated a ceiling effect with almost every fourth patient scoring the maximum points. As the scale has originally been validated on a group containing both pre- and postoperative patients 4 , our finding may reflect the solely postoperative patient sample used here. However, it raises the possibility that the scales may not detect subtle differences in the outcomes for patients generally satisfied with their eyes.
The general FACE-Q scales Satisfaction with Facial Appearance, Appearance-Related Psychosocial Distress, and Satisfaction with Outcome were found to have high internal consistency and reliability. Individual items addressing satisfaction with appearance of the face in photos, unhappiness in appearance, and interest in doing things stood out as possibly influenced by constructs other than that measured by the other questions in the scales. Similarly, responses to Adverse Effects: Eyes questions suggested non-unidimensionality of the scale but sufficient internal consistency. Some of this deviation from the original validation data may be explained by our more diverse patient group including 20% of the participants with conditions affecting their facial function and appearance in ways not influenced by the eyelid surgery.
Our study population may have differed in outcome expectations as well as preoperative status from the primarily esthetic patient population used in the development of the FACE-Q Eye module. In particular, the patients enrolled in our study were all operated on for functional reasons such as the limited field of vision associated with dermatochalasis. This may partly explain why over 15% of the participants received maximum scores for the Satisfaction with the Outcome scale and reported no appearance-related psychosocial stress. Both these components of the FACE-Q have previously been found to be reliable in assessment of patients with facial trauma 18 , and FACE-Q subscales have been validated for use in patients undergoing orthognathic surgery. 19 Thus, the ceiling effect observed for several subscales in our study should not, in our view, be interpreted to suggest unsuitability of the scales for use in a more varied patient group.
Results of the FACE-Q scales correlated at best moderately with aspects of a general HRQoL evaluated with the 15D questionnaire. The moderate correlation was detected between 15D dimension discomfort and symptoms and the Satisfaction with Facial Appearance, Appearance-Related Psychosocial Distress, and Adverse Effects. While the Appraisal of Upper Eyelids correlated moderately with the assessment of the “normality” of the eyes, correlation of the scales measuring satisfaction with the eyes with dimensions of a general HRQoL was low or negligible. This emphasizes the need for an eye-specific instrument that enables evaluation of outcomes relevant to patients undergoing surgery in the periorbital region.
The low response rate of 33% and a relatively small study population of 81 participants are significant shortcomings of our study. We were thus unable to perform the Rasch analysis to further examine the performance of the Finnish version of the FACE-Q Eyes module in our population. Furthermore, the cross-sectional design on postoperative patients did not enable determination of the minimum detectable change or the normative values for the scale. Thus, future studies are needed to better characterize the behavior of the scale.
In conclusion, the Finnish versions of the FACE-Q Eye module and the FACE-Q components Satisfaction with Facial Appearance, Appearance-Related Psychosocial Distress, and Satisfaction with Outcome were produced following accepted guidelines for translation and cross-cultural adaptation. The scales have good internal consistency and reproducibility. Although some of the subscales show a ceiling effect, the scales are suitable for use also in patients outside of the esthetic surgery setting. The FACE-Q scales are available free of charge for clinical and research use at the Q-Portfolio website. 20
Footnotes
Acknowledgements
The authors thank Leena Caravitis for assistance in data processing.
Author contributions
The study conception and design were done by S.P.H., J.P.R., A.J.L., and P.A.L. Production of the Finnish version of the FACE-Q modules was done by S.P.H., J.P.R., A.J.L., and P.A.L. Data collection, data analysis, and article drafting were done by S.P.H. and M.M.U. All authors revised the article critically for important intellectual content and approved the version to be published.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by funding from the Helsinki University Musculoskeletal and Plastic Surgery Research Center.
