Abstract
Purpose:
This study validates and assesses the reliability of the Swedish version of the BREAST-Q Reconstruction Module, a patient-reported outcome measure (PROM) designed to evaluate satisfaction and health-related quality of life (HRQoL) before and after breast reconstruction.
Methods:
This observational study included 300 patients undergoing breast reconstruction at Sahlgrenska University Hospital. The psychometric evaluation involved content validity through cognitive interviews, internal consistency (Cronbach’s α), test–retest reliability, convergent validity (comparison with the EORTC QLQ-BRECON 23), responsiveness (Wilcoxon rank test), and analysis of floor and ceiling effects. BREAST-Q version 1.0 was used to maintain item richness and comparability with historical data.
Results:
Content validity was confirmed through interviews. Internal consistency was acceptable across all domains (Cronbach’s α = 0.70–0.95). Test–retest reliability showed good agreement on the group level. Moderate correlations with BRECON-23 supported convergent validity. Responsiveness was demonstrated with significant improvements across all domains postoperatively (mean difference = 20.3, p < 0.001). Ceiling effects were observed preoperatively in Physical Well-being abdomen (40.1%) and postoperatively in four domains, notably Physical Well-being chest (45.5%). These effects may reflect true clinical success but also suggest limited sensitivity in high-functioning populations.
Conclusion:
The Swedish BREAST-Q Reconstruction Module demonstrates good validity, reliability, and responsiveness. However, ceiling effects in several domains highlight limitations in detecting subtle differences among patients with high levels of satisfaction, suggesting a need for stratified analysis and potential refinement of the scale.
Preregistration:
ClinicalTrials.gov Identifier: NCT05233891 and NCT04714463.
Context and Relevance
The primary goal of breast reconstruction is to improve patients’ quality of life. The BREAST-Q Reconstruction Module is a validated PROM designed to assess outcomes from the patient’s perspective. While translated into over 30 languages, the Swedish version has not yet been validated. Cultural adaptation is essential to ensure its validity and reliability across linguistic contexts. Addressing this gap is crucial for enabling accurate evaluation of patient-reported outcomes in Swedish-speaking individuals undergoing breast reconstruction.
Introduction
Breast reconstruction surgery is a critical component of breast cancer treatment, which positively impacts body image and health-related quality of life (HRQL).1–3 Any instrument used to measure these outcomes must be thoroughly validated, and its reliability tested to ensure comparison is as accurate as possible. Several patient-reported outcome measurements (PROMs) have been developed; however, a 2021 review identified only three PROMs for breast reconstruction as robustly developed and validated. 4 The BREAST-Q is the most widely used of them 5 and is continuously updated. 6
Cultural adaptation of the BREAST-Q is essential to ensure its validity and reliability across different linguistic and cultural contexts. 7 The BREAST-Q has been translated into more than 30 languages. 5 However, no validation of the BREAST-Q Reconstruction Module in Swedish has been published.
This study aimed to validate and test the reliability of the BREAST-Q Reconstruction Module in the Swedish language. This involved a psychometric evaluation to assess the instrument’s reliability, validity, and responsiveness in the Swedish context.
Methods
Protocol and ethics
This cross-sectional study validates a patient-reported outcome measure (PROM) questionnaire for individuals undergoing breast reconstruction after mastectomy. Ethical approval for this study was granted by the Swedish Ethical Review Authority (reference number 2020-04729). The research adhered to the ethical principles outlined in the Declaration of Helsinki, as incorporated into Swedish legislation through the Ethics Review Act (SFS 2003:460). Data management complied with the General Data Protection Regulation (GDPR) and followed the study’s data management plan. Informed written consent was obtained from all participants for study participation and publication. The BREAST-Q Reconstruction module, developed by Drs Klassen, Pusic, and Cano, was used under license from Memorial Sloan Kettering Cancer Center in New York, USA (https://qportfolio.org/breast-q/). The EORTC QLQ-BRECON 23, developed by Winters et al. 8 was used under license from the EORTC Quality of Life Group (https://qol.eortc.org). The study was reported according to the COSMIN guidelines for studies on measurement properties of PROMs. 9
Context, recruitment, and sample
The research was conducted in the Department of Plastic and Reconstructive Surgery at Sahlgrenska University Hospital, which serves approximately 1.5 million people in Western Sweden. The department performs about 400 breast reconstructions yearly. Breast reconstruction is performed in women with a body mass index (BMI) ⩽ 30 kg/m2, the American Society of Anesthesiologists classification (ASA) ⩽ 2, and non-generalized disease. The patient must abstain from smoking for 6 weeks preoperatively and 6 weeks postoperatively. All patients referred to the clinic for breast reconstruction who fulfilled the criteria and were going to undergo surgery were asked to participate in the study. The participants were divided into two groups: therapeutic and risk-reducing mastectomies. All the reconstructions after therapeutic mastectomy were delayed, and so the participants missed one or two breasts at the time of answering the preoperative questionnaire. The risk-reducing mastectomies were immediate reconstructions in cancer-free women, and so the participants had both their breasts at the time of answering the preoperative questionnaire. Questionnaires and a prestamped return envelope were sent to 600 patients prior to surgery (Fig. 1). Two reminders were sent in case the patient did not answer the questionnaire. Sixty patients who answered the questionnaire were asked to complete it again after 2 weeks to allow for analysis of test–retest reliability. In addition, 200 patients who underwent surgery were asked to complete the questionnaire 12 months postoperatively to analyze responsiveness (Fig. 1). One hundred patients who answered the questionnaire postoperatively were also asked to answer the EORTC QLQ-BRECON-23 to assess convergent validity (Fig. 1). The exclusion criteria were insufficient Swedish language skills or the inability to give informed consent. Recruitment started in September 2021 and ended in June 2024.

The course of the study. Figure created by Åsa Bell, medical photographer, Department of Plastic and Reconstructive Surgery, Sahlgrenska University Hospital, Gothenburg, Sweden.
The BREAST-Q reconstruction module
The BREAST-Q Reconstruction Module is a PROM specifically designed to evaluate the outcomes of breast reconstruction surgery from the patient’s perspective. This module was developed through a rigorous qualitative research process and has been thoroughly validated.6,10–12 There are currently two versions of the BREAST-Q reconstruction (1.0 and 2.0)9,13 in use. We choose to validate version 1.0, as the items are more detailed. 14 The BREAST-Q modules comprise six domains: Physical, psychosocial, and sexual well-being, Satisfaction with breasts, Outcome, and Care. Every domain contains items rated on a Likert-type scale. Raw scores are summed and converted into equivalent Rasch-transformed scores, ranging from 0 to 100, with higher scores indicating better outcomes and higher satisfaction. The Swedish BREAST-Q for latissimus dorsi 15 and BREAST-Q expectations 16 have been previously validated. The BREAST-Q reconstruction questionnaire was previously translated into Swedish following established guidelines 7 and has been used in several studies.14,17
Assessment of psychometric properties and statistical analysis
We examined measurement properties according to the criteria proposed by Terwee et al. 18 Continuous variables were described by mean (standard deviation or 95% confidence interval) and median (minimum and maximum). All tests were two-tailed, and a p-value of 0.05 was considered to indicate a statistically significant result. The analysis was performed on the BREAST-Q Rasch converted scores, 0–100. Statistical tests were performed using SPSS, version 29, for Mac (IBM Corp, Armonk, NY, USA) and Microsoft Excel for Mac, version 16.94.
Content validity
A pilot test of the previously translated version was conducted with five preoperative and five postoperative native Swedish speakers of mixed ages (Fig. 1). A specially trained research nurse conducted a semi-structured interview with them, focusing on how the participants understood and interpreted the items and their acceptability.
Internal consistency
Internal consistency measures how well different questions (items) correlate with each other, indicating whether they measure the same concept (construct) and if it is appropriate to combine their scores into a single score. Cronbach’s 19 α was used to assess internal consistency for the two scales. Alpha values between 0.70 and 0.95 are generally considered acceptable. 20 A low Cronbach’s α suggests a lack of correlation between the scale’s questions, making it unjustified to combine them into a total score. Conversely, a very high Cronbach’s α (⩾0.95) may indicate redundancy among the questions. 18
Inter-item correlations were calculated using the Spearman correlation coefficient (ρ). The inter-item correlation indicates the degree of relationship between the questions within the two scales, with a value between 0.2 and 0.8 indicating good consistency. Correlations >0.8 might suggest that some items are too similar and redundant.
Construct validity
Criterion and convergent validity are subtypes of construct validity, differing in their focus and application. Criterion validity is assessed by comparing the measure in question to a criterion measure, typically an established or gold standard measure. In contrast, convergent validity is evaluated by examining the correlation between different measures that are theoretically related. There is no gold standard for measuring health-related quality of life (HRQoL) and satisfaction in patients undergoing breast reconstruction.
To assess convergent validity, we compared the BREAST-Q to EORTC QLQ-BRECON 23. The EORTC QLQ-BRECON 23 incorporates six multi-item scales to evaluate the side effects of disease and surgery, sexual functioning, satisfaction with cosmetic outcomes of the breast and nipple, donor site symptoms, and satisfaction with surgery. 8 All the scales range in score from 0 to 100, with a high score representing a high level of symptoms for symptom scales and a high level of satisfaction and functioning for functional scales. Goodman and Kruskal’s gamma was calculated to examine if the scales were related. We hypothesized that there would be a positive correlation between BREAST-Q Satisfaction with breast and Sexual well-being, and the corresponding Satisfaction with breast cosmetic and Sexual functioning of the EORTC QLQ-BRECON 23. As a high score of the symptom scales of the EORTC QLQ-BRECON 23 represents a high level of symptomatology, we hypothesized that Donor site symptoms would be negatively correlated to the Physical Well-being Abdomen domain of the BREAST-Q. As the BREAST-Q domains comprise more items and the items are differently formulated, compared with EORTC QLQ-BRECON 23, we expect the correlations to be moderate (λ > 0.30 to < 0.69) and not reach the threshold (λ > 0.7) of what is usually accepted as a strong correlation. 18
Reproducibility
Reproducibility measures the consistency of scores when the same patients are assessed multiple times (test–retest) under stable conditions. 18 Test–retest reliability was evaluated by having a subgroup of 60 participants complete the questionnaire twice, with a two-week interval between the assessments. Results were assessed using Bland and Altman 21 analysis, which evaluates agreement between two measurements by plotting the difference between the two scores against their mean. The mean difference and 95% limits of agreement were calculated for each domain of the BREAST-Q. In a previous study, Voineskos et al. 22 proposed a minimal important difference score of 4 points as clinically useful when assessing an individual patient’s outcome using the reconstruction module. Based on this, we hypothesized that the mean difference between test and retest scores would be less than 4 points, indicating acceptable reproducibility.
Responsiveness
Responsiveness is defined as the ability of a questionnaire to detect clinically significant changes with treatment. 23 We used the Wilcoxon rank test for related samples to evaluate the questionnaire’s responsiveness to treatment (preoperatively and 12 months postoperatively). The null hypothesis was that there would be no difference between the two measurements. The clinical hypothesis was that there would be a significant improvement postoperatively.
Floor and ceiling effects
Floor and ceiling effects were determined by calculating the percentage of participants who achieved the lowest and highest possible scores (0 and 100 points, respectively). These effects were considered met if more than 15% of the patients obtained either the minimum or maximum scores. 24
Results
Participants and data completeness
The BREAST-Q reconstruction questionnaire was sent to 600 patients preoperatively, and 300 replied (50%). Of the 300 patients, 200 received the questionnaire 12 months postoperatively, with 160 (80%) replying. In addition, 55 of 60 (92%) were included in the reliability test, and 100 patients answered the EORTC BRECON-23 (Fig. 1). The mean age was 51 years in both groups. Data were complete for ⩾96% of the domains answered by all patients, regardless of the type of reconstruction. Patient demographics and data completeness are summarized in Table 1.
Demographics and data completeness.
SD: standard deviation; DIEP: deep inferior epigastric perforator; LD: latissimus dorsi; N: number of patients.
Three were diseased before surgery, four developed metastases, and 14 changed their mind about wanting a reconstruction.
The domains were only answered by patients who had had an abdominally based free flap-based breast reconstruction, and therefore, data completeness is lower.
Content validity
The cognitive interviews indicated that participants had no trouble interpreting or understanding the questionnaire. Most women found the items across all domains acceptable, except for some items of the Satisfaction with staff domain, which several participants found unclear. Overall, the questionnaire demonstrated acceptable face validity, and no revisions were deemed necessary. The Satisfaction with staff domain was excluded from the validation process, as its items were not considered applicable in the Swedish context. A similar conclusion has been made in Denmark. 25
Internal consistency
Internal consistency was satisfactory for all domains (Cronbach’s α = 0.7–0.95), indicating a good interrelatedness and that combining their scores into a single score is appropriate. Cronbach’s α scores are summarized in Table 2.
Cronbach’s α of the different domains.
Inter-item correlations indicated good consistency for all domains (Electronic Supplement 1). All correlations were inside the predefined acceptable interval for Satisfaction with breast/s (Supplemental Table S1) and Sexual well-being (Supplemental Table S2). Three correlations fell outside the predefined acceptable interval for Psychosocial well-being (Supplemental Table S3) and Physical well-being chest (Supplemental Table S4), and one for Physical well-being abdomen (Supplemental Table S5). For the psychosocial well-being domain, the correlation between questions “how often have you felt”: “Accepting of your body,” “normal,” and “like other women” all scored higher than 0.8, indicating that the questions are similar and therefore could be redundant. Low coefficients outside the acceptable interval were found for questions about “neck pain” and “upper back pain,” and correlations with questions about “aching feelings in your breast area” and “throbbing feeling in your breast area” of the Physical well-being chest domain, indicating that the questions are measuring different constructs. Questions about neck and upper back pain are not part of the physical well-being domain of the BREAST-Q version 2.0. For the Physical well-being abdomen domain, the correlation between the questions “How often have you felt”: “Difficulty sitting up because of abdominal muscle weakness” and “lower back pain” fell below the predefined acceptable threshold (0.173), which indicates that the questions are not measuring the same construct.
Convergent validity
As hypothesized, the correlations between the BREAST-Q domains and the corresponding EORTC QLQ-BRECON 23 domains were moderate, indicating that BREAST-Q reconstruction has an acceptable convergent validity (Table 3).
The correlations between the BREAST-Q domains and the corresponding EORTC QLQ-BRECON 23 domains (preoperative values).
Reproducibility
None of the participants had surgery between the test–retest measurements. The Bland–Altman plots (Fig. 2A to E) showed that the mean differences between the two measurements were fairly close to zero for all domains and only exceeded four for Sexual well-being (Fig. 2E). However, all LOAs exceeded the a priori set limit of 4, as the lower LOAs ranged from −24 to −18, and the upper LOAs from 17 to 27. This suggests that the two measurements may differ by more than four for many individuals but remain stable at the group level.

Bland–Altman plots for (A) Satisfaction with Breast, (B) Psychosocial Well-being, (C) Physical Well-being Chest, (D) Physical Well-being Abdomen, and (E) Sexual Well-being. Solid lines mark the mean difference between test and retest scores, and dotted lines mark the limits of agreement.
Responsiveness
The null hypothesis could be rejected for all domains (p < 0.001) with an average mean difference between measurements of 20.3, indicating that the questionnaire can detect change with treatment. The results of the Wilcoxon rank test are summarized in Table 4.
Responsiveness—preoperative versus postoperative values.
Single item score (equivalent score on a 0–100 scale).
Floor and ceiling effects
Ceiling effects were reached for Physical well-being abdomen preoperatively, with 122 (40.1%) participants obtaining the highest possible score. Floor or ceiling effects were not reached for any other domain preoperatively (Table 5). One year postoperatively, ceiling effects were reached for four domains, including Psychosocial well-being (18.6%), Physical well-being chest (45.5%), Physical well-being abdomen (16%), and satisfaction with abdomen (26.3%) (Table 5). The findings suggest that the questionnaire may not adequately differentiate among high-functioning individuals, potentially due to too few items.
Floor and ceiling effects of the different domains.
CI: confidence interval.
Discussion
This is a psychometric validation of the Swedish version of the BREAST-Q reconstruction module. The study demonstrates good content validity and internal consistency, as well as convergent validity. It appears to have a high overall agreement between test and retest scores, indicating the ability to detect changes over time. Ceiling effects were reached for several domains, suggesting limitations in the questionnaire’s ability to differentiate among high-functioning individuals.
A factor that might affect the validity of a questionnaire study is the response rate and representativeness of the sample. The relatively low response rate (50%) in the preoperative group may have affected the results if the sample did not fully represent the patient group, that is, if patients with very low or very high satisfaction were more likely to decline participation in the study. The patient sample also has a high number of patients waiting for a prophylactic mastectomy (21%), meaning that the patients at the time of answering the questionnaire still had both their native breast, which could skew their answers in a more positive direction.
Construct validity is crucial for validating the BREAST-Q questionnaire, ensuring that it accurately measures what we intend to measure when evaluating breast reconstruction. The lack of a gold standard means that there is no definitive benchmark against which the BREAST-Q can be compared, and it is difficult to ascertain whether the BREAST-Q accurately captures the patient experience or if external factors influence it. It also increases the risk of measurement bias, which may affect the interpretation and application of the questionnaire in clinical practice. The assessment of convergent validity through comparison with the BRECON-23 revealed that only three domains were directly comparable due to differences in item content. Despite these differences, moderate correlations were observed between the comparable domains, confirming our expectation that the measures would align to some extent but not perfectly. The presence of moderate correlations supports the construct validity of the BREAST-Q. It indicates that the BREAST-Q is measuring constructs related to quality of life and patient satisfaction in a manner consistent with other established PROMs. However, the differences in item content and the resulting moderate correlations also highlight the importance of using multiple measures to capture the full breadth of patient experiences. The findings also highlight the importance of using a multi-faceted approach to validation, incorporating various methods.
The presence of ceiling effects in PROMs can significantly impact the instrument’s interpretability and sensitivity. In this study, notable ceiling effects were observed in several domains both preoperatively and 1 year postoperatively. Preoperatively, a ceiling effect of 40% was identified in the domain of “Physical well-being abdomen.” This may reflect a generally healthy baseline population or limitations in the scale’s ability to differentiate among high-functioning individuals caused by a lack of granularity in the response options. Postoperatively, ceiling effects exceeded the commonly accepted threshold of 15% in four domains (Table 5), which raises concerns about the BREAST-Q’s ability to detect subtle improvements or differences among patients who already report high scores. Notably, the postoperative patient sample has a high percentage of autologous reconstruction (58%), as autologous reconstruction is consistently associated with higher satisfaction and better physical and psychosocial outcomes compared to implant-based reconstruction.26,27 This further highlights that the instrument is not sensitive enough to differentiate between satisfied and very satisfied patients. The findings may not be fully generalizable to the broader population of breast reconstruction patients, particularly those undergoing implant-based procedures, and highlight the need for stratified analysis by reconstruction type.
Conclusion
The study demonstrates good content validity and internal consistency, as well as convergent validity, for the Swedish BREAST-Q reconstruction module. Reproducibility testing indicated good overall agreement between test and retest scores at the group level, and testing for responsiveness supported the questionnaires’ ability to detect changes over time. Ceiling effects were reached for several domains, indicating limitations in the questionnaire’s ability to differentiate among high-functioning individuals.
Supplemental Material
sj-docx-1-sjs-10.1177_14574969251387498 – Supplemental material for Validation and reliability testing of the Swedish version of the BREAST-Q reconstruction
Supplemental material, sj-docx-1-sjs-10.1177_14574969251387498 for Validation and reliability testing of the Swedish version of the BREAST-Q reconstruction by Christian Jepsen, Anna Paganini and Emma Hansson in Scandinavian Journal of Surgery
Footnotes
Acknowledgements
We thank all the patients who have answered the questionnaires, RN Susanne Meyer for skillful administrative assistance with the questionnaires and RN Ann Chatrine Edvinsson (posthumous) for conducting the cognitive interviews.
Author contributions
C.J.: Methodology, Project administration, Data curation, Formal analysis, and Writing—original draft preparation, Visualization.
A.P.: Data curation, Project administration, Methodology, and Writing—review and editing.
E.H.: Conceptualization, Funding acquisition, Methodology, Project administration, and Writing—review and editing.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was funded by grants from the Swedish Cancer Society (21 0279 SCIA) and the federal government under the ALF agreement (ALFGBG-1005048).
Ethics approval and consent to participate
The study was vetted and approved by the Swedish Ethical Review Authority (2020-04729).
Consent for participation.
All participants provided written informed consent for their participation.
Consent for publication
All participants gave their written informed consent to publication.
Availability of data and material
Supporting data are not available due to their sensitive nature.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
