Abstract
Purpose:
This study aimed to develop and establish the content validity of a patient-reported outcome measure for youth receiving gender-affirming care (GENDER-Q Youth).
Methods:
This mixed-methods study involved concept elicitation interviews with youth who were seeking/receiving gender-affirming care (February 2019–October 2023). Data were used to develop a conceptual framework and set of independent functioning scales. Scales were refined through clinical and research expert input and cognitive debriefing interviews with youth (December 2023–April 2024). A pilot test was conducted to examine scale psychometric performance, overall content validity, and acceptability (July 2024).
Results:
The concept elicitation interview sample included 47 youth aged 12–19 years. A conceptual framework with four main domains was created and included: health-related quality of life, gender practices, voice, and experience of care. To measure aspects of the framework, 17 scales (292 items) were developed and refined with input from 33 experts and 17 youth. The pilot test sample included 406 youth aged 18–25 years. Most respondents agreed that GENDER-Q Youth was easy to understand, thorough, asked important questions in a respectful way, felt safe to complete, and made them feel that their voice would be heard. The field test version of GENDER-Q Youth includes 16 scales (248 items).
Conclusion:
Evidence of content validity of GENDER-Q Youth was established based on extensive input from experts and youth.
Introduction
A key component that should be included in research on transgender and gender-diverse (TGD) youth is the perspective of youth who obtain gender-affirming care (e.g., hormonal care, mental health care, voice-related care, and other forms of social support). Youths’ perspectives are typically assessed using patient-reported outcome measures (PROMs). PROMs are questionnaires that measure how patients feel and function by asking them directly.1–3 PROMs play a valuable role in patient-centered health care by ensuring the patient’s voice is incorporated into treatment-related decisions.4,5 For PROMs to fulfill this role effectively, they must have high content validity 5 ; which means that the content is relevant, comprehensive, and comprehensible from the perspective of patients.1–3,6 Unfortunately, currently available PROMs are limited in scope or were not developed with or for youth who identify as TGD.
Two systematic reviews describe PROMs used to evaluate gender-affirming care outcomes in adolescent populations.7,8 Bowman et al. reviewed PROMs used to measure gender dysphoria in adolescents and adults and reported poor content validity across all identified instruments. 7 Jackman et al. identified 38 PROMs used to measure outcomes of gender-affirming care for youth. 8 The majority of PROMs identified measured psychological function and none included younger TGD youth (i.e., under age 17 years) as their target population. Furthermore, eight of the identified PROMs were generic quality-of-life measures. 8 While generic measures can be useful 9 such tools are not developed to measure the concerns of specific patient groups and therefore can lack content validity in specific contexts of use (i.e., TGD seeking gender-affirming care). 5 Three PROMs were found that measure gender-specific concepts, 8 but there is a paucity of evidence about how these PROMs were developed and validated. Therefore, the necessity still exists for a rigorously developed and validated PROM designed specifically for TGD youth seeking gender-affirming care.
To support future research and patient-centered clinical care, a robust and valid PROM for youth receiving gender-affirming care is needed. The present study aimed to develop a PROM called GENDER-Q Youth for TGD youth aged 12–25 years. The specific objectives of this study were threefold: (1) to elicit health-related concepts important to youth receiving gender-affirming care; (2) to use the concepts to develop a PROM; and (3) to pilot test the PROM in an online sample of TGD youth.
Methods
Overview of study approach
This study used a mixed-methods, multistep approach (Fig. 1) and adhered to international guidance for PROM development and validation (e.g., United States Food and Drug Administration, Consensus-based Standards for the Selection of Health Measurement Instruments [COSMIN]).1–3,6 We aimed to create a modular PROM composed of a set of independently functioning scales so that only the scales relevant to a specific research objective or clinical situation need to be administered.

Summary of GENDER-Q Youth study methods. RMT, Rasch measurement theory.
The current article describes the development of GENDER-Q Youth, which involved the following: (1) a qualitative interpretive description study10,11 that aimed to develop a conceptual framework and a modular PROM that consists of a set of independently functioning scales (November 2021–October 2023); (2) scale refinement through input from clinical and research experts and cognitive debriefing interviews with TGD youth (December 2023–April 2024); and (3) a quantitative pilot test to examine psychometric performance, overall content validity, and acceptability of the new PROM (July 2024).
Ethical considerations
The study was coordinated through McMaster University (Canada). Research ethics board (REB) approval was obtained from the Hamilton Integrated Research Ethics Board (#11103) and Children’s Hospital of Eastern Ontario (#22/41X). Approval by the main coordinating site meant that the two United States (US) sites did not require separate local REB approval (i.e., they received “not engaged” status). A safety protocol was developed to promote participant and research team safety during data generation and analysis. 12 Written consent was obtained from participants and their parent/guardian if required by the patient’s institution before data generation; verbal consent was reaffirmed at the start of an interview.
Sample and recruitment
This study builds on the development of GENDER-Q, a PROM for adults seeking gender-affirming care.5,13 Specifically, concept elicitation interviews conducted for the GENDER-Q with Canadian and US youth aged 16–18 years were included in the GENDER-Q Youth sample. In addition, TGD youth aged 12–18 years who were seeking and/or receiving gender-affirming care and were able to communicate in English were recruited from three gender clinics in Canada and two in the United States. Youth were informed of the study by a member of their health care team. Interested youth either consented to be contacted by the research team or contacted the research team directly. Purposeful sampling 6 was used to recruit a diverse sample that varied by age, gender identity, race, and type or stage of gender-affirming care received.
In Canada, youth were also recruited by either a community group leader or by a study participant (i.e., snowballing sampling) who provided information about the study to eligible youth; interested youth were able to contact the research team directly. Last, youth who took part in the GENDER-Q field test study who provided permission to be recontacted for future studies, and who met the GENDER-Q Youth study eligibility criteria were contacted by email and invited to consider participating in the study. Youth were offered monetary compensation ($100 CAD/$100 USD) upon completion of the interview.
Concept elicitation
The youth took part in an initial meeting (via phone or institutionally approved web conferencing platform [Zoom Version 5.8.4]) with a research team member (S.L.K.) to review the study consent forms and learn more about the study. An optional preinterview activity to create a timeline of their gender-affirming care journey was introduced. 14 Instructions and activity supplies were sent to youth who expressed interest.
The concept elicitation interviews were conducted one-on-one by an experienced qualitative interviewer (S.L.K.) and audio-recorded using Zoom. At the start of the interviews, participants were asked demographic and clinical questions. The timeline activity (when completed) was used alongside an interview guide (Supplementary Appendix SA1) to elicit concepts. The interview guide was developed to reflect the domains and major themes/subthemes found to be important to youth who took part in concept elicitation interviews to develop the GENDER-Q for adults.5,13 Interviews were conducted and analyzed concurrently so that GENDER-Q Youth interview data could be used to refine the interview guide for subsequent interviews. 11
Data analysis
Interviews were transcribed verbatim into Microsoft® Word (for Microsoft 365). The Word documents were de-identified, password-protected, and coded using the comments feature. The transcripts were coded by an experienced research team member (S.L.K.) and reviewed by a second team member (S.D.C.). Regular meetings were held to review the analysis and update the interview guide. Codes from each transcript were moved into Microsoft® Excel (for Microsoft 365) using Microsoft DocTools© software. Constant comparison was used to refine codes.
The data were analyzed and used to develop a conceptual framework and item pool. The item pool was created by one researcher with extensive experience writing items and developing pediatric and adult PROMs (A.F.K.). A draft set of independently functioning scales was developed using this item pool to measure key concepts from the conceptual framework. The draft scales were reviewed and revised based on feedback from members of the research team (M.N.K., C.R., S.L.K., S.D.C., S.M., N.J.). A document containing participant quotes to support the drafted scales was created. Supplementary Table S1 shows example quotes supporting items from each GENDER-Q Youth scale.
Scale refinement and content validation
To establish content validity, clinical experts in gender-affirming care and research experts in PROM design were invited to provide feedback on the draft scales. 6 An internationally diverse group of experts, including quality-of-life researchers (identified through a quality-of-life-focused international organization’s child health special interest group), clinicians across gender-affirming specialties (through our research team’s network), and community leaders with lived experience (known to our team and involved in health care and transgender advocacy networks) were invited by email to review GENDER-Q Youth and to provide feedback using the comments or track changes features. Experts were instructed to review as many scales as they were able to review and to provide feedback on the instructions, response options, and items, indicating if anything was missing, difficult to understand, or not relevant. Feedback was used to make changes to the scales (Round 1).
Participants from the concept elicitation interviews were invited by email to take part in a cognitive debriefing interview. An experienced qualitative interviewer conducted the interviews (S.L.K.), which were audio-recorded using Zoom. These interviews used the “think aloud” technique whereby participants were asked to review all components of each scale (i.e., instructions, response options, and items) to ensure that they were easy to understand, comprehensive, and relevant (Supplementary Appendix SA2, Interview Guide).15,16 Interviews were conducted in two rounds (Rounds 2 and 3). After each round, two research team members analyzed the interview data (S.L.K., S.D.C.). The results were reviewed by the research team and used to make changes to the scales. Participants received a gift card as a thank-you upon completion of their interview ($100 CAD/$100 USD).
Pilot field test
A pilot field test was conducted in July 2024 using an online crowdsourcing research platform. 17 The Prolific platform is an online research database for individuals aged 18 years or older. 17 Youth aged 18–25 years who identified as gender diverse and were residents of Australia, Canada, Ireland, New Zealand, the United States, or the United Kingdom were invited to complete a Research Electronic Data Capture (REDCap) survey, which included demographic and clinical questions, GENDER-Q Youth scales, and questions to evaluate the overall content validity (was easy to understand, was thorough, asked important questions) and acceptability (asked questions in a respectful way, felt safe to complete, and made me feel like my voice will be heard) of GENDER-Q Youth. Branching logic was used to ensure participants only received scales relevant to them (e.g., if they did not bind, they were not shown the Binding scale). Participants were compensated at a prorated hourly rate of 12.00 GBP.
Rasch measurement theory (RMT) analysis is a modern psychometric item response theory approach to scale development. In this approach, scales are developed “bottom up” from the qualitative data and then tested to see if data provided by a sample fits the requirements of the Rasch model. When data fit the Rasch model the set of items of a scale should map out the construct along a clinically meaningful continuum providing interval level measurement. 18 RMT analysis was performed using RUMM2030 software with the unrestricted Rasch model for polytomous scales (RUMM version 2030, RUMM Laboratory Pty Ltd., Duncraig, Western Australia, Australia, 1998–2023). RMT overcomes the limitations of other PROM development approaches (e.g., classical test theory) by providing interval level measurement.18–20
In Rasch analysis, items with extreme misfits to the Rasch model were identified and removed (Round 4). The Flesch–Kincaid readability test was calculated for each scale (i.e., instructions, response options, and item set) to ensure that all scale components were at a reading level that could be understood by youth between 12 and 25 years of age.
Results
Concept elicitation
Data from eight youth who took part in the GENDER-Q concept elicitation study for adults between February 2019 and February 2020 were included in the present study. 5 An additional 39 interviews were conducted between November 2021 and October 2023. Participants were from Canada (n = 31, 66.0%) and the United States (n = 16, 34.0%). At the time of the interviews, the youth were aged 12–19 years (mean = 16.5 years; 12–15 years = 8 youth, 16–19 years = 39 youth). Thirty-seven participants were assigned female at birth and 10 participants were assigned male at birth. Participants identified as boy or male (n = 25, 53.2%), girl or female (n = 10, 21.3%), nonbinary (n = 11, 23.4%), or preferred to not answer (n = 1, 2.1%). Most participants in the sample identified as White (n = 38, 80.8%). In terms of treatment, 37 participants (78.7%) reported having had pubertal suppression and/or gender-affirming hormones and 29 participants (61.7%) reported having had mental health care. The 47 interviews lasted an average of 105 min (range = 50–177 min).
Thirty-one participants completed the preinterview timeline activity. Timelines depicting the gender-affirming care journey were created on paper (n = 22), digitally (n = 8), or using a combination of paper/digital mediums (n = 1). Eighteen participants used only text in their timeline activity; others used text and images (n = 12) or only images (n = 1). A portion of a completed timeline activity is provided (Fig. 2). The timeline activities stimulated discussion and elicited important concepts from participants that were used to create the PROM.

Portion of completed timeline activity.
The timeline activities helped participants discuss their gender-affirming care journey and share concepts that may otherwise not have been discussed (“I think [the timeline] was a good way for me to, like, look back at things that I would want to talk about […] Talking about my different experiences with healthcare providers, that might not have been something that I had thought about.”). The timeline activities also helped to better understand how participants experienced important concepts, such as interactions with health care professionals (“It’s scary but if you get the right doctors it can go very well [.] so I was really happy to write that down and just to let people know that, like, it’s not all bad. It can be really good.”) Ultimately, the timeline activities facilitated the generation of rich data during the concept elicitation interviews.
Conceptual framework
Analysis of the qualitative data led to the development of a conceptual framework that included four domains: health-related quality of life (HRQL), gender practices, voice, and experience of care. These domains covered a range of outcomes as well as experiences of care that mattered to TGD youth seeking gender-affirming care. Specifically, the HRQL domain covered psychological function (well-being, distress), body image concerns, and social function at school, outside of school, and at home. The gender practices domain covered positive and negative aspects associated with binding and tucking. The voice domain covered how the voice sounds (e.g., too high or too low) and voice-related distress (e.g., feeling self-conscious). The experience of care domain covered the health care team (e.g., explained treatment options), information about gender-affirming hormones (e.g., impact on voice), and an evaluation of gender-affirming treatment (e.g., made life better).
To measure aspects of the framework, 17 independently functioning scales were developed and assigned instructions, a recall period, and a set of response options. Supplementary Table S1 shows the top key concepts with illustrative quotes that were elicited for each aspect of the conceptual framework. These concepts were used to form GENDER-Q Youth scales. Items for the scales incorporated participants’ words as much as possible so that they would resonate with youth and were easy to understand. Of the 17 scales, 12 scales were designed to measure concepts using positive or neutral wording.
Scale refinement and content validation
Scale refinement and content validation were based on one round of expert input and two rounds of cognitive debriefing interviews with TGD youth. Table 1 shows changes made after each round to refine the PROM. Version 1 of GENDER-Q Youth included 17 scales and 292 items. Thirty-three of 41 invited experts provided feedback (response rate = 80.5%) between December 2023 and January 2024. Experts were from Canada (n = 15, 45.4%), the United States (n = 12, 36.3%), Australia (n = 2, 6.1%), Denmark (n = 2, 6.1%), Germany (n = 1, 3.0%), and the United Kingdom (n = 1, 3.0%). Twenty-one experts (63.6%) were health care providers (e.g., adolescent medicine physicians, endocrinologists, mental health care providers, social workers, speech-language pathologists, surgeons), 9 experts (27.3%) were researchers with expertise in PROM development and/or gender-affirming care, and 3 experts (9.1%) were patient partners. Four experts (12.1%) self-identified as TGD.
Changes Made to GENDER-Q Youth Scales in Each Round of Scale Refinement
Version 5 is the field test version of the GENDER-Q Youth Module.
EOC, experience of care; GENDER-Q, gender-affirming care; HRQL, health-related quality of life; V, version.
At the end of the expert input round (Round 1 revisions), 181 items were retained, 55 items were revised, 56 items were dropped, and 23 items were added. One scale (Tucking: Well-Being—10 items) was dropped as it was deemed to not be highly relevant from the perspective of experts and was the scale with the least supportive qualitative data.
Seventeen cognitive debriefing interviews were conducted between January 2024 and April 2024 with youth who had participated in the concept elicitation interviews. At the time of the interviews, the youth were aged 14–19 years (mean = 16.5 years; 14–15 years = 4 youth [23.5%], 16–19 years = 13 youth [76.5%]). Participants identified as boy or male (n = 9, 52.9%), girl or female (n = 5, 29.4%), nonbinary (n = 2, 11.7%), or preferred to not answer (n = 1, 5.9%). Most participants identified as White (n = 13, 76.5%). In terms of treatments, almost all participants reported having had pubertal suppression and/or gender-affirming hormones (n = 16, 94.1%) and mental health care (n = 14, 82.3%). These interviews lasted an average of 93 min (range = 43–154 min).
In Round 2, Version 2 (259 items from 16 scales) was examined by seven youth. At the end of this round, 256 items were retained, 3 revised, 0 dropped, and 5 added. In Round 3, version 3 (264 items from 16 scales) was examined by 10 youth. At the end of this round, 248 items were retained, 10 revised, 6 dropped, and 1 added. Across the two rounds of interviews, each youth reviewed between 2 and 13 scales (mean 6.8, standard deviation 3.5), and each scale was reviewed by 7–11 youth (i.e., 8 scales were reviewed by 7 youth; 6 scales were reviewed by 8 youth and 1 scale was reviewed by 11 youth), except for the Tucking: Adverse Effects scale which was only reviewed by three youth.
Over the two rounds of cognitive debriefing interviews (Rounds 2 and 3), most participants reported that the instructions and response options for the set of scales were comprehensible and appropriate. In addition, most participants commented that the content of the PROM overall resonated with their experience and that the scales covered a comprehensive range of important constructs (“I was looking at the questions and I couldn’t really think of anything to add”), and that the scales’ content was appropriate (“I liked every single one. There wasn’t one [scale] that, like, I wouldn’t find value in talking to my doctor about”).
Pilot field test
Version 4 of GENDER-Q Youth included 259 items. On the date of the pilot test, 1763 potential participants were invited to take part in the survey and 452 (25.6%) did so. We excluded duplicates (n = 3, 0.7%) and participants who failed to complete at least one scale (n = 18, 4.0%), were the wrong age (n = 1, 0.2%), and were cisgender (n = 24, 5.3%). Table 2 shows the sample characteristics for the 406 participants. It took participants approximately 20 min to complete the survey (range: 7–54 min). In terms of overall content validity and acceptability, most respondents mostly or strongly agreed that GENDER-Q Youth was easy to understand, thorough, asked important questions in a respectful way, felt safe to complete and made them feel that their voice would be heard (see Table 3; 402 of 406 [99.0%]) participants completed the evaluation questions).
Characteristics of Pilot Test Participants
Data were self-reported by participants and categorized by the research team for reporting.
Responses to the GENDER-Q Youth Evaluation Questions (n = 402)
The RMT analysis identified 11 items from 7 scales with extreme misfits to the Rasch model, which were dropped (Round 4 revisions). The final field test version of GENDER-Q Youth contains 16 scales with 248 items. Scales have between 10 and 26 items. The overall Flesch–Kincaid Grade Level for each scale (i.e., instructions, response options, and item set) ranged from 0.5 (e.g., psychological distress) to 5.0 (e.g., family) (see Table 4).
GENDER-Q Youth Field Test Version
aFlesch–Kincaid Grade Level for each scale, including instructions, response options, and item set.
Discussion
This study describes the development of GENDER-Q Youth, a modular PROM designed to measure a comprehensive range of outcomes and experiences of care for TGD youth aged 12–25 years. The field test version of GENDER-Q Youth includes 16 independently functioning scales that were developed from detailed concept elicitation interviews with TGD youth receiving gender-affirming care. Input from experts and youth was used to refine the instrument and provided evidence to demonstrate the content validity of this new PROM. Each scale contains a comprehensive set of relevant items that are easy to understand by the target population. The pilot field test shortened the PROM and provided evidence of overall content validity and acceptability in a sample of TGD youth aged 18–25 years.
The GENDER-Q Youth development focused on obtaining high-quality concept elicitation interview data. The involvement of TGD youth who were seeking or receiving gender-affirming care in the PROM development process was essential to ensuring their unique voices informed the development 21 and content included in the outcome and experience of care scales.2,3,6 Including a heterogeneous sample of youth from two countries with two different health care systems and political environments ensured that a diverse range of gender-affirming care experiences were captured and used to form the scales.2,6
Most of the interview data were generated using the timeline activity, which proved to be an effective means of helping youth remember and discuss important aspects of their gender-affirming care journey that they might otherwise have overlooked. Using a combination of timeline activity and semi-structured interview guide enabled the collection of detailed and wide-ranging information about participants’ experiences seeking and receiving gender-affirming care and about outcomes that mattered most to them. This study thus supports the use of novel techniques to enhance concept elicitation interviews in research with youth adding to the sparse literature on this topic in relation to PROM development. 21
Limitations
A limitation of this study pertains to diversity within the qualitative study sample; most participants were White, and fewer participants were female-identifying. However, this study is in line with other research showing a smaller sample of female-identifying youth in clinic-based recruitment. 22 Second, one scale (Tucking: Adverse Effects) was only reviewed by three participants during the cognitive interviews, which is fewer than the number recommended by COSMIN. 6 A third limitation was that the use of an online platform for the pilot field test did not allow us to verify the demographic or clinical information provided by participants. However, Prolific has been found to be of higher quality compared to other crowdsourcing platforms.23,24
Last, youth aged 12–15 years were underrepresented, and youth aged 20–25 years were not included in the concept elicitation and cognitive debriefing interviews, potentially impacting the relevance and applicability of the GENDER-Q Youth across the full spectrum of adolescent and young adult experiences. Furthermore, the pilot field test included older youth aged 18–25 years and not the full age range for whom GENDER-Q Youth is designed (i.e., 12–25 years). The forthcoming international field test study will address these limitations by recruiting participants across the entire age spectrum. The differential item functioning of the scales by age group will be assessed to evaluate whether the scales perform consistently across developmental stages and to ensure the content remains valid and meaningful for all age groups.
Conclusion
GENDER-Q Youth fills a gap in the literature by providing the first rigorously designed comprehensive PROM that measures outcomes and experiences of gender-affirming care from the youths’ perspective. The GENDER-Q Youth international field test study is now underway. Data will be used to shorten scales and examine their psychometric performance. Once development is finished, this new PROM could be used before, during, and after treatment to ensure that the perspectives of TGD youth inform their gender-affirming care (e.g., shared decision-making). GENDER-Q Youth could also be used in clinical trials of treatments and in other studies designed to examine factors (e.g., sociocultural and political) that influence outcomes important to youth.
Footnotes
Authors’ Contributions
S.L.K.: Methodology, formal analysis, investigation, writing—original draft, writing—review and editing, and project administration. S.D.C.: Formal analysis and writing—review and editing. M.N.K.: Methodology, formal analysis, investigation, writing—review and editing, project administration, and funding acquisition. S.M.: Resources, writing—review and editing, and supervision. C.R.: Formal analysis, investigation, writing—original draft, writing—review and editing, and project administration. N.J.: Resources, writing—review and editing, and supervision. K.K.: Resources, writing—review and editing, and supervision. K.A.: Resources and writing—review and editing. M.M.: Resources, writing—review and editing, and supervision. G.S.: Resources, writing—review and editing, and supervision. B.B.: Resources and writing—review and editing. A.F.K.: Conceptualization, methodology, formal analysis, investigation, writing—original draft, writing—review and editing, supervision, project administration, and funding acquisition.
Author Disclosure Statement
G.S. has received funding from Pivotal Ventures for participation in an advisory board. The GENDER-Q Youth is owned by McMaster University and Mass General Brigham. A.F.K. and M.N.K. are codevelopers and will receive a share of license revenues as royalties for its use in for-profit research based on their institution’s inventor-sharing policy. A.F.K. provides research consulting services to the pharmaceutical industry through EVENTUM Research (Hamilton, ON, Canada).
Funding Information
This study received funding from the Canadian Institutes for Health Research (CIHR)—funding reference numbers GSL-171376 and PEG-157062 and the Plastic Surgery Foundation—Award #568801. The study sponsors had no involvement in any part of this study.
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
