Abstract
Narrative story stem techniques (NSSTs) offer insight into attachment and other representational aspects of preschool to young school aged children’s inner lives. While the method moved into the academic and clinical mainstream some 35 years ago, their applicability to “non-Western” contexts remains little understood. This synthesis comprises 31 NSST studies of samples from parts of Africa, East Asia, Latin America, and the Middle East, and from US and UK ethnocultural minoritized backgrounds. In the reviewed studies, three specific NSSTs dominated, story stems were used most to evaluate attachment, and some were clinically focused. However, there was also a strong cultural focus and over half of samples were socioeconomically disadvantaged. Studies revealed both universal and culturally specific features of NSSTs. Attachment distributions were as expected, given the high clinical risk in pooled samples (49% secure, 19% avoidant, 12% ambivalent, 20% disorganized), including by clinical and socioeconomic risk status. Gender differences were similar to “Western” findings. However, the growing evidence for convergent validity across cultural groups is tempered by low reporting of psychometrics. Narratives may sometimes reflect children’s unintended interpretations of the task and therefore not activate internal representations, or may reflect reality but lack equivalent meaning in coding schemes. We discuss how researchers and clinicians can enhance the validity of NSSTs by considering the role of culture in the sense-making process. Pending further validation work, NSSTs have the added potential to give a voice to young children from underrepresented backgrounds.
Over the last 35 years, narrative story stem techniques (NSSTs) evolved from therapeutic doll play into semi-structured assessments to evaluate and better understand the “lived reality” of young children. Story stems are commonly employed to evaluate attachment “internal working models” in the preschool and early school years (Bosman & Kerns, 2015) when storytelling and play are “natural” forms of expression, bridging the gap between the direct behavioral measures of infancy and the “pure” interview-based measures of adulthood. They can also be used to understand a range of other representations and experience, particularly in the social-emotional realm (e.g., Bretherton & Oppenheim, 2003; Page, 2001). NSSTs offer researchers and clinicians insight into the child’s world without needing direct observation or direct questioning, which may otherwise be intrusive to families or ask too much of the child with respect to articulation or disclosure. Research demonstrates that NSST narratives reflect components of children’s real family interactions (e.g., George & Solomon, 2016; Yuval-Adler & Oppenheim, 2023), are associated with child functioning and family factors (e.g., Hennefield et al., 2022; Martoccio et al., 2016; Poehlmann, 2005; Wan & Green, 2010), and statistically predict their later social-emotional outcomes (e.g., Berzenski & Yates, 2017; Davies et al., 2008; Müller et al., 2014; Pass et al., 2012).
However, the question of whether NSSTs are methodologically equivalent when applied to “non-Western” populations and diverse groups in the “West” has received sparse attention. Standardization was pivotal to the success of NSSTs entering the academic and clinical mainstream in the 2000s, and the method is sufficiently established to have warranted several reviews in the last decade (Allen et al., 2018; Kelly & Bailey, 2021; Tang et al., 2018; Yuval-Adler & Oppenheim, 2015). However, its empirical base is still informed primarily by White, “Western” samples (Bettmann & Lundahl, 2007; Jewell et al., 2019; Kelly & Bailey, 2021). This is neither globally representative nor does it reflect the diversity found in preschools, classrooms, and clinics in the urban “West” today. While the cultural universality of attachment has been subject to much scholarly interest using the “gold standard” measure of infant attachment (e.g., Keller, 2018; Van Ijzendoorn, 1990), NSST-derived attachment representations have escaped similar scrutiny. Redressing this cultural bias, which is commonplace is developmental science (Draper et al., 2022), is long overdue.
NSSTs share a common structure in which the child is presented with a series of story beginnings (“stems”), based on familiar, often family-based, scenarios, usually involving conflict, distress, or some dilemma, which they are then instructed to complete (Emde, 2003; Yuval-Adler & Oppenheim, 2015). These completions (often videotaped) are then evaluated, most commonly using a coding scheme, on the premise that the narratives (and often how the child completes them) reflect some aspect of the child’s internal states, perspectives, or behavior (Kelly & Bailey, 2021). However, based on the lack of research addressing their cultural validity, NSSTs seem to carry at least three cultural assumptions: that narrative construction, and its development, are not subject to systematic cultural influence in ways that impact coding (i.e., the narrative features are universal); that story stems activate representations with similar salience across cultures (i.e., stems share universal meaning); and that social-emotional experience is organized and accessed in the same way (i.e., the target constructs are elicited through the same stems universally). As social experience is culturally mediated, how children engage with story stems is likely to vary across cultural settings, but whether these are in ways that make a difference to NSST evaluation is unclear.
This review synthesizes the “non-Western” NSST studies published to date to examine the cultural applicability of NSSTs in “non-Western” countries and samples. While the dichotomizing of a multidimensional construct like culture is recognized as highly simplistic, our motivation was to focus on underrepresented perspectives (Kahalon et al., 2022) in a manageable way. Our review centers around three questions: (1) How have NSSTs been utilized in “non-Western” samples? (2) What themes can be identified across studies regarding the narrative content produced and authors’ interpretations of the findings?, and, (3) are NSSTs feasible and valid in “non-Western” populations?
Method
Database Search
Following PROSPERO registration (ref: CRD42023383568), four databases (Figure 1) were searched on 4 January 2023 (updated 26 June 2023) using the following search terms: [“story stem*” OR “story completion” OR “doll play narrative*” OR “narrative representation*” OR “attachment narrative*” OR “attachment representation*” OR “attachment script*”] (all fields in Ovid, title/abstract/keyword in Scopus) AND [cultur* OR cross-cultur* OR non-Western OR indigenous OR immigra* OR migra* OR Eastern OR Africa* OR Asia* OR “Latin America*” OR Hispanic OR “Middle East*”] (all fields in Ovid, title/abstract/keyword in Scopus) AND [child* OR preschool* OR pre-school* OR kindergart*] (abstract in Ovid, title/abstract in Scopus). Publications were unrestricted by date, but were restricted to English language journals (also in Ovid: empirical human populations within “childhood: birth to 12 years”). Note that as this is a systematic review, institutional ethical approval was not required.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) Flow Diagram of the Systematic Search, Screening and Assessment Process.
Eligibility criteria for inclusion into the review are outlined below, using the SPIDER criteria.
Sample
Children aged 3–12 years from a “non-Western” cultural setting or family heritage were included (see next section for how “non-Western” was defined). As relatively few studies were expected to fit this criterion, studies from a “Western” setting in which over half of the sample has a minoritized ethnic(-cultural) background were also included if results included subgroup analyses or analyzed ethnocultural groups as an independent variable. Studies that reported analyses by both ethnoracial and racial categories were eligible, but those that categorized on racial (skin color) or minoritized ethnic status overall were ineligible. Studies that did not report the sample’s country, ethnocultural, or heritage characteristics were excluded.
Phenomenon of Interest
An NSST was employed, comprising verbal story beginnings, prompting the child to generate a story narrative (may include doll play) that is understood to symbolically represent the child’s internal schemas or emotions. Other narrative (not story stem) and representational (non-narrative) techniques were excluded.
Design
Empirical studies utilizing a narrative story stem method with a typical structure (i.e., those without a standard administrator–delivered story beginning, which did not involve real past event recall, and which included other participants in the story-generation process) were excluded.
Evaluation
Story stem narrative completions from which internalized representations and qualities are inferred. Nonrepresentational features (e.g., narrative skill) were excluded.
Research Type
Quantitative, qualitative, and mixed. Case studies with at least
Of the 1,167 unique abstracts identified and screened, 51 full articles were assessed, of which 26 met full inclusion criteria (Figure 1).
Defining “Non-Western” for the Current Review
The dichotomizing of cultural groupings is inherently problematic. While various terms are used to describe populations that are not “Western” (e.g., “WEIRD” [Western, Educated, Industrialized, Rich and Democratic; Henrich et al., 2010], global majority, global south), the term “non-Western” was chosen for this review as a convenient label that is widely understood and that serves as a juxtaposition to samples typically represented in psychological and developmental science (Draper et al., 2022; Nielsen et al., 2017).
As no consensus exists on what constitutes “non-Western,” our starting point was to exclude countries defined as “Western” based on the World Population Review (2024): countries with cultures that are strongly influenced by European values or whose populations include a large percentage descended from European colonists. Furthermore, our definition of “non-Western” is led by a cultural rather than a geographic, economic, or sociopolitical definition (though we recognize the sometimes considerable overlap), due to a proposed mediating effect of culture on story stem completions. A trial run of our full search was conducted to more closely examine the sample’s country location or cultural context of each study to assist in arriving at the parameters of what constitutes “non-Western.” Those studies based in Europe, North America, and Oceania were removed, except for studies of groups residing in these countries of “non-Western” family cultural heritage. Thus, studies of minoritized ethnoracial groups were included, as their values may vary from the dominant perspective, as aligned with a cultural-ecological perspective (Ogbu, 1981; Smith et al., 2019).
Our informal review of countries and contexts in which relevant studies were located identified studies from countries that are sometimes discussed as “non-Western” or “minoritised” culturally and/or sociopolitically (including differing from mainstream “Western” family orientation in the academic literature) yet also sometimes considered “Western” based on geography and other indicators (e.g., religion, colonial influence), and vice versa. Here, inclusion was based on a continuing influence of indigenous and other non-European cultural values that is
Abstract Screening Considerations
Screening was a complex process as most abstracts did not refer to the sample’s cultural background. Several authors have established that White “Western” samples are treated as the “default” in research (e.g., Cheon et al., 2020; Roberts & Mortenson, 2023) and therefore such characteristics do not require mention. If sample descriptors were used in the abstract that might hint at potentially fitting the “non-Western” criterion, such as ethnically/ethnoracially diverse, refugees, internationally adopted, or being from a broad geographic region, or the paper was published in an English language journal that has a non-Western country in its title (e.g., Korean), the paper was screened positive. Abstracts that gave no hint of “cultural” information were screened out. As a check, the first author retrospectively fully assessed >100 articles previously screened out due to a lack of cultural background information in the abstract. No “non-Western” articles had been omitted in error. In addition, three other searches were conducted in case any eligible papers were missed in the database search process (see below).
Database Search Results
Of the 1,167 unique abstracts identified and screened, 51 full articles were assessed, of which 26 met full inclusion criteria (Figure 1). A second reviewer independently assessed 30 randomly selected articles from those that screened positive. Three disagreements were easily resolved; the second reviewer had excluded studies that were included by consensus agreement: a Puerto Rican study (included due to “non-Western” cultural influence) and two studies of mixed samples (eligible as they included separate analyses by ethnocultural group). Of the 25 excluded studies, most did not use NSSTs and a few did not fit sample requirements.
Manual Search Strategy and Results
Three further searches were completed: (1) reference lists of eligible articles were hand searched and yielded no additional studies; (2) a forward citation search involved screening abstracts of papers that had cited an eligible article, yielding 686 screened abstracts and three eligible papers after assessment; and (3) a forward citation search was completed of story stem methodology protocols published since 2010, as outlined in Kelly and Bailey’s (2021) review, yielding no further studies. A further eligible paper (Katsurada et al., 2017) was sourced from communicating with the author (E Katsurada). Finally, Zahn-Waxler (1996) was located during a general literature search. We checked that this had not appeared in the main search; no hint was made about the measure of attachment utilized in the abstract, title, or keywords. Overall, 31 eligible papers were found.
Data Extraction
The following were extracted from eligible papers: (1) authors, year of publication, countries of authors’ affiliation; (2) sample characteristics (
To present key methodology and study quality information, the following was extracted from papers: (1) local involvement in study design, including in the authorship; (2) any study focus on cultural understanding or comparison; (3) whether the index sample is described as clinical (or at risk), socioeconomically disadvantaged, or migrants; (4) whether the NSST evaluated attachment or otherwise, any reported cultural modifications or language of translation, and reported interrater reliability; and (5) whether findings supported NSST applicability of the cultural group studied and/or any formal comparison with a “Western” sample (e.g., internal properties, cross-cultural comparisons, comparisons with previous published cohorts, qualitative observations, and convergent validity with independent measures). All tables and findings were independently double checked at a minimum.
For the narrative synthesis, (1) study characteristics were compiled and summarized with respect to the groups studied, the research focus, the psychological concepts of interest derived from the story stems, and the specific story stem measures employed and any cultural adaptations; (2) common patterns and themes were drawn from study findings regarding qualities of the story stem narratives produced, ensuring representation from all cultural groups and with more focus given to higher quality studies; and (3) data on the psychometrics and validity of story stem measures were summarized to investigate their utility and applicability in “non-Western” populations.
Results
Study Characteristics
Thirty-one studies were included comprising 2612 children (excluding comparison groups from “Western” or White majority backgrounds). However, a few studies involved or likely involved the same or a substantially overlapping sample (Table 1). A range of cultures and heritages were represented. These cover four regions: five African (two Ghana, two South Africa, one Uganda), nine East Asian (three Chinese US, four Japan, three Korea), six Latin American (four Mexican US, one Puerto Rico, one Mexico and Peru combined), and five Middle Eastern (four Türkiye; one Iraq, Iran, Lebanon, Palestine, Syria and Afghanistan combined). A fifth group of seven studies involved children from minoritized ethnocultural backgrounds (four African American, two diverse US, one diverse UK). Two studies included two “non-Western” groups.
Summary of NSST Study Methodology and Findings by Broad Geographical Region.
Around half of the studies have been published since 2017. All were quantitative, except for one qualitative thematic analysis and one mixed methods. Sample sizes ranged widely:
How NSSTs Have Been Utilized in “Non-Western” Samples
What Internal Representations Were NSSTs Employed to Measure?
Twenty-two (71%) studies adopted NSSTs to evaluate attachment outcomes (Table 1), half reporting classifications and the other half, rating scales. While there is considerable variation within how attachment was evaluated, these studies offer the highest comparability. Other studies utilized NSSTs to evaluate (1) relationships—positive (empathic relations, peer caring experiences, moral themes, reparation in conflict situations) and negative (interpersonal conflict, aggression, avoidance strategies); (2) the child’s perception of the parents (positive and negative parenting representations, mother–child relationship expectations, narrative coherence); (3) emotionality (dysregulated aggression, dissociation, positive and negative narrative emotions); and (4) sense of self (self-worth). Some studies evaluated elements spanning multiple domains. While “dealing with relationships” tended to involve child–parent relationships (but was not described as attachment), some focused on relationships with other family members (or between other family members, including siblings), peers, and/or teachers. Some constructs were not necessarily about close relationships (e.g., moral themes), but were applied to relationships in story stem scenarios. One study utilized a data-driven approach and identified content themes from MCAST stems, including household chores completed by children and child discipline, obedience, and compliance (Hosny et al., 2020).
Several researchers developed novel aspects to their studies. A US study of children of China-born parents and European US children devised their own story stems of peer scenarios (Song et al., 2018), and a study of Middle Eastern refugees devised trauma-related stories based around fleeing war-torn areas and separation from family (Dalgaard et al., 2016). In terms of analysis, a Ghanaian study involved qualitatively thematically analyzing narratives rather than implementing a coding scheme to understand primary school children’s daily lives (Hosny et al., 2020), and a Ugandan study interpreted narratives both in a conventional way and according to gender stereotyping (Dent & Goodman, 2021). Three studies tested the idea that the narrative themes of children of migrants (or migrant children) reflect their heritage cultural values (Chinese or Mexican), which may change over time as children assimilated into the United States (Howes et al., 2011; Petrowski et al., 2009, 2014).
Preferred NSST Procedures
Around three quarters of studies adopted one (or more) of three story stem sets (including adaptations): the
The studies we reviewed varied considerably in how NSSTs were implemented, reflecting the variability in NSST methodology in the studies typically published (Yuval-Adler & Oppenheim, 2015). NSSTs varied in the number of stems (range = 2–13 stories), the characters, the types of figure and props, requirements of the doll representing themselves (e.g., matching for child gender), the scenarios and scripting, the level of emotiveness or arousal induction in stem delivery, and the extent of modifications to suit the culture (e.g., scripting, props) and for other reasons (e.g., to suit the age range, to have a trauma focus). Coding also varied, and for some studies focused on content themes, while others placed emphasis on the qualities of completion or process—even when studying seemingly the same construct. Opted for by nine studies, the ASCT was subject to the most variations in the stems, procedures, and coding system. Just one reviewed study used the full standard ASCT coding system. However, all seven MCAST studies adhered to the standard format; any modifications were to fit the local context. The MSSB (seven studies) has been used mostly in its standard form (with selectivity in number of stems) to evaluate the widest range of social-emotional outcomes, as it was designed for. However, all MSSB studies involved children living in the United States as “minoritized” groups, except for a Japanese study.
A Cultural or Clinical Focus, But Not Both
Studies varied in the degree to which culture was an explicit consideration, but none utilized a cross-cultural design to compare “Western” and strictly “non-Western” samples. Some studies of children of non-Western heritage (i.e., “migrant” samples) employed a comparison group design, although in these cases, all children will have had some degree of exposure to US culture. Seven “migrant” samples and two ethnoculturally diverse US samples were statistically compared with a “Western” or dominant cultural grouping (Table S1; see supplementary materials). Despite the lack of true cross-cultural studies, 19 (61%) studies incorporated some cultural focus in their research questions (Supplementary Table S1). A further five (16%) studies—mostly of community samples—made
Of the 11 studies that reported modifying the NSST for better cultural fit, six detailed changes beyond translation. Changes were made to the scenarios and scripts, choice of family characters (e.g., involving extended family), offering racially matched dolls, using bilingual administrators for language switching, and incorporating more culturally appropriate props. Following the use of slightly adapted versions, a few authors reflected that there may have been issues around children relating to the stem scenarios (Dalgaard et al., 2016; Hosny et al., 2020; Wan et al., 2017) and coding within an existing scheme (Wan et al., 2017). Goodman and Aber (2010) briefly reflected that culturally sensitive researchers can collect ecologically valid videotaped NSST data in the homes of African American families. No East Asian studies and only one Latin American study reported cultural adjustments. Three Turkish studies (Ildız et al., 2022; Sahin-Bayraktar & Seven, 2022; Seven & Ogelman, 2012) utilized the IDFSS, a measure developed by Cassidy (1988) translated into Turkish after which an “expert panel” checked for cultural appropriateness resulting in no specific changes (Seven & Aytar, 2010). Sixteen studies were considered to have had local cultural input, based on author affiliation or the method description.
All eight clinically focused studies involved assessing attachment, mainly in Asian (4/9) and African (2/5) samples, rarely alluding to cultural factors. Their interest was mainly in specific adversities; that is, institutionalized foster care, child maltreatment, refugee trauma, reactive attachment symptoms in a “slum,” and behavioral risk in a diverse low socioeconomic background UK sample. Few studies were focused on strictly clinical samples (i.e., outpatients in psychiatric and other mental health services). In terms of attachment coding, the six studies in “non-Western” settings adopted the four-way classification system (as the measures were designed for) while the two European-based studies (UK, Denmark) reported aggregated or specific rating scale outcomes.
Intersectionality With Socioeconomic Disadvantage and Poverty
Seventeen studies involved children from disadvantaged backgrounds, including 14 (55%) studies of samples from (predominantly) low-income or economically disadvantaged backgrounds, as described or strongly suggested (e.g., based on living conditions of poverty) anywhere in the paper, and three studies of institutionalized children. This may reflect the fact that a large proportion (16 studies) were set in, or involved samples whose parents were from, low- and middle-income countries (LMICs), and that there was an interest in studying disadvantaged groups in studies set in, or involving samples from, high-income countries (HICs; Organisation for Economic Cooperation and Development, 2023). However, among studies in a HIC, only four involved fairly representative community samples (two Korean, one Japanese, one Puerto Rican), as the others involved minoritized, foster care home, and clinical samples (where no information was provided on socioeconomic background). A research focus on clinical questions in specific populations also necessitated either studying “at risk” or clinical samples.
Key Themes in the Story Stem Narrative Content
The Distributions of Attachment Patterns Across Cultures
Around half of studies investigated the attachment security depicted in children’s family-based narratives, eliciting a range of attachment strategies in all regions. Secure attachment rates varied from 0% in a small Japanese care institution sample to 78% in a large rural African American sample (Brown et al., 2017). Pooling across six clinical or clinical risk and four community samples (

NSST Attachment Classification by Clinical Risk Status.
Unfortunately, high variation in sample characteristics reduced our ability to examine attachment distributions by pooled cultural samples. Descriptively, within the East Asian (Korean and Japanese) studies, disorganized attachment rates were highest in the foster care home samples (31% and 21%; Katsurada, 2007; Katsurada et al., 2017), followed by the maltreatment (23.9%; Han, 2020), clinical middle class (11.8%; Jin et al., 2018), and nonclinical samples (4.4%; Jin et al., 2018). In the African studies, secure attachment rates were 73% in a community sample (Wan et al., 2017), 55% in a neighborhood risk sample (Pritchett et al., 2020), and 3% in a clinical sample (Gericke & Bain, 2020), while the reverse was found for disorganized attachment (10%, 23%, and 43% respectively). On the other hand, the chances of a secure classification were relatively low in the Middle Eastern studies (36.9%–48.1%; one refugee and two community samples), and high among African Americans (77.7% and 75.5%, though Gustafsson et al., 2017, was likely a subsample of Brown et al., 2017).
Gender Differences
Of the attachment-focused studies that provided gender-based analysis, most found no significant differences in NSST outcomes (e.g., Brown et al., 2017; Jin et al., 2018; Nóblega et al., 2019; Suwa et al., 2012; Vu & Howes, 2012). When reported, however, there were distinct thematic differences by gender with remarkable consistency across cultures. Relative to boys, girls depicted more positive mothers or relationships (Grey & Yates, 2014; Sher-Censor et al., 2013), more coherent narratives (Grey & Yates, 2014, but not in Sher-Censor et al., 2013), more reparative and affiliative behaviors (Zahn-Wexler et al., 1996), higher self-worth (Gullón-Rivera, 2013), more moral themes (Petrowski et al., 2009), fewer interpersonal conflicts with age (Petrowski et al., 2009), and more secure attachment (Ahmetoglu et al., 2018)—as relating to mixed US ethnocultural, Puerto Rican, Japanese, Latin US and Turkish samples, respectively. Ugandan girls who included more empathic relations also gave more gender flexible narratives, unlike boys (Dent & Goodman, 2021). However, the gender difference in prosocial themes found in US children did not apply to a small Japanese sample living in the United States for a year (Zahn-Waxler et al., 1996). Environmental stability may matter; girls of African American mothers who were more supported depicted less aggression than those who were not, and aggression increased with recent maternal employment (Goodman & Aber, 2010).
Second, boys depicted more dysregulated and aggressive themes than girls (Goodman & Aber, 2010; Petrowski et al., 2009, 2014), including at the start and end of a kindergarten year in Mexican US and Chinese US children relative to Austrian children (Petrowski et al., 2009). Goodman and Aber (2010) propose that mothers’ heightened vigilance toward their sons may exacerbate their son’s aggression. At age 6, Turkish boys were more commonly classified as having a “hostile/negative” attachment than girls (Seven & Ogelman, 2012). No NSST gender differences were reported in a Ugandan study, yet counterintuitively, rejecting father themes were associated with less attachment avoidance among boys only (Dent & Goodman, 2021).
Narratives Reflect Real Parenting Behavior That Lacks Equivalent Meaning in Coding Schemes
Several findings raised questions on the universality of what the narrated content means. Depictions of controlling and argumentative parenting behaviors had not been linked to the same degree to negative outcomes as found in White, “Western” studies in African (or Black) American (Brown et al., 2017; Grey & Yates, 2014) or Turkish cultures (Ahmetoglu et al., 2018). The infrequency of avoidant attachment narratives in normative East Asian samples also emerged (7%—Jin et al., 2012; 13%—Katsurada, 2007; 14%—Han, 2020). A factor analysis on Jin et al.’s (2018) community sample data produced a factor comprising elements expected not to be positively related reflecting attachment avoidance (“self-care”) and ambivalence (“reversal”), suggesting the behaviors may offer a different meaning in the Korean context. Of Ghanaian narratives, Hosny et al. (2020) noted depictions in which the parent’s giving of food or a gift was reciprocated by the child who then said, for example, they will financially provide in future. This behavior may be interpreted as a type of “reversal” or enmeshment when considered from an attachment perspective, but may be commonplace in collectivist cultures as reflecting socialised expectations of role reciprocation. Ghanaian children’s narratives also commonly incorporated death-related themes and spirituality-related attributions when misfortune occurs (Hosny et al., 2020), which the authors suggest are normative culturally and when there are concerns for survival, yet could be challenging to interpret within existing coding schemes.
There is no suggestion by the authors that these findings do not reflect real behavior (as internalized by the children); rather, such findings may reflect cultural differences of family experiences and parenting. Brown et al. (2017) suggest that “prototypical” sensitive care may not be required for secure attachment in the African American context. Jin et al. (2018) proposed that the very low rates of avoidant attachment among Korean children were due to societal norms that dictate high maternal proximity and therefore do not reflect the child’s sense of
Narratives May Reflect Children’s Meaning-Making of the Task Itself
In societies that emphasize “right conduct” as an important socialization goal, NSSTs may be interpreted as morality-based. When instructed to “tell me and show me what happens next in the story,” children may interpret this as a request to convey understanding of what
This tendency may explain the high rate of prototypically “secure” depictions in some samples from socioeconomically deprived backgrounds, such as African Americans (77%; Brown et al., 2017), Ghanaians (73%; Wan et al., 2017), and South Africans in a periurban “slum” (55%; Pritchett et al., 2013), but which notably lacked contextual detail that is unusual for a “secure” classification. Secure attachment in African American children was not associated with parenting sensitivity (Brown et al., 2017) or maternal depression (Goodman et al., 1998). Also, NSST outcomes were
The Role of Culture in Migrant and Minoritized Group Narrative Content
Some of the most convincing evidence for true cultural variations in children’s NSST completions comes from “migrant” studies, as they focused on cultural comparisons. Chinese US children’s narratives involved less conflict resolution, less emotion, and more avoidant themes compared with other cultural groups, interpreted as reflecting values of group interdependence over individual assertiveness and independent problem-solving around relationships (Petrowski et al., 2009, 2014). However, this conflict resolution effect was not later replicated (Song et al., 2018). Similarly, relative to US children, Japanese children displayed less aggressive (verbal and behavioral), angry, and avoidant behavior (Zahn-Waxler et al., 1986). However, Japanese children also showed less affiliation and prosocial themes (verbal and behavior), and more overregulation of emotion, while US children showed more underregulation of emotion. The findings were interpreted as reflecting real differences in children’s behaviors, feelings, and speech related to cultural values that have been internalized by the child from their parents’ heritage socialization goals. European US children’s story stem portrayals of conflict resolution were related to positive self-peer descriptions (and lower social anxiety at follow-up) but not so for Chinese US children, which was interpreted as reflecting different cultural beliefs around what constitutes desirable peer relations (Song et al., 2018).
Narrative content by Mexican US children was interpreted as reflecting Mexican values, such as depictions of emotionally “deactivated” child–teacher relationships attributed to cultural emphasis on education (Vu & Howes, 2012), and a focus on moral themes due to a strong Catholic influence (Petrowski et al., 2009), relative to Chinese US children. However, the latter was not replicated (Petrowski et al., 2014). While Mexican US and Chinese US narrative similarities (e.g., less conflict resolution) relative to Austrian narratives may reflect a shared collectivism orientation as suggested by the authors, they also share a minoritized status, exposure to the US pre/school context, and bilingualism. Furthermore, Chinese US children’s higher depiction of avoidance, lack of assertiveness, and low prosocial and affiliative behavior in response to distressing or dilemma stories may reflect low cultural integration, or internalizing behaviors resulting from the change in cultural environment. Length of narrative was not controlled for, so the Chinese US children’s lower frequency of particular behaviors in a narrative may be because they said or enacted less. Other explanations include unfamiliarity with this kind of creative task and socioeconomic differences (although Zahn-Waxler et al.’s [1996] sample was middle class).
By contrast, the four studies focused specifically on studying African Americans in specific contexts (e.g., rural) with high multiple social adversities (Brown et al., 2017; Goodman & Aber, 2010; Goodman et al., 1998; Gustafsson et al., 2017). These studies focused on negative factors that could be associated with social adversity (parental harshness and intrusiveness, interpartner violence, narrative aggression). Also, findings tended to be interpreted within norms linked with socioeconomic adversity and marginalization rather than (sub-)cultural explanations.
The Cultural Utility and Validity of Story Stem Measures
Internal Psychometrics and Face Validity
The 10 studies that reported on NSST internal characteristics at the scale level provided general support for their robustness, but this was mostly limited to consistency across narrative responses and/or between those and the aggregate outcome (usually attachment classification). More critically, the “hurt knee” (e.g., Gericke & Bain, 2020; Nóblega et al., 2019) and “getting lost and reunion” narratives (e.g., Brown et al., 2017; Jin et al., 2018) were occasionally highlighted as having low predictive value of overall attachment outcome and/or less consistent with the other stems. When Korean children were presented with the “lost in shopping mall” stem, their responses involved asking a policeman for help, thus not differentiating attachment themes (Jin et al., 2018).
Only three attachment studies conducted factor analysis, all arriving at three-factor solutions, comprising a positive factor around attachment security and at least one negative factor, in common with European studies (Wan et al., 2017). The negative factors showed variation between Ghanaians (Wan et al., 2017), Koreans (Jin et al., 2018) and Ugandans (Dent & Goodman, 2021), variously comprising atypical/unresolved, invasive, negative relations, attachment avoidance, rejecting father, self-care, and/or reversal. This may reflect variations in how insecure attachment tends to manifest locally, but also the rating scales utilized (e.g., the MCAST gives more focus to atypical phenomena).
Furthermore, the story stems in two studies elicited themes that seem to reflect societal norms. Ghanaians depicted religious and spiritual themes and themes in which the protagonist doll promised to do well in future (e.g., study well, get a good job) to reciprocate implicit forms of parental “affection,” reflecting spiritual and collectivist values (Hosny et al., 2020), and Ugandan gender differences in narrative content were viewed as reflecting rural Ugandan society that has a highly patriarchal structure, such as polygamous living arrangements being a social norm (Dent & Goodman, 2021).
Inter-Rater Reliability
Inter-rater reliability (IRR)—usually in the form of intraclass correlations or Cohen’s Kappa—was reported and found to be acceptable or high in 22 of the 30 (73%) quantitative studies. IRR data were missing mostly in Middle Eastern and East Asian studies. One study reported that IRR was established but provided no statistics. IRR was most often based on attachment classifications before aggregating into dichotomous classifications for analysis, masking any insecure/disorganized distinction, or involved nonattachment rating scales. Many studies lacked methodological details that may be expected of observational studies, for example, blinding, randomization, type of intraclass correlation or how disagreements were dealt with.
Story Stem Stability
Four samples in three studies were tested on the same stems twice with a 1-year interval at minimum, mostly to investigate socioemotional development. Between the start and end of the kindergarten year, several MSSB interpersonal variables (e.g., moral themes, empathic relations) showed consistency, though emotional expression increased overall. Cultural differences were found in avoidance (increased for Chinese US children, decreased for Mexican US children), and interpersonal conflict (increased for Chinese US children, decreased particularly among Mexican US boys; Petrowski et al., 2009). Using linear regression, a Turkish study found that IDFSS-attachment scores at 6 years predicted such scores at 9 years (Seven & Ogelman, 2012). Song et al.’s (2018) peer story stems study did not formally analyze stability; however, social anxiety and loneliness mean scores were similar when tested a year later.
Convergent Validity With Independent Measures of the Same Construct
Attachment-based NSSTs showed some concordance with other concurrently tested attachment measures (Jin et al., 2018), and with measures taken in infancy (Howes et al., 2011; Pritchett et al., 2013), and later childhood (Seven & Ogelman, 2012). Attachment outcomes were derived from family drawings, observational Q-sort, the Strange Situation Procedure (Ainsworth et al., 2015) and child report, and were based in Korea, Mexico, South Africa, and Türkiye, respectively. However, in Howes et al. (2011), 14-month observed attachment was associated with story stem narrative coherence, and not security or deactivation. Taking a closer look, whether one would conceptually expect stability between early-middle childhood attachment and disorganized attachment measured in infancy (Pritchett et al., 2013), reactive assessment disorder (Pritchett et al., 2013), or clinical assessments of attachment based on case histories (Gericke & Bain, 2020) is debatable (e.g., Minnis et al., 2009).
Outside of attachment, one study presented seven challenging situation stories in two formats, one being story stems and the other involving the child pretending they were in a particular situation (supported with pictures) and asking what they would do by selecting specific response options. Moderate to strong correspondence on various themes was found (Zahn-Waxler et al., 1996).
Convergent Validity With Independently Measured Child, Parenting and Family Outcomes
NSST validity rests on the assumption that children’s produced narratives represent some dimension of the child’s real-life psychological, emotional, and social functioning and the caregiving environment (Kelly & Bailey, 2021), either directly or mediated by attachment (e.g., Martoccio et al., 2016). The strongest evidence for convergent validity comes from measures of personal and social functioning, including self-evaluations, popularity/leadership, peer relations, and inhibitory control and aggression (at 2 years), as found in Koreans (Shin, 2019), Puerto Ricans (Gullón-Rivera, 2013), Mexicans, Peruvians (Nóblega et al., 2019), and a diverse US sample (Grey & Yates, 2014). The associations were strongest with ASCT. In addition, Turkish studies linked NSST narratives with negative social-emotional outcomes: sense of loneliness (Sahin-Bayraktar & Seven, 2022; but not in Song et al., 2018), and poor emotion regulation in families with both high punitive parenting and high minimization of distress (Ahmetoglu et al., 2018). The link between MCAST outcomes and various child difficulties in nonclinical samples was moderate, including social and behavior difficulties in a diverse UK sample (Futh et al., 2008), reactive attachment symptoms in South Africa (Pritchett et al., 2013), and Ghanaian parent-rated child difficulties (Wan et al., 2017). MSSB outcome was linked with child IQ in a diverse US sample (Sher-Censor et al., 2013).
ASCT outcomes were also associated with parent measures of sensitivity, narrative elaboration, “mother comforting child” narratives, anxiety, depressed affect, and “unfiltered speech” in the context of parental trauma—in African American (Goodman & Aber, 2010; Goodman et al.,1998), Korean (Shin, 2019), Middle Eastern heritage (Dalgaard et al., 2016), or Mexican children (Howes et al., 2011). Outcomes from other NSSTs were linked with parents’ reported emotion socialization strategies, namely, minimization reactions, problem-focused coping (Ildız et al., 2022), and emotional availability during mother–child interaction (Suwa et al., 2012). ASCT and MCAST outcomes were associated with child maltreatment, intimate partner violence, and earlier harsh parenting in African Americans (Brown et al., 2017; Gustafsson et al., 2017; Han, 2020).
By contrast, NSST outcomes were
Discussion
Bringing “non-Western” NSST studies together for the first time, publications doubled since 2017, with representation across diverse world regions. Overall, NSSTs were primarily motivated by an interest in child–caregiver attachment and favored standardized NSSTs developed in the “West.” NSST procedures varied across studies even involving the same stem battery, though this is similarly found in “Western” studies (Kelly & Bailey, 2021; Madigan et al., 2016; Tang et al., 2018; Yuval-Adler & Oppenheim, 2015). Among the reviewed studies, however, almost two thirds of the studies had a cultural focus and a large proportion of samples came from socioeconomically disadvantaged backgrounds. A third of studies reported considering cultural modifications to their stems, but no new stems or coding systems were developed in or for a “non-Western” context.
Support for the Utility and Validity of NSSTs in “Non-Western” Populations
Broad support emerged for the universality of NSSTs; narrative content has symbolic value by reflecting aspects of young children’s lived reality. Several attachment distribution findings and narrative themes showed cultural patterns, supporting the universality of attachment (Mesman et al., 2016). Several findings mirror those of “Western” studies; narratives classified as representing secure attachment were less common in clinical risk and socioeconomic disadvantaged groups (Bakermans-Kranenburg et al., 2004; Van Ijzendoorn et al., 2006), and when gender differences were found, these were remarkably consistent and mirrored other studies (Gloger-Tippelt et al., 2016; Pierrehumbert et al., 2009; Posada et al., 2019; Zevalkink & Ankone, 2022). These gender differences could be attributed to girls’ relatively more advanced social and language skills and/or to how mothers talk differently with daughters (Posada et al., 2019) and sons (Goodman & Aber, 2010). Furthermore, there were findings suggesting representational “idealisation” (Mahama et al., 2009; Wan & Green, 2010) and insecure narratives being most common among institutionalized children (Torres et al., 2012).
Based on the evidence, NSST outcomes reflect child personal and social functioning, and to some degree, child difficulties, parent variables, and family adversities. Children’s NSST responses in different cultural contexts have some concordance with observed and parent-reported outcomes, and may provide added value to clinical information. Importantly, NSSTs capture cultural differences in children’s experiences growing up in their respective societies, most notably of parenting behaviors directed by cultural socialization goals (Dent & Goodman, 2021; Hosny et al., 2020; Jin et al., 2018). For example, themes such as low antisocial behavior (e.g., aggression), less emotion, more overregulation of emotion, and more avoidance depicted by Chinese US and Japanese children may reflect Asian societies’ collectivist orientation that devalues free emotional expression.
Still, gaps in the evidence remain, including a need for cross-cultural designs, a need to consider intersecting variables such as socioeconomic status and language, larger validation studies, and studies to understand whether and what NSST cultural modifications are helpful. Inequities in the research environment must also be recognized; researchers in some contexts face systemic barriers, in terms of resourcing studies, NSST training availability, and the thresholds expected for publishing in English language journals (Draper et al., 2022).
Evidence Questioning the Utility and Validity of NSSTs
Few studies directly compared (ethno)cultural groups. However, of those that did, the looser association between attachment and child outcomes than expected in two non-White/non-“Western” groups (Futh et al., 2008; Grey & Yates, 2014) raises the possibility of inequivalence in meaning between groups. For example, parenting behaviors perceived as argumentative, confrontational, or intrusive in one culture may be perceived as care and concern in another, although this could also be partly attributed to family socioeconomic status as this is lower on average in minoritized groups. Similarly, looser associations between parenting and socioemotional outcomes are sometimes found in minoritized groups in studies using other parenting style measures (e.g., Guerrero et al., 2021). Thus, narrative content or delivery, especially depictions of parenting behavior, may reflect real behavioral differences between groups that may render interpretations within the framework of existing coding schemes invalid. Although the association between caregiver insensitivity and attachment is generally robust across cultures (Bakermans-Kranenburg et al., 2004), NSST coding often involves making inferences based on
A few findings from our review suggest that NSSTs may evoke unintended themes in some cultures. Low internal consistency was reported across stems in some studies, which is notable given how few studies provided item-level analysis across stem responses. Furthermore, story stems may particularly evoke moral themes. For example, Ghanaian children tended to provide narratives with child compliance and obedience themes (Hosny et al., 2020) and Korean children tended to report going to a policeman in response to the “getting lost” story stem (Jin et al., 2018). More hierarchical, collectivist societies may emphasize “right conduct” to children as part of valuing group cooperation, which is similarly reflected in children’s non-NSST narratives in, for example, China (Wang & Leichtman, 2000). Narrative research suggests that variations in narrative content and structural features are shaped by a society’s cultural goals (Bliss & McCabe, 2008; Gutierrez-Clellen et al., 1995; McCabe, 1997; Mulvaney, 2011). Adults transmit cultural values and socialization goals to children via storytelling, which children implicitly understand from an early age, and adult-administered NSST stems may be interpreted in similar ways by children.
Children in some studies tended to respond to story stems with more “normalised” yet less detailed or less resolved narratives, which was more common in disadvantaged groups. This finding may suggest that the stem content may not have been sufficiently evocative for these children to compel them to resolve the situation (e.g., conflict or distress). How children engage with the stories, are familiar with this kind of task, and understand the task instructions can drastically affect activation of internalized representations. The few studies that reflected on the cultural modifications they made suggest that they may not have gone “far enough.” Little account has been taken of cultural differences in how children tell narratives (Bliss & McCabe, 2008) or in the child’s perceived power distance based on social norms within the child’s culture.
Clinical and Research Implications
A reflective stance, incorporating cultural humility (Mosher et al., 2017), is needed when administering or evaluating NSSTs to encourage us to consider children’s meaning-making of the task and scenarios, and one’s own inferences in relation to existing coding systems. Probing possible culturally embedded explanations—such as those offered in this review—may aid culturally sensitive interpretation. Alternative to coding schemes, qualitative data–driven thematic analysis (e.g., Hosny et al., 2020) carries fewer assumptions, and can be used to develop culturally relevant coding schemes and to monitor feasibility. Exemplar quotes can supplement quantitative studies as they provide a rich source for more contextualized discussion. When using narrative research methods with adults, Lenette et al. (2022) discuss “embracing messy stories” and favoring story fragments rather than story stems to redress the issue of Western constructions of storytelling. However, this seems to be at odds with attachment NSST measures which attribute secure attachment to linear, coherent responses with the typical story arc. Future research must work to attempt to reconcile these differences.
In terms of research, a key recommendation informed from our review is that the field can only progress with more transparent and robust reporting, reflection, and refinement. At a minimum, reporting should include detailed internal psychometrics when NSSTs are implemented in cultures that differ from where the measures were developed. Our findings suggest a need to consider the potency of factors that may intersect culture (e.g., socioeconomic disadvantage, marginalization, mental health, acculturation) in the interpretation of NSST narratives and outcome, as they were almost never considered in the reviewed studies. Ultimately, co-produced NSSTs would be a valuable way forward in which individuals from specific cultural groups of interest are involved to better understand storytelling norms and inform story stem design (Lenette et al., 2022).
NSSTs yield culturally sensitive output, which opens up their potential for use longitudinally, such as for better understanding the intergenerational transmission of bicultural values in “migrant” samples (e.g., Petrowski et al., 2014) or of the effects of parental trauma (Dalgaard et al., 2016). Second, doll play is often used with story stems, which may assist narrative interpretation and decrease reliance on language, so future work could investigate whether including both may increase parity across cultural contexts (e.g., Bosman & Kerns, 2015). Third, future work could focus on the cross-cultural suitability of narrative methods relative to other modalities for accessing internal representations. Methods using visual modalities, such as family drawings (Jin et al., 2018), are arguably more “automatic” in process, but this has been little investigated cross-culturally.
Review Limitations
Several limitations are important to recognize. First, the focus on “non-Western” studies will almost certainly have inflated a cultural divide, including an assumption that the evidence for NSSTs in the “West” is well characterized and validated, which may not be the case (Jewell et al., 2019; Madigan et al., 2016). Second, as most abstracts and full articles do not provide cultural or country setting information, a manageable strategy to identify eligible studies was challenging. Studies may have been missed if samples were not recruited for their cultural background or which evaluated, for example, attachment without stating the method used in the abstract. Third, restricting to English language publications will have biased representation toward works by authors with a “Western” orientation. Samples tended to be geographically clustered; most of the countries have close ties to the “West” via colonialism or recent migration. This may also reflect sample access by and the research interests of the handful of research groups who contributed multiple papers to the review. No studies involved South Asian or first nations peoples. Fourth, care was taken to consider multiple interpretations of results, but our review is limited by what authors reported and reflected on, and our own largely “Western” orientation as reviewers. Studies had a variety of objectives and many made NSST modifications accordingly, which may have been a strength for the study itself but which reduced study comparability for the review. Some studies may not have reported cultural adjustments that were made, for example, if authors perceived measurement standardization to be preferred for academic publication. Fifth, while cultural (and other) modifications of NSSTs were recognized in this review, little is still known about the impact of these modifications as studies did not evaluate them directly. Finally, interpretation of findings must take into account that potential interrelated confounders (e.g., language fluency, acculturation, socioeconomic disadvantage) were rarely controlled for, and the heterogeneity of comparison groups. Based on existing study designs, we cannot exclude the possibility that family socioeconomic disadvantage is a key explanatory factor in many of the studies.
Conclusion
Redressing potential bias has been pressing across different developmental methods, as Western societies have become increasingly multicultural and “non-Western” societies have increasingly adopted NSSTs derived from “Western” samples. Pooling the evidence to date has facilitated our cultural understanding of NSSTs in a way that can in future be “fit for purpose” in a globalized context. Our review identified many studies broadly supporting the cultural validity of NSSTs as their outcomes reflect some dimensions of children’s inner lives and meaning-making of their world relating to family, others, and self. Thus, NSSTs can give “voice” to young children whose voices may be rarely heard.
Still, this review also highlighted areas that may show inequivalences in meaning across cultures, based on existing coding schemes. Story stems may not activate representations with equivalent salience across cultures, and little is still known about whether social-emotional experience is organized and accessed in the same way universally. This synthesis highlights areas of strength that researchers can build on, and urges for a more robust and transparent reporting and a culturally reflective approach to future study design and interpretation, and clinical practice, when using NSSTs with “non-Western” and multicultural groups.
Supplemental Material
sj-docx-1-jbd-10.1177_01650254241268594 – Supplemental material for Are narrative story stem methods valid in “non-Western” contexts? A systematic review
Supplemental material, sj-docx-1-jbd-10.1177_01650254241268594 for Are narrative story stem methods valid in “non-Western” contexts? A systematic review by Ming Wai Wan, Alice Taylor, Ruby Rainbow and Crystal Liyadi in International Journal of Behavioral Development
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
