Sage Journals: Discover world-class research

Abstract

Aims and Objectives:

Sentence repetition (SRep) tasks are a popular, valid, and cost-effective method of measuring language development. However, their heterogeneity and flexibility of use and adaptation may impact the obtained results. This may be particularly significant when using SRep tasks to identify language problems in children.

Methodology:

To bring attention to this issue, we carried out a systematic review of English-language studies using SRep tasks with samples of bilingual children with and without language problems. We report the results of a systematic review according to the 2020 PRISMA guidelines.

Data and Analysis:

A total of 774 records were screened, resulting in 141 studies subjected to a narrative analysis to characterize the versions of the reported SRep tasks used.

Findings:

Aside from summarizing broad publication trends regarding SRep tasks in studies on bilingual children, our systematic review found variability in the specific formal characteristics of the SRep tasks as reported in the publications. In particular, task standardization, length, and procedure as well as the use of analog versus digital versions of the SRep task emerged as potentially significant areas of differences.

Originality:

We offer an overview of how SRep tasks are structured and reported in the literature on bilingual children and language problems, pointing to areas of difference which, unless further examined, may impair conclusions and generalizations. We also offer suggestions on how to improve the transparency and clarity of reporting the methodological details of the SRep tasks.

Implications:

The systematic review lays out directions of further studies to refine the SRep methodology as applied to bilingual children and identifying language problems. Our findings have the potential to stimulate empirical research into how various characteristics of the SRep task may introduce unwanted variability into the measurement.

Keywords

Sentence repetition systematic review bilingualism children specific language impairment developmental language disorder COST Action IS0804

Introduction

Sentence repetition (SRep) tasks are a language assessment in which participants are asked to repeat sentences verbatim as they are presented one at a time. They are commonly used to assess children’s morphosyntactic development and to identify language difficulties. SRep tasks have been traditionally used in studies of language processing and in clinical neuropsychological diagnosis (e.g., Fraser et al., 1963; Jarvella, 1971; Meyers et al., 2000; Vinther, 2002). They are also a valuable diagnostic tool for language disorders (Klem et al., 2015), particularly specific language impairment (SLI), developmental language disorder (DLD), and/or language impairment (LI) in bilingual children (Bishop, 2017). We use these terms as defined and/or used in the publications we refer to. SLI refers to children with language impairments of unidentifiable causes and with typical development and cognitive skills. DLD addresses SLI’s reliance on defining by exclusion, referring to children with language acquisition difficulties and making no reference to cognitive functioning as a criterion. Finally, as a broader term, LI encompasses various language issues without reference to the specific criteria of SLI, focusing on the impact of language difficulties on functioning, regardless of cause, cognitive level, or development in other areas (see Reilly et al., 2014; Volkers, 2018).

In this context, SRep tasks have shown high sensitivity to SLI, both in monolingual and bilingual children (Conti-Ramsden et al., 2001; Marinis & Armon-Lotem, 2015; Riches, 2012). Combined with their relative simplicity of design and use, this has made SRep tasks popular (Rujas et al., 2021). They are included in test batteries or as stand-alone tasks (e.g., Marshall et al., 2015; Stokes et al., 2006) and are versatile in terms of their application and purpose. For example, Kueser and Leonard (2020) employed an original SRep task involving “a series of four-word sequences” created to vary in terms of their frequency in spoken language as well as the level of predictability of the fourth word based on the preceding three (e.g., “All through the town,” “All through the hay,” “Go for a ride,” “Go for a bath” as high frequency/high predictability, low frequency/high predictability, high frequency/low predictability, and low frequency, low predictability, respectively, p. 1171). Importantly, predictability was based on corpus data. In this way, the task allowed for measuring word frequency and predictability effects on the performance of children with DLD compared with typically developing children. Notably from the point of view of our systematic review, Kueser and Leonard (2020) also report the modality of the stimuli (audio recordings, female speaker), the mode of administration (presented in-person by the experimenter), and the presence of training/trial items and feedback provision.

In contrast, regarding standardized SRep tasks, one widely used tool (although not the only one available, see, for example, Ward et al., 2024) is the LITMUS-SRep, developed as part of the COST Action IS0804 project, which has been adapted to various languages for cross-linguistic comparison (Marinis & Armon-Lotem, 2015). Developing parallel SRep tasks across languages required balancing cross-linguistic comparability with language-specific sensitivity. To achieve this, LITMUS-SRep tasks include two types of structures: those known to be challenging for children with SLI across languages, such as object relative clauses and object wh-questions, and those that are particularly difficult in a given language, based on prior research. For instance, one of the structures used across languages is the object relative clause, which involves both embedding and syntactic movement. Examples include:

English: “The swan that the deer chased knocked over the plant”

Russian: “jeto devochka, kotoruju narisovala mama” (“This is the girl that the mother drew”)

Hebrew: “ra’iti et ha-kelev še ha-sus daxaf” (“I saw the dog that the horse pushed”)

These structures were included because they have consistently been found to pose difficulties for children with SLI across typologically different languages, making them suitable for cross-linguistic assessment of morphosyntactic development.

There are several crucial features of SRep tasks to consider. Above all, they must be constructed separately for each language in order to account for language-specific features and developmental difficulties (Antonijevic & Meir, 2024; Fleckstein et al., 2016; Marinis & Armon-Lotem, 2015). For instance, passives are generally difficult across languages, but the nature of this difficulty varies across populations. English-speaking children with SLI have difficulty with passives, but typically developing bilingual children do not. In other languages, including Hebrew and Russian, passives pose a general difficulty due to their infrequency or the complexity of the case system, essential for understanding passive structures, making them particularly challenging for bilinguals. In Polish, complex sentences with relative clauses and prepositional phrases, such as “Ona idzie po trawie, na której leżą deski” [She is walking on the grass, on which boards are lying] are difficult due to the complex inflectional system. The noun “trawa” (grass) is inflected to “trawie” to indicate the locative case, triggered by the preposition “po” that requires the noun it governs to reflect location. Similarly, the relative pronoun “której” agrees with “trawie” in gender, number, and case (singular, feminine, locative) to maintain grammatical coherence within the relative clause “na której leżą deski.” The verb “leżą” is inflected for third person plural, aligning with its plural subject “deski,” while the prepositional phrase “na której” is locative. This alignment of case, number, and gender across different parts of the sentence through inflection exemplifies the complexity.

Although the task may appear to be memory-based, research has demonstrated that performing well on syntactically complex items requires morphosyntactic processing. Specifically, when sentence structures are sufficiently complex, accurate repetition requires speakers to access and apply grammatical knowledge rather than rely on simple recall (Erlam, 2006; Frizelle et al., 2017). As Devescovi and Caselli (2007) have noted, reproducing such structures involves the speaker reconstructing them internally based on their linguistic competence.

SRep tasks are also employed for a wide range of study goals, and their characteristics may be adjusted accordingly. Thus, for example, the authors may need to construct a new SRep task to measure proficiency in a given, typically minority, language. Alternatively, they may analyze the SRep data in nonstandard ways, for example, focusing on instances of codeswitching (Soesman et al., 2022).

The way SRep tasks are administered also varies considerably. Sentences may be read by the person administering the task or may be presented on a tablet/computer with prerecorded sentences. Elements such as short visual elements and animations, may also be added to make the task more engaging for children. Moreover, SRep tasks may differ in length, may be employed stand-alone or within a longer measurement session, and their administration may involve intermittent rewards or encouragement. Finally, with bilingual children, the instructions and administration of the SRep task need not necessarily be in the same language as the SRep stimulus sentences themselves (e.g., Aguilar-Mediavilla et al., 2019; Grech, 2022). On another level, the reporting of these features in publications is also important to maintain transparency. Similar to reporting other methodological choices and procedures, omitting this information or assuming it is intuitive runs the risk of occluding its potential impact on the obtained results (see Zogmaister et al., 2024).

Given the variability observable in the specific elements and administration of SRep tasks, which, similar to other areas of psychological testing, may impact the obtained results and their reliability and/or validity in clinical practice, we conducted a systematic review to describe and systematize these differences. In the research context, an unacknowledged variability between different versions of the tasks—and the assumption that they are equivalent without verification—may hinder the ability to compare and generalize results between samples and studies (Gilmore & Campbell, 2019; Nosek et al., 2022; Wolfe-Christensen & Callahan, 2008). Thus, our systematic review examined this variability in SRep task design and reporting to highlight potential concerns, stimulate further research, and suggest best practices.

Sentence repetition tasks in bilingual children

Notably, despite the increasing volume of research on bilingualism, most language development assessment tools, including SRep tasks, are primarily designed with monolingual speakers in mind. The tasks frequently rely on normative data derived solely from monolingual populations, which may not adequately capture the linguistic abilities of bilingual speakers. This has the capacity to further exacerbate existing issues with SRep task standardization and results interpretation. Differences in how bilingual children might perform on SRep tasks compared to monolingual peers can be attributed to their different language development trajectories, particularly because bilinguals often develop language skills in a nonlinear manner. Although bilingual children reach early language milestones at the same age as their monolingual peers (Muszyńska et al., 2025), specifically babbling, first word, 10th word, 50th word, and first multi-word utterance, their grammatical development may follow a different trajectory than the one of monolingual children. Their language development might progress rapidly in some areas while slowly in others, influenced by varying levels of exposure to each language, differing contexts in which the languages are used, and the cognitive demands of managing two linguistic systems simultaneously (see Marinis & Armon-Lotem, 2015). Given that bilingualism may follow different language developmental trajectories, there is a need to have tasks standardized for multilingual speakers. Otherwise, if they are tested with tasks standardized for monolinguals, there is a significant risk of inaccurately assessing their language abilities and, in applied settings, making suboptimal decisions based on this data, which may impact the children’s education and functioning (Cao & Yan, 2024; de Jong, 2024; Hunt et al., 2022). Therefore, it is important to study how tasks assessing language development work with bilingual children specifically.

However, few studies thus far have examined the potential impact of differences in SRep task characteristics on the obtained results. For example, Banasik-Jemielniak et al. (2023) have administered an online and in-person version of the SRep-PL task (Banasik et al., 2012) to a sample of 60 Polish-English and Polish-German bilingual children aged from 4 to 7 years. Despite a positive correlation between the online and offline version, the authors have also found statistically significant differences in the children’s scores on the two task modalities, and also noted that the pattern of performance was different for multilingual and monolingual children. In contrast, Pratt et al. (2022) compared scores of 10 children (5 monolingual English and 5 Spanish-English bilinguals) aged between 4 and 8 years on online and offline versions of a range of language assessment tools which included an SRep task and found that the scores obtained in both task modalities were highly correlated, which the authors have taken to imply no significant changes between the modalities. Therefore, available evidence suggests potential variability in bilingual children’s SRep scores depending on administration modality, a point which deserves further attention and study.

COST Action IS0804 and the LITMUS-SRep

A notable initiative in the context of standardizing the SRep tasks for use with bilingual children is the 2009 COST Action IS0804, entitled “Language Impairment in a Multilingual Society: Linguistic Patterns and the Road to Assessment.” It was aimed at enhancing the understanding of language impairment in multilingual settings, addressing the complexities of diagnosing and assessing language impairments in individuals who speak multiple languages. The core of the project was to determine linguistic patterns and developmental trajectories that are typical and atypical among multilingual children, a population that is growing in many parts of Europe due to increased migration and mobility.

One of the significant outcomes of COST Action IS0804 was the development and widespread adoption of the LITMUS (Language Impairment Testing in Multilingual Settings) tools, which include a variety of tasks designed to assess language abilities in children speaking different languages. One of the tools is the LITMUS-SRep task. It is designed to be adaptable across various languages to disentangle bilingualism from SLI. The primary goal was to create a standardized method to assess sentence repetition performance across a wide range of languages while taking into account linguistic and cultural differences. Two main principles guided the LITMUS-SRep task framework: inclusion of syntactically complex and simple structures and adaptation to language-specific features. That is, tasks included syntactically complex structures known to be difficult for children with SLI or language impairments across languages, such as object relative clauses, conditionals, and sentences involving syntactic movement. Also, control structures, such as mono-clausal sentences, were included to provide baseline comparisons. Each language version of the LITMUS-SRep task incorporated structures uniquely challenging for children with SLI or language impairments, but manageable for bilingual children with typical language development (Marinis & Armon-Lotem, 2015).

The current study

Although the COST Action IS0804 represents a crucial step toward improving the standards of SRep task use, the methodologies of SRep task use and their various elements remain heterogeneous across studies. This represents a potential, though untested, source of confounds in results, which may have consequences in both research as well as applied settings. Therefore, to further stimulate attention and research in this area, our purpose was to examine the patterns of variability in the formal characteristics of SRep tasks as reported in English-language studies on language development in bilingual children with and without SLI. Importantly, we highlighted both the SRep features themselves as well as the ways they were reported. To this end, we carried out a systematic review of published empirical studies. We sought to describe the most common patterns of features, contexts of use, and reporting practices of SRep tasks in these studies, outline potential future directions of research and areas requiring more attention, stimulate awareness of current methodological issues concerning the reporting of the SRep tasks in studies, and suggest tentative best practices, both in terms of SRep administration as well as the reporting thereof.

We are aware of several other systematic reviews and/or meta-analyses concerning this topic. First, Pawłowska (2014) carried out a meta-analysis of studies using verb tense grammaticality judgment tasks, nonword repetition tasks (NWR), and SRep tasks to establish their accuracy in diagnosing language impairment in monolingual English-speaking children (aged 3–12, with one study on a sample of 21-year-olds). Her systematic review included four studies using two types of SRep tasks: Redmond’s (2005) Sentence Repetition and the Recalling Sentences subtest from the CELF-4 (Semel et al., 1994). In contrast, our systematic review focused only on SRep tasks. In addition, we did not examine their diagnostic accuracy per se. Rather, we were interested in examining the differences in their contexts of use and formal characteristics, and in drawing attention to the need to systematize them. Second, Rujas et al. (2021) carried out a scoping review in which they established the patterns of SRep task use for diagnosing language difficulties in the years 2010–2021, focusing, for example, on the task languages, the studied populations, or the constructs assessed using SRep tasks (e.g., language abilities vs cognitive abilities). Our systematic review aimed to answer a more specific research question regarding the features of the SRep tasks in the context of bilingual children. While SRep tasks have been extensively utilized in linguistic and clinical research, the diversity in their implementation raises questions about the comparability of findings across studies, especially in the context of bilingual children’s language assessment. This heterogeneity not only reflects the adaptability and versatility of SRep tasks to meet specific research needs but also points at the potential methodological challenges in ensuring consistent and reliable research and diagnostic outcomes. Third, Ward et al. (2024) showed that SRep tasks are widely used in research on monolingual children’s language ability. Their systematic review and meta-analysis also classified different SRep tasks across studies (standardized vs nonstandardized). However, their specific focus was on examining the ability of SRep tasks to discriminate between typically developing children and children with DLD. Moreover, their review focused only on monolingual children. In contrast, we (a) focused in greater detail on the structural characteristics of SRep tasks beyond standardization itself (b) in studies on bilingual children.

Therefore, our review aimed to contribute by providing an overview of how SRep tasks have been used, adapted, and reported in studies involving bilingual children. By examining the formal characteristics of these tasks—ranging from the mode of sentence delivery to the inclusion of interactive elements—we hoped to shed light on the nuances that may influence task performance. This exploration may have implications for the development of best practices in the administration and reporting of SRep tasks. In summary, our systematic review aimed to answer the following research questions:

How do SRep tasks in studies on bilingual children with and without SLI/DLD/LI vary in their formal/structural features (e.g., languages tested, task formats, and scoring methods)?

How are these features reported in publications (i.e., are there pervasive omissions) and how could their reporting standards be improved?

Method

We report the systematic review in accordance with the 2020 PRISMA guidelines (Page et al., 2021) as applied to the current topic. The PRISMA flowchart of the literature search is shown in Figure 1. The current systematic review was not pre-registered.

Figure 1.

PRISMA flowchart of the current systematic review.

The systematic review included empirical studies published in English in academic journals. The inclusion criteria for the studies were: (a) using the SRep tasks in any form or medium, (b) for their general intended purpose, that is, to measure morphosyntactic development or some of its aspects, (c) on both clinical (i.e., with SLI or other language disorders/difficulties) and typically developing samples of bilingual children, and (d) speaking any two languages. That is, although the systematic review was limited only to English-language publications, studies concerning any language were included. Accordingly, the exclusion criteria for the studies were: (a) studies using the SRep tasks for different purposes than measuring morphosyntactic development in whole or in part (i.e., to measure hearing or stuttering), (b) solely on samples of monolingual children or adults. The systematic review comprised Scopus, EBSCO (all databases), and PubMed. In addition, the list of publications on the official LITMUS-SRep task website (https://www.litmus-srep.info/) was screened for publications, since the LITMUS-SRep task, resulting from the COST Action IS0804, represents a crucial development in SRep tasks in the context of bilingual language assessment. In addition, this served to improve the reliability of our search. No criteria for the date of publication of the studies entered into the systematic review were set.

The same search string was used in searches of all three databases, save for syntax or input modifications necessitated by the user interface differences between the databases. Due to the heterogeneity with which the SRtep tasks are sometimes referred to in the literature (e.g., sentence repetition vs sentence elicitation), the search string was manipulated (i.e., expanded or reduced, for every line separately) via trial-and-error in order to maximize the number of results. All three authors designed the search string together. Thus, the search string was

“sentence repetition” OR “sentence imitation” OR “sentence elicitation” OR srep AND bilingualism OR bilingual OR multilingual OR “second language” AND children OR child OR development OR acquisition

The database search was carried out on 26 June 2022 by the first author. The search results from every database and from the LITMUS website were exported into a spreadsheet and collated. A total of 1,202 publications were retrieved. Duplicates were then removed in two stages: first by using the Zotero software, and second, by the first author manually removing any remaining duplicates. A total of 537 duplicates were removed this way. On 27 March 2023, the database search was run again by the first author to account for potential new publications. A total of 59 eligible publications were retrieved, out of which 50 were duplicates. Finally, on 26 January 2024 a second update of the screening was carried out in the same way. A total of 102 eligible publications were retrieved, out of which 3 were duplicates. Thus, a total of 774 publications were subjected to title and abstract screening.

The title and abstract screening was carried out in two stages. First, the publication dataset was divided into three equal parts. The database was screened by three authors (Author A, Author B, and Author C), working in rotating pairs. Each pair (A–B, B–C, and A–C) reviewed one-third of the dataset. The results of the screening were collated into a single spreadsheet. Second, all three authors reviewed the spreadsheet simultaneously. In case of conflicting decisions about the inclusion of a given article within the screening pair, all three authors revisited the title and abstract of the article together to arrive at a unanimous decision. When necessary, the full text of the publication was examined to ascertain the aim and methodology of a given study. This step was necessitated by the fact that the titles and abstracts of the publications in the dataset frequently used various terminology to refer to the SRep tasks employed. In addition, at times, names of larger test batteries which contain SRep tasks were given in the abstracts, or references were made not to the SRrep tasks, but to the constructs they were used to measure in a given study (e.g., “the language and reading achievement of 44 children adopted from Eastern European orphanages were clinically assessed with standardized tests and natural-language samples to determine the extent and types of problems present in the areas of language [i.e., overall spoken language, receptive language, morphology, semantics, syntax, pragmatics] and reading,” Hough & Kaczmarek, 2011). Finally, the publications included in the scoping review by Rujas et al. (2021) were examined and those which met the inclusion criteria of the current systematic review were included. Only one publication was included that way. Overall, the results of the title and abstract screening yielded 163 eligible reports, out of which one was unavailable for retrieval, resulting in 162 reports included in the full-text screening.

Full-text screening and data extraction were carried out by one author who extracted information about the stated study aims, sample characteristics, SRep task characteristics explicitly provided in the text, other tasks used in the study, and raw SRep scores (if reported). In addition, for each publication, the same author prepared a general narrative description of the results pertaining to the SRep task. These categories were designed before full-text screening began. This part of the analysis was carried out by one author because it did not involve any qualitative coding. No measure of study quality or publication bias was used in the full-text screening. To cross-verify the results of the full-text screening, the screening of 13 publications (10% of the dataset) each, randomly chosen from the dataset was carried out by the other authors. The assignment of texts to authors for the cross-verification process was done using a method of algorithmically controlled randomization. Specifically, a language model (ChatGPT) was used to generate 3 different sets of 13 non-repeating numbers. This approach ensured a randomized yet methodologically consistent assignment of texts to authors, taking advantage of the algorithm’s ability to facilitate unbiased distribution. The results of the cross-verification were then discussed between the authors and ambiguities were resolved. For this reason, no quantitative measure of reliability/agreement was calculated.

In cases where information about a given aspect of the study was not explicitly stated in the text (e.g., whether the children were simultaneous or sequential bilinguals), no inferences were made and an “N/A” code was given. Full-text screening resulted in rejecting 27 publications included in the title and abstract screening: 19 publications did not include studies where an SRep task was used, 5 studies did not involve a bilingual child sample, 2 publications contained duplicates of studies reported in publications already included in the systematic review, and 1 publication was a theoretical review. Thus, the final number of publications in the systematic review was 135. The number of reported studies was 141.

Results

Due to their volume, the full results of the systematic review are presented in the online Supplementary Materials: https://osf.io/tcpnz/. First, to provide context, we present a general overview of the dataset before proceeding to a detailed examination of the features of the SRep tasks and their reporting. Table 1 presents a shortened summary of the sample characteristics and the standardized/nonstandardized character of the SRep tasks in each of the studies included in the database. In turn, Table 2 presents the frequencies of reporting of each formal characteristic of the SRep task that we focused on in our systematic review.

Table 1.

Summary of sample characteristics and SRep task standardization in the studies included in the systematic review.

Report	Sample characteristics
Report	Languages	Size	Age	Gender	Typically developing (TD) vs developmental language disorder (DLD)/other	SRep task standardization	SRep task length in sentences
Andreou et al. (2023)	L1 Greek, L2 Italian	N = 37	M = 9;9 years	N/A	N/A	Task made by the authors	25
Torregrossa, Eisenbeiß, & Bongartz (2023)	L1 Greek, L2 Italian	N = 33	M = 9;4 years	16 girls	N/A	Task made by the authors, based on “existing SRTs” (p. 691)	25
Castilla-Earls et al. (2023)	L1 Spanish, L2 English	N = 81 at T3	M = 70.07 months	45 girls, 55 boys	46 children receiving speech-language pathologist services	Standardized—the BESA (Peña et al., 2018) and the BESA-ME (Peña et al., 2020), the CELF sentence repetition subtest	N/A
Torregrossa, Caloi, et al. (2023)	L1 Italian, L2 German	N = 2	M = 10;03 years	N/A	TD	Task made by the authors, based on Torregrossa, Eisenbeiß, & Bongartz (2023) and Andreou et al. (2021)	28
San (2023)	L1 Arabic or Turkish, L2 German	N = 31 (21 children, 10 adults)	Children: M = 94.80-127.37 months	N/A	16 TD children, 5 children with DLD	Standardized—the SRep subtest from the TODIL battery (Topbas & Guven, 2017)	36
Marc Goodrich et al. (2023)	Spanish and English	N = 117 (86 kindergarteners, 31 first-graders	M = 6.28 years	N/A	TD	Standardized—the BESA (Peña et al., 2014)	N/A
Fitton et al. (2023)	Spanish and English	N = 115 (84 kindergarteners, 31 first-graders)	M = 6.29 years	N/A	TD	Standardized—the BESA (Peña et al., 2014)	9; 10
Tribushinina & Mackaaij (2023)	L1 Dutch and a variety of L2s	N = 72 (49 monolingual children, 22 bilingual children)	M = 131.6-137.4 months	13 girls	DLD	Standardized—the Dutch LITMUS-SR (de Jong et al., 2021)	30
Komeili et al. (2023)	L1 Farsi, L2 English	N = 38	M = 103.81 months	N/A	N/A	Standardized—the English and Farsi LITMUS-SR-30 tests (Komeili et al., 2020; Marinis & Armon-Lotem, 2015)	30
Yang et al. (2023)	L1 Kam, L2 Mandarin Chinese	N = 55 (23 children had both parents move for work; 32 children had one parent move for work)	5-9 years	N/A	TD, as per a caretaker questionnaire.	Task adapted by the authors	57
Soesman et al. (2022)	L1 English, L2 Hebrew	N = 78	M = 5;11 years	41 girls	65 children were classified as TD, 13 children were classified as having DLD based on CELF-2 (Wiig et al., 2004) scores for English and the Goralnik Screening Test (Goralnik, 1995) for Hebrew.	Task made by the authors	36
Franck & Delage (2022)	Varying L1s, L2 French	N = 140 (20 monolingual children, 86 non-refugee bilingual children, 34 refugee bilingual children)	M = 5;10-7;2 years	N/A	N/A	Standardized—LITMUS-SR-French (Fleckstein et al., 2016)	30
Friesen et al. (2022)	L1 English, varying L2s. Late sequential bilinguals had varying L1s and L2 English	N = 92 (30 English monolinguals, 27 early simultaneous bilinguals, 29 late sequential bilinguals)	M = 10;6-10;9 years	48 girls across subsamples	N/A	Task adapted from Redmond (2005)	16
Sopata & Długosz (2022a)	L1 Polish, L2 German	N = 58	M = 9;3 years	N/A	TD	Task made by the authors	30
Grech (2022)	L1 Maltese, L2 English	N = 241	24-72 months	134 girls	N/A	Standardized—Language Assessment for Maltese Children (Grech et al., 2011)	10
Taha et al. (2022)	Italian as either L1 or L2	N = 74 (18 monolingual good readers, 19 monolingual poor readers, 21 bilingual good readers, 16 bilingual poor readers)	M = 10;3-10;6 years	32 girls across subgroups	TD	Standardized—shortened version of the Italian LITMUS SA (Marinis & Armon-Lotem, 2015).	30
Monsrud et al. (2022)	Varying L1, Norwegian L2	N = 560	6;0-12;11 years	299 girls	14 children with DLD	Task made by the authors	16
Miciak et al. (2022)		L1 and L2: L1 Spanish, L2 English Sample size: 331Age: 8-9 years old Gender: 56% female TD or DLD/Other: N/A				Standardized—the WJ-III Sentence Recall (Woodcock et al., 2007)	N/A
Rose et al. (2022)	L1 English, L2 Hebrew	N = 131 (onset of bilingualism at 0-24 months, onset of bilingualism at 25-48 months)	M = 68-69 months	65 girls across subgroups	TD	Standardized—CELF Preschool-2 (Wiig et al., 2004) for English, Goralnik Screening Test for Hebrew (Goralnik, 1995) for Hebrew	N/A
Wofford et al. (2022)	L1 Spanish, L2 English	N = 133	M = 6.34 years	N/A	TD	Standardized—the SRep task from the Bilingual English-Spanish Assessment (BESA, Peña et al., 2014)	N/A
De Cat & Melia (2022)	28 different L1s in the bilingual sample, L2 English	N = 174 (87 bilingual and 87 monolingual children)	M = 5;10-6;0 years	96 girls across subsamples	TD	Standardized—LITMUS Srep task (Marinis & Armon-Lotem, 2015; Marinis et al., 2010)	30
Sopata & Długosz (2022b)	L1 German, L2 Polish	N = 42 (10 German monolingual children, 10 simultaneous bilingual children living in Germany, 11 simultaneous bilingual children living in Poland, 10 sequential bilingual children living in Poland)	M = 9;05- 9;11 years	18 girls across subsamples	TD	Task made by the authors	25
Wagley et al. (2022)	L1 Spanish, L2 English	N = 132	M = 8.75 years	52% girls	TD	Standardized—the SRep task from the Bilingual English-Spanish Assessment (BESA, Peña et al., 2014)	N/A
Prentza et al. (2022)	L1 Greek in the monolingual sample, L1 Albanian and L2 Greek in the bilingual sample	N = 110 (53 monolingual children, 57 bilingual children)	M = 9.87-9.9 years	47 girls between subsamples	TD	Standardized, “developed within the COST Action IS0804 [. . .] and used in previous studies” (p. 379)	32
Friesen et al. (2021)	L1 English in the monolingual sample, various L2s in the bilingual sample	N = 94 (31 monolingual children, 63 bilingual children)	M = 10.7-10.8 years	52 girls across subsamples	TD	Task adapted from Redmond (2005)	16
Sopata et al. (2021)	Mixed sample, German and Polish	N = 117 (28 simultaneous Polish-German bilingual children, 44 early sequential bilingual children, 45 monolingual children)	M = 7;11-12;1 years	N/A	N/A	Task made by the authors	N/A
Soesman & Walters (2021)	L1 English, L2 Hebrew	N = 65	M = 5;10 years	33 girls	TD	Task made by the authors	36 Hebrew; 36 English
Quirk (2021)	Mixed sample, English and French	N = 30	M = 7;1 years	19 girls	TD	Standardized—LITMUS SR (Marinis & Armon-Lotem, 2015)	30
Blom et al. (2021)	L1 Arabic, L2 English or Dutch	N = 103 (Syrian refugee children residing in Canada and the Netherlands)	M = 8;7-8;8 years	57 girls across subsamples	N/A	Standardized—LITMUS-SRT (Marinis & Armon-Lotem, 2015)	31 Syrian-Arabic; 31 English; 30 Dutch
Soto-Corominas et al. (2022)	L1 Syrian Arabic, L2 English	N = 119	M = 9.37 years	57 girls	TD	English: Standardized—LITMUS SRT for English (Marinis & Armon-Lotem, 2015). Syrian Arabic-Task made by the authors based on the standardized English SRep	N/A
Stadtmiller et al. (2022)	L1 Russian, L2 German	N = 53	M = 5;3 years	27 girls	TD	German: Made by the authors (Lindner et al., 2013). Russian: Standardized—Russian LITMUS-SRep (Marinis & Armon-Lotem, 2015)	30
Razak et al. (2021)	L1 Mandarin, L2 English, L3 Malay	N = 10	M = 63.50 months	N/A	TD	Task made by the authors	24
Gatlin-Nash et al. (2021)	Mixed sample, English and Spanish	N = 81	M = 5;5-5;11 years	39 girls across subsamples	28 children with DLD	(Standardized—BESA; Peña et al., 2018)	9
Makrodimitris & Schulz (2021)	L1 Greek, L2 German	N = 27	M = 9;1 years	N/A	TD	Standardized—Greek LIMTUS SRep task (Chondrogianni et al., 2013)	32
Andreou et al. (2021)	L1 Albanian L2 Greek	N = 70 (Albanian-Greek monoliterate bilingual children, Greek monolingual children)	M = 9.66-9.68 years	N/A	TD	Standardized—Greek LITMUS SRep task (Chondrogianni et al., 2013)	32 Greek; 60 Albanian
Hamim & Hamid (2021)	L1 Malay, L2 English	N = 60	range = 4;0-6;11 years	N/A	TD	Malaysian English: Task was adapted from the Multilingual Sentence Imitation Task (Multi-SIT, Marinis et al., 2012). Malay: Task was adapted from Abu Bakar (2017)	24
Chilla et al. (2021)	L1 Arabic, Portuguese, or Turkish. L2 French or German	N = 198 (10 TD German monolingual children, 48 TD bilingual children, L2 German, 8 bilingual children, L2 German, with SLI, 37 TD French monolingual children, 69 TD bilingual children, L2 French, 26 bilingual children, L2 French, with SLI)	M = 75-84.3 months	86 girls across subsamples	34 bilingual children with SLI across subsamples	Standardized—LITMUS SR German (Hamann & Abed Ibrahim, 2017), LITMUS SR French (Tuller et al., 2018)	N/A
Paradis et al. (2021)	L1 Syrian Arabic, L2 English	N = 102	M = 9.5 years	49 girls	TD	English: Standardized—LITMUS SRT (Marinis & Armon-Lotem, 2015). Syrian Arabic: Task made by the authors “using the English SRT as a model”	32
Pretorius et al. (2022)	English and Afrikaans	N = 63 (English-speaking children who attended school in English and Afrikaans speaking children who attended school in Afrikaans)	8-9 years	Male to female ratio: of 1:1.3-1:1.9 across subsamples	TD	English: Standardized—subtest of the Test of Auditory Processing-Third Edition (TAPS-3, Martin & Brownell, 2005). Afrikaans: Task made by the authors, based on the TAPS-3	N/A
Pratt et al. (2021)	English and Spanish, not differentiated	N = 136 (TD children and children with DLD)	M = 100.19-103.45 months	69 girls across subsamples	36 children with DLD	Standardized—SRep task from the Bilingual English Spanish Assessment (BESA, Peña et al., 2018)	21 English; 21 Spanish
Öwerdieck et al. (2021)	L1 Arabic, L2 German	N = 20 (TD monolingual children and TD Arabic-German bilingual children)	M = 9;4-10,5 years	N/A	TD	Standardized—German LITMUS-SR task (Hamann et al., 2013)	9
Kaltsa et al. (2020)	L1 Albanian, L2 Greek	N = 60 (monolingual children, simultaneous bilingual children and sequential bilingual children)	M = 8.6-8.9 years	34 girls across subsamples	TD	Standardized—developed within the SR COST Action (Marinis & Armon-Lotem, 2015)	32
Jordaan & Ngwanduli (2020)	L1 Sepedi, L2 English	N = 102 (children attending an English-speaking school, children attending a Sepedi-speaking school with English, and children attending an English-speaking with Sepedi)	M = 8;7-8;9 years	59 girls across subsamples	TD	English: Standardized—the recalling sentences subtest of the CELF-4 (Semel et al., 2003). Sepedi: Task adapted by the authors from the CELF-4	32
Quirk (2020)	L1 French, L2 English	N = 30	M = 7;1 years	19 girls	TD	Standardized—short version of the English LITMUS SR task (Marinis & Armon-Lotem, 2015).	30
Andreou, Tsimpli, et al. (2020)	Greek and Albanian counted as 2L1s	N = 56 (monolingual children with ASD and bilingual children with ASD)	10.4 years	18 girls across subsamples	TD in terms of language development	Standardized—“developed within the COST Action following the guidelines outlined in Marinis & Armon-Lotem (2015, p. 8)”	32
Bogliotti et al. (2020)	French and French Sign Language	N = 62	M = 8;8 years	38 girls	TD	Task made by the authors	20
Peeters-Podgaevskaja et al. (2020)	L1 Russian or Polish, L2 Dutch	N = 157 (monolingual Russian children, monolingual Polish children, bilingual Russian-Dutch children, bilingual Polish-Dutch children)	M = 4;9-5;4 years	N/A	TD	Russian: Standardized, based on the LITMUS SR task (Marinis & Armon-Lotem, 2015). Polish: Task made by the authors, translated from the Russian task.	48
Bedore et al. (2020)	L1 Spanish, L2 English	N = 21 (intervention group and control group)	M = 6;11 years	6 girls	At risk for language/literacy impairment	Standardized—SRep task from the Bilingual English Spanish Oral Screener (BESOS, Peña et al., 2008)	N/A
Scheidnes (2020)	L1 English, L2 French	N = 33	M = 6;10 years	N/A	TD	Standardized—LITMUS SR French (Prevost et al., 2012)	30
Zebib et al. (2020)	Varying L1s, L2 French	N = 76 (TD children and children with DLD)	M = 84.6-86.6 months	N/A	23 children with DLD	Standardized—LITMUS SR French (Fleckstein et al., 2016)	30
Meir & Novogrodsky (2020)	L1 Russian, L2 Hebrew	N = 86 (14 monolingual children with high-functioning autism, bilingual children with high-functioning autism, 28 TD monolingual children, and 30 TD bilingual children)	M = 80-83 months	37 girls across subsamples	12 monolingual children with high-functioning autism achieved results implying DLD. 4 bilingual children with high-functioning autism achieved results implying DLD	Standardized—shortened versions of the Hebrew and Russian LITMUS SR tasks (Marinis & Armon-Lotem, 2015)	30 Hebrew; 30 Russian
Antonijevic-Elliott et al. (2020)	English, with multilingual children additionally speaking one of 16 various languages	N = 88 (41 monolingual children and 47 multilingual children)	M = 79-82 months	N/A	N/A	Task “designed within the COST Action IS0804” (p. 8). English: task adapted from the School-Age Sentence Imitation Test-E32 (Marinis et al., 2011). Polish: task adapted from Banasik et al. (2012). Russian: task adapted from Meir et al. (2016)	24
Andreou, Dosi, et al. (2020)	L1 Albanian, L2 Greek	N = 70 (31 Albanian-Greek heritage bilingual children without L1 literacy support, 10 Albanian-Greek heritage bilingual children with L1 literacy support, and 29 Albanian-Greek bilingual children with both L1 and L2 literacy support)	M = 10.5-10.6 years	N/A	TD	Standardized—“designed (Chondrogianni et al., 2013”; by COST Action IS0804)” (p. 183)	32 Greek; 60 Albanian
Talli & Stavrakaki (2020)	L1 Russian or Albanian, L2 Greek	N = 70 (16 monolingual children with DLD, 16 bilingual Albanian/Russian-Greek children with DLD, 20 monolingual children without DLD, and 18 Albanian/Russian-Greek bilingual children without DLD)	M = 8;4-8;11 years	20 girls across subsamples	32 children with DLD across subsamples	Standardized—SRep task from the Diagnostic Test of Verbal Intelligence (DVIQ; Stavrakaki & Tsimpli, 2000)	15
Simon-Cereijido & Méndez (2020)	L1 Spanish, L2 English	N = 135 (TD children and children with SLI)	M = 4;3-4;5 years	65 girls across subsamples	74 children with SLI	Task made by the authors	21 English; 21 Spanish
Hamann et al. (2020)	L1 Arabic, L2 German	N = 27 (12 heritage simultaneous/early sequential and 15 refugee sequential bilingual children)	Range = 6;0-12;9 years	7 sequential bilingual girls	TD	Standardized—German and Syrian Arabic LITMUS-SRT (Hamann et al., 2013)	45 German; 36 Arabic
De Cat (2020)	Varying L1s, L2 English	N = 174 (87 monolingual children, 87 bilingual children)	M = 5;10-6;0 years	96 girls across subsamples	TD	Standardized—short version of the LITMUS SR task (Marinis & Armon-Lotem, 2015; Marinis et al., 2010)	30
Abed Ibrahim et al. (2020)	L1 Arabic, L2 German	N = 22 (11 L1 Arabic children living in Germany and 11 L1 Arabic Syrian refugee children)	M = 88.4-114 months	12 girls across subsamples	TD	Standardized—the German LITMUS SRep (Hamann et al., 2013), the Lebanese Arabic LITMUS SRep (Henry et al., n.d.)	45
Hopp (2019)	L1 Turkish and L2 German for the bilingual English learner subsample	N = 62 (31 Turkish-German bilingual children learning English and 31 monolingual German children learning English)	M = 10.37-10.54 years	25 girls across subsamples	TD	Task made by the authors	60
Thordardottir & Rioux (2019)	Varying L1, L2 French	N = 15 (3 monolingual children and 12 bilingual children)	M = 57.7 months	3 girls	Children with DLD	Standardized—SRep task adapted from the CELF Preschool (Royle & Thordardottir, 2003)	N/A
Aguilar-Mediavilla et al. (2019)	L1 Catalan, L2 Spanish	N = 28 (Children with DLD and typically developing children)	M = 5.37-5.84 years	12 girls across subsamples	14 children with DLD	Standardized—the SRep task from the Spanish adaptation of the Developmental Neuropsychological Assessment (NEPSY; Korkman et al., 1998)	17
Fitton et al. (2019)	L1 Spanish, L2 English	N = 291	M = 66 months	142 girls	TD	Standardized—the SRep task from the BESA (Peña et al., 2018)	10 Spanish; 9 English
Wood & Hoge (2019)	L1 Spanish, L2 English	N = 25	Kindergarten and first grade according to the US mandatory education system	16 girls	TD	Standardized—SRep task from the Bilingual English and Spanish Assessment (BESA, Peña et al., 2014)	N/A
Méndez & Simon-Cereijido (2019)	L1 Spanish, L2 English	N = 74	M = 4;3 years	34 girls	Diagnosed with SLI	Task from Simon-Cereijido & Mendez (2018; see also Simon-Cereijido, 2009)	21
Janssen & Meir (2019)	L1 Russian, L2 Dutch or Hebrew	N = 72 (Russian-Dutch, Russian-Hebrew, Russian monolingual age-matched, and Russian younger)	M = 50.11-67,06 months	N/A	TD	Task made by the authors: Ru-SRT-Young (Janssen, 2016)—version made by authors, based on Russian LITMUS Repetition Task (Meir & Armon-Lotem, 2015)	14
Abed Ibrahim & Fekete (2019)	L1 German, L2 Arabic, European Portuguese or Turkish	N = 77 (10 TD monolingual, 11 monolingual SLI, 10 TD bilingual German-Arabic, 18 TD bilingual German-European Portuguese, 16 TD bilingual German-Turkish, and 12 bilingual SLI)	5;6–9;0 years	N/A	23 children with SLI	Standardized—German LITMUS-SRT (Hamann et al., 2013)	45
Chondrogianni & Kwon (2019)	L1 Welsh, L2 English	N = 52 (25 Welsh-English 7-to 9-year-old L2-TD children, 17 4-to 6-year-old L2-TD children, and 10 4- to 6-year-old SLI children)	M = 61.3-93.72 months	N/A	10 children with SLI	Task made by the authors	42
Gagarina et al. (2019)	L1 Russian or Turkish, L2 German	N = 25 (TD children and children at risk for DLD)	M = 53.2-53.4 months	N/A	9 children at risk for DLD	Standardized—German LITMUS-SR (Hamann et al., 2013)	30
Limacher (2018)	Varying L1s, L2 English	N = 34 (17 monolingual children and 17 bilingual children)	M = 55.71-59.06 months	N/A	TD	Task adapted from Hack et al. (2012)	3
Chondrogianni & John (2018)	L1 Welsh, L2 English	N = 28 (TD and at risk of SLI)	M = 63-66.5 months	11 girls across subsamples	10 children at risk of SLI	English: Standardized—the Recalling Sentences subtest from the Clinical Evaluation for Language Fundamentals-Preschool 2 (CELF-Preschool 2, Semel et al., 2004). Welsh: Task made by the authors	N/A
Tuller et al. (2018)	Varying L1s, L2 French or German	N = 222 (48 and 47 bilingual French children receiving/not receiving speech-language therapy, 16 and 40 bilingual German children receiving/not receiving speech-language therapy, 37 TD monolingual French children, 17 monolingual French children with SLI, 10 TD monolingual German children, 12 monolingual German children with SLI)	M = 75.9-92.2 months	102 girls across subsamples	63 children with SLI	Standardized—the French and German LITMUS-SR tasks (Fleckstein et al., 2016; Hamann et al., 1998; Jakubowicz & Tuller, 2008; Marinis & Armon-Lotem, 2015)	N/A
Altman et al. (2018)	L1 Russian, L2 Hebrew	N = 68 (21 L1 dominant bilingual children, 15 L2 dominant bilingual children, and 32 monolingual L2 children)	M = 67-69.66 months	N/A	TD	Standardized—SRep task from the Goralnik Screening Test for Hebrew (Goralnik, 1995)	N/A
Fleckstein et al. (2016)	L1 Arabic or English, L2 French	N = 50 (35 TD children, 12 children with SLI)	M = 6;04-6;11 years	N/A	12 children with SLI across subgroups	Standardized—French LITMUS-SR-FR (Armon-Lotem & Meir, 2016)	30
Taliancich-Klinger et al. (2018)	L1 Spanish, L2 English	N = 148	M = 101.69 months	51% girls	TD	Task made by the authors—The BESA-ME, based on the Bilingual English Spanish Assessment (BESA, Peña et al., 2014)	N/A
Meir (2018)	L1 Russian, L2 Hebrew	N = 119 (38 balanced bilingual children, 39 Hebrew-dominant bilingual children, 19 Russian-dominant bilingual children, and 23 bilingual children with SLI)	M = 71-72 months	N/A	23 children with SLI	Standardized, Russian LITMUS-SR (Meir & Armon-Lotem, 2015) and Hebrew LITMUS-SR (Meir et al., 2016)	56 Russian; 56 Hebrew
Simon-Cereijido & Méndez (2018)	L1 Spanish, L2 English	N = 61	M = 4;5 years	31 girls	TD	Task made by the authors in parallel Spanish and English versions, based on Simon-Cereijido (2009)	N/A
Abed Ibrahim et al. (2018)	Varying L1s, L2 German	N = 78 (10 TD monolingual children, 12 monolingual children with SLI, 45 TD bilingual children, and 11 bilingual children with SLI)	M = 75.90-85.6 months	N/A	23 children with SLI	Standardized—the German LITMUS SRT (Hamann et al., 2013)	45
Paradis & Jia (2017)	L1 Cantonese or Mandarin Chinese, L2 English	N = 21	M = 101 months at T1	N/A	TD	Standardized—the Recalling Sentences subtest from the CELF-4 (Semel et al., 2003)	N/A
Antonijevic et al. (2017)	L1 English, L2 Irish	N = 28	M = 85.19 months	16 girls	TD	Task made by the authors based on the LITMUS SRep model (Marinis & Armon-Lotem, 2015), based on the School-Age Sentence Imitation Test-E32 (Marinis et al., 2011). The SASIT-E32 was also used as the English SRep task in the study.	24 English; 24 Irish
Bosman & Janssen (2017)	L1 Turkish, L2 Dutch	N = 86 (38 Turkish-Dutch bilingual children and 48 monolingual Dutch children)	M = 7;2-7;4 years	37 girls across subsamples	TD	Standardized—the SRep task from the Diagnostic Test of Bilingualism by the National Institute of Educational Testing (CITO; Verhoeven et al., 1995)	20
Simón-Cereijido (2017)	L1 Spanish, L2 English	N = 80 (40 Latino preschoolers with primary language impairment and 40 TD preschoolers)	M = 50.3-51.3 months	N/A	40 children with primary language impairment	Task made by the authors	21
Haman et al. (2017)	L1 Polish L2 English	N = 233 (88 bilingual and 145 monolingual)	M = 5.69 years	N/A	TD	Standardized—Polish adaptation of Sentence Repetition task, LITMUS-SRep (Banasik et al., 2012)	68
Whiteside & Norbury (2017)	L1 different (mostly Polish, Bengali, Urdu) L2 English (L1 in monolinguals group)	N = 86 (43 bilinguals and 46 monolinguals)	63-104 months	40 girls across subsamples	37 children with low language proficiency	Standardized—School Age Sentence Imitation Task—English 32 (SASIT-E32; Marinis et al., 2011)	32
Fortier & Simard (2017)	L1 French L2 various (mostly Portuguese, English, Arabic)	N = 83	M = 10.5 years	40 girls	N/A	Task made by the authors, “repetition of ungrammatical sentences”	40
Meir (2017)	Russian and Hebrew (both L1 and L2 in different subgroups)	N = 190 (90 TD Russian-Hebrew children, 18 bilingual Russian-Hebrew children with SLI, 35 TD Hebrew monolingual children, 13 Hebrew monolingual children with SLI, 20 TD monolingual Russian children, and 14 monolingual Russian children with SLI)	M = 69.64-72.75 months	N/A	45 children with SLI	Standardized—The LITMUS-SRep tasks in Russian (Meir & Armon-Lotem, 2015) and in Hebrew (Meir et al., 2016)	56 Russian; 56 Hebrew
Meir & Armon-Lotem (2017b)	Russian and Hebrew	N = 120 (44 low SES bilingual children, 44 mid-high SES bilingual children, 16 low SES monolingual children, and 16 mid-high SES monolingual children)	M = 72-73 months	N/A	TD	Standardized—The Hebrew LITMUS-SRep task (Meir et al., 2016)	56
Courtney et al. (2017)	L1 English, L2 French	N = 165 at T5	9-10 years at the beginning of the study	N/A	TD	Task made by the authors	18
Graham et al. (2017)	L1 English, L2 French	N = 164 at T3 (children in schools emphasizing oracy and children in schools emphasizing literacy)	9-10 years	N/A	TD	Task made by the authors, based on Marinis & Armon-Lotem (2015).	18
Meir et al. (2017)	L1 Russian, L2 Hebrew	N = 150 (47 bilingual children with L2 age of acquisition before 24 months, 20 bilingual children with age of acquisition between 24 and 48 months, 20 bilingual children with age of acquisition after 48 months, 20 monolingual Hebrew children, and 20 monolingual Russian children)	M = 71.55-72.75 months	N/A	TD	Standardized—LITMUS tasks (Marinis & Armon-Lotem, 2015)	56 Russian; 56 Hebrew
Hamann & Abed Ibrahim (2017)	L1 Arabic, Portuguese, or Turkish, L2 German	N = 54 (10 TD monolingual German children, 12 monolingual German children with SLI, 46 TD bilingual children with L1 Arabic, Portuguese, or Turkish, and 8 bilingual children with SLI).	M = 75.90-88.68 months	41 girls across subsamples	20 children with SLI, 38 bilingual children were attending speech-language therapy	Standardized—the German LITMUS SRep (Marinis & Armon-Lotem, 2015)	45
De Almeida et al. (2017)	L1 Arabic, Portuguese, Turkish, L2 French	N = 136 (61 TD bilingual children, 21 bilingual children with SLI, 37 TD monolingual children, and 17 monolingual children with SLI)	M = 6;10-7;10 years	N/A	37 of the bilingual children and 17 of the monolingual children were receiving speech-language therapy	Standardized—the French LITMUS SRep task	30
Gavarró (2017)	Monolingual Catalan children with various levels of L2 exposure (uncontrolled)	N = 35 (14 6 year-olds, 16 7 year-olds; 5 children with SLI)	M = 6;11-10;7 years	20 girls	TD	Task made by the author, modeled on Marinis et al. (2011) and the LITMUS SR tasks.	60
Abed Ibrahim & Hamann (2017)	L1 Arabic or Turkish, L2 German	N = 54 (11 TD monolingual children, 12 monolingual children with SLI, 22 TD bilingual children, and 9 bilingual children with SLI)	M = 6;4-7;1 years	N/A	21 children with SLI	Task made by the authors, “designed according to the LITMUS principles (Marinis & Armon-Lotem, 2015, p. 4)	45
Meir & Armon-Lotem (2017a)	L1 Russian, L2 Hebrew	N = 81 (21 bilingual children with SLI, 39 Hebrew-dominant bilingual children, and 19 Russian-dominant bilingual children)	M = 71-72 months	N/A	23 children with SLI	Standardized—Russian LITMUS SRep (Meir & Armon-Lotem, 2015); Hebrew LITMUS SRep (Meir et al., 2016)	56 Russian; 56 Hebrew
Hamann et al. (2017), Study 1	L1: German; L2: Russian	N = 27 (7 older TD monolinguals and 20 older TD bilinguals)	8;0-10;5 years	N/A	TD	Study 1: Task made by the authors—done in the framework of COST Action IS0804	45
Study 2	L1: German; L2: Russian	N = 31 (8 younger TD monolinguals, 15 younger TD monolinguals, and 8 monolingual children with SLI)	5;6-7;8 years	N/A	8 children with SLI	Study 2: Same as Study 1	45
Study 3	L1 and L2: German, various L1s (Arabic, Portuguese, or Turkish)	N = 38 (15 TD bilingual children, 7 bilingual children with SLI, 8 TD monolingual children, and 8 monolingual children with SLI)	5;6-8;11 years	N/A	15 children with SLI	Standardized—A “short version” of the Litmus SRep task was used.	45
Armon-Lotem & Meir (2016)	Hebrew and Russian	N = 230 (117 TD bilingual children, 27 bilingual children with SLI, 38 TD Hebrew monolingual children, 14 Hebrew monolingual children with SLI, 20 TD Russian monolingual children, and 14 Russian monolingual children with SLI)	M = 70-73 months	N/A	55 children with SLI	Standardized—Russian LITMUS-SRep (Meir et al., 2016), Hebrew LITMUS-SRep (Meir et al., 2016)	56 Russian; 56 Hebrew
Buil-Legaz et al. (2016)	Spanish and Catalan	N = 37 (18 TD children and 19 children with SLI)	5.7-5.8 years during first assessment	17 girls across subsamples	19 children with SLI	Standardized—SRep from the Spanish adaptation of the Developmental Neuropsychological Assessment: NEPSY (Korkman et al., 1998)	17
Meir et al. (2016)	L1 Russian, L2 Hebrew	N = 85 (30 TD bilingual children, 15 bilingual children with SLI, 20 monolingual Russian children, and 20 monolingual Hebrew children)	M = 72.2-73.7 months	36 girls across subsamples	15 bilingual children with SLI	Task made by the authors, based on the LITMUS SRep task (Marinis & Armon-Lotem, 2015)	56 Russian; 56 Hebrew
D. Liu et al. (2016)	L1: Cantonese; L2: English	N = 199	4-5 years	N/A	TD	Task made by the authors—based on P. D. Liu et al. (2010)	30
Gangware (2016)	L1 Cantonese, L2 English	N = 50	M = 52.5 months	27 girls	TD	Task adapted from Devescovi & Caselli (2007)	26
Lein et al. (2016)	Varying L1s, L2 German	N = 30 (5 TD monolingual children, 5 monolingual children with SLI, 15 TD bilingual children, 5 bilingual children with SLI)	M = 71.8-95 months	N/A	10 children with SLI	Standardized—the LITMUS German SRep task	45
Glennen (2015)	Russian, Hungarian, or Romanian L1, English L2	N = 44	M = 5;5 years	24 girls	TD	Standardized—the Recalling Sentences subtest from the CELF-P2 (at age 5; Semel et al., 2004) and the CELF-4 (at ages 6-7; Semel et al., 2003)	N/A
Marinis & Armon-Lotem (2015)	L1 Polish or Russian, L2 English or Hebrew	N = 112 (60 TD monolingual children, 20 TD Polish-English bilingual children, 20 English monolingual children with SLI, and 12 TD Russian-Hebrew bilingual children)	6-8 years old	N/A	30 children with SLI	Standardized—the LITMUS English, Russian, and Hebrew SRep tasks	56 Russian; 56 Hebrew; 56 French
Babayigit (2015)	Varying L1s, L2 English	N = 183 (102 monolingual children, 82 bilingual children)	M = 115.38-115.46 months	93 girls across subsamples	N/A	Standardized—the Recalling Sentences subtest from the CELF-4 UK (Semel et al., 2006)	N/A
Tuller et al. (2015)	L1 Arabic, L2 English	N = 54 (19 TD bilingual children, 14 TD monolingual children, 10 bilingual children with SLI, and 11 monolingual children with SLI)	M = 6;0-7;7 years	22 girls across subsamples	21 children with SLI	Standardized—the French LITMUS SR-French (Marinis & Armon-Lotem, 2015; Prevost et al., 2012)	N/A
Ebert (2014)	L1 Spanish, L2 English	N = 47	M = 8;6 years	N/A	Children with language impairment	Standardized—the Recalling Sentences subtest from the CELF-4E (Semel et al., 2003) and the CELF-4S (Wiig et al., 2004)	N/A
(Du, 2014)	L1 Mandarin Chinese, L2 English	N = 45 (younger and older children)	M = 63-81.8 months	24 girls across subsamples	2 boys with SLI	Task made by the authors	3
Woon et al. (2014)	L1 Mandarin, L2 English	N = 5	3.7-6.3 years	3 girls	N/A	Task made by the authors	20
Erdos et al. (2014)	L1 English, L2 French	N = 86	M = 5;6 years	50 girls	TD, with some identified at risk based on the study results	Standardized—the Recalling Sentences subtest from the CELF-4 (Semel et al., 2003)	N/A
Simard et al. (2014)	L1 Portuguese, L2 French	N = 73 (37 monolingual and 36 bilingual children)	M = 10.3-10.8 years	35 girls	TD	Task made by the authors—a French SRep task consisting of ungrammatical sentences	40
Altman et al. (2014)	L1 Russian, L2 Hebrew	N = 65 (16 children with strict pro-monolingual family language policy, 33 children with mild pro-monolingual family language policy, and 16 children with pro-bilingual family language policy)	M = 6;0 years	N/A	TD	Task made by the authors	20
Aguilar-Mediavilla et al. (2014)	Spanish and Catalan, mixed dominance within the sample	N = 32 (children with and without SLI	M = 6.2 years	12 girls across subsamples	17 children with SLI	Standardized—the sentence repetition task from the Spanish adaptation of the NEPSY (Aguilar-Alonso & Moreno-Gonzalez, 2012)	N/A
Armon-Lotem (2014)	L1 Russian or English, L2 Hebrew	N = 43 (25 TD Russian-Hebrew children, 7 TD English-Hebrew children, and 7 monolingual Hebrew children with SLI)	M = 70-74 months	24 girls across subsamples	7 children with SLI	Task made by the authors	24
Thordardottir & Brandeker (2013), Study 1	L1 English, L2 French	N = 84 (16 monolingual English children, 19 monolingual French children, and 49 English-French bilingual children)	M = 56.4-60.5 months	N/A	TD	Standardized—adapted Recalling Sentences in Context subtest of the CELF-Preschool (Royle & Thordardottir, 2003) for French and the Recalling Sentences in Context subtest from the CELF-2 Preschool 2 (Wiig et al., 2004) for English	N/A
Study 2	L1 English, L2 French	N = 56 (bilingual children with primary language impairment, TD bilingual children, monolingual children with primary language impairment, TD monolingual children)	M = 57.4-61.5 months	N/A	14 bilingual children with primary language impairment, 14 monolingual children with primary language impairment
Komeili & Marshall (2013)	L1 Farsi, L2 English	N = 36 (18 monolingual English children and 18 bilingual children)	M = 8;2-8;8 years	20 girls across subsamples	TD	Standardized—the SASIT-E32 (Marinis et al., 2011)	32
Ziethe et al. (2013)	Varying L1s, L2 German	N = 73 (19 monolingual German children with SLI, 25 TD monolingual German children, 15 bilingual children with suspected SLI, 14 TD monolingual German children)	5-8 years	N/A	34 children with diagnosed or suspected SLI	Standardized—the SRep task from the Language Development Test for Children (Grimm, 2001)	15
Gillam et al. (2013)	L1 Spanish, L2 English	N = 167	M = 63.2 months	83 girls	21 children with SLI	Standardized—the Sentence Imitation subtest from the Test of Language Development—Primary; 3rd Edition (TOLD-P:3; Newcomer & Hammill, 1997)	N/A
Thordardottir & Juliusdottir (2013)	Varying L1s, L2 Icelandic	N = 39 at T1	M = 140.9 months at T1	N/A	TD	Standardized—the sentence imitation subtest of the Icelandic Test of Language Development-Primary and Intermediate (Simonardottir & Guðmundsson, 1996)	N/A
Chiat et al. (2013), Study 1	L1 Russian or English, L2 Hebrew	N = 110 (75 Russian-Hebrew bilingual children and 35 English-Hebrew bilingual children)	M = 5;9-5;10 years	60 girls across subsamples	TD	Study 1: Standardized—the SRep subtest from the Goralnik Screening Test for Hebrew (Goralnik, 1995)	5
Study 2	L1 Russian or English, L2 German	N = 61	M = 5;6 years	30 girls	TD	Study 2: Standardized—the SRep task from the Sprachstandscreening fur das Vorschulalter (Grimm, 2003)	15
Study 3	L1 Turkish, L2 English	N = 32 (15 monolingual children and 17 bilingual children)	M = 7;10 years	18 girls across subsamples	TD	Standardized—the SRep task from the Clinical Evaluation of Language Fundamentals III (CELF-3; Semel et al., 2000)	28
Petersen & Gillam (2013)	L1 Spanish, L2 English	N = 63 at the start	M = 65.3 months	29 girls	At risk for language impairment, as per the BESOS screener (Peña et al., 2008)	Task made by the authors	N/A
Pérez-Leroux et al. (2011)	L1 Spanish, L2 English	N = 20 (simultaneous and sequential bilingual children)	M = 67.9-75.6 months	N/A	TD	Task made by the authors	N/A
Hough & Kaczmarek (2011)	Nonspecified L1, English L2	N = 44	M = 98.3 months	19 girls	TD	Standardized—the Sentence Imitation subtest from the Test of Language Development Third Edition, Primary and Intermediate (TOLD-P; Newcomer & Hamill, 1997)	N/A
Korytkowski Longo (2011)	L1 Spanish, L2 English	N = 23	M = 6;8 years	N/A	Children suspected of language/reading disability based on the BESOS screener and the Woodcock Language Proficiency Battery-Revised	Task made by the authors, based on target structures identified in the BESA, the Austin Independent School District first grade list, the Harris-Jacobson Core List, and the Corpus of Contemporary American English.	5-7
Hirata-Edds (2011)	L1 English, L2 Cherokee	N = 23 (10 L2 immersion program participants and 13 control group children)	M = 5;0-5;2 years	18 girls across subsamples	TD	Task made by the author	29
Simon-Cereijido (2009), Study 1	Varying L1, mostly Spanish, L2 English	N = 60 (20 children with SLI, 20 age-matched controls, and 20 vocabulary-matched controls)	M = 3;6-4;3 years	27 girls across subsamples	20 children with SLI	Study 1: task made by the authors	21
Study 2	Varying L1, mostly Spanish, L2 English	N = 40 (20 children with SLI and 20 age-matched controls)	M = 4;4-4;5 years	21 girls across subsamples	20 children with SLI	Study 2: identical as in Study 1
Westman et al. (2008)	Mixed sample, Swedish and Finnish as spoken languages	N = 81 (33 children at risk of language impairment and 48 TD children)	M = 6;4-6;5 years	42 girls across subsamples	33 children at risk of language impairment	Standardized—the Sentence Repetition subtest from the NEPSY (Korkman et al., 1998, 2000)	N/A
Trofimovich & Baker (2007)	Korean and English	N = 40 (10 native Korean-English bilingual children with short English experience, 10 age-matched monolingual English children, 10 Native Korean-English bilingual adults with long English experience, and monolingual English adults)	M = 10.5-10.7 years	N/A	TD	Task made by the authors	6
Gutiérrez-Clellen et al. (2006)	L1 Spanish (varying dialects), L2 English	N = 160 (39 monolingual and 141 bilingual children)	4;0-7;0 years	N/A	80 children with language impairment	Task made by the authors—the S-MST. The S-MST includes cloze items and SRep items together	51
Manis et al. (2004)	L1 Spanish, L2 English	N = 303	M = 67.8 months	52.5% girls	TD	Standardized—the Memory for Sentences subtest of the Woodcock Language Proficiency Battery (WLPB; Woodcock & Munoz-Sandoval, 1995)	N/A
Verhoeven (1994)	L1 Turkish, L2 Dutch	N = 98 (74 children enrolled in an L2 submersion program, 25 children enrolled in an L1/L2 transition program)	M = 6.7 years	48 girls	TD	Task made by the authors	24 Turkish, 24 Dutch
Verhoeven & Boeschoten (1986)	L1 Turkish, L2 Dutch	N = 32 (16 monolingual children living in Turkey and 16 bilingual children living in the Netherlands)	M = 5;4-7;4 years	15 girls across subsamples	TD	Task made by the authors	N/A
Snow & Hoefnagel-Höhle (1978)	L1 English, L2 Dutch	N = 94 (10 children aged 3-5, 8 children aged 6-7, 13 children aged 8-10, 9 children aged 12-15, and 11 adults with beginner English; 6 children aged 6-7, 5 children aged 8-10, 8 children aged 12-15, and 10 adults with advanced English)	N/A	N/A	TD	Task made by the authors	37
Teitelbaum (1977)	English and Spanish, undifferentiated	N = 99	Children from kindergarten through grade four, i.e., up to 10 years old	N/A	TD	Task adapted from Natalicio & Williams (1971)	15
Hamayan et al. (1977)	L1 Arabic, L2 English	N = 60 (20 third-grade children, 20 six-grade children, and university students)	M = 8;8-11;9 years	N/A	TD	Task adapted from Smith (1973)	28
Clay (1971)	L1 English, L2 Samoan	N = 320 (“Professional group. Children with an optimum model of English language,” children with parents speaking regular English, monolingual English children of Maori parents, bilingual children of Samoan parents)	5;0-7;3 years	N/A	N/A	Task made by the author	36

Note. For ease of presentation and brevity, the age means and/or ranges for individual subsamples in each study have been collated into ranges where applicable. Similarly, the number/proportion of girls in each subsample has been summed into a single value where applicable. Full and detailed sample characteristics for each study are available in the Supplementary Materials. Studies included in the systematic review are marked with an asterisk in the end reference list.

Table 2.

Frequencies of reporting of SRep task characteristics in the studies included in the systematic review.

SRep task feature	Number of studies
Task language (only n > 4)
English	62
Spanish	27
German	20
Hebrew	16
French	16
Russian	13
Greek	8
Turkish	6
Task standardization
Standardized	110
Described as adapted	14
Nonstandardized/made by the authors	50
Task length
Specified	104
Not specified	34
Practice items
Mentioned	45
Not mentioned/not reported	93
Task computerization
Computerized	52
In person	22
Not specified	64
Sentence format
Prerecorded	60
Spoken out-loud	30
Not specified	48
Visual stimuli
Any present	27
PowerPoint	34
Reported as not present	14
Not specified	97
Feedback provision
Reported as present	18
Not specified	120
Task scoring scheme
Binary scoring based on repetitions	44
Binary scoring based on target structures	31
Multipoint scores	20
Percentage calculations	16
Instance counting	20
Other	19

Publication years

Figure 2 presents the distribution of publication years of the publications identified in our systematic review. The earliest publication identified was Clay (1971), where the SRep task was used as a measure of language acquisition in the school context. Only 11 publications were identified in the 1971–2009 period. The publication by Marinis and Armon-Lotem (2015) on the SRep tasks created within the COST Action IS0804 can be considered an influential landmark which potentially facilitated subsequent studies—it has been cited 210 times as per Google Scholar at the time of writing. However, determining the specific impact of the COST Action IS0804 and Marinis and Armon-Lotem (2015) on the design and reporting of SRep tasks in studies with bilingual children was beyond the scope of the current systematic review. After 2015, the number of publications increased overall, though not consistently year-over-year. However, these numbers represent a relatively narrow area of application of the SRep tasks. For example, the study by Conti-Ramsden et al. (2001) which is also considered a foundational publication on the SRep task (Marinis & Armon-Lotem, 2015; Rujas et al., 2021), concerned only monolingual children, a strand of research which was excluded from our systematic review. SRep studies with bilingual children have more methodological considerations, with the need to frequently have SRep tasks in at least two languages possibly being the most crucial one. We turn to these considerations now.

Figure 2.

Distribution of publication years in the systematic review.

Study aims and tasks used in conjunction with SRep

Study aims

Although the inclusion criteria of our systematic review limited it strictly to reports of studies using SRep tasks to investigate language development in bilingual children with or without SLI/DLD/LD, we also examined the study aims of the included reports. Qualitatively categorizing the studies based on their stated goals, the measures employed alongside SRep tasks, and the types of reported results (e.g., total SRep scores only or SRep scores divided by morphosyntactic categories), the following five categories were derived from the dataset. The same report could be classified into multiple categories if the report’s stated study aims, methodology, and reported results supported it. For example, Ziethe et al. (2013), compared bilingual children with and without SLI on their SRep and digit span performance and tested how these scores predicted their language abilities. Thus, the report fit multiple of the below categories:

(a) General language development (68 reports, 50.37% of the dataset), or studies which examined broad language-related variables such as language acquisition, reading outcomes, or narrative skills as well as various facets of language background. These studies also reported SRep scores in general, without analyzing specific morphosyntactic categories. For example, Graham et al. (2017) examined the impact of an oral versus literacy-based teaching approach on children’s linguistic outcomes, operationalizing these via the SRep and photo description task total, vocabulary, and grammar scores. Hough and Kaczmarek (2011) examined language skills in adopted children, including the SRep task from a standardized language assessment battery in a composite score of syntactic proficiency. Similarly, Wagley et al. (2022) analyzed bilingual reading comprehension utilizing SRep scores within a composite score of morphosyntactic knowledge. This category was the most frequent in our dataset, although it can also be considered to be the broadest conceptually.

(b) Specific language development (48 reports, 35.55% of the dataset), or studies which focused on the development, acquisition, or performance in an explicitly defined element of language, reporting SRep scores related to this element rather than only generally. For example, Friesen et al. (2021) measured “differences in English syntactic knowledge” (p. 2951) and reported SRep scores broken down into short and long sentences, active and passive sentences, and noun, verb, and prepositional phrases. Sopata and Długosz (2022) studied word order specifically, dividing SRep scores into categories of negation, inversion, complex verb structures, and subordinate clauses. Finally, Janssen and Meir (2019) focused specifically on the accusative case in Russian, analyzing SRep scores based on four syntactic categories (SVO, SVOO, SOV, and OVS).

(c) Cognitive factors (32 reports, 23.70% of the dataset), or studies which focused on variables related to cognitive processes beyond assessing them during participant recruiting, sample matching, or screening for language difficulties. These chiefly included short-term memory or working memory, and intelligence. These were also studies which used SRep as a measure of cognitive processes related to language. For example, Talli and Stavrakaki (2020) measured verbal short-term and working memory in children with and without DLD and analyzed their influence on syntactic production measured by an SRep task. Aguilar-Mediavilla et al. (2019) measured visual and auditory attention, phonological awareness, verbal short-term memory, and access to verbal information as part of a longitudinal study of children with DLD where the SRep total score was used as one of the measures of verbal short-term memory. Meir (2017) also used an SRep task to measure verbal short-term memory differences in monolingual and bilingual children with and without SLI.

(d) Language difficulties (58 reports, 42.96% of the dataset), meaning studies which concerned language difficulties such as DLD, SLI, or dyslexia in any capacity, whether in terms of using SRep as a diagnostic tool or to measure language-related variables in children experiencing language difficulties. For comparison, Rujas et al. (2021) reported that 76 studies in their dataset of 203 (37.44%) included children with DLD, SLI, LI, or language delay. Regarding our dataset, for example, Abed Ibrahim and Fekete (2019) examined the relative and combined diagnostic accuracy of the LITMUS SRep (both identical repetition and target structure repetition scores) and nonword repetition tasks. Similarly, Tuller et al. (2018) “focused on whether and how well [. . .] LITMUS-NWR and LITMUS-SR, can identify SLI in 5–8-year-old bilingual children having one of the three home languages (Arabic, Portuguese or Turkish), but growing up in different sociocultural and L2 settings, in France and in Germany” (p. 890), utilizing the identical repetition score. In contrast, Bedore et al. (2020) studied a sample of Spanish-English children with DLD, using the SRep total score as an index of their grammatical proficiency before and after a targeted teaching intervention, Language and Literacy Together.

(e) SRep task development (7 reports, 5.18% of the dataset), or studies which strictly reported on the process of creating or refining an SRep task, rather than using an SRep task (standardized as well as made by the authors for the purpose of the study) as part of a larger study. For example, Grech (2022) reported creating a Maltese-language SRep, reporting evidence for its reliability and validity. In contrast, Fitton et al. (2019) examined the psychometric properties of the SRep within the BESA test battery, namely, its factor structure and validity, concluding that from an applied point of view, “the sentence repetition tasks can be treated as essentially unidimensional” (p. 26). Finally, the landmark chapter by Marinis and Armon-Lotem (2015) introduces the COST Action IS0804 LITMUS SRep tasks together with a discussion of preliminary data concerning its diagnostic accuracy for SLI in monolingual and bilingual children. However, regarding this category, it bears mentioning that the small number of studies it contains may have been a consequence of our search strategy, as we explicitly focused on studies on bilingual children. It is possible that additional data on SRep tasks specifically exists in other literature.

Tasks used in conjunction with SRep tasks

For further insight into the contexts of using SRep tasks in studies on bilingual children, we also examined the tasks which were used alongside them. In 15 studies (10.64%), the SRep task was the only measure used, while the remaining studies reported using one or more other tasks. Three studies in one report reported only the SRep results, but stated that “the children participated in a battery of standardized and non-standardized assessments and experimental tasks,” without specifying them further (Chiat et al., 2013, p. 78). Considering that our systematic review focused specifically on studies on bilingual children with or without SLI/DLD/LI, the studies in our dataset reported using SRep tasks in conjunction with other language-related measures. These included various language background/history questionnaires, language proficiency and vocabulary tests, and more specific measures, for example, of narrative abilities (Whiteside & Norbury, 2017) or codeswitching (Quirk, 2020). In addition, studies also frequently employed cognitive measures—intelligence testing batteries or tasks measuring short-term/working memory. Due to the significant heterogeneity of these measures in terms of their target and language, we did not subject them to further classification.

Notably, aside from SRep tasks, NWR tasks are another popular measure of language difficulties and/or working memory (Conti-Ramsden et al., 2001; Pawłowska, 2014; Pham & Ebert, 2020). In these tasks, children are asked to repeat phonologically well-formed sequences that conform to the phonotactic rules of the target language but lack semantic content. Because these nonwords are not part of the mental lexicon, the task requires the listener to rely on phonological encoding, short-term storage, and articulation, without drawing on existing lexical knowledge (Coady & Evans, 2008; Gathercole et al., 1994). Originally conceptualized within the framework of working memory theory, NWR has been interpreted as a measure of phonological short-term memory, particularly engaging the phonological loop: a component responsible for temporary retention and rehearsal of unfamiliar verbal material (Gathercole & Baddeley, 1990). By excluding familiar lexical items, NWR isolates the capacity to process and retain novel phonological forms, which is critical during early stages of language acquisition (Gathercole et al., 1992; Archibald & Gathercole, 2006). In our dataset, 33 (23.4%) studies were reported to use NWR tasks together with an SRep task in one or more languages. Research has shown that performance on NWR tasks correlates with broader language abilities and can indicate developmental language disorder (Polišenská, 2011). NWR has been developed into language-independent tools, such as the cross-linguistic nonword repetition (CL-NWR; Polišenská et al., 2020).

Languages measured using SRep tasks

Of the 141 studies in the systematic review, 60 (42.55%) reported using SRep tasks for two languages or more within a single study. The remaining used only one SRep task. Regarding the studied languages, the most popular one was English (n = 62), then Spanish (n = 27), likely reflecting the dominance of the United States context (i.e., Spanish-English bilingual children) in terms of publications. Popularly studied languages also included German (n = 20), Hebrew (n = 16), French (n = 16), Russian (n = 13), Greek (n = 8), and Turkish (n = 6). This broadly reflects the findings from the scoping review of SRep tasks in the literature between 2010-2021 by Rujas et al. (2021), especially the dominance of the English language. The remaining studied languages were Afrikaans, Albanian, Arabic, Cantonese, Catalan, Dutch, Farsi, French Sign Language, Icelandic, Irish, Italian, Kam, Malay, Malaysian English, Maltese, Mandarin, Norwegian, Polish, Quebecois French, Sepedi, Somali, Swedish, Syrian-Arabic, Tamil, Urdu, Vietnamese, and Welsh (n < 4).

Samples

Next, we classified the studies in our dataset based on whether the sample consisted of typically developing bilingual children, bilingual children with any type of language disorder or disability, or both within a mixed sample. Importantly, the methods or procedures of language development assessment themselves were outside of the scope of our systematic review.

Seventy-three (51.77%) studies reported having an entirely typically developing sample, the most popular sample type in our dataset. On the other hand, seven (4.96%) studies focused specifically on children with any type of language disorder, while 47 (33.33%) studies reported having a mixed sample. We counted any sign, presence, or frequency of any language difficulties as the language disorder/disability or mixed category. This included, for example, risk of dyslexia in Taha et al. (2022), children receiving speech pathologist services in Castilla-Earls et al. (2023), or children identified as at risk for language learning difficulties on the basis of the study results themselves in Erdos et al. (2014). Importantly, 14 (9.92%) studies did not clearly specify whether the sample was typically developing or not. We counted them as a separate category due to our focus on reporting standards, although it may likely be inferred that due to the absence of any information otherwise, these samples were typically developing.

Two of the studies in our systematic review (Andreou, Tsimpli, et al., 2020; Meir & Novogrodsky, 2020) concerned samples of bilingual children with autism spectrum disorder (ASD), with Meir and Novogrodsky (2020) including subsamples of neurotypical and ASD children both with and without DLD. Nevertheless, since our systematic review focused on the use of SRep tasks for purposes of language assessment in bilingual children, both typically and atypically developing, we did not exclude those two studies as they met these criteria.

SRep task features and reporting

SRep task standardization

The main aim of our systematic review was to describe the varieties in the SRep tasks as reported in the empirical literature on bilingual children’s typical and atypical language development. To this end, we first examined the proportion of SRep tasks reported in the publications as standardized, that is, being part of a published battery of language tests (e.g., the Bilingual English-Spanish Assessment, BESA, Peña et al., 2014, or the Clinical Evaluation of Language Fundamentals Preschool-2, CELF Preschool-2, Wiig et al., 2004) or developed within the COST Action IS0804. In case the same task was used in multiple studies within a single report, we only counted it once.

We identified 110 (78.01%) reported mentions of a standardized SRep task and 50 (35.46%) mentions of the creation of a new SRep task by the authors specifically for the purpose of the study (if the same task was used in multiple studies within a single report, we only counted it once). For example, Torregrossa, Eisenbeiß, & Bongartz (2023) reported creating their own version of an Italian SRep task “based [in terms of the examined syntactic structures] on existing SRTs” (p. 691). Monsrud et al. (2022) back-translated the standardized Språk 6–16 SRep task in Norwegian (Ottem & Frost, 2005), and Simon-Cereijido and Méndez (2020) reported using “researcher-developed sentence repetition tasks” in English and in Spanish. We also identified 14 (9.92%) mentions of the use of a task “adapted” from another task or a standardized version. For example, Antonijevic et al. (2017) created an Irish SRep task “following the principles used for other LITMUS-SRep tasks outlined by Marinis and Armon-Lotem (2015) and commenced from an adapted ‘School-Age Sentence Imitation Test-E32’” (p. 364). Similarly, Yang et al. (2023) used “a sentence repetition task (SRep) adapted into Kam (Marinis & Armon-Lotem, 2015)” (p. 5). Gangware (2016) reported using a Cantonese and English SRep task which was “adapted from the sentence repetition task by Devescovi and Caselli (2007)” (p. 15). It is not always clear what this adaptation process entailed, though this classification serves to highlight the diversity with which the authors utilize the SRep methodology more broadly, and the COST Action IS0804 guidelines more narrowly. Thus, the solutions ranged from using published tasks, using SRep subtests from larger language batteries, translating existing tasks into different languages, or designing SRep tasks from the ground-up, with and without using elements of existing SRep tasks, while taking into account the studied language and/or syntactic structures. For comparison, in their scoping review, Rujas et al. (2021) reported that 41% of the studies in their dataset used versions of SRep tasks that have been published previously, 33% used SRep tasks included in assessment batteries, and 25% used SRep tasks created for the purpose of the specific study. Our results also show that standardization of the SRep was the most frequent choice, although it took various forms between the studies. This heterogeneity of the SRep tasks results in the heterogeneity of their particular characteristics, which are additionally not always comprehensively reported. We turn to this point in greater detail now.

SRep task length and presence of practice items

Task length

The number of items included in the SRep tasks was specified in 104 studies (73.75%; if the same task was used in multiple studies within a single report, we only counted it once). Korytkowski Longo (2011) was an exception, since in her study on the effects of a language teaching intervention on bilingual children with language/reading impairment, the children were tested with “5 to 7 sentence probes during each week of intervention over an eight-week period” (p. 11). Otherwise, the shortest reported SRep task consisted of three items: Limacher (2018) used three “sentences developed by Hack and colleagues (2012) for their sound inventory” (p. 19), and the children’s recordings were later rated for accentedness and comprehensibility. The longest reported SRep task contained 68 sentences: Haman et al. (2017) used the Polish-language LITMUS-SRep, based on the English SASIT task (Marinis et al., 2010) to compare Polish proficiency between early bilingual migrant children and nonmigrant monolingual children. The mean length of the reported SRep tasks was 32.26 (SD = 15.29), with the mode in our dataset being 30.

In 34 reports (24.11%), the number of items was not specifically reported. However, this does not necessarily mean that the length of the task is unknown. In the majority of these cases, the authors relied on a standardized SRep task which was properly cited, but the details of which were subsequently omitted. For example, Chondrogianni and John (2018) reported using the Sentence Structure subtest from the CELF-2, without specifying any further details. The results of the SRep task length are provided in Figure 3.

Figure 3.

SRep task length across studies.

Practice items

The presence of practice items was specified in 45 studies (31.91%; if the same task was used in multiple studies within a single report, we only counted it once). The remaining 93 (65.95%) studies either did not use practice SRep items, or did so, but did not report it (e.g., due to using a standardized version of the task, following the manual, and omitting the reporting for reasons of brevity or space).

Computerization of SRep tasks

A potentially significant aspect of SRep task usage which might impact the obtained results is whether the SRep task was used in a computerized version or an analog version. Traditionally, SRep tasks involved the examiner reading the sentences out-loud and the child/examinee having to repeat them verbatim (see, e.g., Conti-Ramsden et al., 2001; Snow & Hoefnagel-Höhle, 1978). However, later studies began using a computerized version. The first report of such a task identified in our dataset comes from Trofimovich and Baker (2007), in which the children were presented with prerecorded SRep stimuli “in a quiet location using a personal computer and stimulus presentation software” (p. 256). Both versions of the SRep task are reported in the literature. In our dataset, 64 studies (45.39%; if the same task was used in multiple studies within a single report, we only counted it once)—the largest proportion—did not specify whether a computer was used to administer the SRep task. However, it should be noted that we arbitrarily assumed that the five studies in our dataset published prior to 1994 (ranging from 1971 to 1986) would not have used computers and have thus classified them as analog rather than as having omitted this information from the report. Twenty-two studies (15.60%) reported information specifying that the SRep task was administered in-person. For example, Chiat et al. (2013) used the SRep task from the Goralnik Screening Test for Hebrew (Goralnik, 1995), and stated that “the sentences of the sentence repetition subtest from the Goralnik test were presented to the children by a native speaker of Hebrew” (p. 68). Finally, 52 studies (36.87%) clearly specified using a computerized version of the SRep task. For example, Abed Ibrahim and Hamann (2017) used the German version of the LITMUS-SRT task (Hamann et al., 2013), reporting that it was “administered using a pseudo-randomized computerized version” (p. 8).

For a more detailed analysis, we also considered the individual aspects of task computerization: prerecording of sentences and the use of visual stimuli within the task.

Prerecorded SRep sentences

Forty-eight (34.04%; if the same task was used in multiple studies within a single report, we only counted it once) studies did not clearly report whether the sentences in the SRep task were prerecorded and played back or whether they were read out-loud. These were often studies which only reported using an SRep task from a standardized battery of tests, and which presumably followed the administration instructions appropriately. Thus, for example, Blom et al. (2021) only reported using the Syrian Arabic LITMUS-SRT task. Although information about prerecording the stimulus sentences or not is not clearly given in the report, it is possible for readers to infer that the LITMUS-SRT task is computerized. However, for example, Altman et al. (2014) reported designing their own SRep task specifically for the age group included in their study (Russian-Hebrew bilingual children aged between 4;6 and 6;9), but gave no details about its format or mode of administration. 30 studies (21.27%) were reported to use SRep items which were read out-loud, while 60 studies (42.55%) reported using prerecorded items, making it the most popular choice in our dataset. An additional concern involves the lack of reporting of the gender of the item speaker. We identified 15 studies (10.63%) in which the gender of the prerecorded item speaker was noted (1 male, 2 “male and female,” and 12 female). Interestingly, only one study in our dataset noted the gender of the examiner who read the items out-loud (female, in Jordaan & Ngwanduli, 2020).

Visual stimuli in SRep tasks

Another feature of SRep task computerization that we focused on was the presence of visual stimuli accompanying the items. The SRep task is intended to tap into language skills. Since they are frequently indicated as effective diagnostic tools for SLI, they also require standardization. However, when used with children, the difficulty of making children focus, stay on task, and maintain engagement arises. The presence of enriching visual stimuli emerges as a potential solution. Accordingly, in our dataset, 27 studies (19.14%; if the same task was used in multiple studies within a single report, we only counted it once) reported using any form of visual stimuli in the SRep task. The nature of these stimuli varied. For example, Sopata and Długosz (2022) incorporated a “picture of a speaking child” as a “signal for the child to repeat the sentence” (p. 9) in their German SRep task. Limacher (2018) presented children with English SRep items “with picture support using a tablet” (p. 19). In the French SRep task used by Courtney et al. (2017), “as the learners heard the sentence, they were also shown an image of the objects referred to in the sentence to encourage them to focus on meaning” (p. 831).

Thirty-four studies (24.11%) have specified administering the SRep task via a PowerPoint presentation. This number is higher than the number of studies reporting any visual stimuli use, as these PowerPoint presentations were not always described in detail and in some instances may have simply been a technical means to easily present the prerecorded sentences. Nevertheless, we mention this figure as the most typical visual enrichment of the SRep task was represented by the LITMUS-SRT tasks, which are typically presented in the form of a “child-friendly” PowerPoint presentation. It presents a short pictorial story which advances as the child repeats the sentences (see Marinis & Armon-Lotem, 2015), sometimes also referred to as the “treasure hunt.” In our dataset, 13 studies clearly reported using this child-friendly solution. Otherwise, 14 studies (9.92%) clearly reported not using any visual stimuli, while 97 studies (68.79%), a decisive majority, did not report explicitly whether visual stimuli were present or not. Importantly, this category may include studies which actually did use visual stimuli, but did not state it. For example, the Goralnik Screening Test for Hebrew utilizes pictorial stimuli alongside the target sentences in its SRep subtest. It was reported as the SRep task in three studies in our database (Altman et al., 2018; Chiat et al., 2013, Study 1; Rose et al., 2022), although the information about the pictorial stimuli was reported only by Chiat et al. (2013). Similarly, it is likely that other studies used SRep tasks from standardized test batteries which included pictorial stimuli, but did not report this explicitly.

Feedback given to children

Another method of maintaining children’s focus on the SRep task involves providing feedback, either in the form of verbal encouragement or small gifts contingent on performance. It may be argued that this element is of crucial importance for assessing the task’s validity (especially when using SRep tasks to identify SLI), as the administrator’s behavior has the potential of significantly impacting the child’s performance. Nevertheless, this information was very rarely disclosed in the reports in our database. Only 18 studies (12.76%) provided information about the feedback given to the children. For example, Torregrossa, Eisenbeiß, & Bongartz (2023) reported in the Supplementary Materials that during their gamified, story-based Italian SRep task administration, the children “each received positive feedback (i.e., “well done”), irrespective of the accuracy of the repetition” (p. 9). Woon et al. (2014) reported that during their Mandarin Chinese and English SRep task administration, “the researcher would also praise the children for any attempt at repetition” (p. 145). On the other hand, Janssen and Meir (2019) only reported that “positive feedback” (p. 14) was an element of the entire procedure.

SRep task scoring schemes

Marinis and Armon-Lotem (2015) describe six default scoring schemes for SRep tasks: (a) binary scoring of identical repetitions, (b) 3-point scoring for repetition accuracy, (c) counting content and function words, (d) binary scoring of overall repetition grammaticality, (e) binary scoring of target structure repetition, and (f) counting the number of changes between the item and the repetition. This variety is due to different traditions and procedures inspired by test batteries predating the COST Action IS0804 LITMUS tasks as well as different aims to which the SRep tasks can be used.

Accordingly, the studies in our dataset also differed in terms of the scoring schemes applied, with these differences also stemming from the task itself (i.e., using an SRep task from a specific test battery) or from the study aims. Thus, out of the studies reviewed, 36 studies (25.53%; if the same task was used in multiple studies within a single report, we only counted it once) reported using more than one scoring scheme. Twenty-three studies (16.31%) had no clearly identified scoring scheme, although the SRep task results were reported, typically as means or percentages of correctness. Six studies (4.25%) reported applying the scoring scheme from the test battery from which the SRep task was taken, but without stating this scheme in the report.

As regards the schemes, binary scoring based on identical repetitions was the most popular one, reported in 44 studies (31.20%). Binary scoring based on target structure repetition was the second most popular, reported in 31 studies (21.98%).

Alternative scoring schemes were employed in 19 studies (13.47%). In several instances, they involved qualitative analyses alongside employing more traditional scoring schemes. For example, Meir (2018) analyzed error profiles in addition to comparing quantitative scores based on target structure repetition. Similarly, Hamim and Hamid (2021) applied the binary scoring schemes for accuracy, grammaticality, and sentence type alongside a qualitative analysis of the error types. However, other studies used scoring schemes that were specifically matched to their aims. For example, Soesman et al. (2022) extracted instances of codeswitching from the recorded SRep trials and divided them into categories based on language and position of the switch within the sentence before subjecting these categories to further quantitative analysis. Bogliotti et al. (2020) used a French Sign Language SRep task and used qualitative ratings by annotators to identify regionalisms and substitutions more broadly, and phonological variables within them (e.g., location or orientation of the hands) more specifically. Finally, Trofimovich and Baker (2007) had native speakers judge the accentedness of English SRep sentences spoken by Korean-English bilingual children and adults.

Multipoint scoring schemes were reported in 20 (14.84%) instances, percentage calculations were reported in 16 instances (11.51%), and counting of specified instances was reported 20 (14.18%) times. As regards the targets of scoring, aside from identical repetition and target structure repetition, they included correctness of repetition, grammaticality, fluency, accentedness, word order and position, lexical or morphological targets, and so forth. Altogether, they were used for very specific study aims and are likely not intended for broader applications.

Discussion

Our systematic review aimed at describing the characteristics of SRep tasks used in published English-language empirical studies on language development in typically developing bilingual children and children with SLI. We hoped to draw attention to the variability in the various aspects of the SRep tasks as well as the typical procedures of their reporting, since they may potentially impact the obtained results and/or their utility. This way, we hoped to call for greater care and transparency in this regard.

Accordingly, our systematic review identified a large set of published studies (N = 141) in which SRep tasks were used to examine a broad range of topics, from typical and atypical language development, through cognitive functioning, to SLI diagnosis. We identified SRep-based investigations of 35 languages total, including both majority languages like English, Spanish, or French as well as relatively less common ones like Irish, Sepedi, or Quebecois French. Many studies examined two or more languages using SRep tasks, and the majority used SRep tasks in conjunction with other well-established measures of language ability and cognitive processes.

Based on our classification, the most prototypical, that is, following the most popular trends we have identified in our dataset, use of the SRep task in bilingual children to study language development would involve a typically developing sample and two or more SRep tasks, with one assessing English. The tasks would have about 30 sentences each and would be used together with a range of other language and/or cognitive tasks. The tasks would likely be standardized, contain prerecorded sentences, and be scored according to a binary scheme, potentially in conjunction with one or more different schemes. However, a report of such a task would likely omit mentions of computerization/gamification, the use of practice items, and whether the language of the task administration was different than the language of the task.

Overall, SRep is a useful measure in studies of language development in all domains and can be flexibly deployed in a variety of settings and populations, as well as in conjunction with a wide range of other tasks. SRep tasks are also largely available for major western European languages. However, additional studies are needed to appropriately respond to the call for increased availability of valid and reliable language screening tools for children speaking a variety of minority languages (Fleckstein et al., 2016; Rose et al., 2022). Furthermore, the classification of study aims testifies to the successful use of SRep in studies of bilingual child development (with studies on typical bilingual language development and bilingual language difficulties enjoying roughly similar popularity) as well as the characteristics, developmental dynamics, and detection of language difficulties (e.g., SLI, DLD, or dyslexia). This is particularly noteworthy, as bilingual language development remains a crucial area of research in which results from monolingual populations should not be easily carried over. This is because bilinguals may follow different pathways of development (Muszyńska et al., 2025), which not only require separate studies, but also require appropriately designed, standardized, and normed tests of language skills, SRep being one of them. In this context, the studies differed significantly both in terms of the features of the SRep tasks they used as well as in the quality of their description. Therefore, our systematic review highlights this area as important for facilitating further studies on bilingual language development. We consider recognizing heterogeneity in SRep tasks, ostensibly used toward the same research or applied purpose, as well as the reporting standards thereof a critical step toward improving them. In turn, greater clarity, transparency, and standardization of SRep tasks in published research will hopefully lead to intensified efforts aimed at designing SRep tasks in more languages, creating norms, and propagating standardization in SRep task use in applied contexts as well.

Probably the most significant area of difference involves computerized versus analog versions of the SRep task. The use of a PowerPoint presentation which frames the SRep task in the context of a picture book-like story was popularized due to its implementation in the COST Action IS0804 SRep tasks (see Marinis & Armon-Lotem, 2015). Although the intention of making the SRep task more engaging for children this way is persuasive, this solution may potentially introduce unwanted variability. First, the presence of gamification elements in the computerized version of tasks may yield higher scores overall than analog tasks due to being more engaging, motivating, or gratifying. On the other hand, they may be too distracting or may represent an additional cognitive load for the children which could disrupt their task performance. Although some evidence on both those points is available (Bernecker & Ninaus, 2021; but see Jagušt et al., 2018; Zhan et al., 2022), it comes from studies on tasks and contexts other than the SRep task. Therefore, more specific studies are needed to test these assumptions, especially since, as was mentioned above, available evidence comparing these task versions is equivocal (Banasik-Jemielniak et al., 2023; Pratt et al., 2022). Moreover, SRep task computerization may also vary between studies. For example, a recent study by Torregrossa, Caloi et al. (2023) used a different visual frame than the “treasure hunt” characteristic for the COST Action IS0804 SRep tasks. Nevertheless, despite these potential risks, using computerized versions of SRep tasks may help control for a variety of other confounding factors related to the testing situation (e.g., better or poorer rapport of a particular examiner with a particular child).

A related area is the use of prerecorded versus live SRep stimuli. With prerecorded stimuli, all children listen to exactly the same voice with the same pace, tone, and intonation. These may vary in live reading, introducing additional variables that are not accounted for. This may be especially important when using SRep tasks to identify SLI, as, presumably, eliminating various paralinguistic aspects and systematizing the stimuli may help tap more precisely into the examined child’s language processing capabilities. Moreover, prerecorded stimuli may be much more cost-effective and easier to use. On the other hand, the impersonal nature of prerecorded material may not be as motivating or engaging as a live interaction, which might facilitate a different dynamic. For example, Rice and Redcay (2016) showed that mere belief that the conversation being listened to is carried out live rather than prerecorded engaged mentalization to a greater degree, implying a difference in cognitive processing. As above, examining the impact of this feature on SRep tasks with bilingual children specifically requires specific tests. Nevertheless, it may be tentatively assumed that unlike the standardized, uniform delivery of prerecorded sentences, live reading allows for natural variability in speech, which can make the task feel more personalized and less mechanical. This can be particularly beneficial for maintaining children’s attention and motivation, as they may find the more interactive nature of live reading more appealing (e.g., “We decided to use live voice rather than prerecorded sentences as this helped engage these young children more readily in the task,” Frizelle et al., 2017, p. 1443). Another feature of live sentences is the ability for participants to use visual cues, such as lip-reading, to enhance their comprehension in the presence of auditory noise (Ma et al., 2009). The presence of a live speaker can also provide real-time feedback, which might be both an advantage (due to higher ecological validity) and disadvantage (due to uncontrolled for variance and the influence of potential examiner biases) for the quality of data collection. The live interaction also better facilitates the adaptation of the task administration to the child’s needs. We suggest that these costs and benefits should be considered depending on the specific aim of the study, although the specific impact of either of these SRep design features remains unknown and unexamined, despite the presence of some evidence from other pertinent areas of psychological assessment.

The above highlights another important issue: SRep task standardization. Using standardized SRep tasks, for example, those which constitute a part of a test battery or the COST Action IS0804 tasks, may contribute to the generalizability of the results and their valid interpretations. In applied contexts, it may also help build up a large amount of data which could potentially contribute to norm creation. As one of the aims of standardization is to minimize the error resulting from different assessment conditions, potential divergences between those conditions and task forms raise a question regarding the validity of interpretation. Specifically, would the study results be different if the assessment procedure was altered (see, for example, McCarthy & Elson, 2018), and if so, how? This question requires specific empirical testing in the context of SRep tasks used with bilingual children.

However, standardized SRep tasks are not available for many languages. Those that are available may also not always fit the specific aims of a given study, for example, when it focuses on the acquisition of a particular morphosyntactic feature. While recognizing the limitations in availability, we strongly recommend using SRep tasks which are standardized or which have been used in studies previously whenever possible. This is more so because, as we have sought to show in our systematic review, there are numerous other features of the SRep task beyond its language and the morphosyntactic structures it tests which should also be considered and which may potentially impact the obtained scores or the possibility of interpreting them. Indeed, our systematic review showed that SRep task length, the presence or absence of practice items, potential feedback given to participating children, as well as scoring schemes and—sometimes—reporting of raw SRep scores differed between the studies. In many cases, information about these aspects of the SRep task was not reported directly, which complicates the interpretation of the methodology and results of these studies.

SRep tasks are frequently deployed together with numerous other, often complex tasks (e.g., intelligence tests, language ability tests, working memory tasks). There is some evidence showing that lengthy study procedures lead to higher subjectively reported cognitive fatigue, but not lower objective performance, in adult participants (Ackerman et al., 2010). Although it may be assumed that this effect would occur with children participating in SRep tasks as well, which may impair their participation, and that this effect may be compounded by SRep task length, this needs specific empirical testing. In this context, based on the results of our systematic review, it appears that the reporting of using practice items within the SRep task is currently not customary in the literature. As above, this does not necessarily mean that practice items are not used, as they may nevertheless be part of the procedure for a given standardized task that the authors of a given study have used and cited properly. Nevertheless, the use of practice items in SRep tasks would benefit from additional examination in terms of its impact on the children’s performance or motivation and a uniform practice of reporting their presence/absence would increase the clarity of the reports.

In addition, establishing reasonable (i.e., motivating without biasing the results) rapport with the children and providing feedback should also be considered. Nevertheless, these solutions were used or reported to have been used relatively infrequently. Moreover, SRep task length was not directly reported in 34 studies (24.11%) of our dataset. This also applies to the SRep scoring schemes. By our classification, only 33.33% of the reports in our dataset reported SRep scores in specific morphosyntactic categories, with the rest reporting total scores only. Breaking down the SRep scores into more specific categories is not necessary in each study. Nevertheless, considering the variability of the SRep tasks, greater transparency in reporting the methodology and results appears warranted.

Finally, regarding the reporting of SRep task design specifically, for every formal SRep feature (with the exception of task language) that we considered, we were able to identify a subset of studies where appropriately clear information on it was not reported. This involves both features which may be considered relatively less important (although we argue otherwise), like the presence of visual stimuli in the task or the provision of feedback or reinforcement to the children throughout the task, as well as vital, core information like the task length in sentences. Even when acknowledging the fact that some studies either relied on an in-text citation to lead readers to further details or used a standardized version of the task (i.e., from a popular language testing battery) which informed readers may likely to be familiar with, in our view, this still represents an important oversight which should be corrected. On the one hand, lack of transparency regarding the specific features of the SRep task used in a given study inhibits appropriate judgments of its methodological quality, and thus the reliability and validity of the obtained results. On the other hand, it makes it more difficult for readers less experienced with this specific research context (e.g., students, early career researchers or researchers beginning their studies using the SRep methodology, or clinicians attempting to update their knowledge) to obtain accurate and detailed information, and thus proficiency. It also propagates a suboptimal standard in the literature, potentially disincentivizing the field from paying attention to and further studying the effects of SRep task heterogeneity. For similar cases in other areas of psychology, consider the over-reliance on Cronbach’s alpha as an index of reliability or the neglect in reporting effect sizes or confidence intervals in lieu of relying on statistical significance only (Fidler et al., 2004; Peng et al., 2013; Sijtsma, 2009).

Therefore, we would like to suggest that studies using SRep tasks directly report the following elements at minimum:

Language of the SRep task,

Origin of the SRep task: test battery, published task, author-made task according to the COST Action IS0804, author-made task according to other guidelines,

Length of the SRep task, presence and number of practice items,

Gamified computerization (including its characteristics, for example, visual elements, intermittent rewards) or in-person administration,

Use of prerecorded versus live sentences,

Feedback given to the children, if any,

Scoring scheme,

Raw SRep scores (means or percentages).

We consider this minimum to be relatively parsimonious, such that it should not be drawn out or redundant even if the authors use an SRep task from a popular test battery. Indeed, there are additional SRep elements which, if reported, could further improve methodological transparency, for example, the gender of the SRep item speaker, the length of SRep items in syllables, the presence or absence of parents or caregivers with the child during SRep task administration, or the language of task instruction (reported separately from the language of the task). However, we did not focus on these in our systematic review, as the initial stages of the full-text screening revealed that they were reported very infrequently. In addition, their influence on the SRep scores is difficult to infer without empirical testing, whereas the elements we did highlight appear to be more meaningful for the SRep methodology and there are some preliminary results on their point already (Banasik-Jemielniak et al., 2023; Pratt et al., 2021).

Limitations and future directions

The chief limitation of our systematic review is its scope. We focused only on English-language publications which included studies on bilingual children’s language development. However, SRep tasks may be used in a range of different contexts as well, both basic and applied. Therefore, our results may not necessarily describe the state of the literature in its entirety, and further reviews are likely needed. Nevertheless, we believe that greater transparency and systematization in terms of methodology reporting can only be beneficial.

Furthermore, we have limited our systematic review to only a general overview of SRep task characteristics. Although we suggest that differences in these characteristics may contribute to distorting the obtained results in uncontrolled and undesirable ways, further studies are needed to test this suggestion. These studies should include both direct comparisons as well as meta-analyses of extracted, appropriately described SRep data.

Finally, within our dataset, we focused only on SRep task characteristics. There are other methodological trends in the field of child bilingualism studies which would be valuable to analyze in greater detail, for example, the languages studied, the socioeconomic characteristics of the samples, the proportion of simultaneous or sequential bilingual children in the samples, or the methods of screening for SLI or gathering language background data. As with our systematic review, we believe that systematization in reporting these aspects of studies also have the potential of raising their scientific standard.

From another perspective, it may be potentially valuable to examine in-depth the impact of the COST Action IS0804 and the resulting SRep task framework on the changes in methodology and reporting standards in the literature (e.g., can specific changes in SRep task design or the way of reporting it be tracked or compared between the pre- and post-COST Action IS084 periods?). This way, valuable insights could be gained, potentially serving to formulate effective standards, guidelines, or reporting practices.¹

In sum, we identified areas of significant variability in SRep task use within English-language studies on typical and atypical language development in bilingual children: task standardization, computerization (the presence of visual elements, prerecorded sentences, as well as gamification), length, and scoring schemes. We hope that studies will proliferate as SRep tasks for more languages are introduced. We also hope that by calling attention to the need for greater transparency and uniformity in reporting the details of the SRep methodology, our systematic review will facilitate this proliferation. Finally, we hope that higher standardization and reporting standards in empirical studies will dovetail with more robust, high-quality use of SRep tasks in applied settings, leading to more trustworthy scores and more accurate decision-making.

Contribution

Our systematic review sheds light on the varied use and reporting of SRep tasks in research with bilingual children. It not only demonstrates the wide range of methodologies but also identifies areas where inconsistencies in task design and reporting could potentially affect the interpretation of research findings, such as mode of administration (computerized vs analog) and format of stimulus presentation (prerecorded vs live). However, further studies are needed to verify this. We highlighted these variables, which may inform the design of future studies. In addition, our review emphasizes the need for clearer reporting standards in the field. This way, we hope to contribute to increased standardization, transparency, and detail, thereby facilitating replications and meta-analyses as well as higher standards of practice. This is particularly important for emerging areas of research and less commonly studied languages, where standardized tools and practices are still being developed.

Footnotes

ORCID iDs

Piotr Kałowski

Dawid Walczak

Natalia Banasik-Jemielniak

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Centre, Poland (Narodowe Centrum Nauki), under the grant number 2021/41/B/HS2/01036, as part of the project “Finding the common denominator in the study of bilingual children’s language development: morphosyntactic skills measured by sentence repetition tasks. Meta-analysis, method comparison and validation studies.”

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available at the Open Science Framework at online.

Notes

Author biographies

Piotr Kałowski is an associate professor at the School of Human Sciences, VIZJA University, Poland. He has written his Ph.D. thesis on the topic of verbal irony in internal dialogues. His research interests chiefly focus on the individual differences in verbal irony.

Maria Obarska is a Ph.D. student at the Faculty of Psychology, University of Warsaw.

Jakub Romaneczko is a Ph.D. student at the Chair of Developmental Psychology, John Paul II Catholic University of Lublin. His scientific interests center around therapeutic interventions for and promotion of well-being. He is writing his thesis on the developmental mechanisms of self-compassion across the lifespan.

Dawid Walczak has an M.A. in psychology. His scientific interests center around personality psychology, especially the Dark Traid. He is also interested in statistical methods and psychometrics.

Natalia Banasik-Jemielniak is an assistant professor at the Institute of Psychology at the Maria Grzegorzewska University in Warsaw, Poland. She has earned her Ph.D. in psychology and three Masters in linguistics, psychology, and intercultural education.

References

*Abed Ibrahim

Fekete

(2019). What machine learning can tell us about the role of language dominance in the diagnostic accuracy of German LITMUS non-word and sentence repetition tasks. Frontiers in Psychology, 9, 2757. https://doi.org/10.3389/fpsyg.2018.02757

*Abed Ibrahim

Hamann

(2017). Bilingual Arabic-German and Turkish-German children with and without specific language impairment: Comparing performance in sentence and nonword repetition tasks. In Proceedings of the 41st Annual Boston University Conference on Language Development (pp. 1–17). Cascadilla Press.

*Abed Ibrahim

Hamann

Fekete

(2020). Language assessment of Bilingual Arabic-German heritage and refugee children: Comparing performance on LITMUS repetition tasks. In Brown

M. M.

Kohut

(Eds.), Proceedings of BUCLD (Vol. 44, pp. 1–17). Cascadilla Press.

Abu Bakar

(2017). Kemahiran sintaksis dalam kalangan kanak-kanak Melayu dengan kecelaruan bahasa spesifik [The syntactic abilities of Malay children with specific language impairment (SLI); Unpublished doctoral dissertation]. Universiti Kebangsaan Malaysia.

Ackerman

P. L.

Kanfer

Shapiro

S. W.

Newton

Beier

M. E.

(2010). Cognitive fatigue during testing: An examination of trait, time-on-task, and strategy influences. Human Performance, 23(5), 381–402. https://doi.org/10.1080/08959285.2010.517720

Aguilar-Alonso

Á.

Moreno-González

(2012). Neuropsychological differences between samples of dyslexic and reader children by means of NEPSY. Anuario de Psicología, 42(1), 33–50.

*Aguilar-Mediavilla

Buil-Legaz

López-Penadés

Sanchez-Azanza

V. A.

Adrover-Roig

(2019). Academic outcomes in bilingual children with developmental language disorder: A longitudinal study. Frontiers in Psychology, 10, 531. https://doi.org/10.3389/fpsyg.2019.00531

*Aguilar-Mediavilla

Buil-Legaz

Perez-Castello

J. A.

Rigo-Carratalà

Adrover-Roig

(2014). Early preschool processing abilities predict subsequent reading outcomes in bilingual Spanish–Catalan children with specific language impairment (SLI). Journal of Communication Disorders, 50, 19–35. https://doi.org/10.1016/j.jcomdis.2014.03.003

*Almeida

L. D.

Ferré

Morin

Prévost

Santos

C. D.

Tuller

. . .Barthez

M. A.

(2017). Identification of bilingual children with specific language impairment in France. Linguistic Approaches to Bilingualism, 7(3–4), 331–358. https://doi.org/10.1075/lab.15019.alm

10.

*Altman

Burstein Feldman

Yitzhaki

Armon Lotem

Walters

(2014). Family language policies, reported language use and proficiency in Russian–Hebrew bilingual children in Israel. Journal of Multilingual and Multicultural Development, 35(3), 216–234. https://doi.org/10.1080/01434632.2013.852561

11.

*Altman

Goldstein

Armon-Lotem

(2018). Vocabulary, metalinguistic awareness and language dominance among bilingual preschool children. Frontiers in Psychology, 1953. https://doi.org/10.3389/fpsyg.2018.01953

12.

*Andreou

Dosi

Papadopoulou

Tsimpli

I. M.

(2020). Heritage and non-heritage bilinguals: The role of biliteracy and bilingual education. In Brehmer

Treffers-Daller

(Eds.), Lost in transmission: The role of attrition in heritage language development (pp. 172–196). John Benjamins.

13.

*Andreou

Torregrossa

Bongartz

(2023). The use of null subjects by Greek-Italian bilingual children. In Fotiadou

Tsimpli

I. M.

(Eds.), Individual differences in anaphora resolution (pp. 166). John Benjamins. https://doi.org/10.1017/S0142716421000229

14.

*Andreou

Tsimpli

I. A.

Durrelman

Peristeri

(2020). Theory of mind, executive functions, and syntax in bilingual children with autism spectrum disorder. Languages, 5(4), 67. https://doi.org/10.3390/languages5040067

15.

*Andreou

Tsimpli

I. M.

Masoura

Agathopoulou

(2021). Cognitive mechanisms of monolingual and bilingual children in monoliterate educational settings: Evidence from sentence repetition. Frontiers in Psychology, 11, 613992. https://doi.org/10.3389/fpsyg.2020.613992

16.

*Antonijevic

Durham

Chonghaile

Í. N.

(2017). Language performance of sequential bilinguals on an Irish and English sentence repetition task. Linguistic Approaches to Bilingualism, 7(3–4), 359–393. https://doi.org/10.1075/lab.15026.ant

17.

Antonijevic

Meir

(2024). Cross-linguistic perspectives on morphosyntax in child language disorders. In Ball

M. J.

Müller

Spencer

(Eds.), The handbook of clinical linguistics, second edition (pp. 259–271). Wiley Blackwell.

18.

*Antonijevic-Elliott

Lyons

O’Malley

M. P.

Meir

Haman

Banasik

. . .Fitzmaurice

(2020). Language assessment of monolingual and multilingual children using non-word and sentence repetition tasks. Clinical Linguistics & Phonetics, 34(4), 293–311. https://doi.org/10.1080/02699206.2019.1637458

19.

Archibald

L. M. D.

Gathercole

S. E.

(2006). Nonword repetition: A comparison of tests. Journal of Speech, Language, and Hearing Research, 49(5), 970–983. https://doi.org/10.1044/1092-4388(2006/070)

20.

*Armon-Lotem

Meir

(2016). Diagnostic accuracy of repetition tasks for the identification of specific language impairment (SLI) in bilingual children: Evidence from Russian and Hebrew. International Journal of Language & Communication Disorders, 51(6), 715–731. https://doi.org/10.1111/1460-6984.12242

21.

*Armon-Lotem

(2014). Between L2 and SLI: Inflections and prepositions in the Hebrew of bilingual children with TLD and monolingual children with SLI. Journal of Child Language, 41(1), 3–33. https://doi.org/10.1017/S0305000912000487

22.

*Babayiğit

(2015). The relations between word reading, oral language, and reading comprehension in children who speak English as a first (L1) and second language (L2): A multigroup structural analysis. Reading and Writing, 28, 527–544. https://doi.org/10.1007/s11145-014-9536-x

23.

Banasik

Haman

Smoczynska

(2012). Sentence repetition task [Unpublished material]. Faculty of Psychology University of Warsaw.

24.

Banasik-Jemielniak

Kochańska

Haman

Czartoryski

Modzelewska

Obarska

Zajączkowska

(2023). Towards verifying the reliability of online testing in studying grammatical competence of children: The Polish Sentence Repetition Task used online and onsite [Conference presentation]. International Symposium on Bilingualism, Sydney, Australia.

25.

*Bedore

L. M.

Peña

E. D.

Fiestas

Lugo-Neris

M. J.

(2020). Language and literacy together: Supporting grammatical development in dual language learners with risk for language and learning difficulties. Language, Speech, and Hearing Services in Schools, 51(2), 282–297. https://doi.org/10.1044/2020_LSHSS-19-00055

26.

Bernecker

Ninaus

(2021). No pain, no gain? Investigating motivational mechanisms of game elements in cognitive tasks. Computers in Human Behavior, 114, 106542. https://doi.org/10.1016/j.chb.2020.106542

27.

Bishop

D. V.

(2017). Why is it so hard to reach agreement on terminology? The case of developmental language disorder (DLD). International Journal of Language & Communication Disorders, 52(6), 671–680.

28.

*Blom

Soto-Corominas

Attar

Daskalaki

Paradis

(2021). Interdependence between L1 and L2: The case of Syrian children with refugee backgrounds in Canada and the Netherlands. Applied Psycholinguistics, 42(5), 1159–1194. https://doi.org/10.1017/S0142716421000229

29.

*Bogliotti

Aksen

Isel

(2020). Language experience in LSF development: Behavioral evidence from a sentence repetition task. PLOS ONE, 15(11), e0236729. https://doi.org/10.1371/journal.pone.0236729

30.

*Bosman

A. M.

Janssen

(2017). Differential relationships between language skills and working memory in Turkish–Dutch and native-Dutch first-graders from low-income families. Reading and Writing, 30(9), 1945–1964. https://doi.org/10.1007/s11145-017-9760-2

31.

*Buil-Legaz

Aguilar-Mediavilla

Adrover-Roig

(2016). Longitudinal trajectories of the representation and access to phonological information in bilingual children with specific language impairment. International Journal of Speech-Language Pathology, 18(5), 473–482. https://doi.org/10.3109/17549507.2015.1126638

32.

Cao

Yan

(2024). Diagnostic accuracy of current assessment measures for developmental language disorders in bilingual children: A systematic review. Journal of Interactional Research in Communication Disorders, 15(1). https://doi.org/10.1558/jircd.26978

33.

*Castilla-Earls

Ronderos

Francis

D. J.

(2023). Longitudinal examination of morphosyntactic skills in bilingual children: Spanish and English standardized scores. Journal of Speech, Language, and Hearing Research, 66(8), 2671–2687. https://doi.org/10.1044/2023_JSLHR-22-00495

34.

De Cat

Melia

(2022). What does the sentence structure component of the CELF-IV index, in monolinguals and bilinguals? Journal of Child Language, 49(3), 423–450. https://doi.org/10.1017/S0305000920000823

35.

Chondrogianni

Andreou

Nerantzini

Varlokosta

Tsimpli

I. M.

(2013). The Greek sentence repetition task. COST Action IS0804.

36.

*Chiat

Armon-Lotem

Marinis

Polišenská

Roy

Seeff-Gabriel

Gathercole

V. M.

(2013). Assessment of language abilities in sequential bilingual children: The potential of sentence imitation tasks. In Mueller Gathercole

V. C.

(Ed.), Issues in the assessment of bilinguals (pp. 56–89). Multilingual Matters.

37.

*Chilla

Hamann

Prévost

Ibrahim

L. A.

Ferré

dos Santos

. . .Tuller

(2021). The influence of different first languages on LITMUS nonword repetition and sentence repetition in second language French and second language German: A crosslinguistic approach. In Grohmann

Armon-Lotem

(Eds.), LITMUS in action: Comparative studies (pp. 227–263). John Benjamins.

38.

*Chondrogianni

John

(2018). Tense and plural formation in Welsh–English bilingual children with and without language impairment. International Journal of Language & Communication Disorders, 53(3), 495–514. https://doi.org/10.1111/1460-6984.12363

39.

*Chondrogianni

Kwon

(2019). The development of English tense and agreement morphology in Welsh–English bilingual children with and without specific language impairment. Applied Psycholinguistics, 40(4), 821–852. https://doi.org/10.1017/S0142716418000772

40.

*Clay

M. M.

(1971). Sentence repetition: Elicited imitation of a controlled set of syntactic structures by four language groups. Monographs of the Society for Research in Child Development, 36, 1–85. https://doi.org/10.2307/1165821

41.

Coady

J. A.

Evans

J. L.

(2008). Uses and interpretations of non-word repetition tasks in children with and without specific language impairments (SLI). International Journal of Language & Communication Disorders, 43(1), 1–40. https://doi.org/10.1080/13682820601116485

42.

Conti-Ramsden

Botting

Faragher

(2001). Psycholinguistic markers for specific language impairment (SLI). Journal of Child Psychology and Psychiatry, 42(6), 741–748.

43.

*Courtney

Graham

Tonkyn

Marinis

(2017). Individual differences in early language learning: A study of English learners of French. Applied Linguistics, 38(6), 824–847. https://doi.org/10.1111/1469-7610.00770

44.

*De Cat

(2020). Predicting language proficiency in bilingual children. Studies in Second Language Acquisition, 42(2), 279–325. https://doi.org/10.1017/S0272263119000597

45.

*De Cat

Melia

(2022). What does the sentence structure component of the CELF-IV index, in monolinguals and bilinguals? Journal of Child Language, 49(3), 423–450. https://doi.org/10.1017/S0305000920000823

46.

de Jong

(2024). Developmental language disorder in a bilingual context. In Ball

M. J.

Müller

Spencer

(Eds.), The handbook of clinical linguistics, second edition. Wiley.

47.

de Jong

Blom

Van Dijk

(2021). LITMUS Srep—een zinsherhaaltaak voor het Nederlands. Stem-, taal- en spraakpathologie, 26, 96–116. https://doi.org/10.21827/32.8310/2021-96

48.

Devescovi

Caselli

M. C.

(2007). Sentence repetition as a measure of early grammatical development in Italian. International Journal of Language & Communication Disorders, 42(2), 187–208. https://doi.org/10.1080/13682820601030686

49.

*Du

(2014). Mandarin morphosyntax development in bilingual Mandarin-English children with and without SLI [Doctoral dissertation]. University of Texas at Austin.

50.

*Ebert

K. D.

(2014). Role of auditory non-verbal working memory in sentence repetition for bilingual children with primary language impairment. International Journal of Language & Communication Disorders, 49(5), 631–636. https://doi.org/10.1111/1460-6984.12090

51.

*Erdos

Genesee

Savage

Haigh

(2014). Predicting risk for oral and written language learning difficulties in students educated in a second language. Applied Psycholinguistics, 35(2), 371–398. https://doi.org/10.1017/S0142716412000422

52.

Erlam

(2006). Elicited imitation as a measure of L2 implicit knowledge: An empirical validation study. Applied Linguistics, 27(3), 464–491. https://doi.org/10.1093/applin/aml001

53.

Fidler

Thomason

Cumming

Finch

Leeman

(2004). Editors can lead researchers to confidence intervals, but can’t make them think: Statistical reform lessons from medicine. Psychological Science, 15(2), 119–126. https://doi.org/10.1111/j.0963-7214.2004.01502008.x

54.

*Fitton

Goodrich

J. M.

Thayer

Pratt

Luna

(2023). Bilingual vocabulary assessment: Examining single-language, conceptual, and total scoring approaches. Journal of Speech, Language, and Hearing Research, 66(9), 3486–3499. https://doi.org/10.1044/2023_JSLHR-22-00573

55.

Fitton

Hoge

Petscher

Wood

(2019). Psychometric evaluation of the Bilingual English–Spanish Assessment sentence repetition task for clinical decision making. Journal of Speech, Language, and Hearing Research, 62(6), 1906–1922. https://doi.org/10.1044/2019_JSLHR-L-18-0354

56.

*Fleckstein

Prévost

Tuller

Sizaret

Zebib

(2016). How to identify SLI in bilingual children: A study on sentence repetition in French. Language Acquisition, 25(1), 85–101.

57.

*Fortier

Simard

(2017). Exploring the contribution of phonological memory to metasyntactic abilities in bilingual children. Language Awareness, 26(2), 78–95. https://doi.org/10.1080/09658416.2017.1345919

58.

*Franck

Delage

(2022). The interplay of emotions, executive functions, memory and language: Challenges for refugee children. Languages, 7(4), 309. https://doi.org/10.3390/languages7040309

59.

Fraser

Bellugi

Brown

(1963). Control of grammar in imitation, comprehension, and production. Journal of Verbal Learning and Verbal Behavior, 2(2), 121–135. https://doi.org/10.1016/S0022-5371(63)80076-6

60.

*Friesen

D. C.

Edwards

Lamoureux

(2021). Predictors of verbal fluency performance in monolingual and bilingual children: The interactive role of English receptive vocabulary and fluid intelligence. Journal of Communication Disorders, 89, 106074. https://doi.org/10.1016/j.jcomdis.2020.106074

61.

*Friesen

D. C.

Ward

Archibald

L. M.

(2022). Sentence repetition performance differences in bilingual and monolingual children. Journal of Speech, Language, and Hearing Research, 65(8), 2948–2961. https://doi.org/10.1044/2022_JSLHR-21-00596

62.

Frizelle

O’Neill

Bishop

D. V.

(2017). Assessing understanding of relative clauses: A comparison of multiple-choice comprehension versus sentence repetition. Journal of Child Language, 44(6), 1435–1457. https://doi.org/10.1017/S0305000916000635

63.

*Gagarina

Gey

Sürmeli

(2019). Identifying early preschool bilinguals at risk of DLD: A composite profile of narrative and sentence repetition skills. ZAS Papers in Linguistics, 62, 168–189. https://doi.org/10.21248/zaspil.62.2019.448

64.

*Gangware

E. B.

(2016). Sentence repetition in sequential bilingual preschool children [Undergraduate honors thesis]. University of Colorado Boulder.

65.

Gathercole

S. E.

Baddeley

A. D.

(1990). Phonological memory deficits in language disordered children: Is there a causal connection? Journal of Memory and Language, 29(3), 336–360. https://doi.org/10.1016/0749-596X(90)90004-J

66.

Gathercole

S. E.

Willis

Emslie

Baddeley

(1992). Phonological memory and vocabulary development during the early school years: A longitudinal study. Development Psychology, 28, 887–898.

67.

Gathercole

S. E.

Willis

C. S.

Baddeley

A. D.

Emslie

(1994). The children’s test of nonword repetition: A test of phonological working memory. Memory, 2(2), 103–127. https://doi.org/10.1080/09658219408258940

68.

*Gatlin-Nash

Peña

E. D.

Bedore

L. M.

Simon-Cereijido

Iglesias

(2021). English BESA morphosyntax performance among Spanish–English bilinguals who use African American English. Journal of Speech, Language, and Hearing Research, 64(10), 3826–3842. https://doi.org/10.1044/2021_JSLHR-20-00737

69.

*Gavarró

(2017). A sentence repetition task for Catalan-speaking typically-developing children and children with specific language impairment. Frontiers in Psychology, 8, 1865. https://doi.org/10.3389/fpsyg.2017.01865

70.

*Gillam

R. B.

Peña

E. D.

Bedore

L. M.

Bohman

T. M.

Mendez-Perez

(2013). Identification of specific language impairment in bilingual children: I. Assessment in English. Journal of Speech, Language, and Hearing, 58(6), 1813–1823. https://doi.org/10.1044/1092-4388(2013/12-0056)

71.

Gilmore

Campbell

(2019). ‘It’s a lot trickier than I expected’: Assessment issues and dilemmas for intern psychologists. The Educational and Developmental Psychologist, 36(1), 3–7. https://doi.org/10.1017/edp.2019.3

72.

*Glennen

(2015). Internationally adopted children in the early school years: Relative strengths and weaknesses in language abilities. Language, Speech, and Hearing Services in Schools, 46(1), 1–13. https://doi.org/10.1044/2014_LSHSS-13-0042

73.

Goralnik

(1995). Goralnik screening test for Hebrew. Matan.

74.

*Graham

Courtney

Marinis

Tonkyn

(2017). Early language learning: The impact of teaching and teacher factors. Language Learning, 67(4), 922–958. https://doi.org/10.1111/lang.12251

75.

*Grech

(2022). The association of sentence imitation with other language domains in bilingual children. Journal of Child Science, 12(01), e15–e23. https://doi.org/10.1055/s-0042-1743528

76.

Grech

Franklin

Dodd

(2011). Language Assessment for Maltese Children (LAMC). University of Malta.

77.

Grimm

(2001). Sprachentwicklungstest für drei- bis fünfjährige Kinder (SETK 3–5). Hogrefe Verlag.

78.

Grimm

(2003). Sprachstandscreening für das Vorschulalter. Hogrefe.

79.

*Gutiérrez-Clellen

V. F.

Restrepo

M. A.

Simón-Cereijido

(2006). Evaluating the discriminant accuracy of a grammatical measure with Spanish-speaking children. Journal of Speech, Language, and Hearing Research, 49(6), 1209–1223. https://doi.org/10.1044/1092-4388(2006/087)

80.

Hack

Marinova-Todd

S. H.

May Bernhardt

(2012). Speech assessment of Chinese–English bilingual children: Accent versus developmental level. International Journal of Speech-Language Pathology, 14(6), 509–519. https://doi.org/10.3109/17549507.2012.718361

81.

*Haman

Wodniecka

Marecka

Szewczyk

Białecka-Pikul

Otwinowska

. . .Foryś-Nogala

(2017). How does L1 and L2 exposure impact L1 performance in bilingual children? Evidence from Polish-English migrants to the United Kingdom. Frontiers in Psychology, 8, 1444. https://doi.org/10.3389/fpsyg.2017.01444

82.

*Hamann

Abed Ibrahim

(2017). Methods for identifying specific language impairment in bilingual populations in Germany. Frontiers in Communication, 2, 16. https://doi.org/10.3389/fcomm.2017.00016

83.

*Hamann

Chilla

Gagarina

Abed Ibrahim

(2017). Syntactic complexity and bilingualism: How (a) typical bilinguals deal with complex structures. In Di Domenico

(Ed.), Complexity in acquisition (pp. 142–178). Cambridge Scholars Publishing.

84.

*Hamann

Chilla

Ibrahim

L. A.

Fekete

(2020). Language assessment tools for Arabic-speaking heritage and refugee children in Germany. Applied Psycholinguistics, 41(6), 1375–1414. https://doi.org/10.1017/S0142716420000399

85.

Hamann

Chilla

Ruigendijk

Abed Ibrahim

(2013). A German sentence repetition task: Testing bilingual Russian-German children. Poster presented at the COST Action IS0806 Conference, Kraków.

86.

Hamann

Penner

Linder

(1998). German impaired grammar: The clause structure revisited. Language Acquisition, 7(2–4), 193–245. https://doi.org/10.1207/s15327817la0702-4_5

87.

*Hamayan

Saegert

Larudee

(1977). Elicited imitation in second language learners. Language and Speech, 20(1), 86–97. https://doi.org/10.1177/002383097702000109

88.

*Hamim

R. A. R.

Hamid

B. A.

(2021). The morphosyntactic abilities of bilingual Malay preschool children based on the Malay and English sentence repetition tasks. Pertanika Journal of Social Science and Humanities, 29(1), 71–90. https://doi.org/10.47836/pjssh.29.1.04

89.

*Hirata-Edds

(2011). Influence of second language Cherokee immersion on children’s development of past tense in their first language, English. Language Learning, 61(3), 700–733. https://doi.org/10.1111/j.1467-9922.2011.00655.x

90.

*Hopp

(2019). Cross-linguistic influence in the child third language acquisition of grammar: Sentence comprehension and production among Turkish-German and German learners of English. International Journal of Bilingualism, 23(2), 567–583. https://doi.org/10.1177/1367006917752523

91.

*Hough

S. D.

Kaczmarek

(2011). Language and reading outcomes in young children adopted from Eastern European orphanages. Journal of Early Intervention, 33(1), 51–74. https://doi.org/10.1177/1053815111401377

92.

Hunt

Nang

Meldrum

Armstrong

(2022). Can dynamic assessment identify language disorder in multilingual children? Clinical applications from a systematic review. Language, Speech, and Hearing Services in Schools, 53(2), 598–625. https://doi.org/10.1044/2021_LSHSS-21-00094

93.

*Ibrahim

L. A.

Hamann

Öwerdieck

(2018). Identifying specific language impairment (SLI) across different bilingual populations: German Sentence Repetition Task (SRT). In Bertolini

A. B.

Kaplan

M. J.

(Eds.), Proceedings of the 42nd annual Boston University Conference on Language Development (pp. 1–14). Cascadilla Press.

94.

Jagušt

Botički

H. J.

(2018). Examining competitive, collaborative and adaptive gamification in young learners’ math learning. Computers & Education, 125, 444–457. https://doi.org/10.1016/j.compedu.2018.06.022

95.

Jakubowicz

Tuller

(2008), Specific language impairment in French. In Ayoun

(Ed.), Studies in French applied linguistics (pp. 97–134). John Benjamins.

96.

Janssen

(2016). The acquisition of gender and case in Polish and Russian: A study of monolingual and bilingual children. Pegasus Oost-Europese Studies, 27. Uitgeverij Pegasus.

97.

*Janssen

Meir

(2019). Production, comprehension and repetition of accusative case by monolingual Russian and bilingual Russian-Dutch and Russian-Hebrew-speaking children. Linguistic Approaches to Bilingualism, 9(4–5), 736–765. https://doi.org/10.1075/lab.17021.jan

98.

Jarvella

R. J.

(1971). Syntactic processing of connected speech. Journal of Verbal Learning and Verbal Behavior, 10(4), 409–416. https://doi.org/10.1016/S0022-5371(71)80040-3

99.

*Jordaan

Ngwanduli

M. H.

(2020). Contextual influences on sentence repetition as a tool for the identification of language impairment in Grade 3 Sepedi-English bilinguals: A case against bilingual norms. South African Journal of Communication Disorders, 67(1), 1–8.

100.

*Kaltsa

Prentza

Tsimpli

I. M.

(2020). Input and literacy effects in simultaneous and sequential bilinguals: The performance of Albanian–Greek-speaking children in sentence repetition. International Journal of Bilingualism, 24(2), 159–183. https://doi.org/10.1177/1367006918819867

101.

Klem

Melby-Lervåg

Hagtvet

Lyster

S.-A. H.

Gustafsson

J.-E.

Hulme

(2015). Sentence repetition is a measure of children’s language skills rather than working memory limitations. Developmental Science, 18(1), 146–154. https://doi.org/10.1111/desc.12202

102.

Komeili

Marinis

Tavakoli

Kazemi

(2020). Sentence repetition in Farsi-English bilingual children. Journal of the European Second Language Association, 4, 1–12. https://doi.org/10.22599/jesla.55

103.

*Komeili

Marshall

C. R.

(2013). Sentence repetition as a measure of morphosyntax in monolingual and bilingual children. Clinical Linguistics & Phonetics, 27(2), 152–162. https://doi.org/10.3109/02699206.2012.751625

104.

*Komeili

Tavakoli

Marinis

(2023). Using multiple measures of language dominance and proficiency in Farsi-English bilingual children. Frontiers in Communication, 8, 1153665. https://doi.org/10.3389/fcomm.2023.1153665

105.

Korkman

Kirk

Kemp

S. L.

(2000). NEPSY. Neuropsykologisk bedömning 3:0–12:11 år. Psykologiförlaget.

106.

Korkman

Kirk

Kemp

(1998). NEPSY: Developmental neuropsychological assessment. Psychological Corp.

107.

*Korytkowski Longo

(2011). Sentence repetition as a tool to measure grammatical progress in English-dominant bilingual children with language and/or reading impairment [Master’s thesis]. University of Texas.

108.

Kueser

J. B.

Leonard

L. B.

(2020). The effects of frequency and predictability on repetition in children with developmental language disorder. Journal of Speech, Language, and Hearing Research, 63(4), 1165–1180. https://doi.org/10.1044/2019_JSLHR-19-00155

109.

*Lein

Hamann

Rothweiler

Abed Ibrahim

Chilla

San

(2016). SLI in bilinguals: Testing complex syntax and semantics in German. In Stringer

Garrett

Halloran

Mossman

(Eds.), Proceedings of the 13th Generative Approaches to Second Language Acquisition Conference (GASLA 2015) (pp. 124–135). Cascadilla Proceedings Project.

110.

*Limacher

(2018). Speech assessment in bilingual children: Relationship between perceptual judgments of accent/comprehensibility and formal test measures [Master’s thesis]. University of Alberta.

111.

Lindner

Schmitt

Kühfuss

(2013). Satzwiederholung im Deutschen. Ein Test für Vorschulkinder [Unpublished manuscript]. University of Munich.

112.

*Liu

Chung

K. K.

McBride

(2016). The role of SES in Chinese (L1) and English (L2) word reading in Chinese-speaking kindergarteners. Journal of Research in Reading, 39(3), 268–291. https://doi.org/10.1111/1467-9817.12046

113.

Liu

P. D.

McBride-Chang

Wong

A. M.-Y.

Tardif

Stokes

Fletcher

Shu

(2010). Early oral language markers of poor reading performance in Hong Kong Chinese children. Journal of Learning Disabilities, 43, 383–386. https://doi.org/10.1177/0022219410369084

114.

W. J.

Zhou

Ross

L. A.

Foxe

J. J.

Parra

L. C.

(2009). Lip-reading aids word recognition most in moderate noise: A Bayesian explanation using high-dimensional feature space. PLOS ONE, 4(3), e4638. https://doi.org/10.1371/journal.pone.0004638

115.

*Makrodimitris

Schulz

(2021). Does timing in acquisition modulate heritage children’s language abilities? Evidence from the Greek LITMUS Sentence Repetition Task. Languages, 6(1), 49. https://doi.org/10.3390/languages6010049

116.

*Manis

F. R.

Lindsey

K. A.

Bailey

C. E.

(2004). Development of reading in grades K–2 in Spanish-speaking English-language learners. Learning Disabilities Research & Practice, 19(4), 214–224. https://doi.org/10.1111/j.1540-5826.2004.00107.x

117.

*Marc Goodrich

Fitton

Thayer

(2023). Relations between oral language skills and English reading achievement among Spanish–English bilingual children: A quantile regression analysis. Annals of Dyslexia, 73(1), 6–28. https://doi.org/10.1007/s11881-022-00257-1

118.

*Marinis

Armon-Lotem

(2015). Sentence repetition. In Armon-Lotem

de Jong

Meir

(Eds.), Assessing multilingual children: Disentangling bilingualism from language impairment (pp. 95–124). Multilingual Matters.

119.

Marinis

Chiat

Armon-Lotem

(2012). Multi-lingual sentence imitation task (Multi-SIT)-30 sentences. University of Reading.

120.

Marinis

Chiat

Armon-Lotem

Gibbons

Gipps

(2010). School-Age Sentence Imitation Test (SASIT). University of Reading.

121.

Marinis

Chiat

Armon-Lotem

Piper

Roy

(2011). School-age sentence imitation test- E32. COST ACTION IS0804.

122.

Marshall

Mason

Rowley

Herman

Atkinson

Woll

Morgan

(2015). Sentence repetition in deaf children with specific language impairment in British sign language. Language Learning and Development, 11(3), 237–251. https://doi.org/10.1080/15475441.2014.917557

123.

Martin

N. A.

Brownell

(2005). Test of auditory processing skills, third edition (TAPS-3). Academic Therapy Publications.

124.

McCarthy

R. J.

Elson

(2018). A conceptual review of lab-based aggression paradigms. Collabra: Psychology, 4(1), 4. https://doi.org/10.1525/collabra.104

125.

*Meir

(2017). Effects of specific language impairment (SLI) and bilingualism on verbal short-term memory. Linguistic Approaches to Bilingualism, 7(3–4), 301–330. https://doi.org/10.1075/lab.15033.mei

126.

*Meir

(2018). Morpho-syntactic abilities of unbalanced bilingual children: A closer look at the weaker language. Frontiers in Psychology, 9, 1318. https://doi.org/10.3389/fpsyg.2018.01318

127.

Meir

Armon-Lotem

(2015). Disentangling bilingualism from SLI in Heritage Russian: The impact of L2 properties and length of exposure to the L2. In Hamann

Ruigendijk

(Eds.), Language acquisition and development: Proceedings of GALA 2013 (pp. 299–314). Cambridge Scholars Publishing.

128.

*Meir

Armon-Lotem

(2017a). Delay or deviance: Old question—New evidence from bilingual children with specific language impairment (SLI). In LaMendola

Scott

(Eds.), Proceedings of the 41st annual Boston University Conference on Language Development (pp. 495–508). Cascadilla Press.

129.

*Meir

Armon-Lotem

(2017b). Independent and combined effects of socioeconomic status (SES) and bilingualism on children’s vocabulary and verbal short-term memory. Frontiers in Psychology, 8, 1442. https://doi.org/10.3389/fpsyg.2017.01442

130.

*Meir

Novogrodsky

(2020). Syntactic abilities and verbal memory in monolingual and bilingual children with high functioning autism (HFA). First Language, 40(4), 341–366. https://doi.org/10.1177/0142723719849981

131.

*Meir

Walters

Armon-Lotem

(2016). Disentangling SLI and bilingualism using sentence repetition tasks: The impact of L1 and L2 properties. International Journal of Bilingualism, 20(4), 421–452. https://doi.org/10.1177/1367006915609240

132.

*Meir

Walters

Armon-Lotem

(2017). Bi-directional cross-linguistic influence in bilingual Russian-Hebrew children. Linguistic Approaches to Bilingualism, 7(5), 514–553. https://doi.org/10.1075/lab.15007.mei

133.

*Méndez

L. I.

Simon-Cereijido

(2019). A view of the lexical–grammatical link in young Latinos with specific language impairment using language-specific and conceptual measures. Journal of Speech, Language, and Hearing Research, 62(6), 1775–1786. https://doi.org/10.1044/2019_JSLHR-L-18-0315

134.

Meyers

J. E.

Volkert

Diep

(2000). Sentence Repetition Test: Updated norms and clinical utility. Applied Neuropsychology, 7(3), 154–159. https://doi.org/10.1207/S15324826AN0703_6

135.

*Miciak

Ahmed

Capin

Francis

D. J.

(2022). The reading profiles of late elementary English learners with and without risk for dyslexia. Annals of Dyslexia, 72(2), 276–300. https://doi.org/10.1007/s11881-022-00254-4

136.

*Monsrud

M. B.

Rydland

Geva

Lyster

S. A. H.

(2022). First and second language sentence repetition: A screening measure for dual language learners? Language and Education, 36(4), 312–328. https://doi.org/10.1080/09500782.2022.2063059

137.

Muszyńska

Krajewski

Dynak

Garmann

N. G.

Romøren

A. S. H.

Łuniewska

. . .Haman

(2025). Bilingual children reach early language milestones at the same age as monolingual peers. Journal of Child Language, 1–24. https://doi.org/10.1017/S0305000924000655

138.

Natalicio

D. C.

Williams

(1971). Repetition as an oral language assessment technique. University of Texas.

139.

Newcomer

P. L.

Hammill

D. D.

(1997). Test of language development—Primary. 3. Pro-Ed.

140.

Nosek

B. A.

Hardwicke

T. E.

Moshontz

Allard

Corker

K. S.

Dreber

. . .Vazire

(2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73(1), 719–748. https://doi.org/10.1146/annurev-psych-020821-114157

141.

Ottem

Frost

(2005). Språk 6-16. Screening test. Language, 6–16.

142.

*Öwerdieck

Hamann

Ibrahim

L. A.

Dionne

Vidal Covas

(2021). Studying a Bilingual population’s production and comprehension of relative clauses longitudinally: Preliminary results. In Dionne

Vidal Covas

L.-A.

(Eds.), Proceedings of the 45th Annual Boston University Conference on Language Development (pp. 40–51). Cascadilla Press.

143.

Page

M. J.

McKenzie

J. E.

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

. . .Moher

(2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. International Journal of Surgery, 88, 105906. https://doi.org/10.1016/j.ijsu.2021.105906

144.

*Paradis

Jia

(2017). Bilingual children’s long-term outcomes in English as a second language: Language environment factors shape individual differences in catching up with monolinguals. Developmental Science, 20(1), e12433. https://doi.org/10.1111/desc.12433

145.

*Paradis

Soto-Corominas

Daskalaki

Chen

Gottardo

(2021). Morphosyntactic development in first generation Arabic—English children: The effect of cognitive, age, and input factors over time and across languages. Languages, 6(1), 51. https://doi.org/10.3390/languages6010051

146.

Pawłowska

(2014). Evaluation of three proposed markers for language impairment in English: A meta-analysis of diagnostic accuracy studies. Journal of Speech, Language, and Hearing Research, 57(6), 2261–2273. https://doi.org/10.1044/2014_JSLHR-L-13-0189

147.

*Peeters-Podgaevskaja

A. V.

Janssen

B. E.

Baker

A. E.

(2020). The acquisition of relative clauses in Russian and Polish in monolingual and bilingual children. Linguistic Approaches to Bilingualism, 10(2), 216–248. https://doi.org/10.1075/lab.17031.pee

148.

Peña

E. D.

Bedore

L. M.

Gutiérrez-Clellen

V. F.

Iglesias

Goldstein

B. A.

(2008). Bilingual English Spanish Oral Screener (BESOS) [Unpublished manuscript].

149.

Peña

E. D.

Bedore

L. M.

Lugo-Neris

Albudoor

(2020). Identifying developmental language disorder in school age bilinguals: Semantics, grammar, and narratives. Language Assessment Quarterly, 17(5), 541–558. https://doi.org/10.1080/15434303.2020.1827258

150.

Peña

E. D.

Gutierrez-Clellen

V. F.

Iglesias

Goldstein

B. A.

Bedore

L. M.

(2014). BESA: Bilingual English-Spanish Assessment Manual. AR-Clinical Publications.

151.

Peña

E. D.

Gutiérrez-Clellen

V. F.

Iglesias

Goldstein

B. A.

Bedore

L. M.

(2018). Bilingual English–Spanish Assessment (BESA). Brookes.

152.

Peng

C. Y. J.

Chen

L. T.

Chiang

H. M.

Chiang

Y. C.

(2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review, 25, 157–209. https://doi.org/10.1007/s10648-013-9218-2

153.

*Pérez-Leroux

A. T.

Cuza

Thomas

(2011). Clitic placement in Spanish–English bilingual children. Bilingualism: Language and Cognition, 14(2), 221–232. https://doi.org/10.1017/S1366728910000234

154.

*Petersen

D. B.

Gillam

R. B.

(2013). Accurately predicting future reading difficulty for bilingual Latino children at risk for language impairment. Learning Disabilities Research & Practice, 28(3), 113–128. https://doi.org/10.1111/ldrp.12014

155.

Pham

Ebert

K. D.

(2020). Diagnostic accuracy of sentence repetition and nonword repetition for developmental language disorder in Vietnamese. Journal of Speech, Language, and Hearing Research, 63(5), 1521–1536. https://doi.org/10.1044/2020_JSLHR-19-00366

156.

Polišenská

(2011). The influence of linguistic structure on memory span: Repetition tasks as a measure of language ability [Doctoral dissertation, City University London].

157.

Polišenská

Chiat

Fenton

Roy

(2020). Assessing young children from diverse backgrounds: Novel ways to measure language abilities and meet the requirements of the early years foundation stage. Languages, Society & Policy. https://doi.org/10.17863/CAM.54064

158.

Pratt

A. S.

Anaya

J. B.

Ramos

M. N.

Pham

Muñoz

Bedore

L. M.

Peña

E. D.

(2022). From a distance: Comparison of in-person and virtual assessments with adult–child dyads from linguistically diverse backgrounds. Language, Speech, and Hearing Services in Schools, 53(2), 360–375. https://doi.org/10.1044/2021_LSHSS-21-00070

159.

*Pratt

A. S.

Peña

E. D.

Bedore

L. M.

(2021). Sentence repetition with bilinguals with and without DLD: Differential effects of memory, vocabulary, and exposure. Bilingualism: Language and Cognition, 24(2), 305–318. https://doi.org/10.1017/S1366728920000498

160.

*Prentza

Tafiadis

Chondrogianni

Tsimpli

I. M.

(2022). Validation of a Greek Sentence Repetition task with typically developing monolingual and bilingual children. Journal of Psycholinguistic Research, 51(2), 373–395. https://doi.org/10.1007/s10936-022-09853-z

161.

*Pretorius

M. J.

Le Roux

Geertsema

(2022). Verbal working memory in second language reading comprehension: A correlational study. Communication Disorders Quarterly, 43(4), 234–245. https://doi.org/10.1177/1525740121991475

162.

Prévost

Tuller

Zebib

(2012). LITMUS-SR-French. François Rabelais University.

163.

*Quirk

(2020). Beyond exposure: Markers of English proficiency in school-aged French–English bilinguals (Doctoral dissertation, City University of New York).

164.

*Quirk

(2021). Interspeaker code-switching use in school-aged bilinguals and its relation with affective factors and language proficiency. Applied Psycholinguistics, 42(2), 367–393. https://doi.org/10.1017/S0142716420000752

165.

*Razak

R. B.

Cho

C. K. S.

Bakar

N. A. B. A.

(2021). Preliminary findings on a newly developed Malaysian multilingual sentence repetition task for multilingual Chinese children in Malaysia. IIUM Medical Journal Malaysia, 20(3), 487–509. https://doi.org/10.31436/imjm.v20i3.1704

166.

Redmond

S. M.

(2005). Differentiating SLI from ADHD using children’s sentence recall and production of past tense morphology. Clinical Linguistics & Phonetics, 19, 109–127. https://doi.org/10.1080/02699200410001669870

167.

Reilly

Tomblin

Law

McKean

Mensah

F. K.

Morgan

. . .Wake

(2014). Specific language impairment: A convenient label for whom? International Journal of Language & Communication Disorders, 49(4), 416–451. https://doi.org/10.1111/1460-6984.12102

168.

Rice

Redcay

(2016). Interaction matters: A perceived social partner alters the neural processing of human speech. NeuroImage, 129, 480–488. https://doi.org/10.1016/j.neuroimage.2015.11.041

169.

Riches

N. G.

(2012). Sentence repetition in children with specific language impairment: An investigation of underlying mechanisms. International Journal of Language & Communication Disorders, 47(5), 499–510. https://doi.org/10.1111/j.1460-6984.2012.00158.x

170.

*Rose

Armon-Lotem

Altman

(2022). Profiling bilingual children: Using monolingual assessment to inform diagnosis. Language, Speech, and Hearing Services in Schools, 53(2), 494–510. https://doi.org/10.1044/2021_LSHSS-21-00099

171.

Royle

Thordardottir

(2003). Le grand déménagement [French adaptation of the Recalling Sentences in Context subtest of the CELF–P]. Unpublished research tool. McGill University.

172.

Rujas

Mariscal

Murillo

Lázaro

(2021). Sentence repetition tasks to detect and prevent language difficulties: A scoping review. Children, 8(7), 578. https://doi.org/10.3390/children8070578

173.

*Șan

N. H.

(2023). Subordination in Turkish heritage children with and without developmental language impairment. Languages, 8(4), 239. https://doi.org/10.3390/languages8040239

174.

*Scheidnes

(2020). Sentence repetition and non-word repetition in early total French immersion. Applied Psycholinguistics, 41(1), 107–131. https://doi.org/10.1017/S0142716419000420

175.

Semel

Wiig

E. H.

Secord

W. A.

(1994). Clinical Evaluation of Language Fundamentals—Revised. The Psychological Corporation.

176.

Semel

Wiig

E. H.

Secord

W. A.

(2000). Clinical Evaluation of Language Fundamentals (CELF-IIIUK), 3rd ed. Psychological Corporation.

177.

Semel

Wiig

E. H.

Secord

W. A.

(2003). Clinical Evaluation of Language Fundamentals-4 (4th ed.). Psychological Corporation.

178.

Semel

Wiig

E. H.

Secord

W. A.

(2004). Clinical Evaluation of Language Fundamentals—Preschool 2. Pearson Assessment.

179.

Semel

Wiig

E. H.

Secord

W. A.

(2006). Clinical Evaluation of Language Fundamentals (4th ed.). Pearson Assessment.

180.

Sijtsma

(2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120. https://doi.org/10.1007/S11336-008-9101-0

181.

*Simard

Foucambert

Labelle

(2014). Examining the contribution of metasyntactic ability to reading comprehension among native and non-native speakers of French. International Journal of Bilingualism, 18(6), 586–604. https://doi.org/10.1177/1367006912452169

182.

Sımonardottir

Guðmundsson

(1996). TOLD-I, malþroskaprof, ıslensk staðfærsla [Language assessment test, Icelandic adaptation]. Rannsoknastofnun uppeldis- og menntamala.

183.

*Simon-Cereijido

(2009). Verb argument structure deficits in Spanish-speaking preschoolers with specific language impairment who are English language learners [Doctoral dissertation]. University of California, San Diego.

184.

*Simón-Cereijido

(2017). Sentence repetition in typical and atypical Spanish-speaking preschoolers who are English language learners. In Auza Benavides

Schwartz

R. G.

(Eds.), Language development and disorders in Spanish-speaking children (pp. 205–215). Springer.

185.

*Simon-Cereijido

Méndez

L. I.

(2018). Using language-specific and bilingual measures to explore lexical–grammatical links in young Latino dual-language learners. Language, Speech, and Hearing Services in Schools, 49(3), 537–550. https://doi.org/10.1044/2018_LSHSS-17-0058

186.

*Simon-Cereijido

Méndez

L. I.

(2020). Similarities and differences in the lexical-grammatical relation of young dual language learners with and without specific language impairment. Clinical Linguistics & Phonetics, 34(1–2), 92–109. https://doi.org/10.1080/02699206.2019.1611926

187.

Smith

(1973). An experimental approach to children’s linguistic competence. In Ferguson

Slobin

(Eds.), Studies of Child Language Development (p. 497).

188.

*Snow

C. E.

Hoefnagel-Höhle

(1978). The critical period for language acquisition: Evidence from second language learning. Child Development, 49, 1114–1128. https://doi.org/10.2307/1128751

189.

*Soesman

Walters

(2021). Codeswitching within prepositional phrases: Effects of switch site and directionality. International Journal of Bilingualism, 25(3), 747–771. https://doi.org/10.1177/13670069211000855

190.

*Soesman

Walters

Fichman

(2022). Language control and intra-sentential codeswitching among bilingual children with and without developmental language disorder. Languages, 7(4), 249. https://doi.org/10.3390/languages7040249

191.

*Sopata

Długosz

(2022a). Age of onset effects in child bilingual acquisition: Identifying the turning point. GEMA Online Journal of Language Studies, 22(3). https://doi.org/10.17576/gema-2022-2203-01

192.

*Sopata

Długosz

(2022b). The effects of language input on word order in German as a heritage and majority language. Language Acquisition, 29(2), 198–228. https://doi.org/10.1080/10489223.2021.1992409

193.

*Sopata

Długosz

Brehmer

Gielge

(2021). Cross-linguistic influence in simultaneous and early sequential acquisition: Null subjects and null objects in Polish-German bilingualism. International Journal of Bilingualism, 25(3), 687–707. https://doi.org/10.1177/1367006920988911

194.

*Soto-Corominas

Daskalaki

Paradis

Winters-Difani

Al Janaideh

(2022). Sources of variation at the onset of bilingualism: The differential effect of input factors, AOA, and cognitive skills on HL Arabic and L2 English syntax. Journal of Child Language, 49(4), 741–773. https://doi.org/10.1017/S0305000921000246

195.

*Stadtmiller

Lindner

Assunta

S. Ü. S. S.

Gagarina

(2022). Russian–German five-year-olds: What omissions in sentence repetition tell us about linguistic knowledge, memory skills and their interrelation. Journal of Child Language, 49(5), 869–896. https://doi.org/10.1017/S0305000921000325

196.

Stavrakaki

Tsimpli

I. M.

(2000). Diagnostic verbal IQ test for Greek preschool and school age children: Standardization, statistical analysis, psychometric properties. In Proceedings of the 8th symposium of the Panhellenic Association of Logopedists (pp. 95–106). Ellinika Grammata.

197.

Stokes

S. F.

Wong

A. M.

Fletcher

Leonard

L. B.

(2006). Nonword repetition and sentence repetition as clinical markers of specific language impairment: The case of Cantonese. Journal of Speech, Language, and Hearing Research, 49(2), 219–236. https://doi.org/10.1044/1092-4388(2006/019)

198.

*Taha

Carioti

Stucchi

Chailleux

Granocchio

Sarti

. . .Guasti

M. T.

(2022). Identifying the risk of dyslexia in bilingual children: The potential of language-dependent and language-independent tasks. Frontiers in Psychology, 13, 935935. https://doi.org/10.3389/fpsyg.2022.935935

199.

*Taliancich-Klinger

C. L.

Bedore

L. M.

Pena

E. D.

(2018). Preposition accuracy on a sentence repetition task in school age Spanish–English bilinguals. Journal of Child Language, 45(1), 97–119. https://doi.org/10.1017/S0305000917000125

200.

*Talli

Stavrakaki

(2020). Short-term memory, working memory and linguistic abilities in bilingual children with developmental language disorder. First Language, 40(4), 437–460. https://doi.org/10.1177/0142723719886954

201.

*Teitelbaum

(1977). The validity of various techniques measuring children’s bilingualism. Working Papers on Bilingualism, No. 13.

202.

*Thordardottir

E. T.

Brandeker

(2013). The effect of bilingual exposure versus language impairment on nonword repetition and sentence imitation scores. Journal of Communication Disorders, 46(1), 1–16. https://doi.org/10.1016/j.jcomdis.2012.08.002

203.

*Thordardottir

E. T.

Juliusdottir

A. G.

(2013). Icelandic as a second language: A longitudinal study of language knowledge and processing by school-age children. International Journal of Bilingual Education and Bilingualism, 16(4), 411–435. https://doi.org/10.1080/13670050.2012.693062

204.

*Thordardottir

E. T.

Rioux

E. J.

(2019). Does efficacy equal lasting impact? A study of intervention short term gains, impact on diagnostic status, and association with background variables. Folia Phoniatrica et Logopaedica, 71(2–3), 71–82. https://doi.org/10.1159/000493125

205.

Topbas

Güven

(2017). Türkçe Okul Çagı Dil Gelisimi Testi TODIL. Detay.

206.

*Torregrossa

Caloi

Listanti

Romano

(2023). The acquisition of syntactic structures in heritage Italian: Assessing the role of language exposure at critical periods. In Romano

F. B.

(Ed.), Studies in Italian as a heritage language (pp. 155–194). De Gruyter.

207.

*Torregrossa

Eisenbeiß

Bongartz

(2023). Boosting bilingual metalinguistic awareness under dual language activation: Some implications for bilingual education. Language Learning, 73(3), 683–722. https://doi.org/10.1111/lang.12552

208.

*Tribushinina

Mackaaij

(2023). Bilingual advantages in foreign language learning: Evidence from primary-school pupils with developmental language disorder. Frontiers in Education, 8, 1264120. https://doi.org/10.3389/feduc.2023.1264120

209.

*Trofimovich

Baker

(2007). Learning prosody and fluency characteristics of second language speech: The effect of experience on child learners’ acquisition of five suprasegmentals. Applied Psycholinguistics, 28(2), 251–276. https://doi.org/10.1017/S0142716407070130

210.

*Tuller

Abboud

Ferré

Fleckstein

Prévost

Dos Santos

. . .Zebib

(2015). Specific language impairment and bilingualism: Assembling the pieces. In Hamann

Rulgendijk

(Eds.), Language acquisition and development: Proceedings of Gala 2013 (pp. 533–567). Cambridge Scholars Publishing.

211.

*Tuller

Hamann

Chilla

Ferré

Morin

Prevost

. . .Zebib

(2018). Identifying language impairment in bilingual children in France and in Germany. International Journal of Language & Communication Disorders, 53(4), 888–904. https://doi.org/10.1111/1460-6984.12397

212.

*Verhoeven

L. T.

(1994). Transfer in bilingual development: The linguistic interdependence hypothesis revisited. Language Learning, 44(3), 381–415. https://doi.org/10.1111/j.1467-1770.1994.tb01112.x

213.

*Verhoeven

L. T.

Boeschoten

H. E.

(1986). First language acquisition in a second language submersion environment. Applied Psycholinguistics, 7(3), 241–255. https://doi.org/10.1017/S0142716400007554

214.

Verhoeven

L. T.

Narain

Extra

Konak

O. A.

Zerrouk

(1995). Toets tweetaligheid handleiding [Manual test for bilingualism]. Cito.

215.

Vinther

(2002). Elicited imitation: A brief overview. International Journal of Applied Linguistics, 12(1), 54–73. https://doi.org/10.1111/1473-4192.00024

216.

Volkers

(2018). Does an SLI label really restrict services? It all depends who you ask. Or perhaps more importantly, where they live. The ASHA Leader, 23(12), 54–61. https://doi.org/10.1044/leader.FTR2.23122018.54

217.

*Wagley

Marks

R. A.

Bedore

L. M.

Kovelman

(2022). Contributions of bilingual home environment and language proficiency on children’s Spanish–English reading outcomes. Child Development, 93(4), 881–899. https://doi.org/10.1111/cdev.13748

218.

Ward

Polisenska

Bannard

(2024). Sentence repetition as a diagnostic tool for developmental language disorder: A systematic review and meta-analysis. Journal of Speech, Language, and Hearing Research, 67(7), 2191–2221. https://doi.org/10.1044/2024_JSLHR-23-00490

219.

*Westman

Korkman

Mickos

Byring

(2008). Language profiles of monolingual and bilingual Finnish preschool children at risk for language impairment. International Journal of Language & Communication Disorders, 43(6), 699–711.

220.

*Whiteside

K. E.

Norbury

C. F.

(2017). The persistence and functional impact of English language difficulties experienced by children learning English as an additional language and monolingual peers. Journal of Speech, Language, and Hearing Research, 60(7), 2014–2030. https://doi.org/10.1044/2017_JSLHR-L-16-0318

221.

Wiig

E. H.

Secord

W. A.

Semel

(2004). Clinical evaluation of language fundamentals: Preschool (2nd ed.). Psychological Corporation.

222.

*Wofford

M. C.

Cano

Goodrich

J. M.

Fitton

(2022). Tell or retell? The role of task and language in Spanish–English narrative microstructure performance. Language, Speech, and Hearing Services in Schools, 53(2), 511–531. https://doi.org/10.1044/2021_LSHSS-21-00055

223.

Wolfe-Christensen

Callahan

J. L.

(2008). Current state of standardization adherence: A reflection of competency in psychological assessment. Training and Education in Professional Psychology, 2(2), 111. https://doi.org/10.1037/1931-3918.2.2.111

224.

*Wood

Hoge

(2019). Average change in sentence repetition by Spanish-English speaking children: Kindergarten to first grade. International Journal of Bilingual Education and Bilingualism, 22(6), 725–740. https://doi.org/10.1080/13670050.2017.1308310

225.

Woodcock

R. W.

McGrew

Mather

Schrank

(2007). Woodcock-Johnson III NU tests of achievement.

226.

Woodcock

R. W.

Munoz-Sandoval

A. F.

(1995). Woodcock Language Proficiency Battery—Revised, Spanish form. Riverside.

227.

*Woon

C. P.

Yap

N. T.

Lim

H. W.

Wong

B. E.

(2014). Measuring grammatical development in bilingual Mandarin-English speaking children with a Sentence Repetition Task. Journal of Education and Learning, 3(3), 144–157.

228.

*Yang

Chan

Gagarina

(2023). Left-behind experience and language proficiency predict narrative abilities in the home language of Kam-speaking minority children in China. Frontiers in Psychology, 13, 1059895. https://doi.org/10.3389/fpsyg.2022.1059895

229.

*Zebib

Tuller

Hamann

Abed Ibrahim

Prévost

(2020). Syntactic complexity and verbal working memory in bilingual children with and without developmental language disorder. First Language, 40(4), 461–484. https://doi.org/10.1177/0142723719888372

230.

Zhan

Tong

Liang

Guo

Lan

(2022). The effectiveness of gamification in programming education: Evidence from a meta-analysis. Computers and Education: Artificial Intelligence, 3, 100096. https://doi.org/10.1016/j.caeai.2022.100096

231.

*Ziethe

Eysholdt

Doellinger

(2013). Sentence repetition and digit span: Potential markers of bilingual children with suspected SLI? Logopedics Phoniatrics Vocology, 38(1), 1–10. https://doi.org/10.3109/14015439.2012.664652

232.

Zogmaister

Vezzoli

Facchin

Conte

F. P.

Rizzi

Giaquinto

. . .Simioni

(2024). Assessing the transparency of methods in scientific reporting. Collabra: Psychology, 10(1). https://doi.org/10.1525/collabra.121243

Differences in sentence repetition tasks in studies on bilingual children’s language development: A systematic review

Abstract

Aims and Objectives:

Methodology:

Data and Analysis:

Findings:

Originality:

Implications:

Keywords

Introduction

Sentence repetition tasks in bilingual children

COST Action IS0804 and the LITMUS-SRep

The current study

Method

Results

Publication years

Study aims and tasks used in conjunction with SRep

Study aims

Tasks used in conjunction with SRep tasks

Languages measured using SRep tasks

Samples

SRep task features and reporting

SRep task standardization

SRep task length and presence of practice items

Task length

Practice items

Computerization of SRep tasks

Prerecorded SRep sentences

Visual stimuli in SRep tasks

Feedback given to children

SRep task scoring schemes

Discussion

Limitations and future directions

Contribution

Footnotes

ORCID iDs

Funding

Declaration of conflicting interests

Supplemental material

Notes

Author biographies

References