Abstract
The Singing Voice Development Measure (SVDM) was created to provide some consistency with what children are asked to sing, how their singing is prompted, and how their singing is evaluated, particularly in research studies. Although the SVDM patterns encouraged children to use all registers of their singing voices, these patterns had an upper limit of B-flat4, which is just touching on upper register (B-flat4 and above). Some children can push their middle register to sing a B-flat or higher and would therefore be scored as a “singer” when really, they are “initial range singers.” Therefore, the purpose of this study was to investigate if children sang better and were more likely to use upper register if prompted to echo a set of patterns with a range of D4 to D5 rather than the original patterns that had a range of C#4 to B-flat4. Kindergarteners (n = 80) and first graders (n = 92) echoed a teacher singing both sets of patterns on text and a neutral syllable. Two raters scored all the recordings, and interrater and intrarater reliability estimates were all strong. Correlation coefficients, exploring the convergent validity of the modified patterns, were all significant and of moderate strength. The revised set of patterns are valid and can be used in subsequent research studies.
Children’s singing behaviors have been of interest to researchers for almost a century (i.e., Jersild & Bienstock, 1931). The methods for assessing singing have been quite diverse with respect to what children have been asked to sing, how their singing was prompted or initiated, and how their performances have been evaluated, particularly in research studies. It would seem all these procedural decisions for assessment are important but are often based on tradition rather than empirical investigation, particularly with regard to what children are asked to sing.
Rutkowski (1986, 1990, 1996, 2018, 2019) created the Singing Voice Development Measure (SVDM) to provide some consistency in evaluating children’s singing. The scale for evaluation (see Appendix) focuses on a child’s use of singing voice, that is, the vocal registers the child is able to access when singing. Previous scales focused on pitch accuracy (Brophy, 1997; Demorest et al., 2017; Goetze & Hori, 1989; Hickey, 1995; Liao & Davidson, 2016; Moore, 1994; Nichols, 2016). However, Rutkowski contended that if a child does not have use of, for example “initial range,” then that child would not be able to sing patterns or songs in tune with a range of D4 to A4. When investigating if “kindergarten and first grade children sing patterns more accurately if the pitches of those patterns fall within their accessible registers,” Rutkowski (2015, p. 286) found “children are more accurate when they are assessed with accessible patterns based on their SVDM classification” (p. 289). She concluded, “For children who cannot yet sing in all registers, a focus on how well they sing in tune may not help them learn to sing with more success in the long run” (p. 290). Therefore, evaluating children’s use of singing voice is important for teachers, and SVDM can be used informally in classroom settings during regular music activities (see Rutkowski, 2018).
However, when conducting research, it is especially critical to be mindful of what children are asked to sing and how children’s singing is prompted. Over a period of 30-some years of study, Rutkowski (1985, 1986, 1990, 1993, 1996, 2010, 2018, 2019) reported echoing three-tone patterns to be the most efficient means of obtaining singing performances for evaluation. Guerrini (2006) also found children to be significantly more accurate when singing three-tone patterns. Therefore, it seems pattern singing is easier for children.
Rutkowski designed patterns for the SVDM with more restricted ranges but that prompted children to use all registers of their singing voice (see Figure 1, “See the Bird”). The initial thought was to provide an opportunity for children to sing above the lift to upper register but to have the prompts remain basically within what was thought to be the initial singing range of children (for a thorough review of children’s singing ranges, see Phillips, 2014; Welch, 1979). The original patterns had an upper limit of B-flat4, which is just touching on the upper register (B-flat4 and above). However, it has become clear that some children can push their middle register to sing a B-flat4 or even higher. Therefore, an initial range singer may be able to force out the B-flat but not truly be using upper register. This could result in an inappropriate assessment of that child’s use of singing voice, rating the child as a “singer” when really the child is an initial range singer. In hindsight, patterns encompassing a wider range, particularly one that prompts a jump into the upper register (from A4 up to D5), may result in more valid performances and therefore more accurate assessments. A child not accessing the upper register yet would not be able to sing the D5 and therefore would not be scored a singer using the revised patterns. Because Wassum (1979) found that 50% of elementary school children could sing an octave, using an octave in patterns for use of singing voice assessment seems reasonable.

Singing Voice Development Measure patterns: “See the Bird” (original) and “Feel the Wind” (revised).
Purpose and Research Questions
Therefore, the purpose of this study was to investigate if children were more likely to use upper register if prompted to echo a set of patterns with a range of D4 to D5 (see Figure 1, “Feel the Wind”) rather than the original patterns with a more limited range of C#4 to B-flat4 (see Figure 1, “See the Bird”).
The research questions we posed included the following: (1) Are rater reliabilities (both intrarater and interrater) acceptable for both sets of patterns? (2) Does a significant difference exist in SVDM scores as a function of the two sets of patterns? (3) Does a significant difference exist between neutral syllable and text performances for within each set of patterns and between each set of patterns? (4) Do significant sex or grade-level differences exist for any of the aforementioned? and (5) Is the new set of patterns a valid task?
Method
Modified Patterns
Rutkowski explored the efficacy and efficiency of prompting children by having them sing a song and echoing patterns and determined echoing patterns to be the most efficient (i.e., Rutkowski, 1986, 1996). The set of patterns used for assessment since 1996 are presented in Figure 1 (“See the Bird”). For this study, the third and fourth measures of “Bird” were modified to include a range of an octave, D4 to D5 (see Figure 1, “Fee the Wind”). The D5 above the register lift was approached by a lift, followed by descending pitches. New text was devised so as not to confuse the two sets of patterns, resulting in “Feel the Wind” (“Wind”). Although neither set of patterns should be considered a song, the patterns logically follow each other and are framed around a typical Western chord progression.
Participants
Kindergarten (n = 126) and first-grade (n = 122) students who attended two elementary schools in New York State were given the opportunity to participate in the assessment. Of that initial number, 46 kindergarteners and 30 first graders either did not engage in the task or did not complete the task, or their parents did not give consent for their participation. The final sample consisted of 80 kindergarteners (n = 43 females) and 92 first graders (n = 51 females), comprising 69.4% of the population.
The teachers’ school district required strict oversight of the study after deeming the study ethical and appropriate. All parental consent forms were signed and collected, and participants provided verbal assent as part of the data collection process. All authors only had access to de-identified data, and anonymity was maintained throughout.
The Assessment Task
Two of the four researchers on this project were the teachers of the participants. They collected student responses to sung prompts through 1:1 devices (iPads) using a platform called Seesaw. The Seesaw website mentions the following:
Seesaw is a platform for student engagement. Teachers can empower participants to create, reflect, share, and collaborate. Participants “show what they know” using photos, videos, drawings, text, PDFs, and links. It’s simple to get student work in one place and share it with families. (Seesaw Learning, Inc., n.d.)
Seesaw platform
Seesaw allows students to have individual journals for each class in which they are enrolled. Teachers for these classes push assignments out virtually to students to complete independently. Teachers can design activities via slides that users can navigate independently. Assignments can contain media, including pictures, audio, and video, that can be embedded into individual slides within an activity. Students sign into the program using individual sign-in credentials and access their class assignments. They can respond to the media or assignments in a variety of ways, including but not limited to virtual drawing tools, audio recording, and video recording. Students in this study were given assignments with the patterns for the SVDM contained on individual slides in an assignment they could navigate through independently.
Seesaw activities
Using Seesaw, the teachers constructed the SVDM activities across 17 slides. The first slide included written and spoken instructions followed by 16 additional slides, each slide housing one three-note pattern. The two sets of patterns, “See the Bird” and “Feel the Wind” (see Figure 1), were sung by an adult female and had two versions (neutral syllable and text), consisting of eight slides each. To counterbalance, one version presented participants with the patterns using song text (T) first and the neutral syllable (NS) “bum” second. The other version introduced the patterns with NS first and T second.
Participants tapped an audio icon to hear the pattern. They then tapped a microphone icon to record themselves echoing the pattern (see Appendix). Upon completion, the individual pattern responses were downloaded from Seesaw, which automatically includes three formats: MP3, MP4, and png files. Researchers used the MP3 and MP4 files for subsequent analyses.
Seesaw administration
To control for order effect, participants were randomly assigned to one of eight groups. Teachers assigned half of each class to complete the “Bird” set of patterns first and the “Wind” set of patterns second. The other half did the reverse. Additionally, half of each group in each class completed the activity with T first and NS second. The other half of each group completed NS first and T second for their respective activities. They completed the opposite for the second activity. The activities were administered in a classroom setting. Half the class was assigned to their 1:1 devices (iPads) to complete the SVDM Seesaw activities independently. The other students completed other quiet tasks. In the following music class (approximately 3–6 days later), the students switched groups: Those students who had previously completed the SVDM tasks completed other activities, and the rest of the class completed the SVDM activities. All assessments were designed to be self-administered, with students selecting icons that would play the patterns that had been prerecorded by the researchers and then using the audio recording feature in Seesaw to record their response.
Participants completed the assessment in small group centers in a classroom setting. Students wore headphones and sat in relatively separate locations throughout the classroom to complete the work independently. The order of the assignments was based on their assigned group for the study (which song they did first, T or NS first, etc.). Students would listen to the pattern on each page and then respond using the audio recorder/microphone on their 1:1 iPad. Audio files were then downloaded and compiled for each student for scoring.
The activities were completed over the course of four to six class periods. Some participants required significant teacher assistance, such as but not limited to teacher tapping icons, repeated directions or patterns, or a separate testing location with one-on-one support. Because participants completed the activity in a nonclinical classroom setting, where outside stimuli were present from other participants who were completing the activity, other participant voices may be heard in the background on select recordings. However, the interference of other voices was addressed using strategies such as (but not limited to) having students wear headphones, giving them space of four to six feet from peers, encouraging students to face different directions from peers, and limiting the number of students singing at one time (only half of the class would have been singing at a time while the other half was occupied with quiet tasks on other music-making applications at the time; the half that were singing were typically split into two separate groups that were singing in opposite corners of the room). In addition, these students have participated in similar tasks in the past and were relatively familiar with this procedure.
Scoring
Upon completion of the tasks, multimedia files were downloaded from the Seesaw platform and organized into individual digital participant folders. Identifying information was removed. Participants were assigned a random participant number to reinforce anonymity. The teachers independently scored all participant measures using the SVDM scoring scale (Rutkowski, 2010, 2018). The SVDM has nine scoring levels (see Appendix). However, for simplicity of rating, the measure is anchored with five levels, with levels indicating inconsistent register usage in between.
Results
To establish intrarater reliability, each teacher randomly selected 12% of cases (n = 20) from their respective schools to reassess. Using SPSS 28, intraclass correlation (ICC) estimates were conducted (95% confidence intervals [CIs], single measurement, absolute agreement, two-way mixed-effects model). ICC coefficients were excellent for Teacher 1 (ICC = .94, 95% CI = [.90, .96]) and Teacher 2 (ICC = .96, 95% CI = [.93, .97]; Koo & Li, 2016).
To establish interrater reliability, each teacher independently scored all tasks from their own schools and each other’s schools. ICC coefficients with the same parameters as described previously were excellent for scores overall (ICC = .96, 95% CI = [.95, .96]), by pattern set (“Bird”: ICC = .97, 95% CI = [.96, .98]; “Wind”: ICC = .93, 95% CI = [.91, .95]), and by performance of either NS (ICC = .95, 95% CI = [.94, .96]) or T (ICC = .96, 95% CI = [.95, .97]). Therefore, scores from both teachers were combined for subsequent analyses.
We explored the convergent validity of the modified patterns from “Wind” to “Bird” by running correlations and paired-sample t tests across the four sets of SVDM scores (Table 1). Scatterplots indicated that relationships between scores were monotonic, linear, and without outliers. The scores failed the assumption of normality; we proceeded with our correlation analysis using Spearman’s rho. All correlations were positive and statistically significant (p < .001). Strength of association was determined using cutoffs recommended by Schober et al. (2018). All correlations were moderate in strength (rs = .49–.63, p < .001).
Correlations Between “See the Bird” (“Bird”) and “Feel the Wind” (“Wind”) by Task.
Note. All Spearman’s rho correlations were significant (p < .01). NS = song sung with a neutral syllable; T = song sung with text.
We utilized two paired-sample t tests to explore differences between scores as a function of the two sets of patterns sung on a neutral syllable versus with text. Box plots revealed one outlier, but inspection of the values did not warrant omitting or revising the case. Difference scores for the two sets of patterns sung with a NS and two sets of patterns sung with T were normally distributed.
Participants scored slightly higher when performing “Wind” with an NS (M = 8.97, SD = 1.05) than when performing “Bird” with an NS (M = 8.86, SD = 1.39); however, the difference in scores was not statistically significant, t(119) = 0.96, p = .34, Cohen’s d = 0.09. Participants also scored very slightly higher on “Wind” when singing with T (M = 8.70, SD = 1.37) than on “Bird” when singing with T (M = 8.67, SD = 1.56), but just as with neutral syllables, the difference in scores was not statistically significant, t(119) = 0.28, p = .78, Cohen’s d = 0.03 (Table 2).
Singing Voice Development Measure Means and Standard Deviations (in Parentheses) by Gender and Grade.
Note. Scores represent the sum of two raters’ scores. “Bird” = “See the Bird”; “Wind” = “Feel the Wind”; NS = song sung with a neutral syllable; T = song sung with text.
A two-way multivariate analysis of variance (MANOVA) was used to explore significant differences across SVDM scores of four tasks (“Bird” sung with an NS, “Bird” sung with T, “Wind” sung with an NS, and “Wind” sung with T) between grade level (kindergarten/first) and student sex (male/female). We assessed the data for univariate outliers and normality. One multivariate outlier was omitted from subsequent analyses. No other multivariate outliers existed per Mahalanobis distance (p > .001). Data violated the assumption of normality for kindergarten females, first-grade females, and first-grade males (p < .0125). We decided to continue with the data analysis given the robustness of MANOVA when it comes to violations of normality (Bray & Maxwell, 1985). We found the assumption of linearity tenable via scatterplot matrices, assumption of no multicollinearity tenable via Pearson correlation coefficients (r > .9), and assumption for homogeneity of variance tenable via Levene’s tests (p > .05).
The assumption of homogeneity of covariance matrices was violated (Box’s M test, p < .001); we proceeded given that our sample sizes were relatively balanced such that all groups were between a 1.5:1 and 2:1 ratio (Huberty & Olejnik, 2006) and used Pillai’s trace when interpreting our results (Olson, 1976; Tabachnick & Fidell, 2014) given that we had unequal group sizes.
Using a two-way MANOVA, we did not find a statistically significant interaction effect between sex and grade on the combined dependent variables, F(4, 104) = 1.92, p = .113, Pillai’s trace = .07, η p 2 = .07. We therefore proceeded to interpret the main effects. There was a statistically significant effect for grade level, F(4, 104) = 3.83, p = .006, Pillai’s trace = .13, ηp2 = .13, and sex, F(4, 104) = 3.466, p = .011, Pillai’s trace = .12, ηp2 = .12, on the combined dependent variables.
Because both of our independent variables were significant, we continued to interpret our univariate main effects for grade level and sex separately. Regarding grade level, there was a statistically significant main effect for “Bird” sung with NS, F(1, 110) = 6.901, p = .01, Cohen’s d = .046; “Bird” sung with T, F(1, 110) = 13.47, p < .001, Cohen’s d = 0.59; “Wind” sung with NS, F(1, 110) = 8.45, p = .004, Cohen’s d = 0.48; and “Wind” sung with T, F(1, 110) = 8.65, p = .004, Cohen’s d = 0.51. First-grade scores were significantly higher than kindergarten scores for all dependent variables, although medium effect sizes were only found for “Bird” sung with T and “Wind” sung with T.
Regarding sex, there were statistically significant main effects for “Bird” sung with T, F(1, 110) = 12.48, p < .001, Cohen’s d = 0.53, and “Wind” sung with NS, F(1, 110) = 6.15, p = .015, Cohen’s d = 0.34. Female scores were higher than male scores for all dependent variables but only significantly higher with a medium effect size for “Bird” sung with T. All other differences yielded small effect sizes.
Summary, Discussion, Recommendations, Conclusions
Summary
Based on the analysis, we answered the research questions as follows.
Are Rater Reliabilities (Both Intrarater and Interrater) Acceptable for Both Sets of Patterns?
Yes, all reliabilities were acceptable and consistently high, as in other studies using SVDM.
Does a Significant Difference Exist in SVDM Scores as a Function of the Two Sets of Patterns (“See the Bird,” “Feel the Wind”)?
No, significant differences did not emerge.
Does a Significant Difference Exist Between NS and T Performances Within Each Set of Patterns and Between Each Set of Patterns?
No, significant differences did not emerge.
Do Significant Sex or Grade-Level Differences Exist for Any of the Aforementioned?
First-grade scores were significantly higher than kindergarten scores for all tasks. However, those differences were more noticeable for the NS performances. Female scores were significantly higher than male, most noticeably for “Bird” sung with T and “Wind” sung with an NS.
Is the New Set of Patterns a Valid Task?
Correlation coefficients, exploring the convergent validity of the modified patterns, were all significant and moderate, providing evidence that the new set of patterns (“Wind”) may be a valid task.
Discussion
Given the lack of significant differences between the two sets of patterns, we posit that the modified patterns also may be used to assess children’s use of singing voice. We note that scores for this sample of kindergarten and first-grade children were quite high, particularly in comparison with previous studies (Levinowitz et al., 1998; Rutkowski, 1996). It is possible that different results may emerge with a sample of children who do not yet have such good use of their singing voices.
Of the 248 children in these kindergarten and first-grade classes, data from only 172 were available for analysis. It is possible the children who were uncomfortable or nervous about singing did not participate at all. If this was the case, that may explain why the mean scores were higher than reported in other studies with children these ages, as noted previously. However, only 14 children fell into this category, and seven of those were because parents withdrew their child’s participation, whereas seven additional children chose to not participate. The remaining children (n = 62; n = 21 kindergartners, n = 41 first graders) participated but did not complete the task. One of the teachers provided additional context:
In this instance, we did not necessarily “recruit” students as we allowed students already enrolled in our classes to “opt out” of the study. Parents were notified virtually and in written form of the study and what it entailed. We had very few students opt out of the study. In fact, if I recall, the students who opted out were not necessarily students I would have expected to score lower on the scale but students whose parents are more wary of their child’s data being shared with persons unknown. In this case, the bias for students who would “opt in” and “opt out” due to their anxiety around singing is less of a concern of mine. (Personal communication, May 16, 2025)
The majority of those for whom we did not have complete data either did not complete the task or had issues that emerged with recordings. We therefore concluded that lack of comfort singing did not contribute to the relatively high scores for this sample of children.
We did not find a significant interaction effect between grade and sex; we did, however, find significant main effects for grade and sex. As illustrated by a meta-analysis on the topic of singing ability in children, children’s use of their singing voice tends to improve between kindergarten and first grade (e.g., Rutkowski & Miller, 2003; Svec, 2018), particularly when singing instruction is included in general music classes. Grade-level differences, supported by our study, were expected. However, sex differences were not expected. Although previous studies investigating singing accuracy often find girls’ singing to be better than boys’ when assessed (e.g., Goetze, 1985; Goetze & Hori, 1989), previous studies focused on use of singing voice with the SVDM as the assessment tool have either not typically found sex differences (Rutkowski, 1986; Rutkowski et al., 2002) or where sex differences were found, yielded small effect sizes (Svec, 2018), as was the case in the current study. Although female scores were higher than male scores for all dependent variables, the only meaningful effect size was for “Bird” sung with T. It is possible girls develop use of their singing voice earlier than boys, but those differences did not emerge in previous studies given the overall lower scores. The overall use of singing voice with this sample was quite developed, so those differences may have emerged as a result.
Assessing the singing voices of this large of a sample of students is certainly a challenge. Having each student complete the task in a room separate from other students would certainly be ideal. However, that is just not practical in the school setting. Using small-group centers and having each child wear headphones seemed the most efficient means of data collection. Although the raters may have heard background noise in the recordings, it is unlikely this distracted the children because they wore headphones to complete the tasks.
As with most research, we recommend that additional studies be conducted to explore whether assessing use of singing voice with patterns using a higher range results in a more accurate picture of children’s use of singing voice. Studies with children whose singing voices are not as developed would be particularly insightful. It would also be beneficial to assess students representing a variety of pedagogical approaches.
We suggest teachers and researchers consider using the modified “Feel the Wind” patterns in their work. We found evidence of convergent validity, as indicated by the scores from the two sets of patterns having been moderately correlated, and t test results were not statistically significant. That may not be the case with a sample of children with less developed use of singing voice given that the modified “Wind” patterns require children to use more of their extended range. This would therefore enable the modified patterns to possibly discriminate better with such a sample.
The protocol used in this study for prompting and recording the children’s singing voices (Seesaw) provided an efficient means of administering the assessment in a classroom setting. Children were comfortable using the technology, and the technology enabled the teachers to minimize the time spent on assessment. We recommend teachers explore this tool or other similar tools for assessing their students’ use of singing voice. It is also curious that the SVDM scores of these children were much higher than in previous studies, as noted previously. Because students of two music teachers were participants, this result does not seem to be simply a teacher effect. An in-depth study of these teachers’ practices would be insightful.
The SVDM continues to provide a valid and reliable means of assessing children’s use of singing voice. Although both sets of patterns yielded similar scores for this sample of children and scores were correlated, the patterns employing a larger range did not hinder the children’s use of singing voice. Assessing children’s singing to provide nurturing environments and helpful strategies to assist all children in gaining full use of their singing voices should continue to remain one of the primary goals of music classes, particularly at the elementary level.
Footnotes
Appendix
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
