Development of phonologically-balanced and perceptually equivalent Singapore Mandarin word lists for word recognition test

Abstract

Background

Word recognition tests must be conducted in the native language of the listener to obtain valid word recognition scores. Existing Singapore Mandarin speech audiometry test materials that are disyllabic lack in sensitivity and are not phonologically balanced.

Objective

The purpose of this study was to address those limitations and develop a set of word lists for testing speakers of Singapore Mandarin.

Methods

The word lists were developed over three phases. (1) Frequently occurring words were chosen from a database consisting of subtitles from local Mandarin programmes, and subjected to familiarity rating by 50 native Singapore Mandarin speakers. (2) Phonologically balanced word lists were created and professionally recorded using words rated as familiar in phase 1. (3) Psychometric curves of words were obtained from 20 normal-hearing native speakers and word lists were analysed for perceptual equivalence.

Results

Sixteen phonologically balanced word lists consisting of 25 monosyllables each were created in phase 2. Eight of the 16 lists were found to be perceptually equivalent.

Conclusion

Word lists developed in this study addressed the limitations of existing Singapore Mandarin speech audiometry materials. The word lists need to be validated with normal-hearing and hearing-impaired Singapore Mandarin speakers before they can be used clinically.

Keywords

Speech audiometry speech recognition tests Singapore Mandarin speech perception speech discrimination tests

Introduction

Speech audiometry is an integral part of the audiologic battery of tests. By using speech stimuli, it supplements information that pure tone audiometry and other tests within the battery are unable to provide. Speech audiometry can be used in validating pure tone audiograms, providing a differential diagnosis and assessing the amount of benefit that a hearing device is providing for an individual with hearing impairment.^1–3

It is well documented that speech audiometry should be conducted in the native language of the patient,^4,5 so as to minimise the chances of low speech scores being confounded by poor linguistic ability. As such, many countries have developed speech audiometry test materials in their non-English native language for testing their own population.^6–8 For example, Mandarin speech audiometry materials have been developed in China^9,10 and Taiwan¹¹ for their primarily Mandarin-speaking populations.

In Singapore where the total population is 5.69 million, 74.2% of the resident population is ethnic Chinese.¹² A total of 1.08 million residents who are over 5 years of age cited Mandarin as the language that they speak most frequently at home.¹³ There is thus a need for Mandarin speech audiometry materials to be available in Singapore to test this group of Mandarin speaking residents. This was the motivation behind the development of the SC-10 – a set of word recognition test materials in Singapore Mandarin by Lee and Lee.¹⁴ Soh and Loo¹⁵ further supported this effort by Lee and Lee,¹⁴ as the Mandarin speech audiometry materials developed in China and Taiwan were found to contain words that are unfamiliar to Singapore Mandarin speakers, and are hence not suitable to be adapted for local use.

The SC-10 Singapore Mandarin speech test materials developed by Lee and Lee¹⁴ contains 10 lists of 10 disyllabic word pairs. The words were chosen from The Frequency Dictionary of Daily Chinese Words Encountered by Singapore Students.¹⁶ Two-stage elimination process was carried out to shortlist the appropriate and most familiar words for final recording. The final wordlists were validated on 25 normal hearing subjects; test-retest reliability was established as there was no significant difference on test and retest scores for 20 of the normal hearing subjects. Although the SC-10 was developed with good intention for practical use in clinical setting, there are a few flaws with this existing Singapore Mandarin speech audiometry material.

First, Lee and Lee¹⁴ rationalised that disyllabic words were more appropriate to be used in developing Mandarin speech test materials, as Mandarin words are hardly monosyllabic. However, the authors did not acknowledge the possibility of increased contextual cues in disyllabic words that might affect the speech recognition scores of hearing-impaired individuals. Word recognition (or discrimination) test materials in English have been traditionally designed to be monosyllabic with good reason. An increase in number of syllables within a word increases the chance of a word being heard correctly, due to the presence of contextual cues.¹⁷ In Mandarin, the contextual cues added to each item by having an additional word is two-fold in terms of the possible word pairings and tones of the word pairs.

Validation of SC-10 had only been conducted with normal hearing subjects¹⁴. In contrast, Soh¹⁸ conducted a preliminary study using the SC-10 word lists to test normal hearing participants (n = 29) as well as hearing-impaired participants (n = 23). The study found that the hearing-impaired participants with mild to severe sensorineural hearing loss (SNHL) scored as well as the normal hearing participants, achieving a maximum score of 95%–100% for speech recognition tests in quiet. This is inconsistent with existing literature for maximum scores in English word recognition tests, which are expected to decrease with greater severity of SNHL.^19–21 In other words, the SC-10 wordlists are not sensitive in differentiating the hearing-impaired individuals from the normal hearing population.

Soh¹⁸ posited that the unexpectedly good performance of the hearing-impaired group using the SC-10 wordlists might be due to the rich contextual cues present in disyllabic Mandarin wordlists. Mandarin is a tonal language with most of the word finals and suprasegmental tones present in low frequencies. As most of the participants with SNHL had better hearing in lower frequencies as compared to high frequencies, it is not surprising that they were able to correctly guess the words in SC-10 based on the possible final and tonal pairings. Thus, the resulting scores were inflated by the presence of contextual cues derived from possible word pairings, instead of a true reflection of the discrimination difficulties faced by a person with hearing-impairment.

Second, Lee and Lee¹⁴ also claimed that the need to have a phonetically balanced wordlist for speech audiometry materials remains debatable based on the report by Martin, Champlin and Perez.²² Hence, the SC-10 wordlists have been designed based on word familiarity, instead of meeting the criteria of phonetic or phonemic balance. The concept of phonemic balance in speech audiometry^17,23,24 serves the purpose of ensuring different sounds occurring in the language are tested when word recognition tests are conducted. This enables clinicians to identify specific sounds that the patient might have more difficulty in recognising. However, Mandarin words have a different structure from English words. Therefore, instead of phonetic or phonemic balance, an equivalent concept in Mandarin would be that of phonological balance, where initial consonants, finals and tones are balanced in the same way that they occur naturally throughout the language.⁹ Although Lee and Lee¹⁴ recognised the importance of phonological balance by having sufficient words (at least 50) per list, the authors only created 10 words per list in the SC-10 for clinical use, citing practicality as the main reason. Hence, the SC-10 wordlists are limited in their ability to incorporate phonological balance due to the brevity of each list.

Moreover, Thornton and Raffin²⁵ found that the variability of word recognition scores (WRS) increases when the number of items within the lists decreases. At a word recognition score of 50% where standard deviation is the largest, the intra-subject standard deviation for a 50-item list is around 7%. This increases to approximately 10% for a 25-item list, and further increases to about 16% for a 10-item list. Hence, 10-item word lists such as those in the SC-10 may not be reliable in providing an estimate of a patient’s speech recognition abilities, even though they take the least amount of time to administer.

As such, the aim of this study is to develop an alternative set of Singapore Mandarin word recognition test materials that will address the identified gaps of the existing SC-10 wordlists. This new set of monosyllabic speech audiometry materials has been designed to incorporate phonological balance into each word list, such that the proportions of initial consonants, finals and tones within each word list mimic their respective frequencies of occurrence in Singapore Mandarin. To maintain a balance between clinical efficiency and a reasonable standard deviation in the obtained WRS, 25 items (monosyllabic words) were created for each list. As the materials are ultimately meant for auditory testing, emphasis was placed on sources of spoken Singapore Mandarin instead of written texts. With reference to Brysbaert and New’s²⁶ study which found film subtitles to be good estimates of word frequencies, a database of spoken Singapore Mandarin samples was approximated from subtitles of local Mandarin television programmes. This database served as the basis on which the occurrence frequencies of the initial consonants, finals and tones were determined. Phonologically-balanced word lists were created based on analysis of this database, and the word lists were subsequently tested to be perceptually equivalent to one another.

Methods

The development of the word recognition test materials in this study involved three phases: (1) selection of suitable words to be included as test items, (2) creation of phonologically-balanced word lists and (3) establishing the perceptual equivalence of word lists.

Phase 1: Selection of suitable words and familiarity rating

It is important that the test materials are made up of words that are familiar to the average speaker within the test population, so that test results are not affected by one’s language competency.⁵ Conventionally, corpora of written texts have been used as sources of test items to be included in a speech recognition test.^17,23,27 There are limitations to this approach as written texts are often edited to read better and have fewer repetitions.²⁶ This could mean the inclusion of words that would not normally be used in spoken speech. On the other hand, building a corpus of spoken speech can be extremely laborious as it involves recording spontaneous speech and transcribing the collected data thereafter.²⁸ Brysbaert and New²⁶ proposed using film subtitles as a convenient and valid way to establish a speech corpus, since subtitles are essentially transcriptions of spoken speech. Instead of using local films which are produced at a rate of only about 11 films per year with poor attendance,²⁹ a database of commonly spoken Singapore Mandarin words was built using subtitles from local Mandarin variety shows on free-to-air channels.

Free-to-air television has traditionally enjoyed high viewership in Singapore, and the two available Mandarin channels are among the top three channels with the highest reach.³⁰ The variety shows that had their subtitles included were those that were produced locally and featured local hosts and guests. In total, over 45 hours of content from 15 different programmes were included. The programmes covered a range of themes such as food, media, health, and Singaporean history. The total number of words extracted from the subtitles of these programmes was over 500,000, and the number of unique Mandarin characters totalled up to 3,391. These words formed the database of words that were commonly spoken and heard by the average Singapore Mandarin speaker.

With the intention to create at least 16 initial word lists of 25 monosyllables each, the top 500 most frequently occurring characters in the database were chosen. These 500 characters were divided into non-overlapping sub-lists of 100 characters each and subjected to a familiarity rating exercise by Singapore Mandarin speakers. Participants had to be born and raised in Singapore, or have grown up in Singapore ever since they began formal education. They needed to be able to read local Mandarin newspapers and be conversant in Mandarin. Ethics approval for the study in this phase was granted by the National University of Singapore Institutional Review Board (S-20-012). Participants provided informed written consent before participating in the study.

Fifty Singaporean participants, aged between 40 and 90 years old, were recruited for this familiarity rating exercise. This age group was chosen because it has the highest number of native speakers of Singapore Mandarin, as Chinese schools (that used Chinese as the language of instruction) were as popular as schools that taught using English as a medium (English schools) in early years. Due to the change in education policies four decades ago, English has become the main instructional medium used in schools. Hence, the majority of the Singaporeans younger than 40 years of age are primarily English speakers.³¹

During the familiarity rating exercise, each participant was presented with a sub-list of the 100 characters in print, with the Hanyu Pinyin accompanying each character. For each character, they were asked “How frequently do you hear this word in your daily life?” and instructed to provide a rating on a 5-point Likert scale (1-Never; 2-Almost never; 3-Sometimes; 4-Frequently; 5-Always). The mean rating was calculated for each of the characters, and characters that were rated as familiar were used in the creation of phonologically balanced word lists.

Phase 2: Creation of phonologically balanced word lists

In order to create phonologically balanced word lists, the occurrence frequencies of each initial consonant, final and tone had to be determined by analysing the entire database of spoken Singapore Mandarin. With the relative percentages, it would be possible to determine how many occurrences there should be for each initial consonant, final or tone within each list of 25 monosyllables. This was done by analysing each unique character within the database and deconstructing the pronunciation of the characters into their respective initial consonants, finals and tones. For example, if the character was “包” (bāo), it would be deconstructed into the initial consonant ‘b’, the final ‘ao’ and Tone 1. The initial consonant ‘b’, final ‘ao’ and Tone 1 were then multiplied by the total number of times the character “包” occurred. This process was repeated for every character within the database to obtain the frequency of occurrence of each initial consonant, final and tone (Tables 1–3). Mandarin characters that have more than one pronunciation were analysed in each context of occurrence to determine which pronunciation was employed for the occurrence, before deconstructing it into its initial consonant, final and tone.

Table 1.

Frequency of occurrence for each initial consonant.

Initial consonant	Occurrence frequency (%)	Occurrence in every 25 words	Occurrence in every 50 words
—	15.65	4	8
b	4.21	1	2
c	1.20	0	1
ch	2.47	1	1
d	10.92	3	5
f	1.98	0	1
g	4.64	1	2
h	5.71	1	3
j	6.24	2	3
k	2.28	1	1
l	4.60	1	2
m	4.89	1	2
n	3.64	1	2
p	1.02	0	1
q	2.38	1	1
r	2.01	1	1
s	1.41	0	1
sh	7.64	2	4
t	3.60	1	2
x	4.76	1	2
z	3.28	1	2
zh	5.48	1	3

Table 2.

Frequency of occurrence for each final.

Final	Occurrence frequency (%)	Occurrence in every 25 words	Occurrence in every 50 words	Occurrence in every 100 words
a	4.88	1	2	5
ai	4.58	1	2	5
an	3.15	1	2	3
ang	2.43	1	1	2
ao	3.78	1	2	4
e	11.71	3	6	12
ei	1.62	0	1	2
en	4.76	1	2	5
eng	1.85	0	1	2
er	0.41	0	0	0
i	9.57	2	5	10
i-	7.30	2	4	7
ia	1.18	0	1	1
ian	3.75	1	2	4
iang	1.96	0	1	2
iao	2.53	1	1	3
ie	1.92	0	1	2
in	1.77	0	1	2
ing	2.25	1	1	2
iong	0.35	0	0	0
iu	3.58	1	2	4
o	0.30	0	0	0
ong	2.13	1	1	2
ou	2.50	1	1	3
u	5.04	1	3	5
ü	1.49	0	1	1
ua	0.46	0	0	0
uai	0.34	0	0	0
uan	1.18	0	1	1
üan	0.56	0	0	1
uang	0.53	0	0	1
üe	0.79	0	0	1
ueng	0.00	0	0	0
ui	3.25	1	2	3
un	0.62	0	0	1
ün	0.25	0	0	0
uo	5.25	1	3	5

Table 3.

Frequency of occurrence for each of the four suprasegmental tones.

Tone	Occurrence frequency (%)	Occurrence in every 25 words
1	20.95	5
2	18.03	5
3	23.22	6
4	37.80	9

Word lists were created by mimicking the occurrence frequency of each initial consonant, final and tone as much as possible, using the syllables that had been rated as familiar to the local population in phase 1. In general, the number of initial consonants within each list was determined first according to Table 1. For example, the initial consonant “b” had a frequency occurrence of 4%, and was to occur once in every list. All syllables rated as familiar in phase 1 and beginning with initial consonant “b” were put through a random list generator to determine which list each “b-” syllable would be assigned to. This was repeated for every initial consonant in accordance with Table 1. With each list now populated by syllables and meeting the targets for initial consonants, the occurrence of each final and tone within each list was then tabulated. The subsequent step required manual swapping of syllables within the list of familiar syllables to ensure that the finals and tones were represented in the proportions stated in Tables 2 and 3 respectively. Finals with occurrence frequency of less than 0.5% were excluded from the word lists as they were deemed as uncommon occurrences in everyday speech sounds. Neutral tones were also excluded from the count as they cannot occur independently in isolated monosyllables.³² A total of 16 word lists (Lists 1–16) consisting of 25 monosyllables each were created (Appendix A). Words within each list were randomised, and subsequently arranged such that consecutive test items do not have the same initial consonant, final or tone.

Of the 400 test items contained in the 16 lists of 25 monosyllables, there were 144 duplicated syllables. The remaining 256 unique syllables were recorded with the voice of one male and one female, both of whom are native speakers of Singapore Mandarin. The male speaker majored in linguistics and is proficient in Mandarin and English. The female speaker is a speech therapist and is also effectively bilingual in Mandarin and English. Both speakers were born and raised in Singapore and received at least 12 years of formal education in Singapore.

Recording was carried out in a professional recording studio with the assistance of a recording engineer. In line with the International Organization of Standardization (ISO)³³ standards of recording requirements for speech audiometry, the recording environment had a reverberation time of 0.24 s and the signal-to-noise ratio was maintained above 40 dB throughout the recording process. A condenser microphone (Neumann KM184) with a linear frequency response (125 Hz–8000 Hz) was used together with a SE Electronics pop filter. A Unison Enabled Mic Preamplifier was used, and signal digitization was done with the Apollo Firewire audio-interface from Universal Audio. The sampling rate and bit depth employed were 44.1 kHz and 16 respectively.

Each speaker was asked to record each syllable four times before moving on to record the next syllable. This was done to make allowances for any tokens that were not pronounced ideally during the recording process. Thereafter, a three-step approach was taken to pick the best token for each syllable. The tokens were first analysed acoustically and visually via their spectrograms in Praat software. Tokens that had unintentional distortions, clicks, noise and uneven tone contours were removed.³⁴ Subsequently, four listeners were asked to listen to the remaining tokens of each syllable and (1) type out what they thought the syllable was and (2) pick the token that sounded the most natural. The four listeners consisted of two audiologists and two speech therapists, all of whom are native Singapore Mandarin speakers. Syllables had to be correctly identified by at least three out of four listeners in order to be included in the subsequent phase of the study. For each of these syllables, the token with the highest number of votes for naturalness was used. The average root-mean-square (RMS) value of all the tokens was calculated and all the tokens were then scaled to this value using Audacity software.

Phase 3: Establishing perceptual equivalence of word lists

This phase of the study involved recruiting native Singapore Mandarin speakers to establish the psychometric curve of each recorded monosyllable. A total of 20 subjects participated in the study, and they all had normal hearing in at least one ear (i.e. hearing thresholds that were less than or equal to 20 dBHL at octave frequencies from 250 Hz to 8000 Hz). Insert earphones were used and participants’ better hearing ear (determined by the pure tone average of 500 Hz, 1 kHz and 2 kHz) was tested in this study. Each monosyllable was played from a sub-threshold level in 2 dB increments and participants were asked to repeat the syllable out loud as soon as they can recognise the syllable. There were no carrier phrases, but a predictable brief interval was held after every presentation to allow the participant time to respond should they have recognised the word. If there was no response from the participant, the same syllable was then presented at 2 dB louder than the previous presentation. The lowest intensity level at which the syllable was correctly recognised was documented. This was repeated for 247 syllables recorded in the female voice and 252 syllables recorded in the male voice. Ethics approval for this phase (phase 3) of the study was granted by the National University of Singapore Institutional Review Board (NUS-IRB-2021-24). Participants provided informed written consent before participating in the study.

Based on the percentage of participants who could correctly recognise the syllable at each intensity level, psychometric curves were fitted for each syllable, using the following function:

f (x) = \frac{1}{1 + e^{- k (x - x_{0})}}

where x is the intensity level, k is the steepness of the curve, and x₀ is the intensity level at the curve’s mid-point. The steepness of slope of the psychometric curves at 50% was compared between the male and female recording of each syllable, and the recording with the steeper slope was used for further analysis. The syllables were then grouped according to the 16 PB lists that were created, and each participant’s score for each list at various intensity levels (5 dBHL to 50 dBHL) was computed. At each intensity level, the scores of each list were compared using a repeated measures analysis of variance. Lists with mean scores that were significantly different were removed as they were deemed to be not perceptually equivalent as the rest of the lists.

Results

Phase 1: Selection of suitable words as test items

A total of 50 participants (13 males, 37 females) took part in this phase of the study. The mean age of all participants was 55 years old (SD = 9.99). All 50 participants had completed at least primary school education and 22 of them attended primary schools that used Mandarin as the primary language of instruction. Of the 500 most commonly occurring characters within the database, 494 characters had mean ratings of 3.0 or higher. Out of these 494 characters, 467 of them had less than 25% of respondents that gave a rating of 1 (Never) or 2 (Almost never).

Even though there were a total of 467 characters that met the criteria to be deemed as familiar to the participants, the number of unique syllables from these 467 characters was only 389. This is because there are many homophones in Mandarin. For example, the character “不” which means “no”, and the character “步” which translates into “(a) step” both have the Hanyu Pinyin of “bù” and have identical pronunciations. In Mandarin speech audiometry, having two different characters with the same pronunciation is redundant as they would essentially be regarded as the same syllable. Hence, the focus here is on the number of unique syllables, and these 389 syllables were used as a basis for the creation of the PB word lists in the subsequent phase.

Phase 3: Establishing perceptual equivalence of word lists

The 499 monosyllables recorded in the male and female voice were tested on 20 normal-hearing ears from nine males and 11 females. The mean pure tone average (PTA) of 500 Hz, 1000 Hz and 2000 Hz of the participants was 9.65dBHL (SD = 5.00). Psychometric curves were fitted for each of the monosyllables and the steepness of the slopes at 50% was calculated. Stimuli with steeper slopes are more sensitive than those with less steep slopes, as they require a smaller change in intensity level to elicit a change in the percentage of participants who recognise the word.³⁵ For each monosyllable, the recording with the steeper slope among the two voices was selected for further analysis and inclusion into the final track list. In general, the male recordings had steeper slopes than female recordings (Table 4) and hence most of the selected recordings were those with the male voice.

Table 4.

Steepness of slope at 50% of psychometric curves fitted for male and female recordings.

	Male recordings (%/dB)	Female recordings (%/dB)
Mean	6.69	5.09
SD	1.69	1.84
Minimum	2.09	1.03
Maximum	11.28	10.93

The selected recordings were grouped according to the 16 PB lists created in Phase 2. Each participant’s WRS for each list was calculated for a range of intensity levels, from 5 dBHL to 50 dBHL in 5-dB steps. At each intensity level, repeated measures ANOVA (RMANOVA) was conducted to find out if there were significant differences in mean scores across lists. Whenever the RMANOVA determined that there were significant differences in mean scores across lists (p < 0.05), post hoc analysis with a Bonferroni adjustment was done to reveal which pairs of lists had statistically different mean scores (Figure 1(a)). Figure 1(b) summarises the number of times each list appeared in Figure 1(a). Lists that appeared frequently, such as Lists 6, 8 and 13, were lists that had significantly different scores from many other lists. They were deemed to be not perceptually equivalent to the rest of the word lists and were removed.

Figure 1.

(a) List pairs with statistically significant mean difference in scores at various intensity levels. For instance, at 15 dBHL, participants’ scores on list 1 and list 12 were significantly different. (b) Participants’ scores for list 8 differed significantly from their scores on other lists in 15 instances.

Figure 2(a) and (b) show results of the same process after removing Lists 6, 8 and 13. Since the aim of the study was to produce lists that are perceptually equivalent, the process was repeated and Lists 7, 10 and 14 were removed. The remaining lists that had significantly different scores from others are shown in Figure 3.

Figure 2.

(a) List pairs with statistically significant mean difference in scores at various intensity levels, after lists 6, 8 and 13 were removed from the analysis. (b) Lists 7, 10 and 14 had relatively high number of instances whereby participants’ scores on them differed significantly from their scores on other lists.

Figure 3.

Remaining list pairs with statistically significant mean difference in scores at various intensity levels, after further removal of lists 7, 10 and 14.

Figure 3 shows that Lists 9 and 12 were the remaining lists that had significantly different scores from the other lists. After removing Lists 9 and 12, the post hoc analysis revealed no more pairs of lists with Bonferroni-adjusted p-value of less than 0.05. The remaining eight lists, which have been analysed to be perceptually equivalent from 5dBHL to 50dBHL at 5-dB intervals, are Lists 1, 2, 3, 4, 5, 11, 15 and 16.

Discussion

Word selection process

The development of speech audiometry test materials typically begins with selection of test items from an existing corpus of the language of interest.^14,17,27 Existing corpora enable speech audiometry test developers to easily select words or sentences that are common and frequently seen in texts. Although there is The Frequency Dictionary of Daily Chinese Words Encountered by Singapore Students¹⁶ for Singapore Mandarin, this study began by amassing a database of subtitles from local television programmes. This was due to the fact that The Frequency Dictionary of Daily Chinese Words Encountered by Singapore Students was developed by analysing a compilation of textual sources. Words found in textual sources may not be those that are used in daily conversations.

Of the resident population that identified Mandarin as the language that they speak most frequently at home, 47% of them do not possess post-secondary level qualifications.³⁶ As such, word recognition test materials should consist of simple, everyday words that even the illiterate but conversant speakers of Singapore Mandarin would be able to recognise. This warrants a source that consists of common, spoken materials rather than textual collections. In view of this, the authors in this study chose to reference Brysbaert and New²⁶’s study, which found film subtitles to be good estimates of word frequencies.

It is worth highlighting though, subtitles from local television programmes were used in the current study instead of film subtitles as used in the study by Brysbaert and New.²⁶ As this modified method has yet to be validated to yield high frequency characters that are familiar to the population, the additional step of recruiting Singapore Mandarin speakers to rate the characters for familiarity had been included in this study. The results obtained suggest that a large majority (98%) of the characters were indeed familiar to the local population, and that these characters could therefore be included into the word lists developed in phase 2 of this study. This also indicates that the modified method of using subtitles from television programmes could be a good starting point for studies that are looking to examine conversational languages used by other ethnic groups in Singapore (i.e. Malay and Tamil).

Length of word lists

The current study developed phonologically balanced (PB) word lists of 25 items each. As some initial consonants and finals have very low frequency of occurrence in the language, they were not represented in every single word list, but only once in every two or four word lists. This could have been avoided by having word lists that consists of more test items, such as 50-item word lists. Having more items within each word list would also decrease the standard deviation and variability in the results obtained. However, more items within each test list would increase the time required for test administration. To save on time required to administer the word recognition tests, it was reported that many audiologists in the United States regularly use half-lists of 25 words to test each ear, even though the materials were designed and validated as 50-word lists.³⁷ This practice is not ideal and could lead to inaccurate and invalid results,³⁸ but also shows how these time-pressed clinicians find it difficult to administer tests using 50-word lists.

Therefore, the current study designed word lists to contain 25 test items in order to strike a balance between clinical efficiency and results reliability. To ensure that using 25-word lists does not compromise on the validity and reliability of the results, the materials have to be designed and validated as 25-word lists from the onset. Although the test-retest reliability of these 25-item word lists will only be validated in the subsequent phases of the study (with normal-hearing and hearing-impaired participants), the test-retest reliability of a 25-word list might not be significantly different from a 50-word list.³⁹

Perceptual equivalence of word lists

The psychometric functions of the individual monosyllables from Phase 3 revealed that there was a wide range in the steepness of slopes across test items. In other words, the individual monosyllables were not perceptually equivalent to one another, even though the recordings had already been scaled to the same average RMS value. This variability in audibility is to be expected, and reflects the complex nature of speech sounds, where certain words tend to be more easily perceived than others due to their phoneme composition.

Of course, it is possible to scale the recordings of monosyllables further to achieve greater perceptual equivalence among individual test items. However, forcing the individual words to be perceived equally loudly would be an unnatural representation of them in everyday speech.⁴⁰ Furthermore, scaling the recordings to a particular reference point would only make them equally perceptible at a chosen intensity level, but would not alter the slope of the psychometric functions. Due to the different rates at which performance increases with intensity across the words, there will still be a lack of perceptual equivalence if the words are being presented at an intensity level that is different from the one chosen for scaling.

Therefore, this study chose to focus on ensuring homogeneity in scores across word lists, rather than across individual words. Word lists that had significantly different scores from the rest were removed from the set of materials. This process is essential to ensure that, when normal-hearing people are tested within 5 dBHL to 50 dBHL, they are likely to achieve similar WRS regardless of the list that was used to test them.

Of the initial 16 PB word lists, the study team eliminated eight of the lists that were not perceptually equivalent to one another. The number of remaining word lists in the current study is consistent with existing Mandarin speech recognition tests developed in China¹⁰ and Taiwan,¹¹ both with 8 lists of 25-word test items.

Conclusion

This study has developed a set of phonologically balanced and perceptually equivalent word lists in Singapore Mandarin for word recognition testing. It has addressed the limitations of the SC-10, firstly by reducing the contextual cues that are present in disyllabic Mandarin test items, and secondly by incorporating phonological balance into the word lists while maintaining clinical efficiency. The gathering of suitable words as test items originated from a database of subtitles from local free-to-air television programmes. Words that were high in frequency of occurrence were rated as familiar by native Singapore Mandarin speakers, and subsequently composed into 16 phonologically balanced word lists of 25 monosyllables each. Psychometric functions of the words were obtained with normal-hearing participants and lists that were not perceptually equivalent to the rest were removed. Further work is required to validate the remaining eight perceptually equivalent word lists with the local Mandarin-speaking population. Test-retest reliability and normative data need to be established with normal-hearing and hearing-impaired Singapore Mandarin speakers before the materials can be ready for clinical use.

Supplemental Material

Supplemental Material - Development of phonologically-balanced and perceptually equivalent Singapore Mandarin word lists for word recognition test

Supplemental Material for Development of phonologically-balanced and perceptually equivalent Singapore Mandarin word lists for word recognition test by Kimberly Wanxian Soh and Jenny Hooi Yin Loo in Proceedings of Singapore Healthcare.

Footnotes

Acknowledgements

We would like to thank Mr Lim Hong Han for his assistance in data collection in Phase 1 of this research.

Author contributions

KWS researched literature and conceived the study. JHYL was involved in protocol development and gaining ethical approval. KWS carried out patient recruitment, data analysis, and wrote the first draft of the manuscript. Both authors reviewed and edited the manuscript and approved the final version of the manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was made possible by a gift donation from Sivantos Pte Ltd.

Ethical approval

Ethical approval for this study was obtained from National University of Singapore Institutional Review Board (S-20-012) and (NUS-IRB-2021-24).

Informed consent

Written informed consent was obtained from all subjects before the study.

Data availability

The datasets generated and/or analysed during the current study are available from ScholarBank@NUS at .

ORCID iDs

Kimberly Wanxian Soh

Jenny Hooi Yin Loo

Supplemental Material

Supplemental material for this article is available online.

References

American Speech-Language-Hearing Association . Determining threshold level for speech [guidelines]. Rockville, MD: American Speech-Language-Hearing Association, 1988. http://www.asha.org/policy/GL1988-00008.htm (accessed 01 June 2021).

Hall

. Diagnostic applications of speech audiometry. Semin Hear 1983; 4: 179–203.

Talbott

Larson

. Research needs in speech audiometry. Semin Hear 1983; 4: 299–308.

Carhart

. Problems in the measurement of speech discrimination. Arch Otolaryngol 1965; 82: 253–260. DOI: 10.1001/archotol.1965.00760010255007.

Hudgins

Hawkins

Kaklin

. The development of recorded auditory tests for measuring hearing loss for speech. Laryngoscope 1947; 57: 57–89.

Rathna Kumar

Sudharshan Reddy B

Kranthi

. Development of word lists in Telugu for assessing speech recognition threshold: alternative forms to existing word lists. Int J Otorhinolaryngol Head Neck Surg 2016; 2: 147–156. DOI: 10.18203/issn.2454-5929.ijohns20162181.

Nissen

Harris

Channell

, et al. The development of psychometrically equivalent Cantonese speech audiometry materials. Int J Audiol 2011; 50: 191–201. DOI: 10.3109/14992027.2010.542491.

Weisleder

Hodgson

. Evaluation of four Spanish word-recognition-ability lists. Ear Hear 1989; 10: 387–392.

Han

Wang

Zhang

, et al. Development of Mandarin monosyllabic speech test materials in China. Int J Audiol 2009; 48: 300–311. DOI: 10.1080/14992020802607456.

10.

Nissen

Harris

Jennings

L-J

, et al. Psychometrically equivalent Mandarin bisyllabic speech discrimination materials spoken by male and female talkers. Int J Audiol 2005; 44: 379–390.

11.

Nissen

Harris

Dukes

. Word recognition materials for native speakers of Taiwan Mandarin. Am J Audiol 2008; 17: 68–79.

12.

Singapore Department of Statistics . Singapore residents by age group, ethnic group and sex, end june, annual. Population and population structure; key annual indicators on population. Singapore: Singapore Department of Statistics, 2021.

13.

Singapore Department of Statistics . Resident population aged 5 years and over by language most/second most frequently spoken at home, place of birth and sex. Census of population 2020. Singapore: Singapore Department of Statistics, 2021.

14.

Lee

GJC

Lee

SLH

. Development of SC-10: a psychometrically equivalent Singapore Mandarin disyllabic word list for clinical speech audiometry use. World J Otorhinolaryngol Head Neck Surg 2021; 7: 247–256, DOI: 10.1016/j.wjorl.2020.02.011.

15.

Soh

Loo

JHY

. A review of Mandarin speech recognition test materials for use in Singapore. Int J Audiol 2021; 60(6): 399–411. DOI: 10.1080/14992027.2020.1826587.

16.

Goh

Lin

Zhao

. The frequency dictionary of daily Chinese words encountered by Singapore students. Singapore: NTU-SCCL Press, 2013.

17.

Egan

. Articulation testing methods. Laryngoscope 1948; 58: 955–991.

18.

Soh

KW.

Validation of Mandarin speech audiometry materials in Singapore. Master of Science (Audiology). Singapore: National University of Singapore, 2017.

19.

Dirks

Kamm

Bower

, et al. Use of performance-intensity functions for diagnosis. J Speech Hear Disord 1977; 42: 408–415.

20.

Dubno

Lee

F-S

Klein

, et al. Confidence limits for maximum word-recognition scores. J Speech Hear Res 1995; 38: 490–502.

21.

Jerger

. Diagnostic significance of PB word functions. Arch Otolaryngol 1971; 93: 573–580.

22.

Martin

Champlin

Perez

. The question of phonetic balance in word recognition testing. J Am Acad Audiol 2000; 11: 489–493.

23.

Hirsh

Davis

Silverman

, et al. Development of materials for speech audiometry. J Speech Hear Disord 1952; 17: 321–337.

24.

Tillman

Carhart

. An expanded test for speech discrimination utilizing CNC monosyllabic words: Northwestern University auditory test no. 6. USAF School of Aerospace Medicine Technical Report. Fort Belvoir, VA: DTIC Document, 1966.

25.

Thornton

Raffin

MJM

. Speech-discrimination scores modeled as a binomial variable. J Speech Hear Res 1978; 21: 507–518.

26.

Brysbaert

New

. Moving beyond Kučera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav Res Methods 2009; 41: 977–990. DOI: 10.3758/BRM.41.4.977

27.

Lehiste

Peterson

. Linguistic considerations in the study of speech intelligibility. J Acoust Soc Am 1959; 31: 280–286.

28.

Koh

Mislan

Khoo

, et al. Building the Singapore English national speech corpus. Interspeech 2019; 2019: 321–325.

29.

Infocomm Media Development Authority of Singapore . Overview of Singapore cinema 2020. Singapore: Infocomm Media Development Authority of Singapore, 2020. https://www.imda.gov.sg/-/media/Imda/Files/Infocomm-Media-Landscape/Research-and-Statistics/Data-on-Singapore-Cinema/Overview-of-SG-Cinema-2020.pdf (accessed 29 June 2021).

30.

Channel News Asia . TV and internet the highest consumed media platforms in Singapore: Nielsen survey. Singapore: Channel News Asia, 2018. https://www.connectedtoindia.com/tv-and-internet-highest-consumed-media-platforms-in-singapore-nielsen-survey-4760.html (accessed 05 June 2021).

31.

Goh

Gopinathan

. Education in Singapore: developments since 1965. Washington, DC: The World Bank, 2006, pp. p.80–108.

32.

Fan

Chen

. Perception of lexical neutral tone among adults and infants. Front Psychol 2018; 9: 322. DOI: 10.3389/fpsyg.2018.00322.

33.

BS EN ISO 8253-3 . Acoustics - Audiometric test methods. Part 3: Speech audiometry. Geneva, Switzerland: BS EN ISO, 2022.

34.

Cheoy

Chong

Mazlan

, et al. Development of the Mandarin nonsense word identification test. Int J Audiol 2021; 60: 578–587. DOI: 10.1080/14992027.2020.1864485.

35.

McArdle

Hnath-Chisolm

. Speech audiometry. In: Katz

(ed). Handbook of clinical audiology. 7th ed. Philadelphia, PA: Wolters Kluwer Health, 2015, pp. pp.61–76.

36.

Singapore Department of Statistics . Resident non-students aged 15 years and over by language most frequently spoken at home, highest qualification attained and sex. Census of population 2010. Singapore: Singapore Department of Statistics, 2015.

37.

Lord

. Word recognition testing: “repeat after me…”. 2014. https://wdh02.azureedge.net/-/media/e3-diagnostics/shared/pdf/articles-white-papers/2014/word-recognition-testing–repeat-after-me-autumn-2014.pdf?la=en&rev=263B (accessed 02 March 2021).

38.

Hornsby

BWY

Mueller

. Monosyllabic word testing: five simple steps to improve accuracy and efficiency. 2013. http://www.audiologyonline.com/articles/word-recognition-testing-puzzling-disconnect-11978 (accessed 12 June 2021).

39.

Kim

Lee

, et al. Test-retest reliability of word recognition score using Korean standard monosyllabic word lists for adults as a function of the number of test words. J Audiol Otol 2015; 19: 68–73.

40.

Carhart

. Instruments and materials for speech audiometry. Acta Otolaryngol 1951; 40: 313–329. DOI: 10.3109/00016485109123646.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.76 MB