Fast Query-by-Singing/Humming System That Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm

Abstract

We newly propose a query-by-singing/humming (QbSH) system considering both the preclassification and multiple classifier-based method by combining linear scaling (LS) and quantized dynamic time warping (QDTW) algorithm in order to enhance both the matching accuracy and processing speed. This is appropriate for the QbSH of high speed in the huge distributed server environment. This research is novel in the following three ways. First, the processing speed of the QDTW is generally much slower than the LS method. So, we perform the QDTW matching only in case that the matching distance by LS algorithm is smaller than predetermined threshold, by which the entire processing time is reduced while the matching accuracy is maintained. Second, we use the different measurement method of matching distance in LS algorithm by considering the characteristics of reference database. Third, we combine the calculated distances of LS and QDTW algorithms based on score level fusion in order to enhance the matching accuracy. The experimental results with the 2009 MIR-QbSH corpus and the AFA MIDI 100 databases showed that the proposed method reduced the total searching time of reference data while obtaining the higher accuracy compared to the QDTW.

1. Introduction

With the widespread music content and music databases on the Internet, portable media, and smart phone, fast and accurate content-based searching systems are required. Query-by-singing/humming (QbSH) is a representatively convenient and intelligent method in the field of content-based music retrieval systems. It matches the reference music file corresponding humming queries of a user. It can be used for retrieving a music file without singer's name and song title based on the melody of the music hummed/sung by a user.

In previous researches, the various kinds of QbSH systems have been researched [1–22]. Ghias et al. proposed the method of representing the pitch contour features extracted from the humming or whistle data as an up-down-repeat (UDR) string and using them for matching [2]. McNab et al. proposed the MELDEX system based on the pitch contour, interval, and duration with string matcher [3, 4]. In previous research [5], they proposed the Tuneserver representing the pitch contour as the UDR string like the method of [2]. Kornstadt et al. developed the Themefinder system which has the capability of searching the theme of music in the Humdrum database of classic music of the 16th century and folk songs on the web [6, 7]. In previous research [8], they showed the retrieval method using the changes of melody and the UDR string. Ryynänen and Klapuri proposed the method of extracting the pitch vectors by using a fixed-size time window and matching them by using locality sensitive hashing (LSH) method [9]. In another study [10], they adopted earth mover's distance (EMD) method which could calculate the minimum cost between the features of humming and reference data with the changes of the weight to measure melodic similarity. In the previous research [11], they proposed the method of content-based music retrieval which firstly filters out 80% unlikely candidates by using hierarchical filtering method and compares the input query with the remaining candidates. Salamon and Rohrmeier proposed the two-stage retrieval method for QbSH system [12]. As the first stage, the number of candidates is reduced by the indexing method using n-grams. And detail matching with the remaining candidates is performed with the remaining candidates based on local alignment with modified cost functions. Wang et al. proposed the QbSH system by combining the EMD and dynamic time warping (DTW) classifiers based on the weighted SUM rule [13].

The previous QbSH systems can be roughly categorized into top-down and bottom-up matching systems. As the top-down one, Wu et al. proposed recursive alignment algorithm which firstly compares two-feature data in global view and does them locally [14]. Other methods of [11, 12] belong to this category. On the contrary, bottom-up method locally calculates the distance between query and reference data in each position and searches the optimal path for obtaining a final matching score [2–4, 6–8, 15, 16].

For the QbSH system, DTW algorithm has been widely used for matcher. It has been widely used in speech recognition and can easily solve the time alignment problem. Since there generally has been much misalignment of time between the input humming/singing and the reference music file, the DTW algorithm is suitable for QbSH systems, but it has the limitation of high cost in computation. Jang and Gao converted the input query data into pitch vectors [15]. Using this method, they measured the similarity between singing/humming and reference songs based on the calculated distance by DTW with high accuracy; however, this method is also computationally demanding [23, 24]. Krishnamoorthy et al. also used DTW as distance measurement for the QbSH system on embedded platforms [19]. However, it still has the problem of high computation of the DTW method and lower matching accuracy by using single classifier. Li et al. proposed multistage matching-based system to enhance the performance of QbSH system [20]. It includes three stages. First and second stages aim to reduce the number of candidates in large amount of database by using earth mover's distance (EMD) based on tune and profile features, respectively. Finally, DTW calculates the matching distance with remaining candidate data. However, the final matching only by the single classifier of the DTW has the limitation of lower accuracy. The linear scaling (LS) method has the advantage of fast processing time, but its accuracy is relatively lower than the DTW method [16].

All of these previous researches are ones only by single classifier-based or by multiple classifier-based or by preclassification-based method. They do not adopt the scheme of considering both the preclassification and multiple classifier-based methods. To overcome the problems of the previous researches, we newly propose a QbSH system considering both the preclassification and multiple classifier-based method by combining LS and quantized DTW (QDTW) algorithm in order to enhance both the matching accuracy and the processing speed. The processing speed of the QDTW is generally much slower than the LS method, although QDTW is the modified version of DTW to enhance the matching accuracy and reduce the processing time. So, we perform the QDTW matching only in case that the matching distance by LS algorithm is smaller than predetermined threshold, by which the entire processing time is reduced by higher than 30% compared to that of QDTW method while the matching accuracy is maintained. We use the different measurement method of matching distance in LS algorithm by considering the characteristics of reference database. In addition, we combine the calculated distances of LS and QDTW algorithms based on score level fusion in order to enhance the matching accuracy. Table 1 shows the summarized comparisons of the proposed and previous methods.

Table 1

Summarized comparisons of the proposed method to previous ones.

Only by single classifier-based method	Method	Matching with single classifier to calculate the distance between input query data and reference data [2–10, 14, 15, 18, 19]
	Advantage	High processing speed
	Disadvantage	Limitation to enhance the matching accuracy only by single classifier

Only by multiple classifier-based method	Method	Combining the matching scores (by two or more classifiers) [13, 16, 17]
	Advantage	Higher matching accuracy than that by single classifier-based method
	Disadvantage	Lower processing speed

Only by preclassification-based method	Method	The system firstly reduces the number of candidates in large amount of database by preclassification method, and it calculates the matching distance with remaining candidate data [11, 12, 20]
	Advantage	Higher matching speed by reducing the number of candidates based on preclassification
	Disadvantage	The final matching only by the single classifier has the limitation of lower accuracy

Considering both preclassification and multiple classifier-based method (proposed method)	Method	Considering both preclassification and multiple classifier-based method by combining LS and QDTW algorithm
	Advantage	Higher matching speed with higher accuracy
	Disadvantage	Lower matching speed than that only by LS algorithm

The rest of this paper is structured as follows. Section 2 explains the proposed QbSH system. Section 3 discusses the experimental results, and Section 4 states the conclusions of this study.

2. Proposed Method

2.1. Overview of the Proposed Method

Figure 1 shows a flowchart of the proposed method. First, pitch data are extracted from the user's input humming file by using musical note estimation method. Second, we perform the following normalization. We remove pitch values of 0 in the extracted pitch data, since these can be regarded as the meaningless data which are obtained from the silence period of melody.

Figure 1

Flowchart of the proposed method.

In general, the melody of the input humming/singing is relatively inaccurate compared to the reference musical instrument digital interface (MIDI) data because it is hummed or sung by an amateur. So, the pitch data of the input is quite different from the MIDI file which requires the further normalization of the pitch data in both input and MIDI files as follows. After eliminating the 0 values, the input humming data are normalized through mean shifting, average filtering, and min-max scaling [16, 17, 21]. Median and average filtering get rid of the peaked and vibrated noises, and min-max scaling adjusts the amplitude variations.

With the normalized pitch data, preclassification is performed based on the calculated distance by LS algorithm in order to decide whether the QDTW algorithm should be executed. In detail, it calculates the matching distance between the input query data and the reference MIDI data in the matching window. If the matching distance is greater than a specific threshold, the QDTW algorithm does not run because the humming and MIDI data are different. Then, the matching window of the MIDI data is moved to the next matching position, and the preclassification procedure is repeated. If the matching distance is less than the threshold, the QDTW is executed in order to obtain more accurate matching score. These procedures are iterated until the matching window reaches the last part of MIDI data. If arriving at the last part of MIDI data, the final matching distance between the input humming/singing and the MIDI file is determined by combining the matching distance of QDTW and that of LS algorithm based on score level fusion. The correct MIDI file is selected based on the final matching distance.

2.2. Pitch Extraction and Normalization

In order to extract the pitch data, we used a voice-activity detection (VAD) algorithm [16, 17, 21]. First, the VAD algorithm estimates the voiced frames, and then pitch data as integer value is extracted by the spectrotemporal autocorrelation (STA) method which is based on temporal and spectral autocorrelations with the sampling of every 32 ms.

However, a lot of noises are generally contained in the extracted pitch data. In addition, muted regions exist and the pitch data of input are quite different from the MIDI file since users cannot hum/sing perfectly like MIDI music. So, the extracted pitch data should be normalized to obtain an accurate matching result. In this research, we perform the procedures of removal of 0 values, mean shifting, median filtering, average filtering, and min-max scaling for normalization.

2.3. Preclassification by LS Algorithm

2.3.1. LS Method

The LS algorithm has been widely used in QbSH systems, since its processing complexity is very low [16]. It calculates the matching distance between input query data and reference MIDI data by changing linearly the length of input or reference data on time axis. In this research, we change the length of the reference MIDI data. Figure 2 shows the example of the LS algorithm.

Figure 2

Example of the operation of the LS algorithm.

2.3.2. Measuring Method of Matching Distance

In general, the characteristics of MIDI data are different according to the kind of reference databases. Although the 2009 MIR-QbSH corpus mostly consists of children's song and folk song, the AFA MIDI 100 database includes more various kinds of songs. So, the melodies of the 2009 MIR-QbSH corpus database are usually simpler than those of the AFA MIDI 100 database. In addition, more noises are included in the AFA MIDI 100 database. So, we use the different measurement method of matching distance in LS algorithm by considering the characteristics of reference database.

In general, the Euclidean distance is used to measure the dissimilarity between input query data and reference MIDI data in LS algorithm as shown in

\begin{matrix} ED = \sqrt{\sum_{i = 0}^{n} {(q_{i} - r_{i})}^{2}}, \end{matrix}

(1)

where

q_{i}

and

r_{i}

mean ith query and reference MIDI data, respectively, and n means the length of data. For lower processing time, the following equation can be used instead of (1):

\begin{matrix} SquareED = \sum_{i = 0}^{n} {(q_{i} - r_{i})}^{2} . \end{matrix}

(2)

In (2), we define the ( $q_{i} - r_{i}$ )² as ${D i s t}_{i}$ like (3), and we select one of the four functions of (4) as the ${D i s t}_{i}$ according to the kind of reference database:

\begin{matrix} SqaureED = \sum_{i = 0}^{n} {D i s t}_{i}, \end{matrix}

(3)

\begin{matrix} {D i s t}_{i} = \{\begin{cases} {(q_{i} - r_{i})}^{2} \\ |q_{i} - r_{i}| \\ log (|q_{i} - r_{i}| + 1) \\ arctan (|q_{i} - r_{i}| - 0.5) + 0.5 . \end{cases} \end{matrix}

(4)

Figure 3 shows the relationship between absolute difference ( $| q_{i} - r_{i} |$ of (4)) and calculated ${D i s t}_{i}$ of (4). The square, log, and atan have the characteristics of nonlinearity between the input and output values whereas the abs has the characteristics of linearity.

Figure 3

The relationship between absolute difference ( $| q_{i} - r_{i} |$ of (4)) and calculated ${D i s t}_{i}$ of (4) (square, abs, log, and atan mean the 1st~4th functions of (4), resp.).

2.4. Matching by QDTW

As shown in Figure 1, if the matching distance by LS algorithm is less than predetermined threshold, QDTW is executed to calculate a more accurate matching distance. In general, a difference in length exists between the MIDI and humming phrase. This problem of time alignment can be overcome by the DTW algorithm which can calculate the dissimilarity between the two patterns with insertion and deletion [16, 17, 21]. At each matching position of DTW, the dissimilarity between the humming and MIDI features is calculated by Euclidean distance. In this research, we adopted QDTW which has the only difference (from the DTW) that it uses the quantized pitch value instead of the original one. Since the original pitch value has variations caused by noise, they are represented as quantized integer values in QDTW.

Before matching by QDTW, we detect the zero to nonzero position (the position where the pitch value is changed from zero to nonzero) of the MIDI data and match the starting position of the humming data with each zero to nonzero position of the MIDI data. If the time interval between two-zero to nonzero positions is less than the threshold, only the first zero to nonzero position is used for matching, through which we can reduce the processing time and enhance the matching accuracy.

2.5. Score Level Fusion of Matching Distances

Score level fusion method has been used widely to enhance the matching accuracy, and there are a lot of methods into score level fusion. In this paper, we combined the two matching distances by the LS and QDTW methods based on simple fusion methods such as MIN, MAX, PRODUCT, and SUM rules and compared the performances of each fusion method. The MIN and MAX rules select the smaller and greater one among two matching distances as final matching score, respectively. The PRODUCT and SUM rule calculate the final matching score by multiplying and summing the two matching distances, respectively. Experimental results showed that the MIN rule showed the best performance among all methods.

3. Experimental Results

For experiments, we used two databases. The first database was the 2009 MIR-QbSH corpus which consists of 48 reference MIDI files and 4431 singing and humming queries as wav files [22]. A total of 118 persons sing or hum 8 s per each query in various environments such as telephones and microphones. Since the 2009 MIR-QbSH corpus database provides pitch vector (PV) files which included manually extracted pitch data, we used the PV files for the experiments to exclude the pitch extraction error.

The second database was the audio feature analysis (AFA) MIDI 100 database, which includes 1,000 singing and humming files recorded by microphone, and 100 MIDI files which are made up of 84 Korean songs, 6 children's songs, and 10 pop songs. The average time length of the input singing/humming files is 12 s. We performed our experiments on a desktop computer with a 3.4 GHz CPU and 8 GB RAM. To measure the matching accuracy, the mean reciprocal rank (MRR) is used as the criterion of performance, and it has been frequently used for measuring the accuracy of QbSH system [12, 16, 17, 21]:

\begin{matrix} MRR = \frac{1}{K} \sum_{i = 1}^{k} \frac{1}{r a n k_{i}}, \end{matrix}

(5)

where K is the number of input singing/humming files and

r a n k_{i}

is the ranking of the correct MIDI file (corresponding to the input file), as calculated by the proposed method. If all of the correct MIDI files (corresponding to the input files) are accurately measured as the 1st in rank, the calculated MRR becomes 1, and the maximum MRR is 1 [12, 16, 17, 21]. Top 1, Top 10, and Top 20 indicate that the rank of the MIDI file is included within rank 1, rank 10, and rank 20, respectively.

As the 1st experiment, we measured the matching accuracy of the LS algorithm according to various distance measurement methods of (3) and (4), as shown in Tables 2 and 3. The result showed that the case which uses log or arctan function shows better accuracy than other cases when using the AFA MIDI 100 database which has a lot of noises. However, the abs function shows the best matching accuracy when using the 2009 MIR-QbSH corpus database which has fewer noises. From that, we can confirm that the linear function for distance measurement can show better performance with the database of less noise while the nonlinear function can have better accuracy with the database of larger noises.

Table 2

The matching accuracy of LS algorithm with PV files of 2009 MIR-QbSH corpus database according to various distance measurement methods of (3) and (4).

Distance measurement method	Accuracy
Distance measurement method	Top 1 (%)	Top 10 (%)	Top 20 (%)	MRR
Square function	68.564	83.966	89.905	0.736
Abs function	71.138	84.575	89.363	0.754
Log function	70.912	84.124	88.799	0.751
Arctan function	69.422	82.611	88.482	0.738

Table 3

The matching accuracy of LS algorithm with AFA MIDI 100 database according to various distance measurement methods of (3) and (4).

Distance measurement method	Accuracy
Distance measurement method	Top 1 (%)	Top 10 (%)	Top 20 (%)	MRR
Square function	40.8	65.9	74.6	0.492
Abs function	49.4	70.0	77.8	0.559
Log function	51.7	71.1	79.7	0.579
Arctan function	51.0	70.4	79.5	0.575

As the 2nd experiment, we measured the matching accuracy and processing time when using the LS algorithm as preclassification method before performing the QDTW algorithm. Based on Tables 2 and 3, the square function based distance measurement method for LS algorithm was excluded because it had the lower matching accuracy. As shown in Tables 4 and 5, the processing time was much reduced by the proposed method compared to the QDTW method although the MRR by the proposed method is the same to that of the QDTW.

Table 4

The performance of the methods which combine LS and QDTW algorithm with PV files of 2009 MIR-QbSH corpus database.

Method	Processing time & accuracy
Method	Processing time (ms)	Top 1 (%)	Top 10 (%)	Top 20 (%)	MRR
QDTW without preclassification by LS	61.855	76.536	90.199	94.490	0.810
Preclassification by LS
LS (abs function) + QDTW	40.228	76.671	89.860	93.835	0.810
LS (log function) + QDTW	43.169	76.649	89.973	94.038	0.810
LS (arctan function) + QDTW	49.334	76.558	90.041	94.219	0.810

Table 5

The performance of the methods which combine LS and QDTW algorithm with AFA MIDI 100 database.

Method	Processing time & accuracy
Method	Processing Time (s)	Top 1 (%)	Top 10 (%)	Top 20 (%)	MRR
QDTW without preclassification by LS	9.294	67.1	80.5	85.2	0.715
Preclassification by LS
LS (abs function) + QDTW	7.603	67.1	80.6	85.2	0.715
LS (log function) + QDTW	8.205	67.1	80.5	85.2	0.715
LS (arctan function) + QDTW	6.161	67.1	80.6	85.1	0.715

As the 3rd experiment, we compared the processing time and MRR of the original QDTW and the proposed method according to the threshold for preclassification by the LS method. If the matching distance by the LS method is greater than the threshold, the QDTW-based matching is not performed and matching window is moved to the next position for matching. If not, the QDTW-based matching is performed. If the threshold increases, the number of cases (that the matching distance by the LS method is less than the threshold) increases. Consequently, the number of cases of performing the QDTW-based matching is also increased, which enhances the MRR but increases the processing time. As shown in Figures 4 and 5, we can confirm that processing time by the proposed method is much reduced compared to that of QDTW while maintaining the MRR. By comparing Figures 4(a), 4(b), and 4(c), we can confirm that the proposed method using the preclassification based on abs function of (4) shows the better performance. In addition, we can also confirm that the proposed method using the preclassification based on arctan function of (4) shows the better performance by comparing Figures 5(a), 5(b), and 5(c).

Figure 4

The relationship between the matching accuracy and processing time with the 2009 MIR-QbSH corpus database in case of using the following: (a) abs function of (4); (b) log function of (4); (c) arctan function of (4).

Figure 5

The relationship between the matching accuracy and processing time with the AFA MIDI 100 database in case of using the following: (a) abs function of (4); (b) log function of (4); (c) arctan function of (4).

The predetermined threshold for LS method was experimentally determined considering the minimum processing time with the maintained MRR (matching accuracy) of our method. That is, as shown in Figures 4(a)~4(c) and 5(a)~5(c), the predetermined thresholds are 3.2, 1.3, 1.5, 3.9, 1.5, and 1.5, respectively. The positions of the thresholds mean that the minimum processing time is taken while the MRR of our method does not degrade. As shown in Figures 4 and 5, the thresholds are different from the dataset and the measurement methods of matching distance (equation (4)) in LS algorithm.

The above results of Tables 4 and 5 and Figures 4 and 5 are the cases that two matching distances by the LS and QDTW are not combined. As the last experiment, we compared the performances when combining the matching distances by LS and QDTW algorithm. Since the matching distance by the LS algorithm was already calculated for preclassification and the processing time of score fusion such as MIN, MAX, PRODUCT, and SUM rule is almost 0 ms, the final processing time by combining two matching distances is not increased.

Tables 6 and 7 show the results of fusion of two matching distances. Based on the above results of Tables 4 and 5, the abs function-based LS algorithm was used for 2009 MIR-QbSH corpus database, and the arctan function-based LS algorithm was used for AFA MIDI 100 database.

Table 6

The results of fusion of two matching distances with 2009 MIR-QbSH corpus database.

Fusion method	Accuracy
Fusion method	Top 1 (%)	Top 10 (%)	Top 20 (%)	MRR
MIN (proposed method)	79.178	89.770	93.428	0.819
MAX	75.903	88.031	92.480	0.797
PRODUCT	76.445	89.228	93.157	0.805
SUM	76.220	88.979	93.135	0.803

Table 7

The results of fusion of two matching distances with AFA MIDI 100 database.

Fusion method	Accuracy
Fusion method	Top 1 (%)	Top 10 (%)	Top 20 (%)	MRR
MIN (proposed method)	69.6	80.6	85.2	0.725
MAX	58.8	74.9	80.4	0.636
PRODUCT	61.8	80.3	85.5	0.683
SUM	61.2	79.8	84.8	0.673

Tables 8 and 9 show the performance comparisons of the proposed method and others with 2009 MIR-QbSH corpus database and AFA MIDI 100 database, respectively. As shown in Table 8, the Top 10 and Top 20 rate of the proposed method are a little lower than those of QDTW and QDTW with preclassification by LS (not combining two matching distances) in case of using the 2009 MIR-QbSH corpus database. However, except for this case, the accuracies of the proposed method are higher than those of other methods in all the cases as shown in Tables 8 and 9. In most of the QbSH systems, the accuracy is evaluated based on the MRR of (5) and Top 1 rate. So, we can confirm that the matching accuracy of the proposed method was enhanced compared to others although the processing time of our algorithm was reduced by higher than 30% compared to that of QDTW. Although LS has the lowest processing time among them, it could not be used as single classifier because of poor matching accuracy.

Table 8

Performance comparison of the proposed method with other single classifiers with the 2009 MIR-QbSH corpus database.

Method	Processing time & accuracy
Method	Processing time per query (ms)	Top 1 (%)	Top 10 (%)	Top 20 (%)	MRR
LS (abs function)	1.027	71.138	84.575	89.363	0.754

QDTW	61.855	76.536	90.199	94.490	0.810

Preclassification by LS (abs function) + QDTW (not combining two matching distances)	40.228	76.671	89.860	93.835	0.810

Preclassification by LS (abs function) + QDTW (combining two matching distances) (proposed method)	40.228	79.178	89.770	93.428	0.819

Table 9

Performance comparison of the proposed method with other single classifiers with the AFA MIDI 100 database.

Method	Processing time & accuracy
Method	Processing time per query (ms)	Top 1 (%)	Top 10 (%)	Top 20 (%)	MRR
LS (arctan function)	130	51.0	70.4	79.5	0.575

QDTW	9,294	67.1	80.5	85.2	0.715

Preclassification by LS (arctan function) + QDTW (not combining two matching distances)	6,161	67.1	80.6	85.1	0.715

Preclassification by LS (abs function) + QDTW (combining two matching distances) (proposed method)	6,161	69.6	80.6	85.2	0.725

4. Conclusions

In QbSH systems, DTW is typically adopted as a matcher. However, this method is computationally expensive, and a reduction in processing time is required for real-time QbSH systems. To overcome this problem, in this paper we proposed a fast QbSH system that combines LS algorithm and QDTW algorithm. The experimental results showed that the proposed method enhanced the matching accuracy and reduced the processing time compared to the result when the QDTW algorithm was used as single classifier.

As a future work, we will compare the performance of our proposed method with other methods for a larger database on various platforms including mobile devices.

Footnotes

Conflict of Interests

The authors declare that they have no conflict of interests.

Acknowledgments

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2012R1A1A2038666) and in part by the Public Welfare and Safety Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2010-0020810).

References

Typke

Wiering

Veltkamp

R. C.

A survey of music information retrieval systems

Proceedings of the International Conference on Music Information Retrieval

September 2005

153 160

2-s2.0-57649146511

Ghias

Logan

Chamberlin

Smith

B. C.

Query by humming: musical information retrieval in an audio database

Proceedings of ACM International Conference on Multimedia (MULTIMEDIA '95)

November 1995

231 236

10.1145/217279.215273

2-s2.0-0029456574

McNab

R. J.

Smith

L. A.

Witten

I. H.

Henderson

C. L.

Cunningham

S. J.

Towards the digital music library: tune retrieval from acoustic input

Proceedings of the 1st ACM International Conference on Digital Libraries

March 1996

11 18

2-s2.0-0029695822

McNab

R. J.

Smith

L. A.

Bainbridge

Witten

I. H.

The New Zealand digital library melody index

D-Lib Magazine 1997 3 5 4 15

2-s2.0-0003297278

Prechelt

Typke

An interface for melody input

ACM Transactions on Computer-Human Interaction 2001 8 2 133 149

10.1145/376929.376978

Kornstadt

Themefinder: a web-based melodic search tool

Computing in Musicology 1998 11 231 236

Themefinder http://www.themefinder.org/

Blackburn

DeRoure

A tool for content based navigation of music

Proceedings of ACM International Conference on Multimedia

1998

361 368

Ryynänen

Klapuri

Query by humming of midi and audio using locality sensitive hashing

Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing

April 2008

2249 2252

10.1109/icassp.2008.4518093

2-s2.0-51449109542

10.

Typke

Giannopoulos

Veltkamp

R. C.

Wiering

Oostrum

R. V.

Using transportation distances for measuring melodic similarity

Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR '03)

October 2003

Baltimore, Md, USA

107 114

11.

Jang

J.-S. R.

Lee

H.-R.

Hierarchical filtering method for content-based music retrieval via acoustic input

Proceedings of the ACM International Conference on Multimedia

October 2001

401 410

2-s2.0-0034797828

12.

Salamon

Rohrmeier

A quantitative evaluation of a two stage retrieval approach for a melodic query by example system

Proceedings of the 10th International Society for Music Information Retrieval Conference

October 2009

Kobe, Japan

255 260

2-s2.0-79956293795

13.

Wang

Huang

Liang

An effective and efficient method for query by humming system based on multi-similarity measurement fusion

Proceedings of the International Conference on Audio, Language and Image Processing (ICALIP '08)

July 2008

Shanghai, China

471 475

10.1109/icalip.2008.4590167

2-s2.0-51849167416

14.

Liu

Yang

Yan

A top-down approach to melody match in pitch contour for query by humming

Proceedings of the International Symposium of Chinese Spoken Language Processing

2006

669 680

15.

Jang

J.-S. R.

Gao

M.-Y.

A query-by-singing system based on dynamic programming

Proceedings of the International Workshop on Intelligent Systems Resolutions

2000

85 89

16.

Nam

G. P.

Luong

T. T. T.

Nam

H. H.

Park

K. R.

Park

S. J.

Intelligent query by humming system based on score level fusion of multiple classifiers

EURASIP Journal on Advances in Signal Processing 2011 2011, article 21 11

10.1186/1687-6180-2011-21

17.

Nam

G. P.

Park

K. R.

Park

S.-J.

Lee

S.-P.

Kim

M.-Y.

A new query-by-humming system based on the score level fusion of two classifiers

International Journal of Communication Systems 2012 25 6 717 733

10.1002/dac.1187

2-s2.0-84861814531

18.

Song

Bae

S. Y.

Yoon

Mid-level music melody representation of polyphonic audio for query-by-humming system

Proceedings of the International Symposium on Music Information Retrieval

October 2002

Paris, France

133 139

19.

Krishnamoorthy

Bhatt

Srinivas

Kumar

Query by humming system for embedded platforms

Proceedings of the Annual IEEE India Conference

December 2010

1 5

10.1109/indcon.2010.5712695

2-s2.0-79952471355

20.

Han

Shi

An Efficient Approach to Humming Transcription for Query-by-Humming System

Proceedings of the 3rd International Congress on Image and Signal Processing (CISP '10)

October 2010

Yantai, China

IEEE

3746 3749

10.1109/cisp.2010.5646801

2-s2.0-78650524840

21.

Kim

Park

K. R.

Park

S.-J.

Lee

S.-P.

Kim

M. Y.

Robust query-by-singing/humming system against background noise environments

IEEE Transactions on Consumer Electronics 2011 57 2 720 725

10.1109/tce.2011.5955213

2-s2.0-79960895771

22.

Wang

C.-C.

Jang

J.-S. R.

Wang

An improved query by singing/humming system using melody and lyrics information

Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR '10)

August 2010

45 50

2-s2.0-84873594172

23.

Keogh

E. J.

Pazzani

M. J.

Scaling up dynamic time warping to massive datasets

Principles of Data Mining and Knowledge Discovery: Third European Conference, PKDD'99, Prague, Czech Republic, September 15–18, 1999. Proceedings 1999 1704

Berlin, Germany

Springer

1 11 Lecture Notes in Computer Science

10.1007/978-3-540-48247-5_1

24.

Youssef

A. M.

Abdel-Galil

T. K.

El-Saadany

E. F.

Salama

M. M. A.

Disturbance classification utilizing dynamic time warping classifier

IEEE Transactions on Power Delivery 2004 19 1 272 278

10.1109/tpwrd.2003.820178

2-s2.0-0346707398